- Add trio_analysis.py for trio-based variant analysis with de novo detection - Add clinvar_acmg_annotate.py for ClinVar/ACMG annotation - Add gwas_comprehensive.py with 201 SNPs across 18 categories - Add pharmgkb_full_analysis.py for pharmacogenomics analysis - Add gwas_trait_lookup.py for basic GWAS trait lookup - Add pharmacogenomics.py for basic PGx analysis - Remove unused scaffolding code (src/, configs/, docs/, tests/) - Update README.md with new documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
114 lines
2.9 KiB
Markdown
114 lines
2.9 KiB
Markdown
# Genomic Consultant
|
|
|
|
A practical genomics analysis toolkit for Trio WES (Whole Exome Sequencing) data analysis, including ClinVar/ACMG annotation, GWAS trait analysis, and pharmacogenomics.
|
|
|
|
## Analysis Scripts
|
|
|
|
### 1. Trio Analysis (`trio_analysis.py`)
|
|
Comprehensive trio-based variant analysis with de novo detection, compound heterozygosity, and inheritance pattern annotation.
|
|
|
|
```bash
|
|
python trio_analysis.py <vcf_path> <output_dir>
|
|
```
|
|
|
|
### 2. ClinVar/ACMG Annotation (`clinvar_acmg_annotate.py`)
|
|
Annotates variants with ClinVar clinical significance and generates ACMG-style evidence tags.
|
|
|
|
```bash
|
|
python clinvar_acmg_annotate.py <vcf_path> <output_path> [sample_idx]
|
|
```
|
|
|
|
### 3. GWAS Comprehensive Analysis (`gwas_comprehensive.py`)
|
|
Comprehensive GWAS trait analysis with 201 curated SNPs across 18 categories:
|
|
- Gout / Uric acid metabolism
|
|
- Kidney disease
|
|
- Hearing loss
|
|
- Autoimmune diseases
|
|
- Cancer risk
|
|
- Blood clotting / Thrombosis
|
|
- Thyroid disorders
|
|
- Bone health / Osteoporosis
|
|
- Liver disease (NAFLD)
|
|
- Migraine
|
|
- Longevity / Aging
|
|
- Sleep
|
|
- Skin conditions
|
|
- Cardiovascular disease
|
|
- Metabolic disorders
|
|
- Eye conditions
|
|
- Neuropsychiatric
|
|
- Other traits
|
|
|
|
```bash
|
|
python gwas_comprehensive.py <vcf_path> <output_path> [sample_idx]
|
|
```
|
|
|
|
### 4. PharmGKB Full Analysis (`pharmgkb_full_analysis.py`)
|
|
Comprehensive pharmacogenomics analysis using the PharmGKB clinical annotations database.
|
|
|
|
```bash
|
|
python pharmgkb_full_analysis.py <vcf_path> <output_path> [sample_idx]
|
|
```
|
|
|
|
### 5. GWAS Trait Lookup (`gwas_trait_lookup.py`)
|
|
Original curated GWAS trait lookup (smaller SNP set).
|
|
|
|
```bash
|
|
python gwas_trait_lookup.py <vcf_path> <output_path> [sample_idx]
|
|
```
|
|
|
|
### 6. Basic Pharmacogenomics (`pharmacogenomics.py`)
|
|
Basic pharmacogenomics analysis with common drug-gene interactions.
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.8+
|
|
- conda environment with bioinformatics tools:
|
|
```bash
|
|
conda create -n genomics python=3.10
|
|
conda activate genomics
|
|
conda install -c bioconda bcftools snpeff gatk4
|
|
```
|
|
|
|
## Reference Databases Required
|
|
|
|
- **ClinVar**: VCF from NCBI
|
|
- **PharmGKB**: Clinical annotations TSV
|
|
- **dbSNP**: For rsID annotation
|
|
- **GRCh37/hg19 reference genome**
|
|
|
|
## Data Directory Structure
|
|
|
|
```
|
|
/Volumes/NV2/
|
|
├── genomics_analysis/
|
|
│ └── vcf/
|
|
│ ├── trio_joint.vcf.gz # Joint-called VCF
|
|
│ ├── trio_joint.rsid.vcf.gz # With rsID annotations
|
|
│ └── trio_joint.snpeff.vcf # With SnpEff annotations
|
|
└── genomics_reference/
|
|
├── clinvar/
|
|
├── pharmgkb/
|
|
├── dbsnp/
|
|
└── gwas_catalog/
|
|
```
|
|
|
|
## Sample Index Mapping
|
|
|
|
For trio VCF files:
|
|
- Index 0: Mother
|
|
- Index 1: Father
|
|
- Index 2: Proband
|
|
|
|
## Output Reports
|
|
|
|
Each script generates detailed reports including:
|
|
- Summary statistics
|
|
- Risk variant identification
|
|
- Family comparison (for trio data)
|
|
- Clinical annotations and recommendations
|
|
|
|
## License
|
|
|
|
Private use only.
|