Files
genomic-consultant/README.md
gbanyan d13d58df8b Refactor: Replace scaffolding with working analysis scripts
- Add trio_analysis.py for trio-based variant analysis with de novo detection
- Add clinvar_acmg_annotate.py for ClinVar/ACMG annotation
- Add gwas_comprehensive.py with 201 SNPs across 18 categories
- Add pharmgkb_full_analysis.py for pharmacogenomics analysis
- Add gwas_trait_lookup.py for basic GWAS trait lookup
- Add pharmacogenomics.py for basic PGx analysis
- Remove unused scaffolding code (src/, configs/, docs/, tests/)
- Update README.md with new documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 22:36:02 +08:00

114 lines
2.9 KiB
Markdown

# Genomic Consultant
A practical genomics analysis toolkit for Trio WES (Whole Exome Sequencing) data analysis, including ClinVar/ACMG annotation, GWAS trait analysis, and pharmacogenomics.
## Analysis Scripts
### 1. Trio Analysis (`trio_analysis.py`)
Comprehensive trio-based variant analysis with de novo detection, compound heterozygosity, and inheritance pattern annotation.
```bash
python trio_analysis.py <vcf_path> <output_dir>
```
### 2. ClinVar/ACMG Annotation (`clinvar_acmg_annotate.py`)
Annotates variants with ClinVar clinical significance and generates ACMG-style evidence tags.
```bash
python clinvar_acmg_annotate.py <vcf_path> <output_path> [sample_idx]
```
### 3. GWAS Comprehensive Analysis (`gwas_comprehensive.py`)
Comprehensive GWAS trait analysis with 201 curated SNPs across 18 categories:
- Gout / Uric acid metabolism
- Kidney disease
- Hearing loss
- Autoimmune diseases
- Cancer risk
- Blood clotting / Thrombosis
- Thyroid disorders
- Bone health / Osteoporosis
- Liver disease (NAFLD)
- Migraine
- Longevity / Aging
- Sleep
- Skin conditions
- Cardiovascular disease
- Metabolic disorders
- Eye conditions
- Neuropsychiatric
- Other traits
```bash
python gwas_comprehensive.py <vcf_path> <output_path> [sample_idx]
```
### 4. PharmGKB Full Analysis (`pharmgkb_full_analysis.py`)
Comprehensive pharmacogenomics analysis using the PharmGKB clinical annotations database.
```bash
python pharmgkb_full_analysis.py <vcf_path> <output_path> [sample_idx]
```
### 5. GWAS Trait Lookup (`gwas_trait_lookup.py`)
Original curated GWAS trait lookup (smaller SNP set).
```bash
python gwas_trait_lookup.py <vcf_path> <output_path> [sample_idx]
```
### 6. Basic Pharmacogenomics (`pharmacogenomics.py`)
Basic pharmacogenomics analysis with common drug-gene interactions.
## Prerequisites
- Python 3.8+
- conda environment with bioinformatics tools:
```bash
conda create -n genomics python=3.10
conda activate genomics
conda install -c bioconda bcftools snpeff gatk4
```
## Reference Databases Required
- **ClinVar**: VCF from NCBI
- **PharmGKB**: Clinical annotations TSV
- **dbSNP**: For rsID annotation
- **GRCh37/hg19 reference genome**
## Data Directory Structure
```
/Volumes/NV2/
├── genomics_analysis/
│ └── vcf/
│ ├── trio_joint.vcf.gz # Joint-called VCF
│ ├── trio_joint.rsid.vcf.gz # With rsID annotations
│ └── trio_joint.snpeff.vcf # With SnpEff annotations
└── genomics_reference/
├── clinvar/
├── pharmgkb/
├── dbsnp/
└── gwas_catalog/
```
## Sample Index Mapping
For trio VCF files:
- Index 0: Mother
- Index 1: Father
- Index 2: Proband
## Output Reports
Each script generates detailed reports including:
- Summary statistics
- Risk variant identification
- Family comparison (for trio data)
- Clinical annotations and recommendations
## License
Private use only.