Files
genomic-consultant/README.md
gbanyan d13d58df8b Refactor: Replace scaffolding with working analysis scripts
- Add trio_analysis.py for trio-based variant analysis with de novo detection
- Add clinvar_acmg_annotate.py for ClinVar/ACMG annotation
- Add gwas_comprehensive.py with 201 SNPs across 18 categories
- Add pharmgkb_full_analysis.py for pharmacogenomics analysis
- Add gwas_trait_lookup.py for basic GWAS trait lookup
- Add pharmacogenomics.py for basic PGx analysis
- Remove unused scaffolding code (src/, configs/, docs/, tests/)
- Update README.md with new documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 22:36:02 +08:00

2.9 KiB

Genomic Consultant

A practical genomics analysis toolkit for Trio WES (Whole Exome Sequencing) data analysis, including ClinVar/ACMG annotation, GWAS trait analysis, and pharmacogenomics.

Analysis Scripts

1. Trio Analysis (trio_analysis.py)

Comprehensive trio-based variant analysis with de novo detection, compound heterozygosity, and inheritance pattern annotation.

python trio_analysis.py <vcf_path> <output_dir>

2. ClinVar/ACMG Annotation (clinvar_acmg_annotate.py)

Annotates variants with ClinVar clinical significance and generates ACMG-style evidence tags.

python clinvar_acmg_annotate.py <vcf_path> <output_path> [sample_idx]

3. GWAS Comprehensive Analysis (gwas_comprehensive.py)

Comprehensive GWAS trait analysis with 201 curated SNPs across 18 categories:

  • Gout / Uric acid metabolism
  • Kidney disease
  • Hearing loss
  • Autoimmune diseases
  • Cancer risk
  • Blood clotting / Thrombosis
  • Thyroid disorders
  • Bone health / Osteoporosis
  • Liver disease (NAFLD)
  • Migraine
  • Longevity / Aging
  • Sleep
  • Skin conditions
  • Cardiovascular disease
  • Metabolic disorders
  • Eye conditions
  • Neuropsychiatric
  • Other traits
python gwas_comprehensive.py <vcf_path> <output_path> [sample_idx]

4. PharmGKB Full Analysis (pharmgkb_full_analysis.py)

Comprehensive pharmacogenomics analysis using the PharmGKB clinical annotations database.

python pharmgkb_full_analysis.py <vcf_path> <output_path> [sample_idx]

5. GWAS Trait Lookup (gwas_trait_lookup.py)

Original curated GWAS trait lookup (smaller SNP set).

python gwas_trait_lookup.py <vcf_path> <output_path> [sample_idx]

6. Basic Pharmacogenomics (pharmacogenomics.py)

Basic pharmacogenomics analysis with common drug-gene interactions.

Prerequisites

  • Python 3.8+
  • conda environment with bioinformatics tools:
    conda create -n genomics python=3.10
    conda activate genomics
    conda install -c bioconda bcftools snpeff gatk4
    

Reference Databases Required

  • ClinVar: VCF from NCBI
  • PharmGKB: Clinical annotations TSV
  • dbSNP: For rsID annotation
  • GRCh37/hg19 reference genome

Data Directory Structure

/Volumes/NV2/
├── genomics_analysis/
│   └── vcf/
│       ├── trio_joint.vcf.gz          # Joint-called VCF
│       ├── trio_joint.rsid.vcf.gz     # With rsID annotations
│       └── trio_joint.snpeff.vcf      # With SnpEff annotations
└── genomics_reference/
    ├── clinvar/
    ├── pharmgkb/
    ├── dbsnp/
    └── gwas_catalog/

Sample Index Mapping

For trio VCF files:

  • Index 0: Mother
  • Index 1: Father
  • Index 2: Proband

Output Reports

Each script generates detailed reports including:

  • Summary statistics
  • Risk variant identification
  • Family comparison (for trio data)
  • Clinical annotations and recommendations

License

Private use only.