Files
genomic-consultant/README.md
2025-11-28 11:52:04 +08:00

4.4 KiB
Raw Blame History

Genomic Consultant

Early design for a personal genomic risk and druginteraction decision support system. Specs are sourced from genomic_decision_support_system_spec_v0.1.md.

Vision (per spec)

  • Phase 1: trio variant calling, annotation, queryable genomic DB, initial ACMG evidence tagging.
  • Phase 2: pharmacogenomics genotype-to-phenotype mapping plus drugdrug interaction checks.
  • Phase 3: supplement/herb normalization and interaction risk layering.
  • Phase 4: LLM-driven query orchestration and report generation.

Repository Layout

  • docs/ — system architecture notes, phase plans, data models (work in progress).
  • configs/ — example ACMG config and gene panel JSON.
  • configs/phenotype_to_genes.example.json — placeholder phenotype/HPO → gene mappings.
  • configs/phenotype_to_genes.hpo_seed.json — seed HPO mappings (replace with full HPO/GenCC derived panels).
  • sample_data/ — tiny annotated TSV for demo.
  • src/genomic_consultant/ — Python scaffolding (pipelines, store, panel lookup, ACMG tagging, reporting).
  • genomic_decision_support_system_spec_v0.1.md — original requirements draft.

Contributing/next steps

  1. Finalize Phase 1 tech selection (variant caller, annotation stack, reference/DB versions).
  2. Stand up the Phase 1 pipelines and minimal query API surface.
  3. Add ACMG evidence tagging config and human-review logging.
  4. Layer in PGx/DDI and supplement modules per later phases.

Data safety: keep genomic/clinical data local; the .gitignore blocks common genomic outputs by default.

Quickstart (CLI scaffolding)

pip install -e .

# 1) Show trio calling plan (commands only; not executed)
genomic-consultant plan-call \
  --sample proband:/data/proband.bam \
  --sample father:/data/father.bam \
  --sample mother:/data/mother.bam \
  --reference /refs/GRCh38.fa \
  --workdir /tmp/trio

# 1b) Execute calling plan (requires GATK installed) and emit run log
genomic-consultant run-call \
  --sample proband:/data/proband.bam \
  --sample father:/data/father.bam \
  --sample mother:/data/mother.bam \
  --reference /refs/GRCh38.fa \
  --workdir /tmp/trio \
  --log /tmp/trio/run_call_log.json \
  --probe-tools

# 2) Show annotation plan for a joint VCF
genomic-consultant plan-annotate \
  --vcf /tmp/trio/trio.joint.vcf.gz \
  --workdir /tmp/trio/annot \
  --prefix trio \
  --reference /refs/GRCh38.fa

# 2b) Execute annotation plan (requires VEP, bcftools) with run log
genomic-consultant run-annotate \
  --vcf /tmp/trio/trio.joint.vcf.gz \
  --workdir /tmp/trio/annot \
  --prefix trio \
  --reference /refs/GRCh38.fa \
  --log /tmp/trio/annot/run_annot_log.json \
  --probe-tools

# 3) Demo panel report using sample data (panel file)
genomic-consultant panel-report \
  --tsv sample_data/example_annotated.tsv \
  --panel configs/panel.example.json \
  --acmg-config configs/acmg_config.example.yaml \
  --individual-id demo \
  --format markdown \
  --log /tmp/panel_log.json

# 3b) Demo panel report using phenotype mapping (HPO)
genomic-consultant panel-report \
  --tsv sample_data/example_annotated.tsv \
  --phenotype-id HP:0000365 \
  --phenotype-mapping configs/phenotype_to_genes.hpo_seed.json \
  --acmg-config configs/acmg_config.example.yaml \
  --individual-id demo \
  --format markdown

# 3c) Merge multiple phenotype→gene mappings into one
genomic-consultant build-phenotype-mapping \
  --output configs/phenotype_to_genes.merged.json \
  configs/phenotype_to_genes.example.json configs/phenotype_to_genes.hpo_seed.json

# 4) End-to-end Phase 1 pipeline (optionally skip call/annotate; use sample TSV)
genomic-consultant phase1-run \
  --tsv sample_data/example_annotated.tsv \
  --skip-call --skip-annotate \
  --panel configs/panel.example.json \
  --acmg-config configs/acmg_config.example.yaml \
  --workdir runtime \
  --prefix demo

# Run tests
pytest

Optional Parquet-backed store

Install pandas to enable Parquet ingestion:

pip install -e .[store]

Notes on VEP plugins (SpliceAI/CADD)

  • The annotation plan already queries SpliceAI and CADD_PHRED fields; ensure your VEP run includes plugins/flags that produce them, e.g.:
    • --plugin SpliceAI,snv=/path/to/spliceai.snv.vcf.gz,indel=/path/to/spliceai.indel.vcf.gz
    • --plugin CADD,/path/to/whole_genome_SNVs.tsv.gz,/path/to/InDels.tsv.gz
  • Pass these via --plugin and/or --extra-flag on run-annotate / plan-annotate to embed fields into the TSV.