# Genomic Consultant Early design for a personal genomic risk and drug–interaction decision support system. Specs are sourced from `genomic_decision_support_system_spec_v0.1.md`. ## Vision (per spec) - Phase 1: trio variant calling, annotation, queryable genomic DB, initial ACMG evidence tagging. - Phase 2: pharmacogenomics genotype-to-phenotype mapping plus drug–drug interaction checks. - Phase 3: supplement/herb normalization and interaction risk layering. - Phase 4: LLM-driven query orchestration and report generation. ## Repository Layout - `docs/` — system architecture notes, phase plans, data models (work in progress). - `configs/` — example ACMG config and gene panel JSON. - `configs/phenotype_to_genes.example.json` — placeholder phenotype/HPO → gene mappings. - `configs/phenotype_to_genes.hpo_seed.json` — seed HPO mappings (replace with full HPO/GenCC derived panels). - `sample_data/` — tiny annotated TSV for demo. - `src/genomic_consultant/` — Python scaffolding (pipelines, store, panel lookup, ACMG tagging, reporting). - `genomic_decision_support_system_spec_v0.1.md` — original requirements draft. ## Contributing/next steps 1. Finalize Phase 1 tech selection (variant caller, annotation stack, reference/DB versions). 2. Stand up the Phase 1 pipelines and minimal query API surface. 3. Add ACMG evidence tagging config and human-review logging. 4. Layer in PGx/DDI and supplement modules per later phases. Data safety: keep genomic/clinical data local; the `.gitignore` blocks common genomic outputs by default. ## Quickstart (CLI scaffolding) ``` pip install -e . # 1) Show trio calling plan (commands only; not executed) genomic-consultant plan-call \ --sample proband:/data/proband.bam \ --sample father:/data/father.bam \ --sample mother:/data/mother.bam \ --reference /refs/GRCh38.fa \ --workdir /tmp/trio # 1b) Execute calling plan (requires GATK installed) and emit run log genomic-consultant run-call \ --sample proband:/data/proband.bam \ --sample father:/data/father.bam \ --sample mother:/data/mother.bam \ --reference /refs/GRCh38.fa \ --workdir /tmp/trio \ --log /tmp/trio/run_call_log.json \ --probe-tools # 2) Show annotation plan for a joint VCF genomic-consultant plan-annotate \ --vcf /tmp/trio/trio.joint.vcf.gz \ --workdir /tmp/trio/annot \ --prefix trio \ --reference /refs/GRCh38.fa # 2b) Execute annotation plan (requires VEP, bcftools) with run log genomic-consultant run-annotate \ --vcf /tmp/trio/trio.joint.vcf.gz \ --workdir /tmp/trio/annot \ --prefix trio \ --reference /refs/GRCh38.fa \ --log /tmp/trio/annot/run_annot_log.json \ --probe-tools # 3) Demo panel report using sample data (panel file) genomic-consultant panel-report \ --tsv sample_data/example_annotated.tsv \ --panel configs/panel.example.json \ --acmg-config configs/acmg_config.example.yaml \ --individual-id demo \ --format markdown \ --log /tmp/panel_log.json # 3b) Demo panel report using phenotype mapping (HPO) genomic-consultant panel-report \ --tsv sample_data/example_annotated.tsv \ --phenotype-id HP:0000365 \ --phenotype-mapping configs/phenotype_to_genes.hpo_seed.json \ --acmg-config configs/acmg_config.example.yaml \ --individual-id demo \ --format markdown # 3c) Merge multiple phenotype→gene mappings into one genomic-consultant build-phenotype-mapping \ --output configs/phenotype_to_genes.merged.json \ configs/phenotype_to_genes.example.json configs/phenotype_to_genes.hpo_seed.json # 4) End-to-end Phase 1 pipeline (optionally skip call/annotate; use sample TSV) genomic-consultant phase1-run \ --tsv sample_data/example_annotated.tsv \ --skip-call --skip-annotate \ --panel configs/panel.example.json \ --acmg-config configs/acmg_config.example.yaml \ --workdir runtime \ --prefix demo # Run tests pytest ``` ### Optional Parquet-backed store Install pandas to enable Parquet ingestion: ``` pip install -e .[store] ``` ### Notes on VEP plugins (SpliceAI/CADD) - The annotation plan already queries `SpliceAI` and `CADD_PHRED` fields; ensure your VEP run includes plugins/flags that produce them, e.g.: - `--plugin SpliceAI,snv=/path/to/spliceai.snv.vcf.gz,indel=/path/to/spliceai.indel.vcf.gz` - `--plugin CADD,/path/to/whole_genome_SNVs.tsv.gz,/path/to/InDels.tsv.gz` - Pass these via `--plugin` and/or `--extra-flag` on `run-annotate` / `plan-annotate` to embed fields into the TSV.