Initial commit

2025-11-28 11:52:04 +08:00
commit f74dc351f7
51 changed files with 2402 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,112 @@
+# Genomic Consultant
+
+Early design for a personal genomic risk and drug–interaction decision support system. Specs are sourced from `genomic_decision_support_system_spec_v0.1.md`.
+
+## Vision (per spec)
+- Phase 1: trio variant calling, annotation, queryable genomic DB, initial ACMG evidence tagging.
+- Phase 2: pharmacogenomics genotype-to-phenotype mapping plus drug–drug interaction checks.
+- Phase 3: supplement/herb normalization and interaction risk layering.
+- Phase 4: LLM-driven query orchestration and report generation.
+
+## Repository Layout
+- `docs/` — system architecture notes, phase plans, data models (work in progress).
+- `configs/` — example ACMG config and gene panel JSON.
+- `configs/phenotype_to_genes.example.json` — placeholder phenotype/HPO → gene mappings.
+- `configs/phenotype_to_genes.hpo_seed.json` — seed HPO mappings (replace with full HPO/GenCC derived panels).
+- `sample_data/` — tiny annotated TSV for demo.
+- `src/genomic_consultant/` — Python scaffolding (pipelines, store, panel lookup, ACMG tagging, reporting).
+- `genomic_decision_support_system_spec_v0.1.md` — original requirements draft.
+
+## Contributing/next steps
+1. Finalize Phase 1 tech selection (variant caller, annotation stack, reference/DB versions).
+2. Stand up the Phase 1 pipelines and minimal query API surface.
+3. Add ACMG evidence tagging config and human-review logging.
+4. Layer in PGx/DDI and supplement modules per later phases.
+
+Data safety: keep genomic/clinical data local; the `.gitignore` blocks common genomic outputs by default.
+
+## Quickstart (CLI scaffolding)
+```
+pip install -e .
+
+# 1) Show trio calling plan (commands only; not executed)
+genomic-consultant plan-call \
+  --sample proband:/data/proband.bam \
+  --sample father:/data/father.bam \
+  --sample mother:/data/mother.bam \
+  --reference /refs/GRCh38.fa \
+  --workdir /tmp/trio
+
+# 1b) Execute calling plan (requires GATK installed) and emit run log
+genomic-consultant run-call \
+  --sample proband:/data/proband.bam \
+  --sample father:/data/father.bam \
+  --sample mother:/data/mother.bam \
+  --reference /refs/GRCh38.fa \
+  --workdir /tmp/trio \
+  --log /tmp/trio/run_call_log.json \
+  --probe-tools
+
+# 2) Show annotation plan for a joint VCF
+genomic-consultant plan-annotate \
+  --vcf /tmp/trio/trio.joint.vcf.gz \
+  --workdir /tmp/trio/annot \
+  --prefix trio \
+  --reference /refs/GRCh38.fa
+
+# 2b) Execute annotation plan (requires VEP, bcftools) with run log
+genomic-consultant run-annotate \
+  --vcf /tmp/trio/trio.joint.vcf.gz \
+  --workdir /tmp/trio/annot \
+  --prefix trio \
+  --reference /refs/GRCh38.fa \
+  --log /tmp/trio/annot/run_annot_log.json \
+  --probe-tools
+
+# 3) Demo panel report using sample data (panel file)
+genomic-consultant panel-report \
+  --tsv sample_data/example_annotated.tsv \
+  --panel configs/panel.example.json \
+  --acmg-config configs/acmg_config.example.yaml \
+  --individual-id demo \
+  --format markdown \
+  --log /tmp/panel_log.json
+
+# 3b) Demo panel report using phenotype mapping (HPO)
+genomic-consultant panel-report \
+  --tsv sample_data/example_annotated.tsv \
+  --phenotype-id HP:0000365 \
+  --phenotype-mapping configs/phenotype_to_genes.hpo_seed.json \
+  --acmg-config configs/acmg_config.example.yaml \
+  --individual-id demo \
+  --format markdown
+
+# 3c) Merge multiple phenotype→gene mappings into one
+genomic-consultant build-phenotype-mapping \
+  --output configs/phenotype_to_genes.merged.json \
+  configs/phenotype_to_genes.example.json configs/phenotype_to_genes.hpo_seed.json
+
+# 4) End-to-end Phase 1 pipeline (optionally skip call/annotate; use sample TSV)
+genomic-consultant phase1-run \
+  --tsv sample_data/example_annotated.tsv \
+  --skip-call --skip-annotate \
+  --panel configs/panel.example.json \
+  --acmg-config configs/acmg_config.example.yaml \
+  --workdir runtime \
+  --prefix demo
+
+# Run tests
+pytest
+```
+
+### Optional Parquet-backed store
+Install pandas to enable Parquet ingestion:
+```
+pip install -e .[store]
+```
+
+### Notes on VEP plugins (SpliceAI/CADD)
+- The annotation plan already queries `SpliceAI` and `CADD_PHRED` fields; ensure your VEP run includes plugins/flags that produce them, e.g.:
+  - `--plugin SpliceAI,snv=/path/to/spliceai.snv.vcf.gz,indel=/path/to/spliceai.indel.vcf.gz`
+  - `--plugin CADD,/path/to/whole_genome_SNVs.tsv.gz,/path/to/InDels.tsv.gz`
+- Pass these via `--plugin` and/or `--extra-flag` on `run-annotate` / `plan-annotate` to embed fields into the TSV.