Files
genomic-consultant/README.md
2025-11-28 11:52:04 +08:00

113 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Genomic Consultant
Early design for a personal genomic risk and druginteraction decision support system. Specs are sourced from `genomic_decision_support_system_spec_v0.1.md`.
## Vision (per spec)
- Phase 1: trio variant calling, annotation, queryable genomic DB, initial ACMG evidence tagging.
- Phase 2: pharmacogenomics genotype-to-phenotype mapping plus drugdrug interaction checks.
- Phase 3: supplement/herb normalization and interaction risk layering.
- Phase 4: LLM-driven query orchestration and report generation.
## Repository Layout
- `docs/` — system architecture notes, phase plans, data models (work in progress).
- `configs/` — example ACMG config and gene panel JSON.
- `configs/phenotype_to_genes.example.json` — placeholder phenotype/HPO → gene mappings.
- `configs/phenotype_to_genes.hpo_seed.json` — seed HPO mappings (replace with full HPO/GenCC derived panels).
- `sample_data/` — tiny annotated TSV for demo.
- `src/genomic_consultant/` — Python scaffolding (pipelines, store, panel lookup, ACMG tagging, reporting).
- `genomic_decision_support_system_spec_v0.1.md` — original requirements draft.
## Contributing/next steps
1. Finalize Phase 1 tech selection (variant caller, annotation stack, reference/DB versions).
2. Stand up the Phase 1 pipelines and minimal query API surface.
3. Add ACMG evidence tagging config and human-review logging.
4. Layer in PGx/DDI and supplement modules per later phases.
Data safety: keep genomic/clinical data local; the `.gitignore` blocks common genomic outputs by default.
## Quickstart (CLI scaffolding)
```
pip install -e .
# 1) Show trio calling plan (commands only; not executed)
genomic-consultant plan-call \
--sample proband:/data/proband.bam \
--sample father:/data/father.bam \
--sample mother:/data/mother.bam \
--reference /refs/GRCh38.fa \
--workdir /tmp/trio
# 1b) Execute calling plan (requires GATK installed) and emit run log
genomic-consultant run-call \
--sample proband:/data/proband.bam \
--sample father:/data/father.bam \
--sample mother:/data/mother.bam \
--reference /refs/GRCh38.fa \
--workdir /tmp/trio \
--log /tmp/trio/run_call_log.json \
--probe-tools
# 2) Show annotation plan for a joint VCF
genomic-consultant plan-annotate \
--vcf /tmp/trio/trio.joint.vcf.gz \
--workdir /tmp/trio/annot \
--prefix trio \
--reference /refs/GRCh38.fa
# 2b) Execute annotation plan (requires VEP, bcftools) with run log
genomic-consultant run-annotate \
--vcf /tmp/trio/trio.joint.vcf.gz \
--workdir /tmp/trio/annot \
--prefix trio \
--reference /refs/GRCh38.fa \
--log /tmp/trio/annot/run_annot_log.json \
--probe-tools
# 3) Demo panel report using sample data (panel file)
genomic-consultant panel-report \
--tsv sample_data/example_annotated.tsv \
--panel configs/panel.example.json \
--acmg-config configs/acmg_config.example.yaml \
--individual-id demo \
--format markdown \
--log /tmp/panel_log.json
# 3b) Demo panel report using phenotype mapping (HPO)
genomic-consultant panel-report \
--tsv sample_data/example_annotated.tsv \
--phenotype-id HP:0000365 \
--phenotype-mapping configs/phenotype_to_genes.hpo_seed.json \
--acmg-config configs/acmg_config.example.yaml \
--individual-id demo \
--format markdown
# 3c) Merge multiple phenotype→gene mappings into one
genomic-consultant build-phenotype-mapping \
--output configs/phenotype_to_genes.merged.json \
configs/phenotype_to_genes.example.json configs/phenotype_to_genes.hpo_seed.json
# 4) End-to-end Phase 1 pipeline (optionally skip call/annotate; use sample TSV)
genomic-consultant phase1-run \
--tsv sample_data/example_annotated.tsv \
--skip-call --skip-annotate \
--panel configs/panel.example.json \
--acmg-config configs/acmg_config.example.yaml \
--workdir runtime \
--prefix demo
# Run tests
pytest
```
### Optional Parquet-backed store
Install pandas to enable Parquet ingestion:
```
pip install -e .[store]
```
### Notes on VEP plugins (SpliceAI/CADD)
- The annotation plan already queries `SpliceAI` and `CADD_PHRED` fields; ensure your VEP run includes plugins/flags that produce them, e.g.:
- `--plugin SpliceAI,snv=/path/to/spliceai.snv.vcf.gz,indel=/path/to/spliceai.indel.vcf.gz`
- `--plugin CADD,/path/to/whole_genome_SNVs.tsv.gz,/path/to/InDels.tsv.gz`
- Pass these via `--plugin` and/or `--extra-flag` on `run-annotate` / `plan-annotate` to embed fields into the TSV.