Commit Graph

3 Commits

Author SHA1 Message Date
a88b0eea60 feat(02-01): add gnomAD constraint data models and download module
- Create evidence layer package structure
- Define ConstraintRecord Pydantic model with NULL preservation
- Implement streaming download with httpx and tenacity retry
- Add lazy TSV parser with column name variant handling
- Add httpx and structlog dependencies
2026-02-11 18:11:49 +08:00
f33b048635 feat(01-04): add CLI entry point with setup and info commands
- Create click-based CLI with command group (--config, --verbose options)
- Add 'info' command displaying pipeline version, config hash, data source versions
- Add 'setup' command orchestrating full infrastructure flow:
  - Load config -> create store/provenance
  - Fetch gene universe (with checkpoint-restart)
  - Map Ensembl IDs to HGNC + UniProt
  - Validate mapping quality gates
  - Save to DuckDB with provenance sidecar
- Update pyproject.toml entry point to usher_pipeline.cli.main:cli
- Add .gitignore for data/, *.duckdb, build artifacts, provenance files
2026-02-11 16:39:50 +08:00
4a80a0398e feat(01-01): create Python package scaffold with config system
- pyproject.toml: installable package with bioinformatics dependencies
- Pydantic config schema with validation (ensembl_release >= 100, directory creation)
- YAML config loader with override support
- Default config with Ensembl 113, gnomAD v4.1
- 5 passing tests for config validation and hashing
2026-02-11 16:24:35 +08:00