|
|
8aa66987f8
|
feat(03-06): implement literature evidence models, PubMed fetch, and scoring
- Create LiteratureRecord pydantic model with context-specific counts
- Implement PubMed query via Biopython Entrez with rate limiting (3/sec default, 10/sec with API key)
- Define SEARCH_CONTEXTS for cilia, sensory, cytoskeleton, cell_polarity queries
- Implement evidence tier classification: direct_experimental > functional_mention > hts_hit > incidental > none
- Implement quality-weighted scoring with bias mitigation via log2(total_pubmed_count) normalization
- Add biopython>=1.84 dependency to pyproject.toml
- Support checkpoint-restart for long-running PubMed queries (estimated 3-11 hours for 20K genes)
|
2026-02-11 19:00:20 +08:00 |
|
|
|
a88b0eea60
|
feat(02-01): add gnomAD constraint data models and download module
- Create evidence layer package structure
- Define ConstraintRecord Pydantic model with NULL preservation
- Implement streaming download with httpx and tenacity retry
- Add lazy TSV parser with column name variant handling
- Add httpx and structlog dependencies
|
2026-02-11 18:11:49 +08:00 |
|
|
|
f33b048635
|
feat(01-04): add CLI entry point with setup and info commands
- Create click-based CLI with command group (--config, --verbose options)
- Add 'info' command displaying pipeline version, config hash, data source versions
- Add 'setup' command orchestrating full infrastructure flow:
- Load config -> create store/provenance
- Fetch gene universe (with checkpoint-restart)
- Map Ensembl IDs to HGNC + UniProt
- Validate mapping quality gates
- Save to DuckDB with provenance sidecar
- Update pyproject.toml entry point to usher_pipeline.cli.main:cli
- Add .gitignore for data/, *.duckdb, build artifacts, provenance files
|
2026-02-11 16:39:50 +08:00 |
|
|
|
4a80a0398e
|
feat(01-01): create Python package scaffold with config system
- pyproject.toml: installable package with bioinformatics dependencies
- Pydantic config schema with validation (ensembl_release >= 100, directory creation)
- YAML config loader with override support
- Default config with Ensembl 113, gnomAD v4.1
- 5 passing tests for config validation and hashing
|
2026-02-11 16:24:35 +08:00 |
|