6605ff0f2b
fix: resolve runtime bugs for pipeline execution on Python 3.14 + latest deps
...
- gene_mapping: wrap mygene fetch_all generator in list() to fix len() error
- gene_mapping: raise MAX_EXPECTED_GENES to 23000 (mygene DB growth)
- setup_cmd: rename gene_universe columns to gene_id/gene_symbol for
consistency with all downstream evidence layer code
- gnomad: handle missing coverage columns in v4.1 constraint TSV
- expression: fix HPA URL (v23.proteinatlas.org) and GTEx URL (v8 path)
- expression: fix Polars pivot() API change (columns -> on), collect first
- expression: handle missing GTEx tissues (Eye - Retina not in v8)
- expression: ensure all expected columns exist even when sources unavailable
- expression/load: safely check column existence before filtering
- localization: fix HPA subcellular URL to v23
- animal_models: fix httpx stream response.read() before .text access
- animal_models: increase infer_schema_length for HCOP and MGI TSV parsing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-13 03:44:01 +08:00
0200395d9e
feat(01-02): create mapping validation gates with tests
...
- Add MappingValidator with configurable success rate thresholds (min_success_rate, warn_threshold)
- Add validate_gene_universe for gene count, format, and duplicate checks
- Add save_unmapped_report for manual review output
- Implement 15 comprehensive tests with mocked mygene responses (no real API calls)
- Tests cover: successful mapping, notfound handling, uniprot list parsing, batching, validation gates, universe validation
2026-02-11 16:33:36 +08:00