Three bugs prevented gnomAD and expression data from contributing to scores: 1. gnomAD COLUMN_VARIANTS mapped "gene" (HGNC symbol) to gene_id instead of gene_symbol, causing JOIN miss with gene_universe (Ensembl IDs) 2. Expression HPA data was fetched but never merged (lf_hpa unused) 3. GTEx versioned Ensembl IDs (ENSG*.5) didn't match gene_universe Results: gnomAD 78.5% coverage, expression 87.4%, 19946 genes with ≥4 layers. HIGH tier refined from 44 → 18 candidates. Validation PASSED (CDH23 96.5th pctl). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
46 lines
894 B
Markdown
46 lines
894 B
Markdown
# Pipeline Reproducibility Report
|
|
|
|
**Run ID:** `e7486ff1-f9be-403b-a68d-115fc845f4a1`
|
|
**Timestamp:** 2026-02-15T21:13:12.288563+00:00
|
|
**Pipeline Version:** 0.1.0
|
|
|
|
## Parameters
|
|
|
|
**Scoring Weights:**
|
|
|
|
- gnomAD: 0.20
|
|
- Expression: 0.20
|
|
- Annotation: 0.15
|
|
- Localization: 0.15
|
|
- Animal Model: 0.15
|
|
- Literature: 0.15
|
|
|
|
## Data Versions
|
|
|
|
- **ensembl_release:** 113
|
|
- **gnomad_version:** v4.1
|
|
- **gtex_version:** v8
|
|
- **hpa_version:** 23.0
|
|
|
|
## Software Environment
|
|
|
|
- **python:** 3.14.3
|
|
- **polars:** 1.38.1
|
|
- **duckdb:** 1.4.4
|
|
|
|
## Filtering Steps
|
|
|
|
| Step | Input Count | Output Count | Criteria |
|
|
|------|-------------|--------------|----------|
|
|
| load_scored_genes | 0 | 0 | |
|
|
| apply_tier_classification | 0 | 0 | |
|
|
| write_candidate_output | 0 | 0 | |
|
|
| generate_visualizations | 0 | 0 | |
|
|
|
|
## Tier Statistics
|
|
|
|
- **Total Candidates:** 21103
|
|
- **HIGH:** 18
|
|
- **MEDIUM:** 9577
|
|
- **LOW:** 11508
|