- Known genes: 38 (10 OMIM Usher + 28 SYSCILIA SCGS v2 core) - ScoringWeights.validate_sum() enforcing weight sum = 1.0 - NULL-preserving weighted average (weighted_sum / available_weight) - Quality flags based on evidence_count thresholds - Per-layer contributions for explainability - 2 tasks, 4 files, 4 min duration
6.2 KiB
phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, duration, completed
| phase | plan | subsystem | tags | requires | provides | affects | tech-stack | key-files | key-decisions | patterns-established | duration | completed | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 04-scoring-integration | 01 | scoring |
|
|
|
|
|
|
|
|
4min | 2026-02-11 |
Phase 04 Plan 01: Known Gene Compilation and Multi-Evidence Scoring Summary
NULL-preserving weighted scoring engine joining 6 evidence layers with configurable weights, plus OMIM/SYSCILIA known gene compilation for validation
Performance
- Duration: 4 minutes (228 seconds)
- Started: 2026-02-11T12:38:05Z
- Completed: 2026-02-11T12:42:13Z
- Tasks: 2
- Files modified: 4
Accomplishments
- Compiled 38 known cilia/Usher genes from OMIM (10 genes) and SYSCILIA SCGS v2 core (28 genes)
- Implemented ScoringWeights.validate_sum() enforcing weight sum constraint (1.0 ± 1e-6)
- Created join_evidence_layers() LEFT JOINing all 6 evidence tables preserving NULLs
- Built compute_composite_scores() with NULL-preserving weighted average (weighted_sum / available_weight)
- Added quality flag classification (sufficient/moderate/sparse/no_evidence) based on evidence count
- Included per-layer contribution columns for explainability
Task Commits
Each task was committed atomically:
- Task 1: Known gene compilation and ScoringWeights validation -
0cd2f7c(feat) - Task 2: Multi-evidence weighted scoring integration -
f441e8c(feat)
Files Created/Modified
src/usher_pipeline/scoring/__init__.py- Scoring module exportssrc/usher_pipeline/scoring/known_genes.py- OMIM_USHER_GENES (10), SYSCILIA_SCGS_V2_CORE (28), compile_known_genes()src/usher_pipeline/scoring/integration.py- join_evidence_layers(), compute_composite_scores(), persist_scored_genes()src/usher_pipeline/config/schema.py- Added ScoringWeights.validate_sum() method
Decisions Made
-
Known gene curation: Limited SYSCILIA SCGS v2 to ~28 core genes (subset of 686 full list) for initial positive control validation. Future enhancement can add fetch_scgs_v2() to download complete list from publication supplementary data.
-
Multi-source provenance: compile_known_genes() does NOT de-duplicate gene_symbols across sources. A gene appearing in both OMIM and SYSCILIA will have two rows (one per source). This preserves provenance for validation and analysis.
-
NULL-preserving weighted average: Implemented weighted_sum / available_weight pattern where available_weight = sum of weights for non-NULL layers only. Genes with 0 evidence layers receive NULL composite_score (not 0), preserving semantic distinction between "no evidence" and "weak evidence".
-
Quality flags: Classification based on evidence_count thresholds (>=4 sufficient, >=2 moderate, >=1 sparse, 0 no_evidence) to guide downstream filtering and prioritization.
-
Explainability: Per-layer contribution columns (score * weight) enable tracing which evidence layers drove a gene's composite score. Critical for manual review and trust.
Deviations from Plan
None - plan executed exactly as written.
Issues Encountered
None. Both verification tests passed on first attempt.
User Setup Required
None - no external service configuration required.
Next Phase Readiness
Ready for Phase 04 Plan 02 (ranked candidate list generation):
- Known gene set compiled and ready for exclusion filtering
- Composite scoring engine functional with NULL preservation
- Quality flags available for filtering
- Per-layer contributions available for ranking criteria
No blockers. Next plan can implement:
- Exclusion of known genes
- Ranking by composite score
- Quality flag filtering
- Top-N candidate selection
Self-Check: PASSED
All claimed files and commits verified:
- src/usher_pipeline/scoring/init.py - FOUND
- src/usher_pipeline/scoring/known_genes.py - FOUND
- src/usher_pipeline/scoring/integration.py - FOUND
- Commit
0cd2f7c(Task 1) - FOUND - Commit
f441e8c(Task 2) - FOUND
Phase: 04-scoring-integration Plan: 01 Completed: 2026-02-11