- SUMMARY.md: CLI orchestration with checkpoint-restart + 10 comprehensive tests - STATE.md: Updated position (Phase 4 complete), progress (75%), velocity, decisions - Duration: 3 minutes, 2 tasks, 4 files (3 created, 1 modified)
8.4 KiB
phase, plan, subsystem, tags, dependency_graph, tech_stack, key_files, decisions, metrics
| phase | plan | subsystem | tags | dependency_graph | tech_stack | key_files | decisions | metrics | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 04-scoring-integration | 03 | CLI and Testing |
|
|
|
|
|
|
Phase 04 Plan 03: CLI Score Command and Tests Summary
One-liner: CLI command orchestrating full scoring pipeline (known genes → composite scores → QC → validation) with comprehensive test coverage for NULL preservation and validation logic
Tasks Completed
Task 1: CLI Score Command with Checkpoint-Restart
Commit: d57a5f2
Created src/usher_pipeline/cli/score_cmd.py following the established pattern from evidence_cmd.py:
Implementation:
- Single
scorecommand (not a group) with options:--force,--skip-qc,--skip-validation - Uses
@click.pass_contextto access config_path from CLI context - 5-step pipeline flow:
- Load known genes (OMIM Usher + SYSCILIA SCGS) to DuckDB
- Compute composite scores with NULL-preserving weighted average
- Persist scored_genes table with per-layer contributions
- Run QC checks (unless --skip-qc) with warnings/errors display
- Validate known gene rankings (unless --skip-validation) with report generation
Checkpoint-restart:
- Checks
store.has_checkpoint('scored_genes')before processing - If exists and not --force, displays summary and returns early
- Allows fast iteration during development
CLI integration:
- Updated
src/usher_pipeline/cli/main.pyto import and register score command - Command appears in
usher-pipeline --helpalongside setup and evidence
Output:
- Displays comprehensive summary: total genes, mean score, quality flag distribution
- Shows QC pass/fail status and validation pass/fail status
- Saves provenance sidecar to
data_dir/scoring/scoring.provenance.json
Verification:
# CLI help works
usher-pipeline score --help # Shows --force, --skip-qc, --skip-validation options
# Command registered
usher-pipeline --help # Lists score command
Task 2: Unit and Integration Tests for Scoring Module
Commit: a6ad6c6
Created comprehensive test coverage for scoring module with 10 tests using synthetic data.
test_scoring.py (7 unit tests):
-
test_compile_known_genes_returns_expected_structure
- Verifies DataFrame structure (gene_symbol, source, confidence columns)
- Asserts >= 38 genes (10 OMIM Usher + 28 SYSCILIA SCGS v2)
- Confirms MYO7A and IFT88 present
- Validates all confidence = HIGH
- Checks both sources present (omim_usher, syscilia_scgs_v2)
-
test_compile_known_genes_no_duplicates_within_source
- Verifies no duplicate gene_symbol within same source
- Allows genes to appear in both sources (separate rows)
-
test_scoring_weights_validate_sum_defaults
- ScoringWeights() with defaults passes validate_sum()
-
test_scoring_weights_validate_sum_custom_valid
- Custom weights summing to 1.0 pass validation
-
test_scoring_weights_validate_sum_invalid
- Weights summing to 1.35 raise ValueError
-
test_scoring_weights_validate_sum_close_to_one
- Weights within 1e-6 of 1.0 pass (0.999999)
- Weights outside tolerance fail (0.99)
-
test_null_preservation_in_composite
- Creates in-memory DuckDB with 3 genes
- GENE1: 2 evidence layers (gnomad + annotation) → non-NULL score, moderate_evidence
- GENE2: 1 evidence layer (gnomad only) → non-NULL score, sparse_evidence
- GENE3: 0 evidence layers → NULL score, no_evidence
- Verifies NULL preservation (not zero) for genes without evidence
test_scoring_integration.py (3 integration tests):
-
test_scoring_pipeline_end_to_end
- Creates synthetic store with 20 genes (17 generic + 3 known: MYO7A, IFT88, CDH23)
- 6 evidence tables with varying NULL rates:
- gnomad: 15/20 (75%)
- expression: 12/20 (60%)
- annotation: 18/20 (90%)
- localization: 10/20 (50%)
- animal_models: 8/20 (40%)
- literature: 14/20 (70%)
- Known genes receive high scores (0.8-0.95) across all 6 layers
- Verifies:
- All 20 genes present in results
- Genes with evidence have non-NULL composite_score
- Genes without evidence have NULL composite_score
- evidence_count values correct (0-6)
- quality_flag matches evidence_count thresholds
- Known genes rank in top 5 (at least 2 of 3)
-
test_qc_detects_missing_data
- Creates synthetic store with 100 genes
- gnomad: 5% coverage (95% NULL) → should trigger ERROR (>80% threshold)
- expression: 40% coverage (60% NULL) → should trigger WARNING (>50% threshold)
- Other layers: >50% coverage (no warnings)
- Verifies QC detects and reports errors for high missing data rates
-
test_validation_passes_with_known_genes_ranked_highly
- Uses synthetic_store with known genes scoring highly
- Loads known genes, computes scores, persists, runs validation
- Verifies:
- validation_passed = True
- median_percentile >= 0.75 (top quartile threshold)
- Known genes rank highly as expected
Test Execution:
pytest tests/test_scoring.py tests/test_scoring_integration.py -v
# 10 passed, 7 warnings in 0.68s
Deviations from Plan
None - plan executed exactly as written.
Self-Check: PASSED
Files Created:
[ -f "/Users/gbanyan/Project/usher-exploring/src/usher_pipeline/cli/score_cmd.py" ] && echo "FOUND: score_cmd.py" || echo "MISSING: score_cmd.py"
# FOUND: score_cmd.py
[ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring.py" ] && echo "FOUND: test_scoring.py" || echo "MISSING: test_scoring.py"
# FOUND: test_scoring.py
[ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring_integration.py" ] && echo "FOUND: test_scoring_integration.py" || echo "MISSING: test_scoring_integration.py"
# FOUND: test_scoring_integration.py
Commits Exist:
git log --oneline --all | grep -q "d57a5f2" && echo "FOUND: d57a5f2" || echo "MISSING: d57a5f2"
# FOUND: d57a5f2
git log --oneline --all | grep -q "a6ad6c6" && echo "FOUND: a6ad6c6" || echo "MISSING: a6ad6c6"
# FOUND: a6ad6c6
CLI Command Works:
usher-pipeline score --help
# Shows --force, --skip-qc, --skip-validation options
usher-pipeline --help | grep score
# score Compute multi-evidence composite scores for all genes.
Tests Pass:
pytest tests/test_scoring.py tests/test_scoring_integration.py -v
# 10 passed, 7 warnings in 0.68s
Verification
All success criteria met:
- ✅
usher-pipeline score --helpshows available options - ✅ Score command registered in main CLI
- ✅ Unit tests pass: known genes, weight validation, NULL handling
- ✅ Integration tests pass: end-to-end scoring with synthetic data, QC detection, validation
- ✅ All tests runnable with
pytest tests/test_scoring*.py
Notes
- Tests use synthetic data exclusively (no external API calls, fast, reproducible)
- NULL preservation pattern validated: genes with no evidence get NULL composite_score, not zero
- Known genes designed to rank highly in synthetic data (0.8-0.95 scores across all layers)
- QC thresholds: 50% missing = warning, 80% missing = error
- Validation threshold: median percentile >= 0.75 (top quartile)