--- phase: 04-scoring-integration plan: 03 subsystem: CLI and Testing tags: - cli - scoring - testing - validation - checkpoint-restart dependency_graph: requires: - 04-01 (scoring integration module) - 04-02 (QC and validation modules) provides: - CLI score command with full pipeline orchestration - Comprehensive unit and integration test coverage affects: - main CLI entry point (command registration) - test suite coverage metrics tech_stack: added: - click command with --force, --skip-qc, --skip-validation options - pytest fixtures for synthetic DuckDB stores patterns: - Checkpoint-restart pattern (skip if scored_genes exists) - Synthetic data testing (no external dependencies) - NULL preservation validation in tests key_files: created: - src/usher_pipeline/cli/score_cmd.py (343 lines) - tests/test_scoring.py (269 lines) - tests/test_scoring_integration.py (305 lines) modified: - src/usher_pipeline/cli/main.py (added score command registration) decisions: - Score command follows evidence_cmd.py pattern for consistency - Separate --skip-qc and --skip-validation flags for flexible iteration - Tests use tmp_path fixtures for isolated DuckDB instances - Synthetic test data designed to ensure known genes rank highly (0.8-0.95 scores across all layers) metrics: duration_minutes: 3 tasks_completed: 2 files_created: 3 files_modified: 1 test_coverage: 10 tests (7 unit + 3 integration) completed_date: 2026-02-11 --- # Phase 04 Plan 03: CLI Score Command and Tests Summary **One-liner:** CLI command orchestrating full scoring pipeline (known genes → composite scores → QC → validation) with comprehensive test coverage for NULL preservation and validation logic ## Tasks Completed ### Task 1: CLI Score Command with Checkpoint-Restart **Commit:** d57a5f2 Created `src/usher_pipeline/cli/score_cmd.py` following the established pattern from `evidence_cmd.py`: **Implementation:** - Single `score` command (not a group) with options: `--force`, `--skip-qc`, `--skip-validation` - Uses `@click.pass_context` to access config_path from CLI context - 5-step pipeline flow: 1. Load known genes (OMIM Usher + SYSCILIA SCGS) to DuckDB 2. Compute composite scores with NULL-preserving weighted average 3. Persist scored_genes table with per-layer contributions 4. Run QC checks (unless --skip-qc) with warnings/errors display 5. Validate known gene rankings (unless --skip-validation) with report generation **Checkpoint-restart:** - Checks `store.has_checkpoint('scored_genes')` before processing - If exists and not --force, displays summary and returns early - Allows fast iteration during development **CLI integration:** - Updated `src/usher_pipeline/cli/main.py` to import and register score command - Command appears in `usher-pipeline --help` alongside setup and evidence **Output:** - Displays comprehensive summary: total genes, mean score, quality flag distribution - Shows QC pass/fail status and validation pass/fail status - Saves provenance sidecar to `data_dir/scoring/scoring.provenance.json` **Verification:** ```bash # CLI help works usher-pipeline score --help # Shows --force, --skip-qc, --skip-validation options # Command registered usher-pipeline --help # Lists score command ``` ### Task 2: Unit and Integration Tests for Scoring Module **Commit:** a6ad6c6 Created comprehensive test coverage for scoring module with 10 tests using synthetic data. #### test_scoring.py (7 unit tests): 1. **test_compile_known_genes_returns_expected_structure** - Verifies DataFrame structure (gene_symbol, source, confidence columns) - Asserts >= 38 genes (10 OMIM Usher + 28 SYSCILIA SCGS v2) - Confirms MYO7A and IFT88 present - Validates all confidence = HIGH - Checks both sources present (omim_usher, syscilia_scgs_v2) 2. **test_compile_known_genes_no_duplicates_within_source** - Verifies no duplicate gene_symbol within same source - Allows genes to appear in both sources (separate rows) 3. **test_scoring_weights_validate_sum_defaults** - ScoringWeights() with defaults passes validate_sum() 4. **test_scoring_weights_validate_sum_custom_valid** - Custom weights summing to 1.0 pass validation 5. **test_scoring_weights_validate_sum_invalid** - Weights summing to 1.35 raise ValueError 6. **test_scoring_weights_validate_sum_close_to_one** - Weights within 1e-6 of 1.0 pass (0.999999) - Weights outside tolerance fail (0.99) 7. **test_null_preservation_in_composite** - Creates in-memory DuckDB with 3 genes - GENE1: 2 evidence layers (gnomad + annotation) → non-NULL score, moderate_evidence - GENE2: 1 evidence layer (gnomad only) → non-NULL score, sparse_evidence - GENE3: 0 evidence layers → **NULL score, no_evidence** - Verifies NULL preservation (not zero) for genes without evidence #### test_scoring_integration.py (3 integration tests): 1. **test_scoring_pipeline_end_to_end** - Creates synthetic store with 20 genes (17 generic + 3 known: MYO7A, IFT88, CDH23) - 6 evidence tables with varying NULL rates: - gnomad: 15/20 (75%) - expression: 12/20 (60%) - annotation: 18/20 (90%) - localization: 10/20 (50%) - animal_models: 8/20 (40%) - literature: 14/20 (70%) - Known genes receive high scores (0.8-0.95) across all 6 layers - Verifies: - All 20 genes present in results - Genes with evidence have non-NULL composite_score - Genes without evidence have NULL composite_score - evidence_count values correct (0-6) - quality_flag matches evidence_count thresholds - Known genes rank in top 5 (at least 2 of 3) 2. **test_qc_detects_missing_data** - Creates synthetic store with 100 genes - gnomad: 5% coverage (95% NULL) → should trigger ERROR (>80% threshold) - expression: 40% coverage (60% NULL) → should trigger WARNING (>50% threshold) - Other layers: >50% coverage (no warnings) - Verifies QC detects and reports errors for high missing data rates 3. **test_validation_passes_with_known_genes_ranked_highly** - Uses synthetic_store with known genes scoring highly - Loads known genes, computes scores, persists, runs validation - Verifies: - validation_passed = True - median_percentile >= 0.75 (top quartile threshold) - Known genes rank highly as expected **Test Execution:** ```bash pytest tests/test_scoring.py tests/test_scoring_integration.py -v # 10 passed, 7 warnings in 0.68s ``` ## Deviations from Plan None - plan executed exactly as written. ## Self-Check: PASSED ### Files Created: ```bash [ -f "/Users/gbanyan/Project/usher-exploring/src/usher_pipeline/cli/score_cmd.py" ] && echo "FOUND: score_cmd.py" || echo "MISSING: score_cmd.py" # FOUND: score_cmd.py [ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring.py" ] && echo "FOUND: test_scoring.py" || echo "MISSING: test_scoring.py" # FOUND: test_scoring.py [ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring_integration.py" ] && echo "FOUND: test_scoring_integration.py" || echo "MISSING: test_scoring_integration.py" # FOUND: test_scoring_integration.py ``` ### Commits Exist: ```bash git log --oneline --all | grep -q "d57a5f2" && echo "FOUND: d57a5f2" || echo "MISSING: d57a5f2" # FOUND: d57a5f2 git log --oneline --all | grep -q "a6ad6c6" && echo "FOUND: a6ad6c6" || echo "MISSING: a6ad6c6" # FOUND: a6ad6c6 ``` ### CLI Command Works: ```bash usher-pipeline score --help # Shows --force, --skip-qc, --skip-validation options usher-pipeline --help | grep score # score Compute multi-evidence composite scores for all genes. ``` ### Tests Pass: ```bash pytest tests/test_scoring.py tests/test_scoring_integration.py -v # 10 passed, 7 warnings in 0.68s ``` ## Verification All success criteria met: - ✅ `usher-pipeline score --help` shows available options - ✅ Score command registered in main CLI - ✅ Unit tests pass: known genes, weight validation, NULL handling - ✅ Integration tests pass: end-to-end scoring with synthetic data, QC detection, validation - ✅ All tests runnable with `pytest tests/test_scoring*.py` ## Notes - Tests use synthetic data exclusively (no external API calls, fast, reproducible) - NULL preservation pattern validated: genes with no evidence get NULL composite_score, not zero - Known genes designed to rank highly in synthetic data (0.8-0.95 scores across all layers) - QC thresholds: 50% missing = warning, 80% missing = error - Validation threshold: median percentile >= 0.75 (top quartile)