diff --git a/.planning/STATE.md b/.planning/STATE.md index 0f705e8..93c5c00 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -10,17 +10,17 @@ See: .planning/PROJECT.md (updated 2026-02-11) ## Current Position Phase: 4 of 6 (Scoring and Integration) -Plan: 2 of 3 in current phase (in progress) -Status: Plan 04-02 complete — quality control and positive control validation -Last activity: 2026-02-11 — Completed 04-02-PLAN.md +Plan: 3 of 3 in current phase (complete) +Status: Phase 04 complete — CLI score command and comprehensive test coverage +Last activity: 2026-02-11 — Completed 04-03-PLAN.md -Progress: [███████░░░] 70.0% (14/20 plans complete across all phases) +Progress: [████████░░] 75.0% (15/20 plans complete across all phases) ## Performance Metrics **Velocity:** -- Total plans completed: 14 -- Average duration: 5.4 min +- Total plans completed: 15 +- Average duration: 5.1 min - Total execution time: 1.3 hours **By Phase:** @@ -30,17 +30,17 @@ Progress: [███████░░░] 70.0% (14/20 plans complete across al | 01 - Data Infrastructure | 4/4 | 14 min | 3.5 min/plan | | 02 - Prototype Evidence Layer | 2/2 | 8 min | 4.0 min/plan | | 03 - Core Evidence Layers | 6/6 | 52 min | 8.7 min/plan | -| 04 - Scoring Integration | 2/3 | 7 min | 3.5 min/plan | +| 04 - Scoring Integration | 3/3 | 10 min | 3.3 min/plan | **Recent Plan Details:** | Plan | Duration | Tasks | Files | |------|----------|-------|-------| -| Phase 03 P03 | 11 min | 2 tasks | 7 files | | Phase 03 P04 | 8 min | 2 tasks | 8 files | | Phase 03 P05 | 10 min | 2 tasks | 8 files | | Phase 03 P06 | 13 min | 2 tasks | 10 files | | Phase 04 P01 | 4 min | 2 tasks | 4 files | | Phase 04 P02 | 3 min | 2 tasks | 4 files | +| Phase 04 P03 | 3 min | 2 tasks | 4 files | ## Accumulated Context @@ -107,6 +107,10 @@ Recent decisions affecting current work: - [04-02]: Missing data thresholds: 50% warn, 80% error for graduated QC feedback - [04-02]: PERCENT_RANK validation computed before known gene exclusion (validates scoring system) - [04-02]: Top quartile validation criterion (median percentile >= 0.75 for known genes) +- [04-03]: Score command follows evidence_cmd.py pattern for consistency +- [04-03]: Separate --skip-qc and --skip-validation flags for flexible iteration +- [04-03]: Tests use tmp_path fixtures for isolated DuckDB instances +- [04-03]: Synthetic test data designed to ensure known genes rank highly (0.8-0.95 scores across all layers) ### Pending Todos @@ -119,5 +123,5 @@ None yet. ## Session Continuity Last session: 2026-02-11 - Plan execution -Stopped at: Completed 04-02-PLAN.md (Quality control and positive control validation) -Resume file: .planning/phases/04-scoring-integration/04-02-SUMMARY.md +Stopped at: Completed 04-03-PLAN.md (CLI score command and comprehensive tests) +Resume file: .planning/phases/04-scoring-integration/04-03-SUMMARY.md diff --git a/.planning/phases/04-scoring-integration/04-03-SUMMARY.md b/.planning/phases/04-scoring-integration/04-03-SUMMARY.md new file mode 100644 index 0000000..719f333 --- /dev/null +++ b/.planning/phases/04-scoring-integration/04-03-SUMMARY.md @@ -0,0 +1,233 @@ +--- +phase: 04-scoring-integration +plan: 03 +subsystem: CLI and Testing +tags: + - cli + - scoring + - testing + - validation + - checkpoint-restart +dependency_graph: + requires: + - 04-01 (scoring integration module) + - 04-02 (QC and validation modules) + provides: + - CLI score command with full pipeline orchestration + - Comprehensive unit and integration test coverage + affects: + - main CLI entry point (command registration) + - test suite coverage metrics +tech_stack: + added: + - click command with --force, --skip-qc, --skip-validation options + - pytest fixtures for synthetic DuckDB stores + patterns: + - Checkpoint-restart pattern (skip if scored_genes exists) + - Synthetic data testing (no external dependencies) + - NULL preservation validation in tests +key_files: + created: + - src/usher_pipeline/cli/score_cmd.py (343 lines) + - tests/test_scoring.py (269 lines) + - tests/test_scoring_integration.py (305 lines) + modified: + - src/usher_pipeline/cli/main.py (added score command registration) +decisions: + - Score command follows evidence_cmd.py pattern for consistency + - Separate --skip-qc and --skip-validation flags for flexible iteration + - Tests use tmp_path fixtures for isolated DuckDB instances + - Synthetic test data designed to ensure known genes rank highly (0.8-0.95 scores across all layers) +metrics: + duration_minutes: 3 + tasks_completed: 2 + files_created: 3 + files_modified: 1 + test_coverage: 10 tests (7 unit + 3 integration) + completed_date: 2026-02-11 +--- + +# Phase 04 Plan 03: CLI Score Command and Tests Summary + +**One-liner:** CLI command orchestrating full scoring pipeline (known genes → composite scores → QC → validation) with comprehensive test coverage for NULL preservation and validation logic + +## Tasks Completed + +### Task 1: CLI Score Command with Checkpoint-Restart + +**Commit:** d57a5f2 + +Created `src/usher_pipeline/cli/score_cmd.py` following the established pattern from `evidence_cmd.py`: + +**Implementation:** +- Single `score` command (not a group) with options: `--force`, `--skip-qc`, `--skip-validation` +- Uses `@click.pass_context` to access config_path from CLI context +- 5-step pipeline flow: + 1. Load known genes (OMIM Usher + SYSCILIA SCGS) to DuckDB + 2. Compute composite scores with NULL-preserving weighted average + 3. Persist scored_genes table with per-layer contributions + 4. Run QC checks (unless --skip-qc) with warnings/errors display + 5. Validate known gene rankings (unless --skip-validation) with report generation + +**Checkpoint-restart:** +- Checks `store.has_checkpoint('scored_genes')` before processing +- If exists and not --force, displays summary and returns early +- Allows fast iteration during development + +**CLI integration:** +- Updated `src/usher_pipeline/cli/main.py` to import and register score command +- Command appears in `usher-pipeline --help` alongside setup and evidence + +**Output:** +- Displays comprehensive summary: total genes, mean score, quality flag distribution +- Shows QC pass/fail status and validation pass/fail status +- Saves provenance sidecar to `data_dir/scoring/scoring.provenance.json` + +**Verification:** +```bash +# CLI help works +usher-pipeline score --help # Shows --force, --skip-qc, --skip-validation options + +# Command registered +usher-pipeline --help # Lists score command +``` + +### Task 2: Unit and Integration Tests for Scoring Module + +**Commit:** a6ad6c6 + +Created comprehensive test coverage for scoring module with 10 tests using synthetic data. + +#### test_scoring.py (7 unit tests): + +1. **test_compile_known_genes_returns_expected_structure** + - Verifies DataFrame structure (gene_symbol, source, confidence columns) + - Asserts >= 38 genes (10 OMIM Usher + 28 SYSCILIA SCGS v2) + - Confirms MYO7A and IFT88 present + - Validates all confidence = HIGH + - Checks both sources present (omim_usher, syscilia_scgs_v2) + +2. **test_compile_known_genes_no_duplicates_within_source** + - Verifies no duplicate gene_symbol within same source + - Allows genes to appear in both sources (separate rows) + +3. **test_scoring_weights_validate_sum_defaults** + - ScoringWeights() with defaults passes validate_sum() + +4. **test_scoring_weights_validate_sum_custom_valid** + - Custom weights summing to 1.0 pass validation + +5. **test_scoring_weights_validate_sum_invalid** + - Weights summing to 1.35 raise ValueError + +6. **test_scoring_weights_validate_sum_close_to_one** + - Weights within 1e-6 of 1.0 pass (0.999999) + - Weights outside tolerance fail (0.99) + +7. **test_null_preservation_in_composite** + - Creates in-memory DuckDB with 3 genes + - GENE1: 2 evidence layers (gnomad + annotation) → non-NULL score, moderate_evidence + - GENE2: 1 evidence layer (gnomad only) → non-NULL score, sparse_evidence + - GENE3: 0 evidence layers → **NULL score, no_evidence** + - Verifies NULL preservation (not zero) for genes without evidence + +#### test_scoring_integration.py (3 integration tests): + +1. **test_scoring_pipeline_end_to_end** + - Creates synthetic store with 20 genes (17 generic + 3 known: MYO7A, IFT88, CDH23) + - 6 evidence tables with varying NULL rates: + - gnomad: 15/20 (75%) + - expression: 12/20 (60%) + - annotation: 18/20 (90%) + - localization: 10/20 (50%) + - animal_models: 8/20 (40%) + - literature: 14/20 (70%) + - Known genes receive high scores (0.8-0.95) across all 6 layers + - Verifies: + - All 20 genes present in results + - Genes with evidence have non-NULL composite_score + - Genes without evidence have NULL composite_score + - evidence_count values correct (0-6) + - quality_flag matches evidence_count thresholds + - Known genes rank in top 5 (at least 2 of 3) + +2. **test_qc_detects_missing_data** + - Creates synthetic store with 100 genes + - gnomad: 5% coverage (95% NULL) → should trigger ERROR (>80% threshold) + - expression: 40% coverage (60% NULL) → should trigger WARNING (>50% threshold) + - Other layers: >50% coverage (no warnings) + - Verifies QC detects and reports errors for high missing data rates + +3. **test_validation_passes_with_known_genes_ranked_highly** + - Uses synthetic_store with known genes scoring highly + - Loads known genes, computes scores, persists, runs validation + - Verifies: + - validation_passed = True + - median_percentile >= 0.75 (top quartile threshold) + - Known genes rank highly as expected + +**Test Execution:** +```bash +pytest tests/test_scoring.py tests/test_scoring_integration.py -v +# 10 passed, 7 warnings in 0.68s +``` + +## Deviations from Plan + +None - plan executed exactly as written. + +## Self-Check: PASSED + +### Files Created: +```bash +[ -f "/Users/gbanyan/Project/usher-exploring/src/usher_pipeline/cli/score_cmd.py" ] && echo "FOUND: score_cmd.py" || echo "MISSING: score_cmd.py" +# FOUND: score_cmd.py + +[ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring.py" ] && echo "FOUND: test_scoring.py" || echo "MISSING: test_scoring.py" +# FOUND: test_scoring.py + +[ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring_integration.py" ] && echo "FOUND: test_scoring_integration.py" || echo "MISSING: test_scoring_integration.py" +# FOUND: test_scoring_integration.py +``` + +### Commits Exist: +```bash +git log --oneline --all | grep -q "d57a5f2" && echo "FOUND: d57a5f2" || echo "MISSING: d57a5f2" +# FOUND: d57a5f2 + +git log --oneline --all | grep -q "a6ad6c6" && echo "FOUND: a6ad6c6" || echo "MISSING: a6ad6c6" +# FOUND: a6ad6c6 +``` + +### CLI Command Works: +```bash +usher-pipeline score --help +# Shows --force, --skip-qc, --skip-validation options + +usher-pipeline --help | grep score +# score Compute multi-evidence composite scores for all genes. +``` + +### Tests Pass: +```bash +pytest tests/test_scoring.py tests/test_scoring_integration.py -v +# 10 passed, 7 warnings in 0.68s +``` + +## Verification + +All success criteria met: + +- ✅ `usher-pipeline score --help` shows available options +- ✅ Score command registered in main CLI +- ✅ Unit tests pass: known genes, weight validation, NULL handling +- ✅ Integration tests pass: end-to-end scoring with synthetic data, QC detection, validation +- ✅ All tests runnable with `pytest tests/test_scoring*.py` + +## Notes + +- Tests use synthetic data exclusively (no external API calls, fast, reproducible) +- NULL preservation pattern validated: genes with no evidence get NULL composite_score, not zero +- Known genes designed to rank highly in synthetic data (0.8-0.95 scores across all layers) +- QC thresholds: 50% missing = warning, 80% missing = error +- Validation threshold: median percentile >= 0.75 (top quartile)