docs(04-03): complete CLI score command and tests plan
- SUMMARY.md: CLI orchestration with checkpoint-restart + 10 comprehensive tests - STATE.md: Updated position (Phase 4 complete), progress (75%), velocity, decisions - Duration: 3 minutes, 2 tasks, 4 files (3 created, 1 modified)
This commit is contained in:
@@ -10,17 +10,17 @@ See: .planning/PROJECT.md (updated 2026-02-11)
|
|||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 4 of 6 (Scoring and Integration)
|
Phase: 4 of 6 (Scoring and Integration)
|
||||||
Plan: 2 of 3 in current phase (in progress)
|
Plan: 3 of 3 in current phase (complete)
|
||||||
Status: Plan 04-02 complete — quality control and positive control validation
|
Status: Phase 04 complete — CLI score command and comprehensive test coverage
|
||||||
Last activity: 2026-02-11 — Completed 04-02-PLAN.md
|
Last activity: 2026-02-11 — Completed 04-03-PLAN.md
|
||||||
|
|
||||||
Progress: [███████░░░] 70.0% (14/20 plans complete across all phases)
|
Progress: [████████░░] 75.0% (15/20 plans complete across all phases)
|
||||||
|
|
||||||
## Performance Metrics
|
## Performance Metrics
|
||||||
|
|
||||||
**Velocity:**
|
**Velocity:**
|
||||||
- Total plans completed: 14
|
- Total plans completed: 15
|
||||||
- Average duration: 5.4 min
|
- Average duration: 5.1 min
|
||||||
- Total execution time: 1.3 hours
|
- Total execution time: 1.3 hours
|
||||||
|
|
||||||
**By Phase:**
|
**By Phase:**
|
||||||
@@ -30,17 +30,17 @@ Progress: [███████░░░] 70.0% (14/20 plans complete across al
|
|||||||
| 01 - Data Infrastructure | 4/4 | 14 min | 3.5 min/plan |
|
| 01 - Data Infrastructure | 4/4 | 14 min | 3.5 min/plan |
|
||||||
| 02 - Prototype Evidence Layer | 2/2 | 8 min | 4.0 min/plan |
|
| 02 - Prototype Evidence Layer | 2/2 | 8 min | 4.0 min/plan |
|
||||||
| 03 - Core Evidence Layers | 6/6 | 52 min | 8.7 min/plan |
|
| 03 - Core Evidence Layers | 6/6 | 52 min | 8.7 min/plan |
|
||||||
| 04 - Scoring Integration | 2/3 | 7 min | 3.5 min/plan |
|
| 04 - Scoring Integration | 3/3 | 10 min | 3.3 min/plan |
|
||||||
|
|
||||||
**Recent Plan Details:**
|
**Recent Plan Details:**
|
||||||
| Plan | Duration | Tasks | Files |
|
| Plan | Duration | Tasks | Files |
|
||||||
|------|----------|-------|-------|
|
|------|----------|-------|-------|
|
||||||
| Phase 03 P03 | 11 min | 2 tasks | 7 files |
|
|
||||||
| Phase 03 P04 | 8 min | 2 tasks | 8 files |
|
| Phase 03 P04 | 8 min | 2 tasks | 8 files |
|
||||||
| Phase 03 P05 | 10 min | 2 tasks | 8 files |
|
| Phase 03 P05 | 10 min | 2 tasks | 8 files |
|
||||||
| Phase 03 P06 | 13 min | 2 tasks | 10 files |
|
| Phase 03 P06 | 13 min | 2 tasks | 10 files |
|
||||||
| Phase 04 P01 | 4 min | 2 tasks | 4 files |
|
| Phase 04 P01 | 4 min | 2 tasks | 4 files |
|
||||||
| Phase 04 P02 | 3 min | 2 tasks | 4 files |
|
| Phase 04 P02 | 3 min | 2 tasks | 4 files |
|
||||||
|
| Phase 04 P03 | 3 min | 2 tasks | 4 files |
|
||||||
|
|
||||||
## Accumulated Context
|
## Accumulated Context
|
||||||
|
|
||||||
@@ -107,6 +107,10 @@ Recent decisions affecting current work:
|
|||||||
- [04-02]: Missing data thresholds: 50% warn, 80% error for graduated QC feedback
|
- [04-02]: Missing data thresholds: 50% warn, 80% error for graduated QC feedback
|
||||||
- [04-02]: PERCENT_RANK validation computed before known gene exclusion (validates scoring system)
|
- [04-02]: PERCENT_RANK validation computed before known gene exclusion (validates scoring system)
|
||||||
- [04-02]: Top quartile validation criterion (median percentile >= 0.75 for known genes)
|
- [04-02]: Top quartile validation criterion (median percentile >= 0.75 for known genes)
|
||||||
|
- [04-03]: Score command follows evidence_cmd.py pattern for consistency
|
||||||
|
- [04-03]: Separate --skip-qc and --skip-validation flags for flexible iteration
|
||||||
|
- [04-03]: Tests use tmp_path fixtures for isolated DuckDB instances
|
||||||
|
- [04-03]: Synthetic test data designed to ensure known genes rank highly (0.8-0.95 scores across all layers)
|
||||||
|
|
||||||
### Pending Todos
|
### Pending Todos
|
||||||
|
|
||||||
@@ -119,5 +123,5 @@ None yet.
|
|||||||
## Session Continuity
|
## Session Continuity
|
||||||
|
|
||||||
Last session: 2026-02-11 - Plan execution
|
Last session: 2026-02-11 - Plan execution
|
||||||
Stopped at: Completed 04-02-PLAN.md (Quality control and positive control validation)
|
Stopped at: Completed 04-03-PLAN.md (CLI score command and comprehensive tests)
|
||||||
Resume file: .planning/phases/04-scoring-integration/04-02-SUMMARY.md
|
Resume file: .planning/phases/04-scoring-integration/04-03-SUMMARY.md
|
||||||
|
|||||||
233
.planning/phases/04-scoring-integration/04-03-SUMMARY.md
Normal file
233
.planning/phases/04-scoring-integration/04-03-SUMMARY.md
Normal file
@@ -0,0 +1,233 @@
|
|||||||
|
---
|
||||||
|
phase: 04-scoring-integration
|
||||||
|
plan: 03
|
||||||
|
subsystem: CLI and Testing
|
||||||
|
tags:
|
||||||
|
- cli
|
||||||
|
- scoring
|
||||||
|
- testing
|
||||||
|
- validation
|
||||||
|
- checkpoint-restart
|
||||||
|
dependency_graph:
|
||||||
|
requires:
|
||||||
|
- 04-01 (scoring integration module)
|
||||||
|
- 04-02 (QC and validation modules)
|
||||||
|
provides:
|
||||||
|
- CLI score command with full pipeline orchestration
|
||||||
|
- Comprehensive unit and integration test coverage
|
||||||
|
affects:
|
||||||
|
- main CLI entry point (command registration)
|
||||||
|
- test suite coverage metrics
|
||||||
|
tech_stack:
|
||||||
|
added:
|
||||||
|
- click command with --force, --skip-qc, --skip-validation options
|
||||||
|
- pytest fixtures for synthetic DuckDB stores
|
||||||
|
patterns:
|
||||||
|
- Checkpoint-restart pattern (skip if scored_genes exists)
|
||||||
|
- Synthetic data testing (no external dependencies)
|
||||||
|
- NULL preservation validation in tests
|
||||||
|
key_files:
|
||||||
|
created:
|
||||||
|
- src/usher_pipeline/cli/score_cmd.py (343 lines)
|
||||||
|
- tests/test_scoring.py (269 lines)
|
||||||
|
- tests/test_scoring_integration.py (305 lines)
|
||||||
|
modified:
|
||||||
|
- src/usher_pipeline/cli/main.py (added score command registration)
|
||||||
|
decisions:
|
||||||
|
- Score command follows evidence_cmd.py pattern for consistency
|
||||||
|
- Separate --skip-qc and --skip-validation flags for flexible iteration
|
||||||
|
- Tests use tmp_path fixtures for isolated DuckDB instances
|
||||||
|
- Synthetic test data designed to ensure known genes rank highly (0.8-0.95 scores across all layers)
|
||||||
|
metrics:
|
||||||
|
duration_minutes: 3
|
||||||
|
tasks_completed: 2
|
||||||
|
files_created: 3
|
||||||
|
files_modified: 1
|
||||||
|
test_coverage: 10 tests (7 unit + 3 integration)
|
||||||
|
completed_date: 2026-02-11
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 04 Plan 03: CLI Score Command and Tests Summary
|
||||||
|
|
||||||
|
**One-liner:** CLI command orchestrating full scoring pipeline (known genes → composite scores → QC → validation) with comprehensive test coverage for NULL preservation and validation logic
|
||||||
|
|
||||||
|
## Tasks Completed
|
||||||
|
|
||||||
|
### Task 1: CLI Score Command with Checkpoint-Restart
|
||||||
|
|
||||||
|
**Commit:** d57a5f2
|
||||||
|
|
||||||
|
Created `src/usher_pipeline/cli/score_cmd.py` following the established pattern from `evidence_cmd.py`:
|
||||||
|
|
||||||
|
**Implementation:**
|
||||||
|
- Single `score` command (not a group) with options: `--force`, `--skip-qc`, `--skip-validation`
|
||||||
|
- Uses `@click.pass_context` to access config_path from CLI context
|
||||||
|
- 5-step pipeline flow:
|
||||||
|
1. Load known genes (OMIM Usher + SYSCILIA SCGS) to DuckDB
|
||||||
|
2. Compute composite scores with NULL-preserving weighted average
|
||||||
|
3. Persist scored_genes table with per-layer contributions
|
||||||
|
4. Run QC checks (unless --skip-qc) with warnings/errors display
|
||||||
|
5. Validate known gene rankings (unless --skip-validation) with report generation
|
||||||
|
|
||||||
|
**Checkpoint-restart:**
|
||||||
|
- Checks `store.has_checkpoint('scored_genes')` before processing
|
||||||
|
- If exists and not --force, displays summary and returns early
|
||||||
|
- Allows fast iteration during development
|
||||||
|
|
||||||
|
**CLI integration:**
|
||||||
|
- Updated `src/usher_pipeline/cli/main.py` to import and register score command
|
||||||
|
- Command appears in `usher-pipeline --help` alongside setup and evidence
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
- Displays comprehensive summary: total genes, mean score, quality flag distribution
|
||||||
|
- Shows QC pass/fail status and validation pass/fail status
|
||||||
|
- Saves provenance sidecar to `data_dir/scoring/scoring.provenance.json`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
```bash
|
||||||
|
# CLI help works
|
||||||
|
usher-pipeline score --help # Shows --force, --skip-qc, --skip-validation options
|
||||||
|
|
||||||
|
# Command registered
|
||||||
|
usher-pipeline --help # Lists score command
|
||||||
|
```
|
||||||
|
|
||||||
|
### Task 2: Unit and Integration Tests for Scoring Module
|
||||||
|
|
||||||
|
**Commit:** a6ad6c6
|
||||||
|
|
||||||
|
Created comprehensive test coverage for scoring module with 10 tests using synthetic data.
|
||||||
|
|
||||||
|
#### test_scoring.py (7 unit tests):
|
||||||
|
|
||||||
|
1. **test_compile_known_genes_returns_expected_structure**
|
||||||
|
- Verifies DataFrame structure (gene_symbol, source, confidence columns)
|
||||||
|
- Asserts >= 38 genes (10 OMIM Usher + 28 SYSCILIA SCGS v2)
|
||||||
|
- Confirms MYO7A and IFT88 present
|
||||||
|
- Validates all confidence = HIGH
|
||||||
|
- Checks both sources present (omim_usher, syscilia_scgs_v2)
|
||||||
|
|
||||||
|
2. **test_compile_known_genes_no_duplicates_within_source**
|
||||||
|
- Verifies no duplicate gene_symbol within same source
|
||||||
|
- Allows genes to appear in both sources (separate rows)
|
||||||
|
|
||||||
|
3. **test_scoring_weights_validate_sum_defaults**
|
||||||
|
- ScoringWeights() with defaults passes validate_sum()
|
||||||
|
|
||||||
|
4. **test_scoring_weights_validate_sum_custom_valid**
|
||||||
|
- Custom weights summing to 1.0 pass validation
|
||||||
|
|
||||||
|
5. **test_scoring_weights_validate_sum_invalid**
|
||||||
|
- Weights summing to 1.35 raise ValueError
|
||||||
|
|
||||||
|
6. **test_scoring_weights_validate_sum_close_to_one**
|
||||||
|
- Weights within 1e-6 of 1.0 pass (0.999999)
|
||||||
|
- Weights outside tolerance fail (0.99)
|
||||||
|
|
||||||
|
7. **test_null_preservation_in_composite**
|
||||||
|
- Creates in-memory DuckDB with 3 genes
|
||||||
|
- GENE1: 2 evidence layers (gnomad + annotation) → non-NULL score, moderate_evidence
|
||||||
|
- GENE2: 1 evidence layer (gnomad only) → non-NULL score, sparse_evidence
|
||||||
|
- GENE3: 0 evidence layers → **NULL score, no_evidence**
|
||||||
|
- Verifies NULL preservation (not zero) for genes without evidence
|
||||||
|
|
||||||
|
#### test_scoring_integration.py (3 integration tests):
|
||||||
|
|
||||||
|
1. **test_scoring_pipeline_end_to_end**
|
||||||
|
- Creates synthetic store with 20 genes (17 generic + 3 known: MYO7A, IFT88, CDH23)
|
||||||
|
- 6 evidence tables with varying NULL rates:
|
||||||
|
- gnomad: 15/20 (75%)
|
||||||
|
- expression: 12/20 (60%)
|
||||||
|
- annotation: 18/20 (90%)
|
||||||
|
- localization: 10/20 (50%)
|
||||||
|
- animal_models: 8/20 (40%)
|
||||||
|
- literature: 14/20 (70%)
|
||||||
|
- Known genes receive high scores (0.8-0.95) across all 6 layers
|
||||||
|
- Verifies:
|
||||||
|
- All 20 genes present in results
|
||||||
|
- Genes with evidence have non-NULL composite_score
|
||||||
|
- Genes without evidence have NULL composite_score
|
||||||
|
- evidence_count values correct (0-6)
|
||||||
|
- quality_flag matches evidence_count thresholds
|
||||||
|
- Known genes rank in top 5 (at least 2 of 3)
|
||||||
|
|
||||||
|
2. **test_qc_detects_missing_data**
|
||||||
|
- Creates synthetic store with 100 genes
|
||||||
|
- gnomad: 5% coverage (95% NULL) → should trigger ERROR (>80% threshold)
|
||||||
|
- expression: 40% coverage (60% NULL) → should trigger WARNING (>50% threshold)
|
||||||
|
- Other layers: >50% coverage (no warnings)
|
||||||
|
- Verifies QC detects and reports errors for high missing data rates
|
||||||
|
|
||||||
|
3. **test_validation_passes_with_known_genes_ranked_highly**
|
||||||
|
- Uses synthetic_store with known genes scoring highly
|
||||||
|
- Loads known genes, computes scores, persists, runs validation
|
||||||
|
- Verifies:
|
||||||
|
- validation_passed = True
|
||||||
|
- median_percentile >= 0.75 (top quartile threshold)
|
||||||
|
- Known genes rank highly as expected
|
||||||
|
|
||||||
|
**Test Execution:**
|
||||||
|
```bash
|
||||||
|
pytest tests/test_scoring.py tests/test_scoring_integration.py -v
|
||||||
|
# 10 passed, 7 warnings in 0.68s
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
None - plan executed exactly as written.
|
||||||
|
|
||||||
|
## Self-Check: PASSED
|
||||||
|
|
||||||
|
### Files Created:
|
||||||
|
```bash
|
||||||
|
[ -f "/Users/gbanyan/Project/usher-exploring/src/usher_pipeline/cli/score_cmd.py" ] && echo "FOUND: score_cmd.py" || echo "MISSING: score_cmd.py"
|
||||||
|
# FOUND: score_cmd.py
|
||||||
|
|
||||||
|
[ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring.py" ] && echo "FOUND: test_scoring.py" || echo "MISSING: test_scoring.py"
|
||||||
|
# FOUND: test_scoring.py
|
||||||
|
|
||||||
|
[ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring_integration.py" ] && echo "FOUND: test_scoring_integration.py" || echo "MISSING: test_scoring_integration.py"
|
||||||
|
# FOUND: test_scoring_integration.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Commits Exist:
|
||||||
|
```bash
|
||||||
|
git log --oneline --all | grep -q "d57a5f2" && echo "FOUND: d57a5f2" || echo "MISSING: d57a5f2"
|
||||||
|
# FOUND: d57a5f2
|
||||||
|
|
||||||
|
git log --oneline --all | grep -q "a6ad6c6" && echo "FOUND: a6ad6c6" || echo "MISSING: a6ad6c6"
|
||||||
|
# FOUND: a6ad6c6
|
||||||
|
```
|
||||||
|
|
||||||
|
### CLI Command Works:
|
||||||
|
```bash
|
||||||
|
usher-pipeline score --help
|
||||||
|
# Shows --force, --skip-qc, --skip-validation options
|
||||||
|
|
||||||
|
usher-pipeline --help | grep score
|
||||||
|
# score Compute multi-evidence composite scores for all genes.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tests Pass:
|
||||||
|
```bash
|
||||||
|
pytest tests/test_scoring.py tests/test_scoring_integration.py -v
|
||||||
|
# 10 passed, 7 warnings in 0.68s
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
All success criteria met:
|
||||||
|
|
||||||
|
- ✅ `usher-pipeline score --help` shows available options
|
||||||
|
- ✅ Score command registered in main CLI
|
||||||
|
- ✅ Unit tests pass: known genes, weight validation, NULL handling
|
||||||
|
- ✅ Integration tests pass: end-to-end scoring with synthetic data, QC detection, validation
|
||||||
|
- ✅ All tests runnable with `pytest tests/test_scoring*.py`
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Tests use synthetic data exclusively (no external API calls, fast, reproducible)
|
||||||
|
- NULL preservation pattern validated: genes with no evidence get NULL composite_score, not zero
|
||||||
|
- Known genes designed to rank highly in synthetic data (0.8-0.95 scores across all layers)
|
||||||
|
- QC thresholds: 50% missing = warning, 80% missing = error
|
||||||
|
- Validation threshold: median percentile >= 0.75 (top quartile)
|
||||||
Reference in New Issue
Block a user