Files
gbanyan 386fbf51b2 docs(04-03): complete CLI score command and tests plan
- SUMMARY.md: CLI orchestration with checkpoint-restart + 10 comprehensive tests
- STATE.md: Updated position (Phase 4 complete), progress (75%), velocity, decisions
- Duration: 3 minutes, 2 tasks, 4 files (3 created, 1 modified)
2026-02-11 20:56:31 +08:00

8.4 KiB

phase, plan, subsystem, tags, dependency_graph, tech_stack, key_files, decisions, metrics
phase plan subsystem tags dependency_graph tech_stack key_files decisions metrics
04-scoring-integration 03 CLI and Testing
cli
scoring
testing
validation
checkpoint-restart
requires provides affects
04-01 (scoring integration module)
04-02 (QC and validation modules)
CLI score command with full pipeline orchestration
Comprehensive unit and integration test coverage
main CLI entry point (command registration)
test suite coverage metrics
added patterns
click command with --force, --skip-qc, --skip-validation options
pytest fixtures for synthetic DuckDB stores
Checkpoint-restart pattern (skip if scored_genes exists)
Synthetic data testing (no external dependencies)
NULL preservation validation in tests
created modified
src/usher_pipeline/cli/score_cmd.py (343 lines)
tests/test_scoring.py (269 lines)
tests/test_scoring_integration.py (305 lines)
src/usher_pipeline/cli/main.py (added score command registration)
Score command follows evidence_cmd.py pattern for consistency
Separate --skip-qc and --skip-validation flags for flexible iteration
Tests use tmp_path fixtures for isolated DuckDB instances
Synthetic test data designed to ensure known genes rank highly (0.8-0.95 scores across all layers)
duration_minutes tasks_completed files_created files_modified test_coverage completed_date
3 2 3 1 10 tests (7 unit + 3 integration) 2026-02-11

Phase 04 Plan 03: CLI Score Command and Tests Summary

One-liner: CLI command orchestrating full scoring pipeline (known genes → composite scores → QC → validation) with comprehensive test coverage for NULL preservation and validation logic

Tasks Completed

Task 1: CLI Score Command with Checkpoint-Restart

Commit: d57a5f2

Created src/usher_pipeline/cli/score_cmd.py following the established pattern from evidence_cmd.py:

Implementation:

  • Single score command (not a group) with options: --force, --skip-qc, --skip-validation
  • Uses @click.pass_context to access config_path from CLI context
  • 5-step pipeline flow:
    1. Load known genes (OMIM Usher + SYSCILIA SCGS) to DuckDB
    2. Compute composite scores with NULL-preserving weighted average
    3. Persist scored_genes table with per-layer contributions
    4. Run QC checks (unless --skip-qc) with warnings/errors display
    5. Validate known gene rankings (unless --skip-validation) with report generation

Checkpoint-restart:

  • Checks store.has_checkpoint('scored_genes') before processing
  • If exists and not --force, displays summary and returns early
  • Allows fast iteration during development

CLI integration:

  • Updated src/usher_pipeline/cli/main.py to import and register score command
  • Command appears in usher-pipeline --help alongside setup and evidence

Output:

  • Displays comprehensive summary: total genes, mean score, quality flag distribution
  • Shows QC pass/fail status and validation pass/fail status
  • Saves provenance sidecar to data_dir/scoring/scoring.provenance.json

Verification:

# CLI help works
usher-pipeline score --help  # Shows --force, --skip-qc, --skip-validation options

# Command registered
usher-pipeline --help  # Lists score command

Task 2: Unit and Integration Tests for Scoring Module

Commit: a6ad6c6

Created comprehensive test coverage for scoring module with 10 tests using synthetic data.

test_scoring.py (7 unit tests):

  1. test_compile_known_genes_returns_expected_structure

    • Verifies DataFrame structure (gene_symbol, source, confidence columns)
    • Asserts >= 38 genes (10 OMIM Usher + 28 SYSCILIA SCGS v2)
    • Confirms MYO7A and IFT88 present
    • Validates all confidence = HIGH
    • Checks both sources present (omim_usher, syscilia_scgs_v2)
  2. test_compile_known_genes_no_duplicates_within_source

    • Verifies no duplicate gene_symbol within same source
    • Allows genes to appear in both sources (separate rows)
  3. test_scoring_weights_validate_sum_defaults

    • ScoringWeights() with defaults passes validate_sum()
  4. test_scoring_weights_validate_sum_custom_valid

    • Custom weights summing to 1.0 pass validation
  5. test_scoring_weights_validate_sum_invalid

    • Weights summing to 1.35 raise ValueError
  6. test_scoring_weights_validate_sum_close_to_one

    • Weights within 1e-6 of 1.0 pass (0.999999)
    • Weights outside tolerance fail (0.99)
  7. test_null_preservation_in_composite

    • Creates in-memory DuckDB with 3 genes
    • GENE1: 2 evidence layers (gnomad + annotation) → non-NULL score, moderate_evidence
    • GENE2: 1 evidence layer (gnomad only) → non-NULL score, sparse_evidence
    • GENE3: 0 evidence layers → NULL score, no_evidence
    • Verifies NULL preservation (not zero) for genes without evidence

test_scoring_integration.py (3 integration tests):

  1. test_scoring_pipeline_end_to_end

    • Creates synthetic store with 20 genes (17 generic + 3 known: MYO7A, IFT88, CDH23)
    • 6 evidence tables with varying NULL rates:
      • gnomad: 15/20 (75%)
      • expression: 12/20 (60%)
      • annotation: 18/20 (90%)
      • localization: 10/20 (50%)
      • animal_models: 8/20 (40%)
      • literature: 14/20 (70%)
    • Known genes receive high scores (0.8-0.95) across all 6 layers
    • Verifies:
      • All 20 genes present in results
      • Genes with evidence have non-NULL composite_score
      • Genes without evidence have NULL composite_score
      • evidence_count values correct (0-6)
      • quality_flag matches evidence_count thresholds
      • Known genes rank in top 5 (at least 2 of 3)
  2. test_qc_detects_missing_data

    • Creates synthetic store with 100 genes
    • gnomad: 5% coverage (95% NULL) → should trigger ERROR (>80% threshold)
    • expression: 40% coverage (60% NULL) → should trigger WARNING (>50% threshold)
    • Other layers: >50% coverage (no warnings)
    • Verifies QC detects and reports errors for high missing data rates
  3. test_validation_passes_with_known_genes_ranked_highly

    • Uses synthetic_store with known genes scoring highly
    • Loads known genes, computes scores, persists, runs validation
    • Verifies:
      • validation_passed = True
      • median_percentile >= 0.75 (top quartile threshold)
      • Known genes rank highly as expected

Test Execution:

pytest tests/test_scoring.py tests/test_scoring_integration.py -v
# 10 passed, 7 warnings in 0.68s

Deviations from Plan

None - plan executed exactly as written.

Self-Check: PASSED

Files Created:

[ -f "/Users/gbanyan/Project/usher-exploring/src/usher_pipeline/cli/score_cmd.py" ] && echo "FOUND: score_cmd.py" || echo "MISSING: score_cmd.py"
# FOUND: score_cmd.py

[ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring.py" ] && echo "FOUND: test_scoring.py" || echo "MISSING: test_scoring.py"
# FOUND: test_scoring.py

[ -f "/Users/gbanyan/Project/usher-exploring/tests/test_scoring_integration.py" ] && echo "FOUND: test_scoring_integration.py" || echo "MISSING: test_scoring_integration.py"
# FOUND: test_scoring_integration.py

Commits Exist:

git log --oneline --all | grep -q "d57a5f2" && echo "FOUND: d57a5f2" || echo "MISSING: d57a5f2"
# FOUND: d57a5f2

git log --oneline --all | grep -q "a6ad6c6" && echo "FOUND: a6ad6c6" || echo "MISSING: a6ad6c6"
# FOUND: a6ad6c6

CLI Command Works:

usher-pipeline score --help
# Shows --force, --skip-qc, --skip-validation options

usher-pipeline --help | grep score
# score     Compute multi-evidence composite scores for all genes.

Tests Pass:

pytest tests/test_scoring.py tests/test_scoring_integration.py -v
# 10 passed, 7 warnings in 0.68s

Verification

All success criteria met:

  • usher-pipeline score --help shows available options
  • Score command registered in main CLI
  • Unit tests pass: known genes, weight validation, NULL handling
  • Integration tests pass: end-to-end scoring with synthetic data, QC detection, validation
  • All tests runnable with pytest tests/test_scoring*.py

Notes

  • Tests use synthetic data exclusively (no external API calls, fast, reproducible)
  • NULL preservation pattern validated: genes with no evidence get NULL composite_score, not zero
  • Known genes designed to rank highly in synthetic data (0.8-0.95 scores across all layers)
  • QC thresholds: 50% missing = warning, 80% missing = error
  • Validation threshold: median percentile >= 0.75 (top quartile)