Files
usher-exploring/.planning/phases/04-scoring-integration/04-03-PLAN.md

12 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
phase plan type wave depends_on files_modified autonomous must_haves
04-scoring-integration 03 execute 3
04-01
04-02
src/usher_pipeline/cli/score_cmd.py
src/usher_pipeline/cli/main.py
tests/test_scoring.py
tests/test_scoring_integration.py
true
truths artifacts key_links
CLI 'usher-pipeline score' command orchestrates full scoring pipeline with checkpoint-restart
Scoring pipeline can be run end-to-end on synthetic test data
Unit tests verify NULL preservation, weight validation, and known gene compilation
Integration test verifies full scoring pipeline with synthetic evidence data
path provides contains
src/usher_pipeline/cli/score_cmd.py CLI command for scoring pipeline orchestration click.command
path provides contains
tests/test_scoring.py Unit tests for scoring module test_compile_known_genes
path provides contains
tests/test_scoring_integration.py Integration tests for full scoring pipeline test_scoring_pipeline
from to via pattern
src/usher_pipeline/cli/score_cmd.py src/usher_pipeline/scoring/ imports integration, known_genes, quality_control, validation from usher_pipeline.scoring import
from to via pattern
src/usher_pipeline/cli/main.py src/usher_pipeline/cli/score_cmd.py cli.add_command(score) add_command.*score
from to via pattern
tests/test_scoring_integration.py src/usher_pipeline/scoring/integration.py synthetic DuckDB data -> compute_composite_scores compute_composite_scores
Create CLI score command and comprehensive tests for the scoring module.

Purpose: The CLI command provides the user-facing interface for running the scoring pipeline (integrating all evidence, running QC, validating against known genes). Tests ensure correctness of NULL handling, weight validation, and end-to-end scoring with synthetic data.

Output: src/usher_pipeline/cli/score_cmd.py, tests/test_scoring.py, tests/test_scoring_integration.py.

<execution_context> @/Users/gbanyan/.claude/get-shit-done/workflows/execute-plan.md @/Users/gbanyan/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/04-scoring-integration/04-RESEARCH.md @.planning/phases/04-scoring-integration/04-01-SUMMARY.md @.planning/phases/04-scoring-integration/04-02-SUMMARY.md @src/usher_pipeline/cli/evidence_cmd.py @src/usher_pipeline/cli/main.py @src/usher_pipeline/scoring/__init__.py Task 1: CLI score command with checkpoint-restart src/usher_pipeline/cli/score_cmd.py src/usher_pipeline/cli/main.py 1. Create `src/usher_pipeline/cli/score_cmd.py` following the established pattern from `evidence_cmd.py`:

Import: click, structlog, sys, Path, load_config, PipelineStore, ProvenanceTracker, and from scoring module: load_known_genes_to_duckdb, compute_composite_scores, persist_scored_genes, run_qc_checks, validate_known_gene_ranking, generate_validation_report. Import ScoringWeights from config.schema.

Create Click command score (not a group -- single command):

  • Options:
    • --force (is_flag): Re-run scoring even if scored_genes checkpoint exists
    • --skip-qc (is_flag): Skip quality control checks (for faster iteration)
    • --skip-validation (is_flag): Skip known gene validation
  • Uses @click.pass_context to get config_path from ctx.obj['config_path']

Implementation flow (follows evidence_cmd.py pattern): a. Load config, initialize store and provenance b. Check checkpoint: store.has_checkpoint('scored_genes') -- if exists and not --force, show summary and return c. Load and validate scoring weights: config.scoring, call validate_sum() d. Step 1 - Load known genes: call load_known_genes_to_duckdb(store), display count e. Step 2 - Compute composite scores: call compute_composite_scores(store, config.scoring), display summary (total genes, mean score, quality flag distribution) f. Step 3 - Persist scores: call persist_scored_genes(store, scored_df, config.scoring) g. Step 4 (unless --skip-qc) - Run QC: call run_qc_checks(store), display warnings/errors, log missing data rates h. Step 5 (unless --skip-validation) - Validate: call validate_known_gene_ranking(store), display results with generate_validation_report() i. Save provenance sidecar to data_dir/scoring/scoring.provenance.json j. Display final summary: total scored genes, mean composite score, quality flag counts, QC pass/fail, validation pass/fail

Use Click styling consistent with evidence_cmd.py: click.style("=== Title ===", bold=True), green for success, yellow for warnings, red for errors.

Error handling: wrap each step in try/except, display error with click.style(fg='red'), sys.exit(1). Always close store in finally block.

  1. Update src/usher_pipeline/cli/main.py:
    • Import score command from score_cmd.py
    • Add it to the CLI group: cli.add_command(score) (same pattern as evidence command)
    • The score command should be a top-level command (not nested under evidence), since it's a different pipeline phase Run: cd /Users/gbanyan/Project/usher-exploring && python -c " from usher_pipeline.cli.score_cmd import score import click.testing runner = click.testing.CliRunner() result = runner.invoke(score, ['--help']) print(result.output) assert result.exit_code == 0 assert '--force' in result.output assert '--skip-qc' in result.output assert '--skip-validation' in result.output print('CLI score command --help works') " && python -c " from usher_pipeline.cli.main import cli import click.testing runner = click.testing.CliRunner() result = runner.invoke(cli, ['--help']) print(result.output) assert 'score' in result.output, 'score command not registered in main CLI' print('Score command registered in main CLI') "
    • usher-pipeline score command exists with --force, --skip-qc, --skip-validation options
    • Score command registered in main CLI group
    • Follows established pattern: config load -> checkpoint check -> process -> persist -> provenance
    • Orchestrates full pipeline: known genes -> scoring -> QC -> validation
Task 2: Unit and integration tests for scoring module tests/test_scoring.py tests/test_scoring_integration.py 1. Create `tests/test_scoring.py` with unit tests:

Import: pytest, polars, from scoring module all functions, ScoringWeights.

Test class or functions:

a. test_compile_known_genes_returns_expected_structure: - Call compile_known_genes() - Assert returns polars DataFrame with columns: gene_symbol, source, confidence - Assert height >= 38 (10 Usher + 28+ SYSCILIA) - Assert "MYO7A" in gene_symbol values - Assert "IFT88" in gene_symbol values - Assert all confidence values == "HIGH" - Assert sources include both "omim_usher" and "syscilia_scgs_v2"

b. test_compile_known_genes_no_duplicates_within_source: - Verify no duplicate gene_symbol within the same source - (A gene CAN appear in both sources as separate rows)

c. test_scoring_weights_validate_sum_defaults: - ScoringWeights() with defaults should pass validate_sum()

d. test_scoring_weights_validate_sum_custom_valid: - ScoringWeights with custom weights summing to 1.0 should pass

e. test_scoring_weights_validate_sum_invalid: - ScoringWeights(gnomad=0.5) sums to 1.35 -> validate_sum() raises ValueError

f. test_scoring_weights_validate_sum_close_to_one: - Weights that sum to 0.999999 (within 1e-6) should pass - Weights that sum to 0.99 should fail

g. test_null_preservation_in_composite: - Create a synthetic PipelineStore (in-memory DuckDB: duckdb.connect(':memory:')) - Create a minimal gene_universe table with 3 genes - Create gnomad_constraint table with scores for genes 1 and 2 (gene 3 has no entry) - Create annotation_completeness with scores for gene 1 only - Create empty/missing entries for other evidence tables (create them with no rows or only partial rows) - Call join_evidence_layers and verify gene 3 has NULL gnomad_score and NULL annotation_score - Call compute_composite_scores and verify gene 3 with zero evidence layers has composite_score = NULL

  1. Create tests/test_scoring_integration.py with integration tests:

    a. test_scoring_pipeline_end_to_end:

    • Create in-memory PipelineStore (wrap duckdb.connect(':memory:') in a PipelineStore-like interface, OR create tmp file with pytest tmp_path)
    • Create synthetic tables for all 7 tables (gene_universe + 6 evidence):
      • gene_universe: 20 genes (gene_001 through gene_020) with gene_symbols
      • Include some known genes (MYO7A, IFT88, CDH23) in the gene universe as genes 18-20
      • gnomad_constraint: 15 genes with loeuf_normalized scores, 5 NULL
      • tissue_expression: 12 genes with expression_score_normalized, 8 NULL
      • annotation_completeness: 18 genes with annotation_score_normalized
      • subcellular_localization: 10 genes with localization_score_normalized
      • animal_model_phenotypes: 8 genes with animal_model_score_normalized
      • literature_evidence: 14 genes with literature_score_normalized
      • Give known genes (MYO7A, IFT88, CDH23) HIGH scores in multiple layers (0.8-0.95) to ensure they rank highly
    • Run compute_composite_scores with default ScoringWeights
    • Assert: all 20 genes present in result
    • Assert: composite_score is not NULL for genes with at least 1 evidence layer
    • Assert: evidence_count values are correct (count of non-NULL scores)
    • Assert: quality_flag values are correct based on evidence_count
    • Assert: known genes (MYO7A, IFT88, CDH23) have high composite scores (among top 5)

    b. test_qc_detects_missing_data:

    • Create scored_genes table where one layer is 90% NULL
    • Run run_qc_checks
    • Assert that layer appears in errors (>80% missing)

    c. test_validation_passes_with_known_genes_ranked_highly:

    • Use scored_genes from end-to-end test (known genes scored highly)
    • Run validate_known_gene_ranking
    • Assert validation_passed is True

    Use tmp_path fixture for DuckDB file-based stores. Use PipelineStore(tmp_path / "test.duckdb") for store creation. Follow existing test patterns from tests/test_gnomad_integration.py. Run: cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_scoring.py tests/test_scoring_integration.py -v --tb=short 2>&1 | tail -30

    • test_scoring.py: 7+ unit tests covering known genes, weight validation, NULL preservation
    • test_scoring_integration.py: 3+ integration tests covering end-to-end pipeline with synthetic data
    • All tests pass with pytest tests/test_scoring.py tests/test_scoring_integration.py
    • Tests verify NULL preservation (genes with no evidence get NULL composite score)
    • Tests verify known genes rank highly when given high scores
- `usher-pipeline score --help` shows available options - Score command registered in main CLI - Unit tests pass: known genes, weight validation, NULL handling - Integration tests pass: end-to-end scoring with synthetic data, QC detection, validation - All tests runnable with `pytest tests/test_scoring*.py`

<success_criteria>

  • CLI score command orchestrates: known genes -> composite scoring -> QC -> validation
  • Checkpoint-restart: skips if scored_genes table exists (unless --force)
  • pytest tests/test_scoring.py passes all unit tests
  • pytest tests/test_scoring_integration.py passes all integration tests
  • Tests use synthetic data (no external API calls, fast, reproducible) </success_criteria>
After completion, create `.planning/phases/04-scoring-integration/04-03-SUMMARY.md`