docs(04): create phase plan for scoring and integration
This commit is contained in:
254
.planning/phases/04-scoring-integration/04-03-PLAN.md
Normal file
254
.planning/phases/04-scoring-integration/04-03-PLAN.md
Normal file
@@ -0,0 +1,254 @@
|
||||
---
|
||||
phase: 04-scoring-integration
|
||||
plan: 03
|
||||
type: execute
|
||||
wave: 3
|
||||
depends_on: ["04-01", "04-02"]
|
||||
files_modified:
|
||||
- src/usher_pipeline/cli/score_cmd.py
|
||||
- src/usher_pipeline/cli/main.py
|
||||
- tests/test_scoring.py
|
||||
- tests/test_scoring_integration.py
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "CLI 'usher-pipeline score' command orchestrates full scoring pipeline with checkpoint-restart"
|
||||
- "Scoring pipeline can be run end-to-end on synthetic test data"
|
||||
- "Unit tests verify NULL preservation, weight validation, and known gene compilation"
|
||||
- "Integration test verifies full scoring pipeline with synthetic evidence data"
|
||||
artifacts:
|
||||
- path: "src/usher_pipeline/cli/score_cmd.py"
|
||||
provides: "CLI command for scoring pipeline orchestration"
|
||||
contains: "click.command"
|
||||
- path: "tests/test_scoring.py"
|
||||
provides: "Unit tests for scoring module"
|
||||
contains: "test_compile_known_genes"
|
||||
- path: "tests/test_scoring_integration.py"
|
||||
provides: "Integration tests for full scoring pipeline"
|
||||
contains: "test_scoring_pipeline"
|
||||
key_links:
|
||||
- from: "src/usher_pipeline/cli/score_cmd.py"
|
||||
to: "src/usher_pipeline/scoring/"
|
||||
via: "imports integration, known_genes, quality_control, validation"
|
||||
pattern: "from usher_pipeline.scoring import"
|
||||
- from: "src/usher_pipeline/cli/main.py"
|
||||
to: "src/usher_pipeline/cli/score_cmd.py"
|
||||
via: "cli.add_command(score)"
|
||||
pattern: "add_command.*score"
|
||||
- from: "tests/test_scoring_integration.py"
|
||||
to: "src/usher_pipeline/scoring/integration.py"
|
||||
via: "synthetic DuckDB data -> compute_composite_scores"
|
||||
pattern: "compute_composite_scores"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Create CLI score command and comprehensive tests for the scoring module.
|
||||
|
||||
Purpose: The CLI command provides the user-facing interface for running the scoring pipeline (integrating all evidence, running QC, validating against known genes). Tests ensure correctness of NULL handling, weight validation, and end-to-end scoring with synthetic data.
|
||||
|
||||
Output: `src/usher_pipeline/cli/score_cmd.py`, `tests/test_scoring.py`, `tests/test_scoring_integration.py`.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/Users/gbanyan/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/Users/gbanyan/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/04-scoring-integration/04-RESEARCH.md
|
||||
@.planning/phases/04-scoring-integration/04-01-SUMMARY.md
|
||||
@.planning/phases/04-scoring-integration/04-02-SUMMARY.md
|
||||
@src/usher_pipeline/cli/evidence_cmd.py
|
||||
@src/usher_pipeline/cli/main.py
|
||||
@src/usher_pipeline/scoring/__init__.py
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: CLI score command with checkpoint-restart</name>
|
||||
<files>
|
||||
src/usher_pipeline/cli/score_cmd.py
|
||||
src/usher_pipeline/cli/main.py
|
||||
</files>
|
||||
<action>
|
||||
1. Create `src/usher_pipeline/cli/score_cmd.py` following the established pattern from `evidence_cmd.py`:
|
||||
|
||||
Import: click, structlog, sys, Path, load_config, PipelineStore, ProvenanceTracker, and from scoring module: load_known_genes_to_duckdb, compute_composite_scores, persist_scored_genes, run_qc_checks, validate_known_gene_ranking, generate_validation_report. Import ScoringWeights from config.schema.
|
||||
|
||||
Create Click command `score` (not a group -- single command):
|
||||
- Options:
|
||||
- `--force` (is_flag): Re-run scoring even if scored_genes checkpoint exists
|
||||
- `--skip-qc` (is_flag): Skip quality control checks (for faster iteration)
|
||||
- `--skip-validation` (is_flag): Skip known gene validation
|
||||
- Uses `@click.pass_context` to get config_path from `ctx.obj['config_path']`
|
||||
|
||||
Implementation flow (follows evidence_cmd.py pattern):
|
||||
a. Load config, initialize store and provenance
|
||||
b. Check checkpoint: `store.has_checkpoint('scored_genes')` -- if exists and not --force, show summary and return
|
||||
c. Load and validate scoring weights: `config.scoring`, call `validate_sum()`
|
||||
d. Step 1 - Load known genes: call `load_known_genes_to_duckdb(store)`, display count
|
||||
e. Step 2 - Compute composite scores: call `compute_composite_scores(store, config.scoring)`, display summary (total genes, mean score, quality flag distribution)
|
||||
f. Step 3 - Persist scores: call `persist_scored_genes(store, scored_df, config.scoring)`
|
||||
g. Step 4 (unless --skip-qc) - Run QC: call `run_qc_checks(store)`, display warnings/errors, log missing data rates
|
||||
h. Step 5 (unless --skip-validation) - Validate: call `validate_known_gene_ranking(store)`, display results with `generate_validation_report()`
|
||||
i. Save provenance sidecar to `data_dir/scoring/scoring.provenance.json`
|
||||
j. Display final summary: total scored genes, mean composite score, quality flag counts, QC pass/fail, validation pass/fail
|
||||
|
||||
Use Click styling consistent with evidence_cmd.py: `click.style("=== Title ===", bold=True)`, green for success, yellow for warnings, red for errors.
|
||||
|
||||
Error handling: wrap each step in try/except, display error with click.style(fg='red'), sys.exit(1). Always close store in finally block.
|
||||
|
||||
2. Update `src/usher_pipeline/cli/main.py`:
|
||||
- Import score command from score_cmd.py
|
||||
- Add it to the CLI group: `cli.add_command(score)` (same pattern as evidence command)
|
||||
- The score command should be a top-level command (not nested under evidence), since it's a different pipeline phase
|
||||
</action>
|
||||
<verify>
|
||||
Run: `cd /Users/gbanyan/Project/usher-exploring && python -c "
|
||||
from usher_pipeline.cli.score_cmd import score
|
||||
import click.testing
|
||||
runner = click.testing.CliRunner()
|
||||
result = runner.invoke(score, ['--help'])
|
||||
print(result.output)
|
||||
assert result.exit_code == 0
|
||||
assert '--force' in result.output
|
||||
assert '--skip-qc' in result.output
|
||||
assert '--skip-validation' in result.output
|
||||
print('CLI score command --help works')
|
||||
" && python -c "
|
||||
from usher_pipeline.cli.main import cli
|
||||
import click.testing
|
||||
runner = click.testing.CliRunner()
|
||||
result = runner.invoke(cli, ['--help'])
|
||||
print(result.output)
|
||||
assert 'score' in result.output, 'score command not registered in main CLI'
|
||||
print('Score command registered in main CLI')
|
||||
"`
|
||||
</verify>
|
||||
<done>
|
||||
- `usher-pipeline score` command exists with --force, --skip-qc, --skip-validation options
|
||||
- Score command registered in main CLI group
|
||||
- Follows established pattern: config load -> checkpoint check -> process -> persist -> provenance
|
||||
- Orchestrates full pipeline: known genes -> scoring -> QC -> validation
|
||||
</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Unit and integration tests for scoring module</name>
|
||||
<files>
|
||||
tests/test_scoring.py
|
||||
tests/test_scoring_integration.py
|
||||
</files>
|
||||
<action>
|
||||
1. Create `tests/test_scoring.py` with unit tests:
|
||||
|
||||
Import: pytest, polars, from scoring module all functions, ScoringWeights.
|
||||
|
||||
Test class or functions:
|
||||
|
||||
a. `test_compile_known_genes_returns_expected_structure`:
|
||||
- Call compile_known_genes()
|
||||
- Assert returns polars DataFrame with columns: gene_symbol, source, confidence
|
||||
- Assert height >= 38 (10 Usher + 28+ SYSCILIA)
|
||||
- Assert "MYO7A" in gene_symbol values
|
||||
- Assert "IFT88" in gene_symbol values
|
||||
- Assert all confidence values == "HIGH"
|
||||
- Assert sources include both "omim_usher" and "syscilia_scgs_v2"
|
||||
|
||||
b. `test_compile_known_genes_no_duplicates_within_source`:
|
||||
- Verify no duplicate gene_symbol within the same source
|
||||
- (A gene CAN appear in both sources as separate rows)
|
||||
|
||||
c. `test_scoring_weights_validate_sum_defaults`:
|
||||
- ScoringWeights() with defaults should pass validate_sum()
|
||||
|
||||
d. `test_scoring_weights_validate_sum_custom_valid`:
|
||||
- ScoringWeights with custom weights summing to 1.0 should pass
|
||||
|
||||
e. `test_scoring_weights_validate_sum_invalid`:
|
||||
- ScoringWeights(gnomad=0.5) sums to 1.35 -> validate_sum() raises ValueError
|
||||
|
||||
f. `test_scoring_weights_validate_sum_close_to_one`:
|
||||
- Weights that sum to 0.999999 (within 1e-6) should pass
|
||||
- Weights that sum to 0.99 should fail
|
||||
|
||||
g. `test_null_preservation_in_composite`:
|
||||
- Create a synthetic PipelineStore (in-memory DuckDB: `duckdb.connect(':memory:')`)
|
||||
- Create a minimal gene_universe table with 3 genes
|
||||
- Create gnomad_constraint table with scores for genes 1 and 2 (gene 3 has no entry)
|
||||
- Create annotation_completeness with scores for gene 1 only
|
||||
- Create empty/missing entries for other evidence tables (create them with no rows or only partial rows)
|
||||
- Call join_evidence_layers and verify gene 3 has NULL gnomad_score and NULL annotation_score
|
||||
- Call compute_composite_scores and verify gene 3 with zero evidence layers has composite_score = NULL
|
||||
|
||||
2. Create `tests/test_scoring_integration.py` with integration tests:
|
||||
|
||||
a. `test_scoring_pipeline_end_to_end`:
|
||||
- Create in-memory PipelineStore (wrap duckdb.connect(':memory:') in a PipelineStore-like interface, OR create tmp file with pytest tmp_path)
|
||||
- Create synthetic tables for all 7 tables (gene_universe + 6 evidence):
|
||||
- gene_universe: 20 genes (gene_001 through gene_020) with gene_symbols
|
||||
- Include some known genes (MYO7A, IFT88, CDH23) in the gene universe as genes 18-20
|
||||
- gnomad_constraint: 15 genes with loeuf_normalized scores, 5 NULL
|
||||
- tissue_expression: 12 genes with expression_score_normalized, 8 NULL
|
||||
- annotation_completeness: 18 genes with annotation_score_normalized
|
||||
- subcellular_localization: 10 genes with localization_score_normalized
|
||||
- animal_model_phenotypes: 8 genes with animal_model_score_normalized
|
||||
- literature_evidence: 14 genes with literature_score_normalized
|
||||
- Give known genes (MYO7A, IFT88, CDH23) HIGH scores in multiple layers (0.8-0.95) to ensure they rank highly
|
||||
- Run compute_composite_scores with default ScoringWeights
|
||||
- Assert: all 20 genes present in result
|
||||
- Assert: composite_score is not NULL for genes with at least 1 evidence layer
|
||||
- Assert: evidence_count values are correct (count of non-NULL scores)
|
||||
- Assert: quality_flag values are correct based on evidence_count
|
||||
- Assert: known genes (MYO7A, IFT88, CDH23) have high composite scores (among top 5)
|
||||
|
||||
b. `test_qc_detects_missing_data`:
|
||||
- Create scored_genes table where one layer is 90% NULL
|
||||
- Run run_qc_checks
|
||||
- Assert that layer appears in errors (>80% missing)
|
||||
|
||||
c. `test_validation_passes_with_known_genes_ranked_highly`:
|
||||
- Use scored_genes from end-to-end test (known genes scored highly)
|
||||
- Run validate_known_gene_ranking
|
||||
- Assert validation_passed is True
|
||||
|
||||
Use `tmp_path` fixture for DuckDB file-based stores. Use `PipelineStore(tmp_path / "test.duckdb")` for store creation. Follow existing test patterns from tests/test_gnomad_integration.py.
|
||||
</action>
|
||||
<verify>
|
||||
Run: `cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_scoring.py tests/test_scoring_integration.py -v --tb=short 2>&1 | tail -30`
|
||||
</verify>
|
||||
<done>
|
||||
- test_scoring.py: 7+ unit tests covering known genes, weight validation, NULL preservation
|
||||
- test_scoring_integration.py: 3+ integration tests covering end-to-end pipeline with synthetic data
|
||||
- All tests pass with `pytest tests/test_scoring.py tests/test_scoring_integration.py`
|
||||
- Tests verify NULL preservation (genes with no evidence get NULL composite score)
|
||||
- Tests verify known genes rank highly when given high scores
|
||||
</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
- `usher-pipeline score --help` shows available options
|
||||
- Score command registered in main CLI
|
||||
- Unit tests pass: known genes, weight validation, NULL handling
|
||||
- Integration tests pass: end-to-end scoring with synthetic data, QC detection, validation
|
||||
- All tests runnable with `pytest tests/test_scoring*.py`
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- CLI score command orchestrates: known genes -> composite scoring -> QC -> validation
|
||||
- Checkpoint-restart: skips if scored_genes table exists (unless --force)
|
||||
- pytest tests/test_scoring.py passes all unit tests
|
||||
- pytest tests/test_scoring_integration.py passes all integration tests
|
||||
- Tests use synthetic data (no external API calls, fast, reproducible)
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/04-scoring-integration/04-03-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user