docs(06): create phase plan
This commit is contained in:
211
.planning/phases/06-validation/06-03-PLAN.md
Normal file
211
.planning/phases/06-validation/06-03-PLAN.md
Normal file
@@ -0,0 +1,211 @@
|
||||
---
|
||||
phase: 06-validation
|
||||
plan: 03
|
||||
type: execute
|
||||
wave: 2
|
||||
depends_on: ["06-01", "06-02"]
|
||||
files_modified:
|
||||
- src/usher_pipeline/scoring/validation_report.py
|
||||
- src/usher_pipeline/cli/validate_cmd.py
|
||||
- src/usher_pipeline/cli/main.py
|
||||
- tests/test_validation.py
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "CLI validate command runs positive controls, negative controls, and sensitivity analysis in sequence"
|
||||
- "Comprehensive validation report documents all three validation prongs with pass/fail verdicts"
|
||||
- "Weight tuning recommendations are generated based on validation results with documented rationale"
|
||||
- "Tests verify negative control logic, recall@k computation, weight perturbation, and report generation"
|
||||
artifacts:
|
||||
- path: "src/usher_pipeline/scoring/validation_report.py"
|
||||
provides: "Comprehensive validation report combining all three validation prongs"
|
||||
exports: ["generate_comprehensive_validation_report", "recommend_weight_tuning"]
|
||||
- path: "src/usher_pipeline/cli/validate_cmd.py"
|
||||
provides: "CLI validate command orchestrating full validation pipeline"
|
||||
exports: ["validate"]
|
||||
- path: "tests/test_validation.py"
|
||||
provides: "Unit tests for negative controls, recall@k, sensitivity, and validation report"
|
||||
key_links:
|
||||
- from: "src/usher_pipeline/cli/validate_cmd.py"
|
||||
to: "src/usher_pipeline/scoring/negative_controls.py"
|
||||
via: "validate_negative_controls import"
|
||||
pattern: "from usher_pipeline.scoring import validate_negative_controls"
|
||||
- from: "src/usher_pipeline/cli/validate_cmd.py"
|
||||
to: "src/usher_pipeline/scoring/sensitivity.py"
|
||||
via: "run_sensitivity_analysis import"
|
||||
pattern: "from usher_pipeline.scoring import run_sensitivity_analysis"
|
||||
- from: "src/usher_pipeline/cli/validate_cmd.py"
|
||||
to: "src/usher_pipeline/scoring/validation.py"
|
||||
via: "validate_positive_controls_extended import"
|
||||
pattern: "from usher_pipeline.scoring import validate_positive_controls_extended"
|
||||
- from: "src/usher_pipeline/cli/main.py"
|
||||
to: "src/usher_pipeline/cli/validate_cmd.py"
|
||||
via: "Click group add_command"
|
||||
pattern: "cli.add_command.*validate"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Create comprehensive validation report generator, CLI validate command, and unit tests for all Phase 6 validation modules.
|
||||
|
||||
Purpose: This plan wires together the positive control, negative control, and sensitivity analysis modules (from Plans 01 and 02) into a single CLI command and comprehensive report. Tests ensure correctness with synthetic data. This completes Phase 6 by providing the user-facing validation workflow.
|
||||
|
||||
Output: validation_report.py, validate_cmd.py (CLI), updated main.py, and test_validation.py.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/Users/gbanyan/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/Users/gbanyan/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/06-validation/06-RESEARCH.md
|
||||
|
||||
# Need SUMMARYs from Plans 01 and 02 for what was actually built
|
||||
@.planning/phases/06-validation/06-01-SUMMARY.md
|
||||
@.planning/phases/06-validation/06-02-SUMMARY.md
|
||||
|
||||
@src/usher_pipeline/scoring/__init__.py
|
||||
@src/usher_pipeline/scoring/negative_controls.py
|
||||
@src/usher_pipeline/scoring/sensitivity.py
|
||||
@src/usher_pipeline/scoring/validation.py
|
||||
@src/usher_pipeline/cli/score_cmd.py
|
||||
@src/usher_pipeline/cli/main.py
|
||||
@tests/test_scoring.py
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Create comprehensive validation report and CLI validate command</name>
|
||||
<files>src/usher_pipeline/scoring/validation_report.py, src/usher_pipeline/cli/validate_cmd.py, src/usher_pipeline/cli/main.py</files>
|
||||
<action>
|
||||
**Create `src/usher_pipeline/scoring/validation_report.py`:**
|
||||
|
||||
1. **generate_comprehensive_validation_report(positive_metrics: dict, negative_metrics: dict, sensitivity_result: dict, sensitivity_summary: dict) -> str** function:
|
||||
- Generate a multi-section Markdown report combining all three validation prongs
|
||||
- Section 1: "Positive Control Validation" -- median percentile, recall@k table, per-source breakdown, pass/fail
|
||||
- Section 2: "Negative Control Validation" -- median percentile, top quartile count, in-HIGH-tier count, pass/fail
|
||||
- Section 3: "Sensitivity Analysis" -- Spearman rho table (layer x delta), overall stability verdict, most/least sensitive layers
|
||||
- Section 4: "Overall Validation Summary" -- all-pass/partial-fail/fail verdict
|
||||
- Section 5: "Weight Tuning Recommendations" -- call recommend_weight_tuning()
|
||||
- Return the full Markdown string
|
||||
|
||||
2. **recommend_weight_tuning(positive_metrics: dict, negative_metrics: dict, sensitivity_summary: dict) -> str** function:
|
||||
- Analyze validation results and suggest weight adjustments
|
||||
- If positive controls pass AND negative controls pass AND sensitivity stable: "Current weights are validated. No tuning recommended."
|
||||
- If positive controls fail: suggest increasing weights for layers where known genes score highly
|
||||
- If negative controls fail (housekeeping genes ranking too high): suggest examining which layers boost housekeeping genes
|
||||
- If sensitivity unstable: identify most sensitive layer and suggest reducing its weight
|
||||
- Document rationale for each recommendation
|
||||
- CRITICAL: Note that any tuning is "post-validation" and flag circular validation risk per research pitfall
|
||||
- Return formatted recommendation text
|
||||
|
||||
3. **save_validation_report(report_text: str, output_path: Path) -> None**: Write report to file
|
||||
|
||||
**Create `src/usher_pipeline/cli/validate_cmd.py`:**
|
||||
|
||||
Follow the established CLI pattern from score_cmd.py (config load, store init, checkpoint, steps, summary, cleanup):
|
||||
|
||||
1. Click command `validate` with options:
|
||||
- `--force`: Re-run even if validation checkpoint exists
|
||||
- `--skip-sensitivity`: Skip sensitivity analysis (faster iteration)
|
||||
- `--output-dir`: Output directory for validation report (default: {data_dir}/validation)
|
||||
- `--top-n`: Top N genes for sensitivity analysis (default: 100)
|
||||
|
||||
2. Pipeline steps:
|
||||
- Step 1: Load configuration and initialize store
|
||||
- Step 2: Check scored_genes checkpoint exists (error if not -- must run `score` first)
|
||||
- Step 3: Run positive control validation (validate_positive_controls_extended)
|
||||
- Step 4: Run negative control validation (validate_negative_controls)
|
||||
- Step 5: Run sensitivity analysis (unless --skip-sensitivity) -- run_sensitivity_analysis + summarize_sensitivity
|
||||
- Step 6: Generate comprehensive validation report (generate_comprehensive_validation_report)
|
||||
- Step 7: Save report to output_dir/validation_report.md and provenance sidecar
|
||||
|
||||
3. Use click.echo with styled output matching score_cmd.py patterns (green for success, yellow for warnings, red for errors, bold for step headers)
|
||||
|
||||
4. Provenance tracking: record_step for each validation phase with metrics
|
||||
|
||||
5. Final summary: display overall pass/fail, recall@top-10%, housekeeping median percentile, sensitivity stability
|
||||
|
||||
**Update `src/usher_pipeline/cli/main.py`:**
|
||||
- Import validate from validate_cmd
|
||||
- Add: `cli.add_command(validate)`
|
||||
- Follow the existing pattern used for score and report commands
|
||||
</action>
|
||||
<verify>
|
||||
Run: `cd /Users/gbanyan/Project/usher-exploring && python -c "from usher_pipeline.cli.validate_cmd import validate; print(f'Command name: {validate.name}'); print('OK')"` exits 0 AND `cd /Users/gbanyan/Project/usher-exploring && python -c "from usher_pipeline.cli.main import cli; commands = list(cli.commands.keys()); print(f'CLI commands: {commands}'); assert 'validate' in commands; print('OK')"` exits 0
|
||||
</verify>
|
||||
<done>
|
||||
validation_report.py generates comprehensive multi-section Markdown report with weight tuning recommendations. validate_cmd.py provides CLI command running all three validation prongs. main.py registers validate as a CLI subcommand. All follow established patterns from score_cmd.py.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Create unit tests for all validation modules</name>
|
||||
<files>tests/test_validation.py</files>
|
||||
<action>
|
||||
Create `tests/test_validation.py` with comprehensive tests using synthetic DuckDB data. Follow the test pattern from tests/test_scoring.py (use tmp_path fixtures, create in-memory DuckDB with synthetic data).
|
||||
|
||||
**Test helper: create_synthetic_scored_db(tmp_path)** function:
|
||||
- Create DuckDB with gene_universe (20 genes: GENE001-GENE020)
|
||||
- Create scored_genes table with composite_score and all 6 layer scores
|
||||
- Design scores so that:
|
||||
- MYO7A, IFT88, BBS1 (known cilia genes) get high scores (0.8-0.95)
|
||||
- GAPDH, ACTB, RPL13A (housekeeping genes) get low scores (0.1-0.3)
|
||||
- Other genes get mid-range scores (0.3-0.6)
|
||||
- This ensures positive controls rank high and negative controls rank low in tests
|
||||
|
||||
**Tests to include:**
|
||||
|
||||
1. **test_compile_housekeeping_genes_structure**: Verify compile_housekeeping_genes() returns DataFrame with 13 genes, correct columns (gene_symbol, source, confidence), all confidence=HIGH, all source=literature_validated
|
||||
|
||||
2. **test_compile_housekeeping_genes_known_genes_present**: Assert GAPDH, ACTB, RPL13A, TBP are in the gene_symbol column
|
||||
|
||||
3. **test_validate_negative_controls_with_synthetic_data**: Use synthetic DB where housekeeping genes score low. Assert validation_passed=True, median_percentile < 0.5
|
||||
|
||||
4. **test_validate_negative_controls_inverted_logic**: Create a DB where housekeeping genes score HIGH (artificial scenario). Assert validation_passed=False
|
||||
|
||||
5. **test_compute_recall_at_k**: Use synthetic DB. Assert recall@k returns dict with recalls_absolute and recalls_percentage keys. With 3 known genes in top 5 of 20, recall@5 should be high (>0.5)
|
||||
|
||||
6. **test_perturb_weight_renormalizes**: Perturb gnomad by +0.10, assert weights still sum to 1.0. Perturb by -0.25 (more than weight value), assert weight >= 0.0 and sum = 1.0
|
||||
|
||||
7. **test_perturb_weight_invalid_layer**: perturb_weight with layer="nonexistent" should raise ValueError
|
||||
|
||||
8. **test_generate_comprehensive_validation_report_format**: Pass mock metrics dicts, assert report contains expected sections ("Positive Control", "Negative Control", "Sensitivity Analysis", "Weight Tuning")
|
||||
|
||||
9. **test_recommend_weight_tuning_all_pass**: Pass metrics indicating all validations pass. Assert response contains "No tuning recommended" or similar
|
||||
|
||||
All tests should use tmp_path for DuckDB isolation. Import from usher_pipeline.scoring (not internal modules directly where possible). Use PipelineStore with direct conn assignment pattern from test_scoring.py.
|
||||
</action>
|
||||
<verify>
|
||||
Run: `cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_validation.py -v --tb=short` -- all tests pass
|
||||
</verify>
|
||||
<done>
|
||||
test_validation.py contains 9+ tests covering negative controls, recall@k, weight perturbation, sensitivity analysis, and report generation. All tests pass using synthetic DuckDB data with designed score patterns ensuring known genes rank high and housekeeping genes rank low.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
- `python -m pytest tests/test_validation.py -v` -- all validation tests pass
|
||||
- `python -c "from usher_pipeline.cli.main import cli; assert 'validate' in cli.commands"` -- CLI command registered
|
||||
- `python -c "from usher_pipeline.scoring.validation_report import generate_comprehensive_validation_report, recommend_weight_tuning"` -- report functions importable
|
||||
- `usher-pipeline validate --help` displays usage information with all options
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- CLI `validate` command runs positive + negative + sensitivity validations and generates comprehensive report
|
||||
- Validation report includes all three prongs with pass/fail verdicts and weight tuning recommendations
|
||||
- Unit tests cover negative controls, recall@k, perturbation, and report generation
|
||||
- All tests pass with synthetic data
|
||||
- validate command registered in main CLI
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/06-validation/06-03-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user