docs(06-03): complete comprehensive validation report & CLI plan
- Created 06-03-SUMMARY.md documenting plan execution - Updated STATE.md: - Current Position: Phase 6 COMPLETE (21/21 plans) - Performance Metrics: Phase 06 3/3 plans, 10 min total, 3.3 min/plan avg - Added decisions for comprehensive validation report and weight tuning recommendations - Session Continuity: Stopped at 06-03-PLAN.md completion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -10,16 +10,16 @@ See: .planning/PROJECT.md (updated 2026-02-11)
|
|||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 6 of 6 (Validation)
|
Phase: 6 of 6 (Validation)
|
||||||
Plan: 3 of 3 in current phase (plans 06-01, 06-02 complete)
|
Plan: 3 of 3 in current phase (all plans complete)
|
||||||
Status: Phase 6 in progress — plans 06-01 and 06-02 complete
|
Status: Phase 6 COMPLETE — all validation plans complete
|
||||||
Last activity: 2026-02-12 — Completed 06-02: Sensitivity Analysis Module
|
Last activity: 2026-02-12 — Completed 06-03: Comprehensive Validation Report & CLI
|
||||||
|
|
||||||
Progress: [██████████] 100.0% (20/20 plans complete across all phases)
|
Progress: [██████████] 100.0% (21/21 plans complete across all phases)
|
||||||
|
|
||||||
## Performance Metrics
|
## Performance Metrics
|
||||||
|
|
||||||
**Velocity:**
|
**Velocity:**
|
||||||
- Total plans completed: 20
|
- Total plans completed: 21
|
||||||
- Average duration: 4.6 min
|
- Average duration: 4.6 min
|
||||||
- Total execution time: 1.6 hours
|
- Total execution time: 1.6 hours
|
||||||
|
|
||||||
@@ -32,7 +32,7 @@ Progress: [██████████] 100.0% (20/20 plans complete across a
|
|||||||
| 03 - Core Evidence Layers | 6/6 | 52 min | 8.7 min/plan |
|
| 03 - Core Evidence Layers | 6/6 | 52 min | 8.7 min/plan |
|
||||||
| 04 - Scoring Integration | 3/3 | 10 min | 3.3 min/plan |
|
| 04 - Scoring Integration | 3/3 | 10 min | 3.3 min/plan |
|
||||||
| 05 - Output & CLI | 3/3 | 12 min | 4.0 min/plan |
|
| 05 - Output & CLI | 3/3 | 12 min | 4.0 min/plan |
|
||||||
| 06 - Validation | 2/3 | 5 min | 2.5 min/plan |
|
| 06 - Validation | 3/3 | 10 min | 3.3 min/plan |
|
||||||
|
|
||||||
**Recent Plan Details:**
|
**Recent Plan Details:**
|
||||||
| Plan | Duration | Tasks | Files |
|
| Plan | Duration | Tasks | Files |
|
||||||
@@ -45,6 +45,7 @@ Progress: [██████████] 100.0% (20/20 plans complete across a
|
|||||||
| Phase 05 P03 | 3 min | 2 tasks | 3 files |
|
| Phase 05 P03 | 3 min | 2 tasks | 3 files |
|
||||||
| Phase 06 P01 | 2 min | 2 tasks | 3 files |
|
| Phase 06 P01 | 2 min | 2 tasks | 3 files |
|
||||||
| Phase 06 P02 | 3 min | 2 tasks | 2 files |
|
| Phase 06 P02 | 3 min | 2 tasks | 2 files |
|
||||||
|
| Phase 06 P03 | 5 min | 2 tasks | 5 files |
|
||||||
|
|
||||||
## Accumulated Context
|
## Accumulated Context
|
||||||
|
|
||||||
@@ -141,6 +142,9 @@ Recent decisions affecting current work:
|
|||||||
- [06-02]: Top-N default 100 genes for ranking comparison (relevant for candidate prioritization)
|
- [06-02]: Top-N default 100 genes for ranking comparison (relevant for candidate prioritization)
|
||||||
- [06-02]: Minimum overlap 10 genes required for Spearman correlation (avoids meaningless correlations)
|
- [06-02]: Minimum overlap 10 genes required for Spearman correlation (avoids meaningless correlations)
|
||||||
- [06-02]: Per-layer sensitivity tracking (most_sensitive_layer and most_robust_layer computed from mean rho)
|
- [06-02]: Per-layer sensitivity tracking (most_sensitive_layer and most_robust_layer computed from mean rho)
|
||||||
|
- [06-03]: Comprehensive validation report combines positive, negative, and sensitivity prongs in single Markdown document
|
||||||
|
- [06-03]: Weight tuning recommendations include critical circular validation warnings (post-validation tuning invalidates controls)
|
||||||
|
- [06-03]: CLI validate command provides --skip-sensitivity flag for faster iteration during development
|
||||||
|
|
||||||
### Pending Todos
|
### Pending Todos
|
||||||
|
|
||||||
@@ -153,5 +157,5 @@ None yet.
|
|||||||
## Session Continuity
|
## Session Continuity
|
||||||
|
|
||||||
Last session: 2026-02-12 - Phase 6 execution
|
Last session: 2026-02-12 - Phase 6 execution
|
||||||
Stopped at: Completed 06-02-PLAN.md (Sensitivity Analysis Module)
|
Stopped at: Completed 06-03-PLAN.md (Comprehensive Validation Report & CLI)
|
||||||
Resume file: .planning/phases/06-validation/06-02-SUMMARY.md
|
Resume file: .planning/phases/06-validation/06-03-SUMMARY.md
|
||||||
|
|||||||
331
.planning/phases/06-validation/06-03-SUMMARY.md
Normal file
331
.planning/phases/06-validation/06-03-SUMMARY.md
Normal file
@@ -0,0 +1,331 @@
|
|||||||
|
---
|
||||||
|
phase: 06-validation
|
||||||
|
plan: 03
|
||||||
|
subsystem: validation
|
||||||
|
tags: [comprehensive-validation, cli-validate, validation-report, weight-tuning, unit-tests]
|
||||||
|
dependency_graph:
|
||||||
|
requires: [06-01-negative-controls-recall, 06-02-sensitivity-analysis]
|
||||||
|
provides: [comprehensive-validation-report, cli-validate-command, validation-tests]
|
||||||
|
affects: [validation-workflow-completion]
|
||||||
|
tech_stack:
|
||||||
|
added: []
|
||||||
|
patterns: [comprehensive-validation-pipeline, weight-tuning-recommendations, cli-orchestration]
|
||||||
|
key_files:
|
||||||
|
created:
|
||||||
|
- src/usher_pipeline/scoring/validation_report.py
|
||||||
|
- src/usher_pipeline/cli/validate_cmd.py
|
||||||
|
- tests/test_validation.py
|
||||||
|
modified:
|
||||||
|
- src/usher_pipeline/cli/main.py
|
||||||
|
- src/usher_pipeline/scoring/__init__.py
|
||||||
|
decisions:
|
||||||
|
- "Comprehensive validation report combines positive, negative, and sensitivity prongs in single Markdown document"
|
||||||
|
- "Weight tuning recommendations are guidance-only with critical circular validation warnings"
|
||||||
|
- "CLI validate command follows score_cmd.py pattern with --force, --skip-sensitivity, --output-dir, --top-n options"
|
||||||
|
- "Tests use synthetic DuckDB data with designed score patterns (known genes high, housekeeping low)"
|
||||||
|
- "Validation report saved to {data_dir}/validation/validation_report.md by default"
|
||||||
|
metrics:
|
||||||
|
duration_minutes: 5
|
||||||
|
completed_date: 2026-02-12
|
||||||
|
tasks_completed: 2
|
||||||
|
files_created: 3
|
||||||
|
files_modified: 2
|
||||||
|
commits: 2
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 6 Plan 03: Comprehensive Validation Report & CLI Summary
|
||||||
|
|
||||||
|
CLI validate command orchestrating full validation pipeline (positive controls, negative controls, sensitivity analysis) with comprehensive Markdown report and weight tuning recommendations.
|
||||||
|
|
||||||
|
## Tasks Completed
|
||||||
|
|
||||||
|
### Task 1: Create comprehensive validation report and CLI validate command
|
||||||
|
**Status:** Complete
|
||||||
|
**Commit:** 10f19f8
|
||||||
|
**Files:** src/usher_pipeline/scoring/validation_report.py, src/usher_pipeline/cli/validate_cmd.py, src/usher_pipeline/cli/main.py, src/usher_pipeline/scoring/__init__.py
|
||||||
|
|
||||||
|
**Created validation_report.py** with comprehensive report generation:
|
||||||
|
|
||||||
|
- **generate_comprehensive_validation_report()**: Multi-section Markdown report combining all three validation prongs
|
||||||
|
- Section 1: Positive Control Validation (median percentile, recall@k table, per-source breakdown, pass/fail)
|
||||||
|
- Section 2: Negative Control Validation (median percentile, top quartile count, in-HIGH-tier count, pass/fail)
|
||||||
|
- Section 3: Sensitivity Analysis (Spearman rho table by layer × delta, stability verdict, most/least sensitive layers)
|
||||||
|
- Section 4: Overall Validation Summary (all-pass/partial-fail/fail verdict with interpretation)
|
||||||
|
- Section 5: Weight Tuning Recommendations (targeted suggestions based on validation results)
|
||||||
|
|
||||||
|
- **recommend_weight_tuning()**: Analyzes validation results and provides weight adjustment guidance
|
||||||
|
- All validations pass → "Current weights are validated. No tuning recommended."
|
||||||
|
- Positive controls fail → Suggest increasing weights for layers where known genes score highly
|
||||||
|
- Negative controls fail → Suggest examining layers boosting housekeeping genes, reducing generic layer weights
|
||||||
|
- Sensitivity unstable → Identify most sensitive layer, suggest reducing its weight
|
||||||
|
- **CRITICAL WARNING:** Documents circular validation risk (post-validation tuning invalidates controls)
|
||||||
|
- Provides best practices: independent validation set, document rationale, prefer a priori weights
|
||||||
|
|
||||||
|
- **save_validation_report()**: Persists report to file with parent directory creation
|
||||||
|
|
||||||
|
**Created validate_cmd.py** CLI command following score_cmd.py pattern:
|
||||||
|
|
||||||
|
- Click command `validate` with options:
|
||||||
|
- `--force`: Re-run even if validation checkpoint exists
|
||||||
|
- `--skip-sensitivity`: Skip sensitivity analysis for faster iteration
|
||||||
|
- `--output-dir`: Custom output directory (default: {data_dir}/validation)
|
||||||
|
- `--top-n`: Top N genes for sensitivity analysis (default: 100)
|
||||||
|
|
||||||
|
- Pipeline steps:
|
||||||
|
1. Load configuration and initialize store
|
||||||
|
2. Check scored_genes checkpoint exists (error if not - must run `score` first)
|
||||||
|
3. Run positive control validation (validate_positive_controls_extended)
|
||||||
|
4. Run negative control validation (validate_negative_controls)
|
||||||
|
5. Run sensitivity analysis (unless --skip-sensitivity) - run_sensitivity_analysis + summarize_sensitivity
|
||||||
|
6. Generate comprehensive validation report (generate_comprehensive_validation_report)
|
||||||
|
7. Save report to output_dir/validation_report.md and provenance sidecar
|
||||||
|
|
||||||
|
- Styled output with click.echo patterns (green for success, yellow for warnings, red for errors, bold for step headers)
|
||||||
|
- Provenance tracking: record_step for each validation phase with metrics
|
||||||
|
- Final summary: displays overall pass/fail, recall@10%, housekeeping median percentile, sensitivity stability
|
||||||
|
|
||||||
|
**Updated main.py:**
|
||||||
|
- Imported validate from validate_cmd
|
||||||
|
- Added `cli.add_command(validate)` following existing pattern
|
||||||
|
|
||||||
|
**Updated scoring.__init__.py:**
|
||||||
|
- Added validation_report imports: generate_comprehensive_validation_report, recommend_weight_tuning, save_validation_report
|
||||||
|
- Added all 3 functions to __all__ exports
|
||||||
|
|
||||||
|
**Verification:** Both verification commands passed:
|
||||||
|
- `python -c "from usher_pipeline.cli.validate_cmd import validate; print(f'Command name: {validate.name}'); print('OK')"` → OK
|
||||||
|
- `python -c "from usher_pipeline.cli.main import cli; assert 'validate' in cli.commands"` → OK
|
||||||
|
|
||||||
|
### Task 2: Create unit tests for all validation modules
|
||||||
|
**Status:** Complete
|
||||||
|
**Commit:** 5879ae9
|
||||||
|
**Files:** tests/test_validation.py
|
||||||
|
|
||||||
|
**Created test_validation.py** with 13 comprehensive tests using synthetic DuckDB data:
|
||||||
|
|
||||||
|
**Test helper:**
|
||||||
|
- **create_synthetic_scored_db()**: Creates DuckDB with gene_universe (20 genes) and scored_genes table
|
||||||
|
- Designed scores ensure known cilia genes (MYO7A, IFT88, BBS1) get high scores (0.85-0.92)
|
||||||
|
- Housekeeping genes (GAPDH, ACTB, RPL13A) get low scores (0.12-0.20)
|
||||||
|
- Filler genes get mid-range scores (0.35-0.58)
|
||||||
|
- Includes all 6 layer scores and quality_flag
|
||||||
|
- Creates known_genes table with 3 genes (1 OMIM, 2 SYSCILIA)
|
||||||
|
|
||||||
|
**Tests for negative controls (4 tests):**
|
||||||
|
1. **test_compile_housekeeping_genes_structure**: Verifies compile_housekeeping_genes() returns DataFrame with 13 genes, correct columns (gene_symbol, source, confidence), all confidence=HIGH, all source=literature_validated
|
||||||
|
|
||||||
|
2. **test_compile_housekeeping_genes_known_genes_present**: Asserts GAPDH, ACTB, RPL13A, TBP are in gene_symbol column
|
||||||
|
|
||||||
|
3. **test_validate_negative_controls_with_synthetic_data**: Uses synthetic DB where housekeeping genes score low, asserts validation_passed=True, median_percentile < 0.5
|
||||||
|
|
||||||
|
4. **test_validate_negative_controls_inverted_logic**: Creates DB where housekeeping genes score HIGH (artificial scenario), asserts validation_passed=False
|
||||||
|
|
||||||
|
**Tests for recall@k (1 test):**
|
||||||
|
5. **test_compute_recall_at_k**: Uses synthetic DB, asserts recall@k returns dict with recalls_absolute and recalls_percentage keys. With 3 known genes in dataset (out of 38 total from compile_known_genes), recall@100 = 3/38 = 0.0789
|
||||||
|
|
||||||
|
**Tests for weight perturbation (3 tests):**
|
||||||
|
6. **test_perturb_weight_renormalizes**: Perturbs gnomad by +0.10, asserts weights still sum to 1.0 within 1e-6 tolerance
|
||||||
|
|
||||||
|
7. **test_perturb_weight_large_negative**: Perturbs by -0.25 (more than weight value), asserts weight >= 0.0 (clamped) and sum = 1.0
|
||||||
|
|
||||||
|
8. **test_perturb_weight_invalid_layer**: Asserts perturb_weight with layer="nonexistent" raises ValueError
|
||||||
|
|
||||||
|
**Tests for validation report (5 tests):**
|
||||||
|
9. **test_generate_comprehensive_validation_report_format**: Passes mock metrics dicts, asserts report contains expected sections ("Positive Control", "Negative Control", "Sensitivity Analysis", "Weight Tuning")
|
||||||
|
|
||||||
|
10. **test_recommend_weight_tuning_all_pass**: Passes metrics indicating all validations pass, asserts response contains "No tuning recommended"
|
||||||
|
|
||||||
|
11. **test_recommend_weight_tuning_positive_fail**: Passes metrics with positive controls failed, asserts response contains "Known Gene Ranking Issue" or "Positive Control"
|
||||||
|
|
||||||
|
12. **test_recommend_weight_tuning_negative_fail**: Passes metrics with negative controls failed, asserts response contains "Housekeeping" or "Negative Control"
|
||||||
|
|
||||||
|
13. **test_recommend_weight_tuning_sensitivity_fail**: Passes metrics with sensitivity unstable, asserts response contains "Sensitivity" or "gnomad"
|
||||||
|
|
||||||
|
**Verification:** All 13 tests passed:
|
||||||
|
```
|
||||||
|
tests/test_validation.py::test_compile_housekeeping_genes_structure PASSED
|
||||||
|
tests/test_validation.py::test_compile_housekeeping_genes_known_genes_present PASSED
|
||||||
|
tests/test_validation.py::test_validate_negative_controls_with_synthetic_data PASSED
|
||||||
|
tests/test_validation.py::test_validate_negative_controls_inverted_logic PASSED
|
||||||
|
tests/test_validation.py::test_compute_recall_at_k PASSED
|
||||||
|
tests/test_validation.py::test_perturb_weight_renormalizes PASSED
|
||||||
|
tests/test_validation.py::test_perturb_weight_large_negative PASSED
|
||||||
|
tests/test_validation.py::test_perturb_weight_invalid_layer PASSED
|
||||||
|
tests/test_validation.py::test_generate_comprehensive_validation_report_format PASSED
|
||||||
|
tests/test_validation.py::test_recommend_weight_tuning_all_pass PASSED
|
||||||
|
tests/test_validation.py::test_recommend_weight_tuning_positive_fail PASSED
|
||||||
|
tests/test_validation.py::test_recommend_weight_tuning_negative_fail PASSED
|
||||||
|
tests/test_validation.py::test_recommend_weight_tuning_sensitivity_fail PASSED
|
||||||
|
======================== 13 passed, 1 warning in 0.79s =========================
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
None - plan executed exactly as written.
|
||||||
|
|
||||||
|
## Verification Results
|
||||||
|
|
||||||
|
All verification checks passed:
|
||||||
|
|
||||||
|
1. `python -c "from usher_pipeline.cli.validate_cmd import validate; print(f'Command name: {validate.name}'); print('OK')"` → OK
|
||||||
|
2. `python -c "from usher_pipeline.cli.main import cli; assert 'validate' in cli.commands"` → OK
|
||||||
|
3. `python -m pytest tests/test_validation.py -v` → 13 passed, 0 failed
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- [x] CLI `validate` command runs positive + negative + sensitivity validations and generates comprehensive report
|
||||||
|
- [x] Validation report includes all three prongs with pass/fail verdicts and weight tuning recommendations
|
||||||
|
- [x] Unit tests cover negative controls, recall@k, perturbation, and report generation
|
||||||
|
- [x] All tests pass with synthetic data
|
||||||
|
- [x] validate command registered in main CLI
|
||||||
|
|
||||||
|
## Key Files
|
||||||
|
|
||||||
|
### Created
|
||||||
|
- **src/usher_pipeline/scoring/validation_report.py** (410 lines)
|
||||||
|
- Comprehensive validation report generation combining all three validation prongs
|
||||||
|
- Exports: generate_comprehensive_validation_report, recommend_weight_tuning, save_validation_report
|
||||||
|
|
||||||
|
- **src/usher_pipeline/cli/validate_cmd.py** (408 lines)
|
||||||
|
- CLI validate command orchestrating full validation pipeline
|
||||||
|
- Exports: validate (Click command)
|
||||||
|
|
||||||
|
- **tests/test_validation.py** (478 lines)
|
||||||
|
- Unit tests for negative controls, recall@k, sensitivity, and validation report
|
||||||
|
- 13 tests with synthetic DuckDB fixture
|
||||||
|
|
||||||
|
### Modified
|
||||||
|
- **src/usher_pipeline/cli/main.py** (+2 lines)
|
||||||
|
- Added validate command import and registration
|
||||||
|
|
||||||
|
- **src/usher_pipeline/scoring/__init__.py** (+7 lines)
|
||||||
|
- Added validation_report module exports
|
||||||
|
|
||||||
|
## Integration Points
|
||||||
|
|
||||||
|
**Depends on:**
|
||||||
|
- Phase 06-01: Negative control validation (validate_negative_controls) and positive control validation (validate_positive_controls_extended, compute_recall_at_k)
|
||||||
|
- Phase 06-02: Sensitivity analysis (run_sensitivity_analysis, summarize_sensitivity)
|
||||||
|
- Phase 04-02: scored_genes checkpoint (validation requires scoring to be complete)
|
||||||
|
|
||||||
|
**Provides:**
|
||||||
|
- Comprehensive validation report combining all three validation prongs
|
||||||
|
- CLI `validate` command for user-facing validation workflow
|
||||||
|
- Unit test coverage for all validation modules
|
||||||
|
|
||||||
|
**Affects:**
|
||||||
|
- Phase 6 completion: This is the final plan in validation phase
|
||||||
|
- User workflow: Provides `usher-pipeline validate` command for validation step
|
||||||
|
|
||||||
|
## Technical Notes
|
||||||
|
|
||||||
|
**Comprehensive Validation Report Design:**
|
||||||
|
|
||||||
|
The report combines three complementary validation approaches:
|
||||||
|
1. **Positive controls** (Plan 06-01): Known genes should rank high → validates sensitivity
|
||||||
|
2. **Negative controls** (Plan 06-01): Housekeeping genes should rank low → validates specificity
|
||||||
|
3. **Sensitivity analysis** (Plan 06-02): Rankings stable under perturbations → validates robustness
|
||||||
|
|
||||||
|
If all three pass: scoring system is sensitive, specific, and robust.
|
||||||
|
|
||||||
|
**Overall Validation Verdict Logic:**
|
||||||
|
- **All pass** → "ALL VALIDATIONS PASSED ✓" (scientifically defensible)
|
||||||
|
- **Pos + Neg pass, Sensitivity fail** → "PARTIAL PASS (Sensitivity Unstable)" (directionally correct but may need weight tuning)
|
||||||
|
- **Pos pass, Neg fail** → "PARTIAL PASS (Specificity Issue)" (sensitive but not specific)
|
||||||
|
- **Pos fail** → "VALIDATION FAILED ✗" (fundamental scoring issues)
|
||||||
|
|
||||||
|
**Weight Tuning Recommendations Philosophy:**
|
||||||
|
|
||||||
|
Recommendations are **guidance**, not automatic actions. They suggest:
|
||||||
|
- Which layers to adjust (increase/decrease weights)
|
||||||
|
- Why adjustments are needed (based on validation failures)
|
||||||
|
- How to interpret failures (specificity vs sensitivity vs stability)
|
||||||
|
|
||||||
|
**CRITICAL WARNING** included in all recommendations:
|
||||||
|
- Post-validation tuning introduces **circular validation risk**
|
||||||
|
- If weights are tuned based on validation results, those same controls cannot validate the tuned weights
|
||||||
|
- Best practices: independent validation set, document rationale, prefer a priori weights
|
||||||
|
|
||||||
|
This prevents the pitfall identified in 06-RESEARCH.md: "Tuning weights to maximize known gene recall, then using known gene recall as validation."
|
||||||
|
|
||||||
|
**CLI validate Command Design:**
|
||||||
|
|
||||||
|
Follows established pattern from score_cmd.py:
|
||||||
|
1. Click command with options (--force, --skip-sensitivity, --output-dir, --top-n)
|
||||||
|
2. Step-by-step pipeline with styled output (bold headers, colored status)
|
||||||
|
3. Checkpoint-restart support (skips if validation_report.md exists unless --force)
|
||||||
|
4. Provenance tracking for all steps (record_step for each validation phase)
|
||||||
|
5. Final summary with overall status and key metrics
|
||||||
|
6. Error handling with sys.exit(1) on failures
|
||||||
|
|
||||||
|
**Test Design Philosophy:**
|
||||||
|
|
||||||
|
All tests use **synthetic DuckDB data** with **designed score patterns**:
|
||||||
|
- Known genes get high scores (0.85-0.92) → positive controls should pass
|
||||||
|
- Housekeeping genes get low scores (0.12-0.20) → negative controls should pass
|
||||||
|
- Deterministic scores enable precise assertions
|
||||||
|
|
||||||
|
Tests cover:
|
||||||
|
- **Happy path**: Validations pass with expected data
|
||||||
|
- **Inverted logic**: Validations fail when data is wrong (housekeeping genes high)
|
||||||
|
- **Edge cases**: Large negative perturbations, invalid layer names
|
||||||
|
- **Format verification**: Report contains expected sections
|
||||||
|
- **Recommendation logic**: Different tuning suggestions for different failure modes
|
||||||
|
|
||||||
|
**Usage Pattern:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full validation pipeline
|
||||||
|
usher-pipeline validate
|
||||||
|
|
||||||
|
# Skip sensitivity analysis (faster iteration)
|
||||||
|
usher-pipeline validate --skip-sensitivity
|
||||||
|
|
||||||
|
# Custom output directory
|
||||||
|
usher-pipeline validate --output-dir results/validation
|
||||||
|
|
||||||
|
# More genes for sensitivity (default 100)
|
||||||
|
usher-pipeline validate --top-n 200
|
||||||
|
|
||||||
|
# Force re-run
|
||||||
|
usher-pipeline validate --force
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Workflow:**
|
||||||
|
|
||||||
|
1. User runs `usher-pipeline score` (Phase 04-03)
|
||||||
|
2. User runs `usher-pipeline validate` (this plan)
|
||||||
|
3. User reviews validation report at {data_dir}/validation/validation_report.md
|
||||||
|
4. If all pass: proceed to candidate prioritization
|
||||||
|
5. If failures: review weight tuning recommendations, adjust weights with biological justification, re-run
|
||||||
|
|
||||||
|
**Phase 6 Completion:**
|
||||||
|
|
||||||
|
This plan completes Phase 6 (Validation). All three plans executed:
|
||||||
|
- 06-01: Negative controls and recall@k (2 min)
|
||||||
|
- 06-02: Sensitivity analysis (3 min)
|
||||||
|
- 06-03: Comprehensive validation report and CLI (5 min)
|
||||||
|
|
||||||
|
Total Phase 6 duration: 10 minutes across 3 plans.
|
||||||
|
|
||||||
|
## Self-Check: PASSED
|
||||||
|
|
||||||
|
**Created files verified:**
|
||||||
|
- [x] src/usher_pipeline/scoring/validation_report.py exists (410 lines)
|
||||||
|
- [x] src/usher_pipeline/cli/validate_cmd.py exists (408 lines)
|
||||||
|
- [x] tests/test_validation.py exists (478 lines)
|
||||||
|
|
||||||
|
**Modified files verified:**
|
||||||
|
- [x] src/usher_pipeline/cli/main.py updated with validate import and registration
|
||||||
|
- [x] src/usher_pipeline/scoring/__init__.py updated with validation_report exports
|
||||||
|
|
||||||
|
**Commits verified:**
|
||||||
|
- [x] 10f19f8: Task 1 commit exists (comprehensive validation report and CLI validate command)
|
||||||
|
- [x] 5879ae9: Task 2 commit exists (unit tests for all validation modules)
|
||||||
|
|
||||||
|
**Functionality verified:**
|
||||||
|
- [x] validate command imports correctly (Command name: validate, OK)
|
||||||
|
- [x] validate registered in CLI (CLI commands includes 'validate', OK)
|
||||||
|
- [x] All 13 tests pass (pytest reports 13 passed, 0 failed)
|
||||||
|
|
||||||
|
All claims in summary verified against actual implementation.
|
||||||
Reference in New Issue
Block a user