196 lines
9.8 KiB
Markdown
196 lines
9.8 KiB
Markdown
---
|
|
phase: 06-validation
|
|
plan: 02
|
|
type: execute
|
|
wave: 1
|
|
depends_on: []
|
|
files_modified:
|
|
- src/usher_pipeline/scoring/sensitivity.py
|
|
autonomous: true
|
|
|
|
must_haves:
|
|
truths:
|
|
- "Sensitivity analysis perturbs each weight by +-5% and +-10% and measures rank stability"
|
|
- "Spearman rank correlation is computed for top-100 genes between baseline and perturbed configurations"
|
|
- "Weight perturbation renormalizes remaining weights to maintain sum=1.0 constraint"
|
|
- "Rank stability assessment classifies each perturbation as stable (rho>=0.85) or unstable"
|
|
artifacts:
|
|
- path: "src/usher_pipeline/scoring/sensitivity.py"
|
|
provides: "Parameter sweep sensitivity analysis with Spearman correlation"
|
|
exports: ["perturb_weight", "run_sensitivity_analysis", "summarize_sensitivity"]
|
|
key_links:
|
|
- from: "src/usher_pipeline/scoring/sensitivity.py"
|
|
to: "src/usher_pipeline/scoring/integration.py"
|
|
via: "compute_composite_scores import"
|
|
pattern: "from usher_pipeline.scoring.integration import compute_composite_scores"
|
|
- from: "src/usher_pipeline/scoring/sensitivity.py"
|
|
to: "scipy.stats"
|
|
via: "spearmanr import"
|
|
pattern: "from scipy.stats import spearmanr"
|
|
- from: "src/usher_pipeline/scoring/sensitivity.py"
|
|
to: "src/usher_pipeline/config/schema.py"
|
|
via: "ScoringWeights import"
|
|
pattern: "from usher_pipeline.config.schema import ScoringWeights"
|
|
---
|
|
|
|
<objective>
|
|
Implement sensitivity analysis module for parameter sweep validation of scoring weights.
|
|
|
|
Purpose: Demonstrates that top candidate rankings are robust to reasonable weight perturbations (+-5-10%), satisfying success criterion 3 (rank stability). This is the core of the "are our results defensible?" validation.
|
|
|
|
Output: sensitivity.py module with weight perturbation, Spearman correlation analysis, and stability classification.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@/Users/gbanyan/.claude/get-shit-done/workflows/execute-plan.md
|
|
@/Users/gbanyan/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@.planning/phases/06-validation/06-RESEARCH.md
|
|
|
|
@src/usher_pipeline/scoring/integration.py
|
|
@src/usher_pipeline/config/schema.py
|
|
@src/usher_pipeline/persistence/duckdb_store.py
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto">
|
|
<name>Task 1: Create sensitivity analysis module with weight perturbation and rank correlation</name>
|
|
<files>src/usher_pipeline/scoring/sensitivity.py</files>
|
|
<action>
|
|
Create `src/usher_pipeline/scoring/sensitivity.py` with:
|
|
|
|
1. **EVIDENCE_LAYERS** list constant: ["gnomad", "expression", "annotation", "localization", "animal_model", "literature"]
|
|
|
|
2. **DEFAULT_DELTAS** list constant: [-0.10, -0.05, 0.05, 0.10]
|
|
|
|
3. **STABILITY_THRESHOLD** float constant: 0.85 (Spearman rho threshold for "stable")
|
|
|
|
4. **perturb_weight(baseline: ScoringWeights, layer: str, delta: float) -> ScoringWeights** function:
|
|
- Get baseline weights as dict via baseline.model_dump()
|
|
- Apply perturbation: w_dict[layer] = max(0.0, min(1.0, w_dict[layer] + delta))
|
|
- Renormalize ALL weights so they sum to 1.0: divide each by total
|
|
- Return new ScoringWeights instance
|
|
- Raise ValueError if layer not in EVIDENCE_LAYERS
|
|
|
|
5. **run_sensitivity_analysis(store: PipelineStore, baseline_weights: ScoringWeights, deltas: list[float] | None = None, top_n: int = 100) -> dict** function:
|
|
- Default deltas to DEFAULT_DELTAS if None
|
|
- Compute baseline scores via compute_composite_scores(store, baseline_weights)
|
|
- Sort by composite_score DESC, take top_n genes as baseline ranking
|
|
- For each layer in EVIDENCE_LAYERS, for each delta in deltas:
|
|
- Create perturbed weights via perturb_weight()
|
|
- Compute perturbed scores via compute_composite_scores(store, perturbed_weights)
|
|
- Sort by composite_score DESC, take top_n genes
|
|
- Inner join baseline and perturbed on gene_symbol to get paired scores
|
|
- If fewer than 10 overlapping genes, log warning and record rho=None
|
|
- Otherwise compute spearmanr() on paired composite_score columns
|
|
- Record: layer, delta, perturbed_weights (as dict), spearman_rho, spearman_pval, overlap_count (how many of top_n genes appear in both), top_n
|
|
- Return dict with keys: baseline_weights (dict), results (list of per-perturbation dicts), top_n, total_perturbations
|
|
- Use structlog for progress logging (log each perturbation result)
|
|
|
|
IMPORTANT: The compute_composite_scores function re-queries the DB each time. This is by design -- different weights produce different composite_score values from the same underlying evidence layer scores.
|
|
|
|
For the Spearman correlation, join baseline_top_n and perturbed_top_n DataFrames on gene_symbol (inner join). Use the composite_score from each as the paired values. This measures whether the relative ordering of shared top genes is preserved.
|
|
|
|
6. **summarize_sensitivity(analysis_result: dict) -> dict** function:
|
|
- From the results list, compute:
|
|
- min_rho, max_rho, mean_rho across all perturbations (excluding None values)
|
|
- count of stable perturbations (rho >= STABILITY_THRESHOLD)
|
|
- count of unstable perturbations (rho < STABILITY_THRESHOLD)
|
|
- most_sensitive_layer: layer with lowest mean rho across its perturbations
|
|
- most_robust_layer: layer with highest mean rho across its perturbations
|
|
- overall_stable: bool = all non-None rhos >= STABILITY_THRESHOLD
|
|
- Return dict with: min_rho, max_rho, mean_rho, stable_count, unstable_count, total_perturbations, overall_stable, most_sensitive_layer, most_robust_layer
|
|
|
|
7. **generate_sensitivity_report(analysis_result: dict, summary: dict) -> str** function:
|
|
- Follow the formatting pattern from generate_validation_report() in validation.py
|
|
- Show table: Layer | Delta | Spearman rho | p-value | Stable?
|
|
- Show summary: overall stability verdict, most/least sensitive layers
|
|
- Include interpretation text
|
|
|
|
Use structlog, polars, scipy.stats.spearmanr imports. Import compute_composite_scores from usher_pipeline.scoring.integration, ScoringWeights from usher_pipeline.config.schema, PipelineStore from usher_pipeline.persistence.duckdb_store.
|
|
</action>
|
|
<verify>
|
|
Run: `cd /Users/gbanyan/Project/usher-exploring && python -c "
|
|
from usher_pipeline.scoring.sensitivity import perturb_weight, EVIDENCE_LAYERS, STABILITY_THRESHOLD, DEFAULT_DELTAS
|
|
from usher_pipeline.config.schema import ScoringWeights
|
|
|
|
# Test weight perturbation
|
|
w = ScoringWeights()
|
|
p = perturb_weight(w, 'gnomad', 0.10)
|
|
p.validate_sum() # Must not raise
|
|
print(f'Original gnomad: {w.gnomad}, Perturbed: {p.gnomad:.4f}')
|
|
assert p.gnomad > w.gnomad, 'Perturbed weight should be higher'
|
|
|
|
# Test renormalization
|
|
total = p.gnomad + p.expression + p.annotation + p.localization + p.animal_model + p.literature
|
|
assert abs(total - 1.0) < 1e-6, f'Weights must sum to 1.0, got {total}'
|
|
|
|
# Test edge: perturb to near-zero
|
|
p_low = perturb_weight(w, 'gnomad', -0.25)
|
|
p_low.validate_sum()
|
|
assert p_low.gnomad >= 0.0, 'Weight must not go negative'
|
|
|
|
print('All perturb_weight tests passed')
|
|
"` exits 0
|
|
</verify>
|
|
<done>
|
|
sensitivity.py exists with perturb_weight (renormalizing), run_sensitivity_analysis (computing Spearman rho for top-N genes across all layer/delta combinations), summarize_sensitivity (stability classification), and generate_sensitivity_report (formatted output). Weights always renormalize to sum=1.0 after perturbation.
|
|
</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 2: Export sensitivity module from scoring package</name>
|
|
<files>src/usher_pipeline/scoring/__init__.py</files>
|
|
<action>
|
|
Update `src/usher_pipeline/scoring/__init__.py` to add imports and exports for the sensitivity module:
|
|
|
|
Add imports:
|
|
```python
|
|
from usher_pipeline.scoring.sensitivity import (
|
|
perturb_weight,
|
|
run_sensitivity_analysis,
|
|
summarize_sensitivity,
|
|
generate_sensitivity_report,
|
|
EVIDENCE_LAYERS,
|
|
STABILITY_THRESHOLD,
|
|
)
|
|
```
|
|
|
|
Add to __all__ list: "perturb_weight", "run_sensitivity_analysis", "summarize_sensitivity", "generate_sensitivity_report", "EVIDENCE_LAYERS", "STABILITY_THRESHOLD"
|
|
|
|
NOTE: Plan 01 may have already updated __init__.py to add negative_controls exports. If so, ADD the sensitivity imports alongside those -- do not remove them. Read the file first to check current state.
|
|
</action>
|
|
<verify>
|
|
Run: `cd /Users/gbanyan/Project/usher-exploring && python -c "from usher_pipeline.scoring import perturb_weight, run_sensitivity_analysis, summarize_sensitivity, generate_sensitivity_report, EVIDENCE_LAYERS, STABILITY_THRESHOLD; print(f'EVIDENCE_LAYERS: {EVIDENCE_LAYERS}'); print(f'STABILITY_THRESHOLD: {STABILITY_THRESHOLD}'); print('All sensitivity exports OK')"` exits 0
|
|
</verify>
|
|
<done>
|
|
All sensitivity analysis functions and constants are importable from usher_pipeline.scoring. Existing exports from negative_controls (Plan 01) are preserved.
|
|
</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
- `python -c "from usher_pipeline.scoring.sensitivity import perturb_weight; from usher_pipeline.config.schema import ScoringWeights; w = ScoringWeights(); p = perturb_weight(w, 'gnomad', 0.05); p.validate_sum(); print('OK')"` -- weight perturbation works and renormalizes
|
|
- `python -c "from usher_pipeline.scoring import run_sensitivity_analysis, summarize_sensitivity, generate_sensitivity_report"` -- all exports available
|
|
- `python -c "from usher_pipeline.scoring.sensitivity import STABILITY_THRESHOLD; assert STABILITY_THRESHOLD == 0.85"` -- threshold configured
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- perturb_weight correctly perturbs one layer and renormalizes to sum=1.0
|
|
- run_sensitivity_analysis computes Spearman rho for all layer x delta combinations
|
|
- summarize_sensitivity classifies perturbations as stable/unstable
|
|
- generate_sensitivity_report produces human-readable output
|
|
- All functions exported from scoring package
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/06-validation/06-02-SUMMARY.md`
|
|
</output>
|