docs(05-02): complete visualization and reproducibility report plan

- Plan 05-02 executed successfully - 2 tasks completed with 2 commits - 13 tests passing (6 visualization + 7 reproducibility) - 4 files created, 2 files modified - Duration: 5 minutes - Updated STATE.md with progress (17/20 plans complete, 85%)
2026-02-12 04:03:08 +08:00
parent 434c79c0a8
commit 5f14dc2e64
2 changed files with 264 additions and 10 deletions
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -10,18 +10,18 @@ See: .planning/PROJECT.md (updated 2026-02-11)
 ## Current Position
 Phase: 5 of 6 (Output & CLI)
-Plan: 1 of 3 in current phase (plan 05-01 complete)
+Plan: 2 of 3 in current phase (plans 05-01, 05-02 complete)
-Status: Phase 5 in progress — 05-01 complete
+Status: Phase 5 in progress — 05-02 complete
-Last activity: 2026-02-11 — Plan 05-01 executed and verified
+Last activity: 2026-02-11 — Plan 05-02 executed and verified
-Progress: [████████░░] 80.0% (16/20 plans complete across all phases)
+Progress: [█████████░] 85.0% (17/20 plans complete across all phases)
 ## Performance Metrics
 **Velocity:**
- Total plans completed: 16
+- Total plans completed: 17
 - Average duration: 5.0 min
- Total execution time: 1.4 hours
+- Total execution time: 1.5 hours
 **By Phase:**
@@ -31,17 +31,17 @@ Progress: [████████░░] 80.0% (16/20 plans complete across al
 | 02 - Prototype Evidence Layer | 2/2 | 8 min | 4.0 min/plan |
 | 03 - Core Evidence Layers | 6/6 | 52 min | 8.7 min/plan |
 | 04 - Scoring Integration | 3/3 | 10 min | 3.3 min/plan |
-| 05 - Output & CLI | 1/3 | 4 min | 4.0 min/plan |
+| 05 - Output & CLI | 2/3 | 9 min | 4.5 min/plan |
 **Recent Plan Details:**
 | Plan | Duration | Tasks | Files |
 |------|----------|-------|-------|
 | Phase 03 P05 | 10 min | 2 tasks | 8 files |
 | Phase 03 P06 | 13 min | 2 tasks | 10 files |
 | Phase 04 P01 | 4 min | 2 tasks | 4 files |
 | Phase 04 P02 | 3 min | 2 tasks | 4 files |
 | Phase 04 P03 | 3 min | 2 tasks | 4 files |
 | Phase 05 P01 | 4 min | 2 tasks | 5 files |
 | Phase 05 P02 | 5 min | 2 tasks | 6 files |
 ## Accumulated Context
@@ -118,6 +118,12 @@ Recent decisions affecting current work:
 - [05-01]: Dual-format TSV+Parquet with identical data for downstream tool compatibility
 - [05-01]: YAML provenance sidecar includes statistics (tier counts) and column metadata
 - [05-01]: Fixed deprecated pl.count() -> pl.len() usage for polars 0.20.5+ compatibility
 - [05-02]: matplotlib Agg backend for headless/CLI safety (non-interactive visualization)
 - [05-02]: 300 DPI for publication-quality plots
 - [05-02]: Tier color scheme: GREEN/ORANGE/RED for HIGH/MEDIUM/LOW (consistent across all plots)
 - [05-02]: Graceful degradation (individual plot failures don't block batch generation)
 - [05-02]: Dual-format reproducibility reports (JSON machine-readable + Markdown human-readable)
 - [05-02]: Optional validation metrics in reproducibility reports (report generates whether or not validation provided)
 ### Pending Todos
@@ -130,5 +136,5 @@ None yet.
 ## Session Continuity
 Last session: 2026-02-11 - Phase 5 execution
-Stopped at: Plan 05-01 complete — tiering, evidence summary, and dual-format writer implemented with tests
+Stopped at: Plan 05-02 complete — visualization and reproducibility report modules implemented with tests
-Resume file: .planning/phases/05-output-cli/05-01-SUMMARY.md
+Resume file: .planning/phases/05-output-cli/05-02-SUMMARY.md
--- a/.planning/phases/05-output-cli/05-02-SUMMARY.md
+++ b/.planning/phases/05-output-cli/05-02-SUMMARY.md
@@ -0,0 +1,248 @@
 ---
 phase: 05-output-cli
 plan: 02
 subsystem: output
 tags: [visualization, reproducibility, reporting, matplotlib, seaborn]
 completed: 2026-02-11
 duration_minutes: 5
 dependencies:
  requires:
    - config.schema (PipelineConfig, ScoringWeights, DataSourceVersions)
    - persistence.provenance (ProvenanceTracker)
  provides:
    - visualizations.generate_all_plots
    - reproducibility.generate_reproducibility_report
  affects:
    - output.__init__ (exports visualization and reproducibility functions)
 tech_stack:
  added:
    - matplotlib>=3.8.0 (visualization library with Agg backend)
    - seaborn>=0.13.0 (statistical visualization on top of matplotlib)
  patterns:
    - Non-interactive backend (Agg) for headless/CLI safety
    - Proper figure cleanup with plt.close() to prevent memory leaks
    - Graceful degradation (individual plot failures don't block others)
    - Dataclass-based report structure with dual format output (JSON + Markdown)
 key_files:
  created:
    - src/usher_pipeline/output/visualizations.py (plot generation functions)
    - src/usher_pipeline/output/reproducibility.py (report generation)
    - tests/test_visualizations.py (6 tests for plot creation)
    - tests/test_reproducibility.py (7 tests for report content)
  modified:
    - pyproject.toml (added matplotlib and seaborn dependencies)
    - src/usher_pipeline/output/__init__.py (exported new functions)
 decisions:
  - matplotlib_backend: "Use Agg (non-interactive) backend for headless/CLI safety"
  - plot_dpi: "300 DPI for publication-quality output"
  - tier_colors: "GREEN/ORANGE/RED for HIGH/MEDIUM/LOW (consistent across plots)"
  - error_handling: "Wrap each plot in try/except so failures don't block batch generation"
  - report_formats: "JSON for machine-readable + Markdown for human-readable"
  - validation_optional: "Validation metrics are optional in reproducibility report"
 metrics:
  tasks: 2
  commits: 2
  tests: 13
  files_created: 4
  files_modified: 2
 ---
 # Phase 05 Plan 02: Visualization and Reproducibility Reports Summary
 **One-liner:** Matplotlib/seaborn visualizations (score distributions, layer contributions, tier breakdowns) and dual-format reproducibility reports (JSON + Markdown) with parameters, data versions, filtering steps, and validation metrics.
 ## Execution Flow
 ### Task 1: Visualization module with matplotlib/seaborn plots (commit: 150417f)
 **Created visualization module with 3 plot types + orchestrator:**
 1. **plot_score_distribution**: Histogram of composite scores colored by confidence tier (HIGH=green, MEDIUM=orange, LOW=red), 30 bins, stacked display
 2. **plot_layer_contributions**: Bar chart showing non-null count per evidence layer (gnomAD, expression, annotation, localization, animal model, literature)
 3. **plot_tier_breakdown**: Pie chart with percentage labels for tier distribution
 4. **generate_all_plots**: Orchestrator that creates all 3 plots with error handling (one failure doesn't block others)
 **Key technical decisions:**
 - Use matplotlib Agg backend (non-interactive, headless-safe)
 - Save all plots at 300 DPI for publication quality
 - Always call `plt.close(fig)` after savefig to prevent memory leaks (critical pitfall from research)
 - Convert polars DataFrame to pandas via `to_pandas()` for seaborn compatibility (acceptable overhead for small result sets)
 **Dependencies added:**
 - matplotlib>=3.8.0
 - seaborn>=0.13.0
 **Tests created (6 tests):**
 - test_plot_score_distribution_creates_file
 - test_plot_layer_contributions_creates_file
 - test_plot_tier_breakdown_creates_file
 - test_generate_all_plots_creates_all_files
 - test_generate_all_plots_returns_paths
 - test_plots_handle_empty_dataframe (edge case)
 All tests pass with only expected warnings for empty DataFrame edge cases.
 ### Task 2: Reproducibility report module (commit: 5af63ea)
 **Created reproducibility report generation with dual formats:**
 1. **FilteringStep dataclass**: Captures step_name, input_count, output_count, criteria
 2. **ReproducibilityReport dataclass**: Contains run_id (UUID4), timestamp, pipeline_version, parameters (scoring weights), data_versions (Ensembl/gnomAD/GTEx/HPA), software_environment (Python/polars/duckdb versions), filtering_steps, validation_metrics (optional), tier_statistics
 3. **generate_reproducibility_report**: Extracts all metadata from config, provenance, and tiered DataFrame
 4. **Report output methods:**
   - `to_json()`: Indented JSON for machine parsing
   - `to_markdown()`: Tables and headers for human reading
   - `to_dict()`: Dictionary for programmatic access
 **Key design patterns:**
 - NULL-preserving tier counting with polars `group_by().agg(pl.len())`
 - Optional validation metrics (report generates whether or not validation results are provided)
 - Filtering steps extracted from ProvenanceTracker.get_steps()
 - Software versions captured at report generation time (sys.version, pl.__version__, duckdb.__version__)
 **Tests created (7 tests):**
 - test_generate_report_has_all_fields
 - test_report_to_json_parseable
 - test_report_to_markdown_has_headers
 - test_report_tier_statistics_match
 - test_report_includes_validation_when_provided
 - test_report_without_validation
 - test_report_software_versions
 All tests pass.
 ## Deviations from Plan
 **Auto-fixed Issues:**
 **1. [Rule 1 - Bug] Fixed deprecated polars API**
 - **Found during:** Task 2 testing
 - **Issue:** `pl.count()` is deprecated in polars 0.20.5+, replaced with `pl.len()`
 - **Fix:** Updated `pl.count().alias("count")` to `pl.len().alias("count")` in both visualizations.py and reproducibility.py
 - **Files modified:** src/usher_pipeline/output/visualizations.py, src/usher_pipeline/output/reproducibility.py
 - **Commit:** Included in 5af63ea
 **2. [Rule 1 - Bug] Fixed matplotlib/seaborn deprecation warnings**
 - **Found during:** Task 1 testing
 - **Issue:** seaborn barplot warning about passing `palette` without `hue`, and `set_xticklabels()` warning about fixed ticks
 - **Fix:** Added `hue=labels` and `legend=False` to barplot call, changed `ax.set_xticklabels()` to `plt.setp()`
 - **Files modified:** src/usher_pipeline/output/visualizations.py
 - **Commit:** Included in 150417f (amended during testing)
 **3. [Rule 3 - Blocking] __init__.py already updated**
 - **Found during:** Task 2 commit preparation
 - **Issue:** Discovered that src/usher_pipeline/output/__init__.py was already updated with reproducibility and visualization exports by a parallel process
 - **Resolution:** No action needed - integration work already completed by plan 05-01
 - **Impact:** Positive - reduces risk of merge conflicts and ensures consistency
 ## Verification Results
 **Import verification:**
 ```
 ✓ from usher_pipeline.output.visualizations import generate_all_plots - OK
 ✓ from usher_pipeline.output.reproducibility import generate_reproducibility_report, ReproducibilityReport - OK
 ```
 **Test results:**
 ```
 ✓ 6/6 visualization tests pass
 ✓ 7/7 reproducibility tests pass
 ✓ Total: 13/13 tests pass
 ```
 **Success criteria met:**
 - [x] Visualization module produces 3 PNG plots at 300 DPI
 - [x] Score distribution plot with tier color coding (GREEN/ORANGE/RED)
 - [x] Layer contributions bar chart showing evidence coverage
 - [x] Tier breakdown pie chart with percentages
 - [x] Reproducibility report generates in both JSON and Markdown formats
 - [x] Report contains parameters, data versions, filtering steps, tier statistics
 - [x] Optional validation metrics handled gracefully
 - [x] matplotlib Agg backend used (no display required)
 - [x] All tests pass
 - [x] Proper figure cleanup (plt.close) implemented
 ## Self-Check: PASSED
 **Created files exist:**
 ```
 ✓ FOUND: src/usher_pipeline/output/visualizations.py
 ✓ FOUND: src/usher_pipeline/output/reproducibility.py
 ✓ FOUND: tests/test_visualizations.py
 ✓ FOUND: tests/test_reproducibility.py
 ```
 **Commits exist:**
 ```
 ✓ FOUND: 150417f (Task 1 - visualization module)
 ✓ FOUND: 5af63ea (Task 2 - reproducibility module)
 ```
 **Tests pass:**
 ```
 ✓ 13/13 tests pass (6 visualization + 7 reproducibility)
 ```
 **Dependencies installed:**
 ```
 ✓ matplotlib>=3.8.0 installed
 ✓ seaborn>=0.13.0 installed
 ```
 All verification checks passed successfully.
 ## Integration Notes
 **For downstream consumers:**
 1. **Visualization usage:**
   ```python
   from usher_pipeline.output.visualizations import generate_all_plots
   plots = generate_all_plots(tiered_df, output_dir)
   # Returns: {"score_distribution": Path, "layer_contributions": Path, "tier_breakdown": Path}
   ```
 2. **Reproducibility report usage:**
   ```python
   from usher_pipeline.output.reproducibility import generate_reproducibility_report
   report = generate_reproducibility_report(
       config=config,
       tiered_df=tiered_df,
       provenance=provenance,
       validation_result=validation_dict  # Optional
   )
   report.to_json(output_dir / "reproducibility.json")
   report.to_markdown(output_dir / "reproducibility.md")
   ```
 3. **Expected columns in tiered_df:**
   - composite_score, confidence_tier
   - gnomad_score, expression_score, annotation_score, localization_score, animal_model_score, literature_score (can be NULL)
 4. **Plot output:**
   - All plots saved as PNG at 300 DPI
   - Figures properly closed (no memory leaks)
   - Empty DataFrames handled gracefully
 5. **Report content:**
   - JSON: Machine-readable, parseable with standard json library
   - Markdown: Human-readable with tables, headers, formatted statistics
   - Both contain identical information, just different presentations
 ## Next Steps
 Plan 05-03 will integrate these modules into the CLI with an `output` command that:
 - Loads tiered results from DuckDB
 - Generates all plots to output directory
 - Creates reproducibility reports in both formats
 - Provides summary statistics to console
 This plan completes the reporting infrastructure. All visualization and documentation generation logic is now available as reusable modules.