usher-exploring/.planning/phases/05-output-cli/05-02-PLAN.md

---
phase: 05-output-cli
plan: 02
type: execute
wave: 1
depends_on: []
files_modified:
  - src/usher_pipeline/output/visualizations.py
  - src/usher_pipeline/output/reproducibility.py
  - pyproject.toml
  - tests/test_visualizations.py
  - tests/test_reproducibility.py
autonomous: true

must_haves:
  truths:
    - "Pipeline generates score distribution histogram with tier color coding as PNG"
    - "Pipeline generates evidence layer contribution bar chart as PNG"
    - "Pipeline generates tier breakdown pie chart as PNG"
    - "Reproducibility report documents scoring parameters, data versions, gene counts per filtering step, and validation metrics"
    - "Reproducibility report is generated in both JSON (machine-readable) and Markdown (human-readable) formats"
  artifacts:
    - path: "src/usher_pipeline/output/visualizations.py"
      provides: "matplotlib/seaborn visualization functions"
      exports: ["plot_score_distribution", "plot_layer_contributions", "plot_tier_breakdown", "generate_all_plots"]
    - path: "src/usher_pipeline/output/reproducibility.py"
      provides: "Reproducibility report generation"
      exports: ["generate_reproducibility_report", "ReproducibilityReport"]
    - path: "tests/test_visualizations.py"
      provides: "Tests for visualization file creation"
    - path: "tests/test_reproducibility.py"
      provides: "Tests for report content and formatting"
  key_links:
    - from: "src/usher_pipeline/output/visualizations.py"
      to: "matplotlib/seaborn"
      via: "to_pandas() conversion for seaborn compatibility"
      pattern: "to_pandas.*sns\\."
    - from: "src/usher_pipeline/output/reproducibility.py"
      to: "provenance tracker and config"
      via: "reads ProvenanceTracker metadata and PipelineConfig"
      pattern: "provenance.*create_metadata|config.*model_dump"
---

<objective>
Create visualization and reproducibility report modules: score distribution plots, evidence layer contribution charts, tier breakdowns, and comprehensive reproducibility documentation in JSON+Markdown formats.

Purpose: Provides the visual and textual reporting layer that makes pipeline results interpretable for researchers and satisfies reproducibility requirements for scientific pipelines.
Output: `src/usher_pipeline/output/visualizations.py`, `src/usher_pipeline/output/reproducibility.py`, and associated tests.
</objective>

<execution_context>
@/Users/gbanyan/.claude/get-shit-done/workflows/execute-plan.md
@/Users/gbanyan/.claude/get-shit-done/templates/summary.md
</execution_context>

<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/05-output-cli/05-RESEARCH.md
@src/usher_pipeline/config/schema.py
@src/usher_pipeline/persistence/provenance.py
@src/usher_pipeline/scoring/quality_control.py
@src/usher_pipeline/scoring/validation.py
</context>

<tasks>

<task type="auto">
  <name>Task 1: Visualization module with matplotlib/seaborn plots</name>
  <files>
    src/usher_pipeline/output/visualizations.py
    pyproject.toml
    tests/test_visualizations.py
  </files>
  <action>
**pyproject.toml**: Add matplotlib and seaborn to dependencies list:
- "matplotlib>=3.8.0"
- "seaborn>=0.13.0"

**visualizations.py**: Create visualization module with 3 plot functions and 1 orchestrator.

Use matplotlib backend "Agg" (non-interactive, safe for headless/CLI use): call `matplotlib.use("Agg")` before importing pyplot.

1. `plot_score_distribution(df: pl.DataFrame, output_path: Path) -> Path`:
   - Converts to pandas via df.to_pandas() (small result set, acceptable overhead per research)
   - Sets seaborn theme: `sns.set_theme(style="whitegrid", context="paper")`
   - Creates histogram of composite_score colored by confidence_tier
   - Uses `sns.histplot(data=pdf, x="composite_score", hue="confidence_tier", hue_order=["HIGH", "MEDIUM", "LOW"], palette={"HIGH": "#2ecc71", "MEDIUM": "#f39c12", "LOW": "#e74c3c"}, bins=30, multiple="stack")`
   - Labels: x="Composite Score", y="Candidate Count", title="Score Distribution by Confidence Tier"
   - Saves as PNG at 300 DPI with bbox_inches='tight'
   - CRITICAL: Always call plt.close(fig) after savefig (memory leak pitfall from research)
   - Returns output_path

2. `plot_layer_contributions(df: pl.DataFrame, output_path: Path) -> Path`:
   - Counts non-null values per layer score column: gnomad_score, expression_score, annotation_score, localization_score, animal_model_score, literature_score
   - Creates bar chart using seaborn barplot with viridis palette
   - X-axis labels cleaned (remove "_score" suffix), rotated 45 degrees
   - Labels: x="Evidence Layer", y="Candidates with Evidence", title="Evidence Layer Coverage"
   - Saves PNG at 300 DPI, closes figure
   - Returns output_path

3. `plot_tier_breakdown(df: pl.DataFrame, output_path: Path) -> Path`:
   - Counts genes per confidence_tier
   - Creates pie chart with percentage labels (autopct='%1.1f%%')
   - Colors match score_distribution palette (green/orange/red for HIGH/MEDIUM/LOW)
   - Title: "Candidate Tier Breakdown"
   - Saves PNG at 300 DPI, closes figure
   - Returns output_path

4. `generate_all_plots(df: pl.DataFrame, output_dir: Path) -> dict[str, Path]`:
   - Creates output_dir if not exists
   - Calls all 3 plot functions with standard filenames: score_distribution.png, layer_contributions.png, tier_breakdown.png
   - Returns dict mapping plot name to file path
   - Wraps each plot in try/except so one failure doesn't block others (log warning on failure)

**tests/test_visualizations.py**: Test file creation.

Create synthetic DataFrame fixture with ~30 rows including confidence_tier and all 6 layer score columns (some NULL).

Tests:
1. test_plot_score_distribution_creates_file: Verify PNG file created and size > 0
2. test_plot_layer_contributions_creates_file: Verify PNG file created
3. test_plot_tier_breakdown_creates_file: Verify PNG file created
4. test_generate_all_plots_creates_all_files: Verify all 3 PNG files exist in output_dir
5. test_generate_all_plots_returns_paths: Verify returned dict has 3 entries
6. test_plots_handle_empty_dataframe: Empty DataFrame produces plots without crashing (edge case)
  </action>
  <verify>
Run: `cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_visualizations.py -v`
  </verify>
  <done>
All 6 visualization tests pass. PNG files are created at 300 DPI. Plots handle edge cases (empty data, all-NULL columns) without crashing. matplotlib figures are properly closed after saving.
  </done>
</task>

<task type="auto">
  <name>Task 2: Reproducibility report module with JSON and Markdown output</name>
  <files>
    src/usher_pipeline/output/reproducibility.py
    src/usher_pipeline/output/__init__.py
    tests/test_reproducibility.py
  </files>
  <action>
**reproducibility.py**: Create reproducibility report generation module.

Define `FilteringStep` dataclass:
- step_name: str
- input_count: int
- output_count: int
- criteria: str

Define `ReproducibilityReport` dataclass:
- run_id: str (UUID4)
- timestamp: str (ISO format)
- pipeline_version: str
- parameters: dict (scoring weights, thresholds, etc.)
- data_versions: dict (ensembl_release, gnomad_version, gtex_version, hpa_version)
- software_environment: dict (python version, polars version, duckdb version, etc.)
- filtering_steps: list[FilteringStep]
- validation_metrics: dict (from validation.py output if available)
- tier_statistics: dict (total, high, medium, low counts)

Methods on ReproducibilityReport:
- `to_json(path: Path) -> Path`: Write as indented JSON file
- `to_markdown(path: Path) -> Path`: Write as human-readable Markdown with tables for filtering steps, parameters section, software versions, tier statistics, validation metrics
- `to_dict() -> dict`: Return as plain dict for programmatic access

Implement `generate_reproducibility_report(config: PipelineConfig, tiered_df: pl.DataFrame, provenance: ProvenanceTracker, validation_result: dict | None = None) -> ReproducibilityReport`:
- Extracts parameters from config (scoring weights via config.scoring.model_dump(), data_versions via config.versions.model_dump())
- Computes tier_statistics from tiered_df confidence_tier column
- Builds filtering_steps from provenance.get_steps() -- each recorded step with gene counts
- Captures software versions: sys.version, polars.__version__, duckdb.__version__
- Generates UUID4 run_id
- If validation_result provided, includes median_percentile, top_quartile_fraction, validation_passed
- Returns ReproducibilityReport instance

**Update __init__.py**: Add generate_reproducibility_report, ReproducibilityReport, generate_all_plots, and individual plot functions to exports. Also add visualizations imports.

**tests/test_reproducibility.py**: Test report content.

Create mock config, mock provenance tracker, and synthetic tiered DataFrame.

Tests:
1. test_generate_report_has_all_fields: Report contains run_id, timestamp, pipeline_version, parameters, data_versions, software_environment, tier_statistics
2. test_report_to_json_parseable: Write JSON, read back with json.load, verify it's valid JSON with expected keys
3. test_report_to_markdown_has_headers: Markdown output contains "# Pipeline Reproducibility Report", "## Parameters", "## Data Versions", "## Filtering Steps", "## Tier Statistics"
4. test_report_tier_statistics_match: tier_statistics.total == tiered_df.height, high + medium + low == total
5. test_report_includes_validation_when_provided: When validation_result dict is passed, report contains validation_metrics section
6. test_report_without_validation: When validation_result is None, report still generates without error
7. test_report_software_versions: software_environment contains python, polars, duckdb keys
  </action>
  <verify>
Run: `cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_reproducibility.py -v`
  </verify>
  <done>
All 7 reproducibility tests pass. Report generates in both JSON and Markdown formats. JSON is valid and parseable. Markdown contains all required sections with proper formatting. Tier statistics are accurate. Validation metrics are optional and handled gracefully.
  </done>
</task>

</tasks>

<verification>
- `python -c "from usher_pipeline.output.visualizations import generate_all_plots; print('viz OK')"` succeeds
- `python -c "from usher_pipeline.output.reproducibility import generate_reproducibility_report, ReproducibilityReport; print('report OK')"` succeeds
- `python -m pytest tests/test_visualizations.py tests/test_reproducibility.py -v` -- all tests pass
- matplotlib Agg backend used (no display required)
</verification>

<success_criteria>
- Visualization module produces 3 PNG plots (score distribution, layer contributions, tier breakdown) at 300 DPI
- Reproducibility report module generates both JSON and Markdown formats with parameters, data versions, filtering steps, tier statistics, and optional validation metrics
- All tests pass
- No matplotlib display window opened (Agg backend)
</success_criteria>

<output>
After completion, create `.planning/phases/05-output-cli/05-02-SUMMARY.md`
</output>