Files
usher-exploring/.planning/phases/05-output-cli/05-02-SUMMARY.md
gbanyan 5f14dc2e64 docs(05-02): complete visualization and reproducibility report plan
- Plan 05-02 executed successfully
- 2 tasks completed with 2 commits
- 13 tests passing (6 visualization + 7 reproducibility)
- 4 files created, 2 files modified
- Duration: 5 minutes
- Updated STATE.md with progress (17/20 plans complete, 85%)
2026-02-12 04:03:08 +08:00

9.7 KiB

phase, plan, subsystem, tags, completed, duration_minutes, dependencies, tech_stack, key_files, decisions, metrics
phase plan subsystem tags completed duration_minutes dependencies tech_stack key_files decisions metrics
05-output-cli 02 output
visualization
reproducibility
reporting
matplotlib
seaborn
2026-02-11 5
requires provides affects
config.schema (PipelineConfig, ScoringWeights, DataSourceVersions)
persistence.provenance (ProvenanceTracker)
visualizations.generate_all_plots
reproducibility.generate_reproducibility_report
output.__init__ (exports visualization and reproducibility functions)
added patterns
matplotlib>=3.8.0 (visualization library with Agg backend)
seaborn>=0.13.0 (statistical visualization on top of matplotlib)
Non-interactive backend (Agg) for headless/CLI safety
Proper figure cleanup with plt.close() to prevent memory leaks
Graceful degradation (individual plot failures don't block others)
Dataclass-based report structure with dual format output (JSON + Markdown)
created modified
src/usher_pipeline/output/visualizations.py (plot generation functions)
src/usher_pipeline/output/reproducibility.py (report generation)
tests/test_visualizations.py (6 tests for plot creation)
tests/test_reproducibility.py (7 tests for report content)
pyproject.toml (added matplotlib and seaborn dependencies)
src/usher_pipeline/output/__init__.py (exported new functions)
matplotlib_backend
Use Agg (non-interactive) backend for headless/CLI safety
plot_dpi
300 DPI for publication-quality output
tier_colors
GREEN/ORANGE/RED for HIGH/MEDIUM/LOW (consistent across plots)
error_handling
Wrap each plot in try/except so failures don't block batch generation
report_formats
JSON for machine-readable + Markdown for human-readable
validation_optional
Validation metrics are optional in reproducibility report
tasks commits tests files_created files_modified
2 2 13 4 2

Phase 05 Plan 02: Visualization and Reproducibility Reports Summary

One-liner: Matplotlib/seaborn visualizations (score distributions, layer contributions, tier breakdowns) and dual-format reproducibility reports (JSON + Markdown) with parameters, data versions, filtering steps, and validation metrics.

Execution Flow

Task 1: Visualization module with matplotlib/seaborn plots (commit: 150417f)

Created visualization module with 3 plot types + orchestrator:

  1. plot_score_distribution: Histogram of composite scores colored by confidence tier (HIGH=green, MEDIUM=orange, LOW=red), 30 bins, stacked display
  2. plot_layer_contributions: Bar chart showing non-null count per evidence layer (gnomAD, expression, annotation, localization, animal model, literature)
  3. plot_tier_breakdown: Pie chart with percentage labels for tier distribution
  4. generate_all_plots: Orchestrator that creates all 3 plots with error handling (one failure doesn't block others)

Key technical decisions:

  • Use matplotlib Agg backend (non-interactive, headless-safe)
  • Save all plots at 300 DPI for publication quality
  • Always call plt.close(fig) after savefig to prevent memory leaks (critical pitfall from research)
  • Convert polars DataFrame to pandas via to_pandas() for seaborn compatibility (acceptable overhead for small result sets)

Dependencies added:

  • matplotlib>=3.8.0
  • seaborn>=0.13.0

Tests created (6 tests):

  • test_plot_score_distribution_creates_file
  • test_plot_layer_contributions_creates_file
  • test_plot_tier_breakdown_creates_file
  • test_generate_all_plots_creates_all_files
  • test_generate_all_plots_returns_paths
  • test_plots_handle_empty_dataframe (edge case)

All tests pass with only expected warnings for empty DataFrame edge cases.

Task 2: Reproducibility report module (commit: 5af63ea)

Created reproducibility report generation with dual formats:

  1. FilteringStep dataclass: Captures step_name, input_count, output_count, criteria
  2. ReproducibilityReport dataclass: Contains run_id (UUID4), timestamp, pipeline_version, parameters (scoring weights), data_versions (Ensembl/gnomAD/GTEx/HPA), software_environment (Python/polars/duckdb versions), filtering_steps, validation_metrics (optional), tier_statistics
  3. generate_reproducibility_report: Extracts all metadata from config, provenance, and tiered DataFrame
  4. Report output methods:
    • to_json(): Indented JSON for machine parsing
    • to_markdown(): Tables and headers for human reading
    • to_dict(): Dictionary for programmatic access

Key design patterns:

  • NULL-preserving tier counting with polars group_by().agg(pl.len())
  • Optional validation metrics (report generates whether or not validation results are provided)
  • Filtering steps extracted from ProvenanceTracker.get_steps()
  • Software versions captured at report generation time (sys.version, pl.version, duckdb.version)

Tests created (7 tests):

  • test_generate_report_has_all_fields
  • test_report_to_json_parseable
  • test_report_to_markdown_has_headers
  • test_report_tier_statistics_match
  • test_report_includes_validation_when_provided
  • test_report_without_validation
  • test_report_software_versions

All tests pass.

Deviations from Plan

Auto-fixed Issues:

1. [Rule 1 - Bug] Fixed deprecated polars API

  • Found during: Task 2 testing
  • Issue: pl.count() is deprecated in polars 0.20.5+, replaced with pl.len()
  • Fix: Updated pl.count().alias("count") to pl.len().alias("count") in both visualizations.py and reproducibility.py
  • Files modified: src/usher_pipeline/output/visualizations.py, src/usher_pipeline/output/reproducibility.py
  • Commit: Included in 5af63ea

2. [Rule 1 - Bug] Fixed matplotlib/seaborn deprecation warnings

  • Found during: Task 1 testing
  • Issue: seaborn barplot warning about passing palette without hue, and set_xticklabels() warning about fixed ticks
  • Fix: Added hue=labels and legend=False to barplot call, changed ax.set_xticklabels() to plt.setp()
  • Files modified: src/usher_pipeline/output/visualizations.py
  • Commit: Included in 150417f (amended during testing)

3. [Rule 3 - Blocking] init.py already updated

  • Found during: Task 2 commit preparation
  • Issue: Discovered that src/usher_pipeline/output/init.py was already updated with reproducibility and visualization exports by a parallel process
  • Resolution: No action needed - integration work already completed by plan 05-01
  • Impact: Positive - reduces risk of merge conflicts and ensures consistency

Verification Results

Import verification:

✓ from usher_pipeline.output.visualizations import generate_all_plots - OK
✓ from usher_pipeline.output.reproducibility import generate_reproducibility_report, ReproducibilityReport - OK

Test results:

✓ 6/6 visualization tests pass
✓ 7/7 reproducibility tests pass
✓ Total: 13/13 tests pass

Success criteria met:

  • Visualization module produces 3 PNG plots at 300 DPI
  • Score distribution plot with tier color coding (GREEN/ORANGE/RED)
  • Layer contributions bar chart showing evidence coverage
  • Tier breakdown pie chart with percentages
  • Reproducibility report generates in both JSON and Markdown formats
  • Report contains parameters, data versions, filtering steps, tier statistics
  • Optional validation metrics handled gracefully
  • matplotlib Agg backend used (no display required)
  • All tests pass
  • Proper figure cleanup (plt.close) implemented

Self-Check: PASSED

Created files exist:

✓ FOUND: src/usher_pipeline/output/visualizations.py
✓ FOUND: src/usher_pipeline/output/reproducibility.py
✓ FOUND: tests/test_visualizations.py
✓ FOUND: tests/test_reproducibility.py

Commits exist:

✓ FOUND: 150417f (Task 1 - visualization module)
✓ FOUND: 5af63ea (Task 2 - reproducibility module)

Tests pass:

✓ 13/13 tests pass (6 visualization + 7 reproducibility)

Dependencies installed:

✓ matplotlib>=3.8.0 installed
✓ seaborn>=0.13.0 installed

All verification checks passed successfully.

Integration Notes

For downstream consumers:

  1. Visualization usage:

    from usher_pipeline.output.visualizations import generate_all_plots
    
    plots = generate_all_plots(tiered_df, output_dir)
    # Returns: {"score_distribution": Path, "layer_contributions": Path, "tier_breakdown": Path}
    
  2. Reproducibility report usage:

    from usher_pipeline.output.reproducibility import generate_reproducibility_report
    
    report = generate_reproducibility_report(
        config=config,
        tiered_df=tiered_df,
        provenance=provenance,
        validation_result=validation_dict  # Optional
    )
    
    report.to_json(output_dir / "reproducibility.json")
    report.to_markdown(output_dir / "reproducibility.md")
    
  3. Expected columns in tiered_df:

    • composite_score, confidence_tier
    • gnomad_score, expression_score, annotation_score, localization_score, animal_model_score, literature_score (can be NULL)
  4. Plot output:

    • All plots saved as PNG at 300 DPI
    • Figures properly closed (no memory leaks)
    • Empty DataFrames handled gracefully
  5. Report content:

    • JSON: Machine-readable, parseable with standard json library
    • Markdown: Human-readable with tables, headers, formatted statistics
    • Both contain identical information, just different presentations

Next Steps

Plan 05-03 will integrate these modules into the CLI with an output command that:

  • Loads tiered results from DuckDB
  • Generates all plots to output directory
  • Creates reproducibility reports in both formats
  • Provides summary statistics to console

This plan completes the reporting infrastructure. All visualization and documentation generation logic is now available as reusable modules.