11 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 05-output-cli | 02 | execute | 1 |
|
true |
|
Purpose: Provides the visual and textual reporting layer that makes pipeline results interpretable for researchers and satisfies reproducibility requirements for scientific pipelines.
Output: src/usher_pipeline/output/visualizations.py, src/usher_pipeline/output/reproducibility.py, and associated tests.
<execution_context> @/Users/gbanyan/.claude/get-shit-done/workflows/execute-plan.md @/Users/gbanyan/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/05-output-cli/05-RESEARCH.md @src/usher_pipeline/config/schema.py @src/usher_pipeline/persistence/provenance.py @src/usher_pipeline/scoring/quality_control.py @src/usher_pipeline/scoring/validation.py Task 1: Visualization module with matplotlib/seaborn plots src/usher_pipeline/output/visualizations.py pyproject.toml tests/test_visualizations.py **pyproject.toml**: Add matplotlib and seaborn to dependencies list: - "matplotlib>=3.8.0" - "seaborn>=0.13.0"visualizations.py: Create visualization module with 3 plot functions and 1 orchestrator.
Use matplotlib backend "Agg" (non-interactive, safe for headless/CLI use): call matplotlib.use("Agg") before importing pyplot.
-
plot_score_distribution(df: pl.DataFrame, output_path: Path) -> Path:- Converts to pandas via df.to_pandas() (small result set, acceptable overhead per research)
- Sets seaborn theme:
sns.set_theme(style="whitegrid", context="paper") - Creates histogram of composite_score colored by confidence_tier
- Uses
sns.histplot(data=pdf, x="composite_score", hue="confidence_tier", hue_order=["HIGH", "MEDIUM", "LOW"], palette={"HIGH": "#2ecc71", "MEDIUM": "#f39c12", "LOW": "#e74c3c"}, bins=30, multiple="stack") - Labels: x="Composite Score", y="Candidate Count", title="Score Distribution by Confidence Tier"
- Saves as PNG at 300 DPI with bbox_inches='tight'
- CRITICAL: Always call plt.close(fig) after savefig (memory leak pitfall from research)
- Returns output_path
-
plot_layer_contributions(df: pl.DataFrame, output_path: Path) -> Path:- Counts non-null values per layer score column: gnomad_score, expression_score, annotation_score, localization_score, animal_model_score, literature_score
- Creates bar chart using seaborn barplot with viridis palette
- X-axis labels cleaned (remove "_score" suffix), rotated 45 degrees
- Labels: x="Evidence Layer", y="Candidates with Evidence", title="Evidence Layer Coverage"
- Saves PNG at 300 DPI, closes figure
- Returns output_path
-
plot_tier_breakdown(df: pl.DataFrame, output_path: Path) -> Path:- Counts genes per confidence_tier
- Creates pie chart with percentage labels (autopct='%1.1f%%')
- Colors match score_distribution palette (green/orange/red for HIGH/MEDIUM/LOW)
- Title: "Candidate Tier Breakdown"
- Saves PNG at 300 DPI, closes figure
- Returns output_path
-
generate_all_plots(df: pl.DataFrame, output_dir: Path) -> dict[str, Path]:- Creates output_dir if not exists
- Calls all 3 plot functions with standard filenames: score_distribution.png, layer_contributions.png, tier_breakdown.png
- Returns dict mapping plot name to file path
- Wraps each plot in try/except so one failure doesn't block others (log warning on failure)
tests/test_visualizations.py: Test file creation.
Create synthetic DataFrame fixture with ~30 rows including confidence_tier and all 6 layer score columns (some NULL).
Tests:
- test_plot_score_distribution_creates_file: Verify PNG file created and size > 0
- test_plot_layer_contributions_creates_file: Verify PNG file created
- test_plot_tier_breakdown_creates_file: Verify PNG file created
- test_generate_all_plots_creates_all_files: Verify all 3 PNG files exist in output_dir
- test_generate_all_plots_returns_paths: Verify returned dict has 3 entries
- test_plots_handle_empty_dataframe: Empty DataFrame produces plots without crashing (edge case)
Run:
cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_visualizations.py -vAll 6 visualization tests pass. PNG files are created at 300 DPI. Plots handle edge cases (empty data, all-NULL columns) without crashing. matplotlib figures are properly closed after saving.
Define FilteringStep dataclass:
- step_name: str
- input_count: int
- output_count: int
- criteria: str
Define ReproducibilityReport dataclass:
- run_id: str (UUID4)
- timestamp: str (ISO format)
- pipeline_version: str
- parameters: dict (scoring weights, thresholds, etc.)
- data_versions: dict (ensembl_release, gnomad_version, gtex_version, hpa_version)
- software_environment: dict (python version, polars version, duckdb version, etc.)
- filtering_steps: list[FilteringStep]
- validation_metrics: dict (from validation.py output if available)
- tier_statistics: dict (total, high, medium, low counts)
Methods on ReproducibilityReport:
to_json(path: Path) -> Path: Write as indented JSON fileto_markdown(path: Path) -> Path: Write as human-readable Markdown with tables for filtering steps, parameters section, software versions, tier statistics, validation metricsto_dict() -> dict: Return as plain dict for programmatic access
Implement generate_reproducibility_report(config: PipelineConfig, tiered_df: pl.DataFrame, provenance: ProvenanceTracker, validation_result: dict | None = None) -> ReproducibilityReport:
- Extracts parameters from config (scoring weights via config.scoring.model_dump(), data_versions via config.versions.model_dump())
- Computes tier_statistics from tiered_df confidence_tier column
- Builds filtering_steps from provenance.get_steps() -- each recorded step with gene counts
- Captures software versions: sys.version, polars.version, duckdb.version
- Generates UUID4 run_id
- If validation_result provided, includes median_percentile, top_quartile_fraction, validation_passed
- Returns ReproducibilityReport instance
Update init.py: Add generate_reproducibility_report, ReproducibilityReport, generate_all_plots, and individual plot functions to exports. Also add visualizations imports.
tests/test_reproducibility.py: Test report content.
Create mock config, mock provenance tracker, and synthetic tiered DataFrame.
Tests:
- test_generate_report_has_all_fields: Report contains run_id, timestamp, pipeline_version, parameters, data_versions, software_environment, tier_statistics
- test_report_to_json_parseable: Write JSON, read back with json.load, verify it's valid JSON with expected keys
- test_report_to_markdown_has_headers: Markdown output contains "# Pipeline Reproducibility Report", "## Parameters", "## Data Versions", "## Filtering Steps", "## Tier Statistics"
- test_report_tier_statistics_match: tier_statistics.total == tiered_df.height, high + medium + low == total
- test_report_includes_validation_when_provided: When validation_result dict is passed, report contains validation_metrics section
- test_report_without_validation: When validation_result is None, report still generates without error
- test_report_software_versions: software_environment contains python, polars, duckdb keys
Run:
cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_reproducibility.py -vAll 7 reproducibility tests pass. Report generates in both JSON and Markdown formats. JSON is valid and parseable. Markdown contains all required sections with proper formatting. Tier statistics are accurate. Validation metrics are optional and handled gracefully.
<success_criteria>
- Visualization module produces 3 PNG plots (score distribution, layer contributions, tier breakdown) at 300 DPI
- Reproducibility report module generates both JSON and Markdown formats with parameters, data versions, filtering steps, tier statistics, and optional validation metrics
- All tests pass
- No matplotlib display window opened (Agg backend) </success_criteria>