usher-exploring/.planning/phases/05-output-cli/05-02-PLAN.md at c284804493f7a6541c006543c963e8d37e6ad077

gbanyan/usher-exploring

Fork 0

Files

gbanyan 6ab7fd1378 docs(05-output-cli): create phase plan

2026-02-11 21:14:37 +08:00

11 KiB

Raw Blame History

phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves

phase

plan

type

wave

depends_on

files_modified

autonomous

must_haves

05-output-cli

execute

src/usher_pipeline/output/visualizations.py

src/usher_pipeline/output/reproducibility.py

pyproject.toml

tests/test_visualizations.py

tests/test_reproducibility.py

true

truths

artifacts

key_links

Pipeline generates score distribution histogram with tier color coding as PNG

Pipeline generates evidence layer contribution bar chart as PNG

Pipeline generates tier breakdown pie chart as PNG

Reproducibility report documents scoring parameters, data versions, gene counts per filtering step, and validation metrics

Reproducibility report is generated in both JSON (machine-readable) and Markdown (human-readable) formats

path

provides

exports

src/usher_pipeline/output/visualizations.py

matplotlib/seaborn visualization functions

plot_score_distribution

plot_layer_contributions

plot_tier_breakdown

generate_all_plots

path

provides

exports

src/usher_pipeline/output/reproducibility.py

Reproducibility report generation

generate_reproducibility_report

ReproducibilityReport

path	provides
tests/test_visualizations.py	Tests for visualization file creation

path	provides
tests/test_reproducibility.py	Tests for report content and formatting

from	to	via	pattern
src/usher_pipeline/output/visualizations.py	matplotlib/seaborn	to_pandas() conversion for seaborn compatibility	to_pandas.*sns.

from	to	via	pattern
src/usher_pipeline/output/reproducibility.py	provenance tracker and config	reads ProvenanceTracker metadata and PipelineConfig	provenance.create_metadata\|config.model_dump

Create visualization and reproducibility report modules: score distribution plots, evidence layer contribution charts, tier breakdowns, and comprehensive reproducibility documentation in JSON+Markdown formats.

Purpose: Provides the visual and textual reporting layer that makes pipeline results interpretable for researchers and satisfies reproducibility requirements for scientific pipelines. Output: src/usher_pipeline/output/visualizations.py, src/usher_pipeline/output/reproducibility.py, and associated tests.

<execution_context> @/Users/gbanyan/.claude/get-shit-done/workflows/execute-plan.md @/Users/gbanyan/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/05-output-cli/05-RESEARCH.md @src/usher_pipeline/config/schema.py @src/usher_pipeline/persistence/provenance.py @src/usher_pipeline/scoring/quality_control.py @src/usher_pipeline/scoring/validation.py Task 1: Visualization module with matplotlib/seaborn plots src/usher_pipeline/output/visualizations.py pyproject.toml tests/test_visualizations.py **pyproject.toml**: Add matplotlib and seaborn to dependencies list: - "matplotlib>=3.8.0" - "seaborn>=0.13.0"

visualizations.py: Create visualization module with 3 plot functions and 1 orchestrator.

Use matplotlib backend "Agg" (non-interactive, safe for headless/CLI use): call matplotlib.use("Agg") before importing pyplot.

plot_score_distribution(df: pl.DataFrame, output_path: Path) -> Path:
- Converts to pandas via df.to_pandas() (small result set, acceptable overhead per research)
- Sets seaborn theme: sns.set_theme(style="whitegrid", context="paper")
- Creates histogram of composite_score colored by confidence_tier
- Uses sns.histplot(data=pdf, x="composite_score", hue="confidence_tier", hue_order=["HIGH", "MEDIUM", "LOW"], palette={"HIGH": "#2ecc71", "MEDIUM": "#f39c12", "LOW": "#e74c3c"}, bins=30, multiple="stack")
- Labels: x="Composite Score", y="Candidate Count", title="Score Distribution by Confidence Tier"
- Saves as PNG at 300 DPI with bbox_inches='tight'
- CRITICAL: Always call plt.close(fig) after savefig (memory leak pitfall from research)
- Returns output_path
plot_layer_contributions(df: pl.DataFrame, output_path: Path) -> Path:
- Counts non-null values per layer score column: gnomad_score, expression_score, annotation_score, localization_score, animal_model_score, literature_score
- Creates bar chart using seaborn barplot with viridis palette
- X-axis labels cleaned (remove "_score" suffix), rotated 45 degrees
- Labels: x="Evidence Layer", y="Candidates with Evidence", title="Evidence Layer Coverage"
- Saves PNG at 300 DPI, closes figure
- Returns output_path
plot_tier_breakdown(df: pl.DataFrame, output_path: Path) -> Path:
- Counts genes per confidence_tier
- Creates pie chart with percentage labels (autopct='%1.1f%%')
- Colors match score_distribution palette (green/orange/red for HIGH/MEDIUM/LOW)
- Title: "Candidate Tier Breakdown"
- Saves PNG at 300 DPI, closes figure
- Returns output_path
generate_all_plots(df: pl.DataFrame, output_dir: Path) -> dict[str, Path]:
- Creates output_dir if not exists
- Calls all 3 plot functions with standard filenames: score_distribution.png, layer_contributions.png, tier_breakdown.png
- Returns dict mapping plot name to file path
- Wraps each plot in try/except so one failure doesn't block others (log warning on failure)

tests/test_visualizations.py: Test file creation.

Create synthetic DataFrame fixture with ~30 rows including confidence_tier and all 6 layer score columns (some NULL).

Tests:

test_plot_score_distribution_creates_file: Verify PNG file created and size > 0
test_plot_layer_contributions_creates_file: Verify PNG file created
test_plot_tier_breakdown_creates_file: Verify PNG file created
test_generate_all_plots_creates_all_files: Verify all 3 PNG files exist in output_dir
test_generate_all_plots_returns_paths: Verify returned dict has 3 entries
test_plots_handle_empty_dataframe: Empty DataFrame produces plots without crashing (edge case) Run: cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_visualizations.py -v All 6 visualization tests pass. PNG files are created at 300 DPI. Plots handle edge cases (empty data, all-NULL columns) without crashing. matplotlib figures are properly closed after saving.

Task 2: Reproducibility report module with JSON and Markdown output src/usher_pipeline/output/reproducibility.py src/usher_pipeline/output/__init__.py tests/test_reproducibility.py **reproducibility.py**: Create reproducibility report generation module.

Define FilteringStep dataclass:

step_name: str
input_count: int
output_count: int
criteria: str

Define ReproducibilityReport dataclass:

run_id: str (UUID4)
timestamp: str (ISO format)
pipeline_version: str
parameters: dict (scoring weights, thresholds, etc.)
data_versions: dict (ensembl_release, gnomad_version, gtex_version, hpa_version)
software_environment: dict (python version, polars version, duckdb version, etc.)
filtering_steps: list[FilteringStep]
validation_metrics: dict (from validation.py output if available)
tier_statistics: dict (total, high, medium, low counts)

Methods on ReproducibilityReport:

to_json(path: Path) -> Path: Write as indented JSON file
to_markdown(path: Path) -> Path: Write as human-readable Markdown with tables for filtering steps, parameters section, software versions, tier statistics, validation metrics
to_dict() -> dict: Return as plain dict for programmatic access

Implement generate_reproducibility_report(config: PipelineConfig, tiered_df: pl.DataFrame, provenance: ProvenanceTracker, validation_result: dict | None = None) -> ReproducibilityReport:

Extracts parameters from config (scoring weights via config.scoring.model_dump(), data_versions via config.versions.model_dump())
Computes tier_statistics from tiered_df confidence_tier column
Builds filtering_steps from provenance.get_steps() -- each recorded step with gene counts
Captures software versions: sys.version, polars.version, duckdb.version
Generates UUID4 run_id
If validation_result provided, includes median_percentile, top_quartile_fraction, validation_passed
Returns ReproducibilityReport instance

Update init.py: Add generate_reproducibility_report, ReproducibilityReport, generate_all_plots, and individual plot functions to exports. Also add visualizations imports.

tests/test_reproducibility.py: Test report content.

Create mock config, mock provenance tracker, and synthetic tiered DataFrame.

Tests:

test_generate_report_has_all_fields: Report contains run_id, timestamp, pipeline_version, parameters, data_versions, software_environment, tier_statistics
test_report_to_json_parseable: Write JSON, read back with json.load, verify it's valid JSON with expected keys
test_report_to_markdown_has_headers: Markdown output contains "# Pipeline Reproducibility Report", "## Parameters", "## Data Versions", "## Filtering Steps", "## Tier Statistics"
test_report_tier_statistics_match: tier_statistics.total == tiered_df.height, high + medium + low == total
test_report_includes_validation_when_provided: When validation_result dict is passed, report contains validation_metrics section
test_report_without_validation: When validation_result is None, report still generates without error
test_report_software_versions: software_environment contains python, polars, duckdb keys Run: cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_reproducibility.py -v All 7 reproducibility tests pass. Report generates in both JSON and Markdown formats. JSON is valid and parseable. Markdown contains all required sections with proper formatting. Tier statistics are accurate. Validation metrics are optional and handled gracefully.

- `python -c "from usher_pipeline.output.visualizations import generate_all_plots; print('viz OK')"` succeeds - `python -c "from usher_pipeline.output.reproducibility import generate_reproducibility_report, ReproducibilityReport; print('report OK')"` succeeds - `python -m pytest tests/test_visualizations.py tests/test_reproducibility.py -v` -- all tests pass - matplotlib Agg backend used (no display required)

<success_criteria>

Visualization module produces 3 PNG plots (score distribution, layer contributions, tier breakdown) at 300 DPI
Reproducibility report module generates both JSON and Markdown formats with parameters, data versions, filtering steps, tier statistics, and optional validation metrics
All tests pass
No matplotlib display window opened (Agg backend) </success_criteria>

After completion, create `.planning/phases/05-output-cli/05-02-SUMMARY.md`

11 KiB Raw Blame History

11 KiB

Raw Blame History