---
phase: 03-core-evidence-layers
plan: 05
type: execute
wave: 1
depends_on: []
files_modified:
  - src/usher_pipeline/evidence/animal_models/__init__.py
  - src/usher_pipeline/evidence/animal_models/models.py
  - src/usher_pipeline/evidence/animal_models/fetch.py
  - src/usher_pipeline/evidence/animal_models/transform.py
  - src/usher_pipeline/evidence/animal_models/load.py
  - tests/test_animal_models.py
  - tests/test_animal_models_integration.py
  - src/usher_pipeline/cli/evidence_cmd.py
autonomous: true

must_haves:
  truths:
    - "Pipeline retrieves gene knockout/perturbation phenotypes from MGI, ZFIN, and IMPC"
    - "Phenotypes are filtered for relevance to sensory function, balance, vision, hearing, and cilia morphology"
    - "Ortholog mapping uses established databases with confidence scoring, handling one-to-many mappings"
  artifacts:
    - path: "src/usher_pipeline/evidence/animal_models/fetch.py"
      provides: "MGI, ZFIN, and IMPC phenotype data retrieval"
      exports: ["fetch_mgi_phenotypes", "fetch_zfin_phenotypes", "fetch_impc_phenotypes", "fetch_ortholog_mapping"]
    - path: "src/usher_pipeline/evidence/animal_models/transform.py"
      provides: "Phenotype relevance filtering and scoring"
      exports: ["filter_sensory_phenotypes", "score_animal_evidence", "process_animal_model_evidence"]
    - path: "src/usher_pipeline/evidence/animal_models/load.py"
      provides: "DuckDB persistence for animal model evidence"
      exports: ["load_to_duckdb"]
    - path: "tests/test_animal_models.py"
      provides: "Unit tests for phenotype filtering and ortholog handling"
  key_links:
    - from: "src/usher_pipeline/evidence/animal_models/fetch.py"
      to: "MGI/IMPC bulk data and ZFIN API"
      via: "httpx with tenacity retry for bulk downloads"
      pattern: "informatics\\.jax\\.org|mousephenotype\\.org|zfin\\.org"
    - from: "src/usher_pipeline/evidence/animal_models/fetch.py"
      to: "HCOP ortholog database"
      via: "HCOP bulk download from HGNC"
      pattern: "genenames\\.org.*hcop"
    - from: "src/usher_pipeline/evidence/animal_models/transform.py"
      to: "src/usher_pipeline/evidence/animal_models/fetch.py"
      via: "filters phenotypes and scores animal model evidence"
      pattern: "filter_sensory_phenotypes|score_animal_evidence"
    - from: "src/usher_pipeline/evidence/animal_models/load.py"
      to: "src/usher_pipeline/persistence/duckdb_store.py"
      via: "store.save_dataframe"
      pattern: "save_dataframe.*animal_model_phenotypes"
---

<objective>
Implement the Animal Model Phenotypes evidence layer (ANIM-01/02/03): retrieve knockout/perturbation phenotypes from MGI (mouse), ZFIN (zebrafish), and IMPC, map orthologs with confidence scoring, filter for sensory/cilia relevance, and score.

Purpose: Animal model phenotypes provide functional evidence -- genes whose orthologs cause sensory, balance, hearing, or cilia defects in mouse/zebrafish are strong candidates. Ortholog confidence prevents false positives from paralog mis-mapping.
Output: animal_model_phenotypes DuckDB table with per-gene ortholog mapping, phenotype summaries, sensory relevance flags, and normalized animal model score.
</objective>

<execution_context>
@/Users/gbanyan/.claude/get-shit-done/workflows/execute-plan.md
@/Users/gbanyan/.claude/get-shit-done/templates/summary.md
</execution_context>

<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/03-core-evidence-layers/03-RESEARCH.md
@.planning/phases/02-prototype-evidence-layer/02-01-SUMMARY.md
@.planning/phases/02-prototype-evidence-layer/02-02-SUMMARY.md
@src/usher_pipeline/evidence/gnomad/fetch.py
@src/usher_pipeline/evidence/gnomad/load.py
@src/usher_pipeline/cli/evidence_cmd.py
@src/usher_pipeline/persistence/duckdb_store.py
</context>

<tasks>

<task type="auto">
  <name>Task 1: Create animal model evidence data model, fetch (orthologs + phenotypes), and transform</name>
  <files>
    src/usher_pipeline/evidence/animal_models/__init__.py
    src/usher_pipeline/evidence/animal_models/models.py
    src/usher_pipeline/evidence/animal_models/fetch.py
    src/usher_pipeline/evidence/animal_models/transform.py
  </files>
  <action>
    Create the animal model evidence layer following the established fetch->transform pattern.

    **models.py**: Define AnimalModelRecord pydantic model with fields: gene_id (str), gene_symbol (str), mouse_ortholog (str|None), mouse_ortholog_confidence (str|None -- "HIGH"/"MEDIUM"/"LOW" based on HCOP source count), zebrafish_ortholog (str|None), zebrafish_ortholog_confidence (str|None), has_mouse_phenotype (bool|None), has_zebrafish_phenotype (bool|None), has_impc_phenotype (bool|None), sensory_phenotype_count (int|None -- number of sensory-relevant phenotypes), phenotype_categories (str|None -- semicolon-separated list of matched MP/ZP terms), animal_model_score_normalized (float|None -- 0-1 composite). Define ANIMAL_TABLE_NAME = "animal_model_phenotypes". Define SENSORY_MP_KEYWORDS (Mammalian Phenotype ontology terms): ["hearing", "deaf", "vestibular", "balance", "retina", "photoreceptor", "vision", "blind", "cochlea", "stereocilia", "cilia", "cilium", "flagellum", "situs inversus", "laterality", "hydrocephalus", "kidney cyst", "polycystic"]. Define SENSORY_ZP_KEYWORDS similarly for zebrafish phenotype terms.

    **fetch.py**: Four functions:
    1. `fetch_ortholog_mapping(gene_ids: list[str]) -> pl.DataFrame` -- Download HCOP ortholog data from HGNC: https://ftp.ebi.ac.uk/pub/databases/genenames/hcop/human_mouse_hcop_fifteen_column.txt.gz and human_zebrafish equivalent. Parse with polars. Extract columns: human gene ID, ortholog ID, ortholog symbol, support count (number of databases agreeing). Assign confidence: HIGH (8+ sources), MEDIUM (4-7), LOW (1-3). For one-to-many mappings: keep the ortholog with highest support count. Flag genes with multiple orthologs (ortholog_count column). Return DataFrame with gene_id, mouse_ortholog, mouse_ortholog_confidence, zebrafish_ortholog, zebrafish_ortholog_confidence.
    2. `fetch_mgi_phenotypes(mouse_gene_ids: list[str]) -> pl.DataFrame` -- Download MGI gene-phenotype report from MGI FTP: https://www.informatics.jax.org/downloads/reports/MGI_GenePheno.rpt (or HMD_HumanPhenotype.rpt for human orthologs). Parse tab-separated report. Extract: mouse gene ID, allele symbol, MP term name, MP term ID. Return DataFrame with mouse gene and phenotype terms. Use httpx streaming download with retry.
    3. `fetch_zfin_phenotypes(zebrafish_gene_ids: list[str]) -> pl.DataFrame` -- Download ZFIN phenotype data from: https://zfin.org/downloads/phenoGeneCleanData_fish.txt. Parse tab-separated. Extract: zebrafish gene, phenotype terms. Return DataFrame. Use httpx streaming download with retry.
    4. `fetch_impc_phenotypes(mouse_gene_ids: list[str]) -> pl.DataFrame` -- Query IMPC SOLR API: https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=marker_symbol:{gene}&rows=1000. Process in batches (50 genes at a time with rate limiting). Extract: gene symbol, MP term, p-value, effect size. Return DataFrame. Use httpx with tenacity retry, ratelimit (5 req/sec conservative). If IMPC API is unreliable, fall back to bulk download from https://www.mousephenotype.org/data/release.

    **transform.py**: Three functions:
    1. `filter_sensory_phenotypes(phenotype_df: pl.DataFrame, keywords: list[str]) -> pl.DataFrame` -- Filter phenotype terms for sensory/cilia relevance using keyword matching (case-insensitive substring). Return only rows where MP/ZP term matches any keyword in SENSORY_MP_KEYWORDS or SENSORY_ZP_KEYWORDS. Count matches per gene as sensory_phenotype_count. Concatenate matched terms as phenotype_categories string.
    2. `score_animal_evidence(df: pl.DataFrame) -> pl.DataFrame` -- Compute animal_model_score_normalized. Scoring formula: base_score = 0 if no phenotypes. For each organism with sensory phenotypes: mouse +0.4 (weighted by ortholog confidence: HIGH=1.0, MEDIUM=0.7, LOW=0.4), zebrafish +0.3 (same confidence weighting), IMPC +0.3 (independent confirmation bonus). Clamp to [0, 1]. Multiply by log2(sensory_phenotype_count + 1) / log2(max_count + 1) to reward more phenotypes. NULL if no ortholog mapping exists.
    3. `process_animal_model_evidence(gene_ids: list[str]) -> pl.DataFrame` -- End-to-end: fetch orthologs -> fetch MGI -> fetch ZFIN -> fetch IMPC -> join on orthologs -> filter sensory -> score -> collect.

    Follow established patterns: NULL preservation (no ortholog = NULL, not zero), structlog logging. Handle one-to-many orthologs: take best confidence, aggregate phenotypes across all orthologs for that human gene.
  </action>
  <verify>
    cd /Users/gbanyan/Project/usher-exploring && python -c "from usher_pipeline.evidence.animal_models import fetch_ortholog_mapping, fetch_mgi_phenotypes, fetch_zfin_phenotypes, fetch_impc_phenotypes, filter_sensory_phenotypes, score_animal_evidence, process_animal_model_evidence; print('imports OK')"
  </verify>
  <done>
    Animal model fetch retrieves orthologs from HCOP and phenotypes from MGI, ZFIN, IMPC. Transform filters for sensory/cilia relevance and scores with ortholog confidence weighting. All functions importable.
  </done>
</task>

<task type="auto">
  <name>Task 2: Create animal model DuckDB loader, CLI command, and tests</name>
  <files>
    src/usher_pipeline/evidence/animal_models/load.py
    src/usher_pipeline/cli/evidence_cmd.py
    tests/test_animal_models.py
    tests/test_animal_models_integration.py
  </files>
  <action>
    **load.py**: Follow gnomad/load.py pattern. Create `load_to_duckdb(df, store, provenance, description)` saving to "animal_model_phenotypes" table. Record provenance: genes with mouse orthologs, genes with zebrafish orthologs, genes with sensory phenotypes, ortholog confidence distribution, mean sensory phenotype count. Create `query_sensory_phenotype_genes(store, min_score=0.3) -> pl.DataFrame` helper.

    **evidence_cmd.py**: Add `animal-models` subcommand to evidence command group. Follow gnomad pattern: checkpoint check (has_checkpoint('animal_model_phenotypes')), --force flag, load gene universe for gene_ids, call process_animal_model_evidence, load to DuckDB, save provenance sidecar to data/animal_models/phenotypes.provenance.json. Display summary: ortholog coverage, sensory phenotype counts by organism, top scoring genes.

    **tests/test_animal_models.py**: Unit tests with synthetic data. Mock httpx for all downloads. Test cases:
    - test_ortholog_confidence_high: 8+ supporting sources -> HIGH
    - test_ortholog_confidence_low: 1-3 sources -> LOW
    - test_one_to_many_best_selected: Multiple mouse orthologs -> highest confidence kept
    - test_sensory_keyword_match: "hearing loss" phenotype matches SENSORY_MP_KEYWORDS
    - test_non_sensory_filtered: "increased body weight" phenotype filtered out
    - test_score_with_confidence_weighting: HIGH confidence ortholog scores higher than LOW
    - test_score_null_no_ortholog: Gene without ortholog -> NULL score
    - test_multi_organism_bonus: Phenotypes in both mouse and zebrafish -> higher score
    - test_phenotype_count_scaling: More sensory phenotypes -> higher score (diminishing returns via log)
    - test_impc_integration: IMPC phenotypes contribute to score

    **tests/test_animal_models_integration.py**: Integration tests. Mock HCOP download, MGI/ZFIN/IMPC responses. Test full pipeline, checkpoint-restart, provenance. Synthetic phenotype report fixtures.
  </action>
  <verify>
    cd /Users/gbanyan/Project/usher-exploring && python -m pytest tests/test_animal_models.py tests/test_animal_models_integration.py -v
  </verify>
  <done>
    All animal model unit and integration tests pass. CLI `evidence animal-models` command registered. DuckDB stores animal_model_phenotypes table with ortholog mappings, phenotype summaries, and confidence-weighted scores. Checkpoint-restart works.
  </done>
</task>

</tasks>

<verification>
- `python -m pytest tests/test_animal_models.py tests/test_animal_models_integration.py -v` -- all tests pass
- `python -c "from usher_pipeline.evidence.animal_models import *"` -- all exports importable
- `usher-pipeline evidence animal-models --help` -- CLI help displays
- DuckDB animal_model_phenotypes table has columns: gene_id, gene_symbol, mouse_ortholog, mouse_ortholog_confidence, sensory_phenotype_count, phenotype_categories, animal_model_score_normalized
</verification>

<success_criteria>
- ANIM-01: Phenotypes retrieved from MGI (mouse), ZFIN (zebrafish), and IMPC via bulk downloads and API
- ANIM-02: Phenotypes filtered for sensory/balance/vision/hearing/cilia relevance via keyword matching
- ANIM-03: Ortholog mapping via HCOP with confidence scoring (HIGH/MEDIUM/LOW), one-to-many handled by taking best confidence
- Pattern compliance: fetch->transform->load->CLI->tests matching evidence layer structure
</success_criteria>

<output>
After completion, create `.planning/phases/03-core-evidence-layers/03-05-SUMMARY.md`
</output>