Files
usher-exploring/src/usher_pipeline/evidence/expression/__init__.py
gbanyan 8aa66987f8 feat(03-06): implement literature evidence models, PubMed fetch, and scoring
- Create LiteratureRecord pydantic model with context-specific counts
- Implement PubMed query via Biopython Entrez with rate limiting (3/sec default, 10/sec with API key)
- Define SEARCH_CONTEXTS for cilia, sensory, cytoskeleton, cell_polarity queries
- Implement evidence tier classification: direct_experimental > functional_mention > hts_hit > incidental > none
- Implement quality-weighted scoring with bias mitigation via log2(total_pubmed_count) normalization
- Add biopython>=1.84 dependency to pyproject.toml
- Support checkpoint-restart for long-running PubMed queries (estimated 3-11 hours for 20K genes)
2026-02-11 19:00:20 +08:00

49 lines
1.4 KiB
Python

"""Tissue expression evidence layer for Usher-relevant tissues.
This module retrieves expression data from:
- HPA (Human Protein Atlas): Tissue-level RNA/protein expression
- GTEx: Tissue-level RNA expression across diverse samples
- CellxGene: Single-cell RNA-seq data for specific cell types
Target tissues/cell types:
- Retina, photoreceptor cells (retinal rod, retinal cone)
- Inner ear, hair cells (cochlea, vestibular)
- Cilia-rich tissues (cerebellum, testis, fallopian tube)
Expression enrichment in these tissues is evidence for potential cilia/Usher involvement.
"""
from usher_pipeline.evidence.expression.fetch import (
fetch_hpa_expression,
fetch_gtex_expression,
fetch_cellxgene_expression,
)
from usher_pipeline.evidence.expression.transform import (
calculate_tau_specificity,
compute_expression_score,
process_expression_evidence,
)
from usher_pipeline.evidence.expression.load import (
load_to_duckdb,
query_tissue_enriched,
)
from usher_pipeline.evidence.expression.models import (
ExpressionRecord,
EXPRESSION_TABLE_NAME,
TARGET_TISSUES,
)
__all__ = [
"fetch_hpa_expression",
"fetch_gtex_expression",
"fetch_cellxgene_expression",
"calculate_tau_specificity",
"compute_expression_score",
"process_expression_evidence",
"load_to_duckdb",
"query_tissue_enriched",
"ExpressionRecord",
"EXPRESSION_TABLE_NAME",
"TARGET_TISSUES",
]