feat(03-06): implement literature evidence models, PubMed fetch, and scoring
- Create LiteratureRecord pydantic model with context-specific counts - Implement PubMed query via Biopython Entrez with rate limiting (3/sec default, 10/sec with API key) - Define SEARCH_CONTEXTS for cilia, sensory, cytoskeleton, cell_polarity queries - Implement evidence tier classification: direct_experimental > functional_mention > hts_hit > incidental > none - Implement quality-weighted scoring with bias mitigation via log2(total_pubmed_count) normalization - Add biopython>=1.84 dependency to pyproject.toml - Support checkpoint-restart for long-running PubMed queries (estimated 3-11 hours for 20K genes)
This commit is contained in:
48
src/usher_pipeline/evidence/expression/__init__.py
Normal file
48
src/usher_pipeline/evidence/expression/__init__.py
Normal file
@@ -0,0 +1,48 @@
|
||||
"""Tissue expression evidence layer for Usher-relevant tissues.
|
||||
|
||||
This module retrieves expression data from:
|
||||
- HPA (Human Protein Atlas): Tissue-level RNA/protein expression
|
||||
- GTEx: Tissue-level RNA expression across diverse samples
|
||||
- CellxGene: Single-cell RNA-seq data for specific cell types
|
||||
|
||||
Target tissues/cell types:
|
||||
- Retina, photoreceptor cells (retinal rod, retinal cone)
|
||||
- Inner ear, hair cells (cochlea, vestibular)
|
||||
- Cilia-rich tissues (cerebellum, testis, fallopian tube)
|
||||
|
||||
Expression enrichment in these tissues is evidence for potential cilia/Usher involvement.
|
||||
"""
|
||||
|
||||
from usher_pipeline.evidence.expression.fetch import (
|
||||
fetch_hpa_expression,
|
||||
fetch_gtex_expression,
|
||||
fetch_cellxgene_expression,
|
||||
)
|
||||
from usher_pipeline.evidence.expression.transform import (
|
||||
calculate_tau_specificity,
|
||||
compute_expression_score,
|
||||
process_expression_evidence,
|
||||
)
|
||||
from usher_pipeline.evidence.expression.load import (
|
||||
load_to_duckdb,
|
||||
query_tissue_enriched,
|
||||
)
|
||||
from usher_pipeline.evidence.expression.models import (
|
||||
ExpressionRecord,
|
||||
EXPRESSION_TABLE_NAME,
|
||||
TARGET_TISSUES,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"fetch_hpa_expression",
|
||||
"fetch_gtex_expression",
|
||||
"fetch_cellxgene_expression",
|
||||
"calculate_tau_specificity",
|
||||
"compute_expression_score",
|
||||
"process_expression_evidence",
|
||||
"load_to_duckdb",
|
||||
"query_tissue_enriched",
|
||||
"ExpressionRecord",
|
||||
"EXPRESSION_TABLE_NAME",
|
||||
"TARGET_TISSUES",
|
||||
]
|
||||
Reference in New Issue
Block a user