- Create LiteratureRecord pydantic model with context-specific counts - Implement PubMed query via Biopython Entrez with rate limiting (3/sec default, 10/sec with API key) - Define SEARCH_CONTEXTS for cilia, sensory, cytoskeleton, cell_polarity queries - Implement evidence tier classification: direct_experimental > functional_mention > hts_hit > incidental > none - Implement quality-weighted scoring with bias mitigation via log2(total_pubmed_count) normalization - Add biopython>=1.84 dependency to pyproject.toml - Support checkpoint-restart for long-running PubMed queries (estimated 3-11 hours for 20K genes)
49 lines
1.4 KiB
Python
49 lines
1.4 KiB
Python
"""Tissue expression evidence layer for Usher-relevant tissues.
|
|
|
|
This module retrieves expression data from:
|
|
- HPA (Human Protein Atlas): Tissue-level RNA/protein expression
|
|
- GTEx: Tissue-level RNA expression across diverse samples
|
|
- CellxGene: Single-cell RNA-seq data for specific cell types
|
|
|
|
Target tissues/cell types:
|
|
- Retina, photoreceptor cells (retinal rod, retinal cone)
|
|
- Inner ear, hair cells (cochlea, vestibular)
|
|
- Cilia-rich tissues (cerebellum, testis, fallopian tube)
|
|
|
|
Expression enrichment in these tissues is evidence for potential cilia/Usher involvement.
|
|
"""
|
|
|
|
from usher_pipeline.evidence.expression.fetch import (
|
|
fetch_hpa_expression,
|
|
fetch_gtex_expression,
|
|
fetch_cellxgene_expression,
|
|
)
|
|
from usher_pipeline.evidence.expression.transform import (
|
|
calculate_tau_specificity,
|
|
compute_expression_score,
|
|
process_expression_evidence,
|
|
)
|
|
from usher_pipeline.evidence.expression.load import (
|
|
load_to_duckdb,
|
|
query_tissue_enriched,
|
|
)
|
|
from usher_pipeline.evidence.expression.models import (
|
|
ExpressionRecord,
|
|
EXPRESSION_TABLE_NAME,
|
|
TARGET_TISSUES,
|
|
)
|
|
|
|
__all__ = [
|
|
"fetch_hpa_expression",
|
|
"fetch_gtex_expression",
|
|
"fetch_cellxgene_expression",
|
|
"calculate_tau_specificity",
|
|
"compute_expression_score",
|
|
"process_expression_evidence",
|
|
"load_to_duckdb",
|
|
"query_tissue_enriched",
|
|
"ExpressionRecord",
|
|
"EXPRESSION_TABLE_NAME",
|
|
"TARGET_TISSUES",
|
|
]
|