docs: update gitignore to track report results, add README cross-links

- Revised .gitignore to ignore raw data/cache but track data/report/
  (candidates TSV/Parquet, plots, reproducibility metadata)
- Added zh↔en cross-links between README.md and README.en.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-13 05:57:27 +08:00
parent 674a9ae845
commit dc36730cb4
11 changed files with 18283 additions and 10 deletions

Binary file not shown.

View File

@@ -0,0 +1,33 @@
generated_at: '2026-02-12T18:42:59.932245+00:00'
output_files:
- candidates.tsv
- candidates.parquet
statistics:
total_candidates: 18116
high_count: 0
medium_count: 2151
low_count: 15965
column_count: 22
column_names:
- gene_id
- gene_symbol
- gnomad_score
- expression_score
- annotation_score
- localization_score
- animal_model_score
- literature_score
- evidence_count
- available_weight
- weighted_sum
- composite_score
- quality_flag
- gnomad_contribution
- expression_contribution
- annotation_contribution
- localization_contribution
- animal_model_contribution
- literature_contribution
- confidence_tier
- supporting_layers
- evidence_gaps

18117
data/report/candidates.tsv Normal file

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

View File

@@ -0,0 +1,57 @@
{
"run_id": "5f00f9da-e548-4a58-b1b3-028d05c94d32",
"timestamp": "2026-02-12T18:43:00.223842+00:00",
"pipeline_version": "0.1.0",
"parameters": {
"gnomad": 0.2,
"expression": 0.2,
"annotation": 0.15,
"localization": 0.15,
"animal_model": 0.15,
"literature": 0.15
},
"data_versions": {
"ensembl_release": 113,
"gnomad_version": "v4.1",
"gtex_version": "v8",
"hpa_version": "23.0"
},
"software_environment": {
"python": "3.14.3",
"polars": "1.38.1",
"duckdb": "1.4.4"
},
"filtering_steps": [
{
"step_name": "load_scored_genes",
"input_count": 0,
"output_count": 0,
"criteria": ""
},
{
"step_name": "apply_tier_classification",
"input_count": 0,
"output_count": 0,
"criteria": ""
},
{
"step_name": "write_candidate_output",
"input_count": 0,
"output_count": 0,
"criteria": ""
},
{
"step_name": "generate_visualizations",
"input_count": 0,
"output_count": 0,
"criteria": ""
}
],
"validation_metrics": {},
"tier_statistics": {
"total": 18116,
"high": 0,
"medium": 2151,
"low": 15965
}
}

View File

@@ -0,0 +1,45 @@
# Pipeline Reproducibility Report
**Run ID:** `5f00f9da-e548-4a58-b1b3-028d05c94d32`
**Timestamp:** 2026-02-12T18:43:00.223842+00:00
**Pipeline Version:** 0.1.0
## Parameters
**Scoring Weights:**
- gnomAD: 0.20
- Expression: 0.20
- Annotation: 0.15
- Localization: 0.15
- Animal Model: 0.15
- Literature: 0.15
## Data Versions
- **ensembl_release:** 113
- **gnomad_version:** v4.1
- **gtex_version:** v8
- **hpa_version:** 23.0
## Software Environment
- **python:** 3.14.3
- **polars:** 1.38.1
- **duckdb:** 1.4.4
## Filtering Steps
| Step | Input Count | Output Count | Criteria |
|------|-------------|--------------|----------|
| load_scored_genes | 0 | 0 | |
| apply_tier_classification | 0 | 0 | |
| write_candidate_output | 0 | 0 | |
| generate_visualizations | 0 | 0 | |
## Tier Statistics
- **Total Candidates:** 18116
- **HIGH:** 0
- **MEDIUM:** 2151
- **LOW:** 15965