Major fixes per codex (gpt-5.4) review:
## Structural fixes
- Fixed three-method convergence overclaim: added Script 20 to run KDE
antimode, BD/McCrary, and Beta mixture EM on accountant-level means.
Accountant-level 1D convergence: KDE antimode=0.973, Beta-2=0.979,
LogGMM-2=0.976 (within ~0.006). BD/McCrary finds no transition at
accountant level (consistent with smooth clustering, not sharp
discontinuity).
- Disambiguated Method 1: KDE crossover (between two labeled distributions,
used at signature all-pairs level) vs KDE antimode (single-distribution
local minimum, used at accountant level).
- Addressed Firm A circular validation: Script 21 adds CPA-level 70/30
held-out fold. Calibration thresholds derived from 70% only; heldout
rates reported with Wilson 95% CIs (e.g. cos>0.95 heldout=93.61%
[93.21%-93.98%]).
- Fixed 139+32 vs 180: the split is 139/32 of 171 Firm A CPAs with >=10
signatures (9 CPAs excluded for insufficient sample). Reconciled across
intro, results, discussion, conclusion.
- Added document-level classification aggregation rule (worst-case signature
label determines document label).
## Pixel-identity validation strengthened
- Script 21: built ~50,000-pair inter-CPA random negative anchor (replaces
the original n=35 same-CPA low-similarity negative which had untenable
Wilson CIs).
- Added Wilson 95% CI for every FAR in Table X.
- Proper EER interpolation (FAR=FRR point) in Table X.
- Softened "conservative recall" claim to "non-generalizable subset"
language per codex feedback (byte-identical positives are a subset, not
a representative positive class).
- Added inter-CPA stats: mean=0.762, P95=0.884, P99=0.913.
## Terminology & sentence-level fixes
- "statistically independent methods" -> "methodologically distinct methods"
throughout (three diagnostics on the same sample are not independent).
- "formal bimodality check" -> "unimodality test" (dip test tests H0 of
unimodality; rejection is consistent with but not a direct test of
bimodality).
- "Firm A near-universally non-hand-signed" -> already corrected to
"replication-dominated" in prior commit; this commit strengthens that
framing with explicit held-out validation.
- "discrete-behavior regimes" -> "clustered accountant-level heterogeneity"
(BD/McCrary non-transition at accountant level rules out sharp discrete
boundaries; the defensible claim is clustered-but-smooth).
- Softened White 1982 quasi-MLE claim (no longer framed as a guarantee).
- Fixed VLM 1.2% FP overclaim (now acknowledges the 1.2% could be VLM FP
or YOLO FN).
- Unified "310 byte-identical signatures" language across Abstract,
Results, Discussion (previously alternated between pairs/signatures).
- Defined min_dhash_independent explicitly in Section III-G.
- Fixed table numbering (Table XI heldout added, classification moved to
XII, ablation to XIII).
- Explained 84,386 vs 85,042 gap (656 docs have only one signature, no
pairwise stat).
- Made Table IX explicitly a "consistency check" not "validation"; paired
it with Table XI held-out rates as the genuine external check.
- Defined 0.941 threshold (calibration-fold Firm A cosine P5).
- Computed 0.945 Firm A rate exactly (94.52%) instead of interpolated.
- Fixed Ref [24] Qwen2.5-VL to full IEEE format (arXiv:2502.13923).
## New artifacts
- Script 20: accountant-level three-method threshold analysis
- Script 21: expanded validation (inter-CPA anchor, held-out Firm A 70/30)
- paper/codex_review_gpt54_v3.md: preserved review feedback
Output: Paper_A_IEEE_Access_Draft_v3.docx (391 KB, rebuilt from v3.1
markdown sources).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- export_paper_to_docx.py: build script combining paper_a_*.md sections into docx
- Paper_A_IEEE_TAI_Draft_20260403.docx: intermediate draft before AI review rounds
- Paper_A_IEEE_TAI_Draft_v2.docx: current draft after 3 AI reviews (GPT-5.4, Opus 4.6, Gemini 3 Pro) and Firm A recalibration
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Paper draft includes all sections (Abstract through Conclusion), 36 references,
and supporting scripts. Key methodology: Cosine similarity + dHash dual-method
verification with thresholds calibrated against known-replication firm (Firm A).
Includes:
- 8 section markdown files (paper_a_*.md)
- Ablation study script (ResNet-50 vs VGG-16 vs EfficientNet-B0)
- Recalibrated classification script (84,386 PDFs, 5-tier system)
- Figure generation and Word export scripts
- Citation renumbering script ([1]-[36])
- Signature analysis pipeline (12 steps)
- YOLO extraction scripts
Three rounds of AI review completed (GPT-5.4, Claude Opus 4.6, Gemini 3 Pro).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>