Codex independent peer review (paper/codex_review_gpt55_v3_18_1.md) audited
empirical claims against scripts/JSON reports rather than rubber-stamping
prior Accept verdicts. Verdict: Minor Revision. This commit addresses every
flagged item.
- Soften mechanism-identification language (Results IV-D.1, Discussion B):
per-signature cosine "fails to reject unimodality" rather than "reflects a
single dominant generative mechanism"; framing tied to joint evidence.
- Replace overabsolute "single stored image" with multi-template phrasing
in Introduction and Methodology III-A.
- Reframe Methodology III-H so practitioner knowledge is non-load-bearing;
evidentiary basis is the paper's own image evidence.
- Fix stale section cross-references after the v3.18 retitling: IV-F.* ->
IV-G.* in 11 locations across methodology and results.
- Fix 0.941 / 0.945 / 0.9407 wording in Methodology III-K to use the
calibration-fold P5 = 0.9407 and the rounded sensitivity cut 0.945.
- Soften "sharp discontinuity" in Results IV-G.3 to "23-28 percentage-point
gap consistent with firm-wide non-hand-signing practice".
- Soften Conclusion's "directly generalizable" with explicit conditions on
analogous anchors and artifact-generation physics.
- Add Appendix B: table-to-script provenance map (15 manuscript tables
mapped to generating scripts and JSON report artifacts).
- New script signature_analysis/28_byte_identity_decomposition.py produces
reproducible artifacts for two previously-unverified claims:
(a) 145 / 50 / 180 / 35 Firm A byte-identity decomposition (verified);
(b) cross-firm dual-descriptor convergence -- corrected from the previous
manuscript text "non-Firm-A 11.3% vs Firm A 58.7% (5x)" to the
database-verified "non-Firm-A 42.12% vs Firm A 88.32% (~2.1x)".
- Clarify scripts 19 / 21 docstrings: legacy EER / FRR / Precision / F1
helpers are retained for diagnostic use only and are NOT cited as
biometric performance in the paper. Remove "interview evidence" wording.
- Rebuild Paper_A_IEEE_Access_Draft_v3.docx.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Interview evidence from multiple Firm A accountants confirms that MOST
use replication (stamping / firm-level e-signing) but a MINORITY may
still hand-sign. Firm A is therefore a "replication-dominated" population,
not a "pure" one. This framing is consistent with:
- 92.5% of Firm A signatures exceed cosine 0.95 (majority replication)
- The long left tail (~7%) captures the minority hand-signers, not scan
noise or preprocessing artifacts
- Hartigan dip test: Firm A cosine unimodal long-tail (p=0.17)
- Accountant-level GMM: of 180 Firm A accountants, 139 cluster in C1
(high-replication) and 32 in C2 (middle band = minority hand-signers)
Updates docstrings and report text in Scripts 15, 16, 18, 19 to match.
Partner v3's "near-universal non-hand-signing" language corrected.
Script 19 regenerated with the updated text.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements Partner v3's statistical rigor requirements at the level of
signature vs. accountant analysis units:
- Script 15 (Hartigan dip test): formal unimodality test via `diptest`.
Result: Firm A cosine UNIMODAL (p=0.17, pure non-hand-signed population);
full-sample cosine MULTIMODAL (p<0.001, mix of two regimes);
accountant-level aggregates MULTIMODAL on both cos and dHash.
- Script 16 (Burgstahler-Dichev / McCrary): discretised Z-score transition
detection. Firm A and full-sample cosine transitions at 0.985; dHash
at 2.0.
- Script 17 (Beta mixture EM + logit-GMM): 2/3-component Beta via EM
with MoM M-step, plus parallel Gaussian mixture on logit transform
as White (1982) robustness check. Beta-3 BIC < Beta-2 BIC at signature
level confirms 2-component is a forced fit -- supporting the pivot
to accountant-level mixture.
- Script 18 (Accountant-level GMM): rebuilds the 2026-04-16 analysis
that was done inline and not saved. BIC-best K=3 with components
matching prior memory almost exactly: C1 (cos=0.983, dh=2.41, 20%,
Deloitte 139/141), C2 (0.954, 6.99, 51%, KPMG/PwC/EY), C3 (0.928,
11.17, 28%, small firms). 2-component natural thresholds:
cos=0.9450, dh=8.10.
- Script 19 (Pixel-identity validation): no human annotation needed.
Uses pixel_identical_to_closest (310 sigs) as gold positive and
Firm A as anchor positive. Confirms Firm A cosine>0.95 = 92.51%
(matches prior 2026-04-08 finding of 92.5%), dual rule
cos>0.95 AND dhash_indep<=8 captures 89.95% of Firm A.
Python deps added: diptest, scikit-learn (installed into venv).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>