Paper A v3.18.2: address codex GPT-5.5 round-16 Minor-Revision findings
Codex independent peer review (paper/codex_review_gpt55_v3_18_1.md) audited
empirical claims against scripts/JSON reports rather than rubber-stamping
prior Accept verdicts. Verdict: Minor Revision. This commit addresses every
flagged item.
- Soften mechanism-identification language (Results IV-D.1, Discussion B):
per-signature cosine "fails to reject unimodality" rather than "reflects a
single dominant generative mechanism"; framing tied to joint evidence.
- Replace overabsolute "single stored image" with multi-template phrasing
in Introduction and Methodology III-A.
- Reframe Methodology III-H so practitioner knowledge is non-load-bearing;
evidentiary basis is the paper's own image evidence.
- Fix stale section cross-references after the v3.18 retitling: IV-F.* ->
IV-G.* in 11 locations across methodology and results.
- Fix 0.941 / 0.945 / 0.9407 wording in Methodology III-K to use the
calibration-fold P5 = 0.9407 and the rounded sensitivity cut 0.945.
- Soften "sharp discontinuity" in Results IV-G.3 to "23-28 percentage-point
gap consistent with firm-wide non-hand-signing practice".
- Soften Conclusion's "directly generalizable" with explicit conditions on
analogous anchors and artifact-generation physics.
- Add Appendix B: table-to-script provenance map (15 manuscript tables
mapped to generating scripts and JSON report artifacts).
- New script signature_analysis/28_byte_identity_decomposition.py produces
reproducible artifacts for two previously-unverified claims:
(a) 145 / 50 / 180 / 35 Firm A byte-identity decomposition (verified);
(b) cross-firm dual-descriptor convergence -- corrected from the previous
manuscript text "non-Firm-A 11.3% vs Firm A 58.7% (5x)" to the
database-verified "non-Firm-A 42.12% vs Firm A 88.32% (~2.1x)".
- Clarify scripts 19 / 21 docstrings: legacy EER / FRR / Precision / F1
helpers are retained for diagnostic use only and are NOT cited as
biometric performance in the paper. Remove "interview evidence" wording.
- Rebuild Paper_A_IEEE_Access_Draft_v3.docx.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,26 +2,39 @@
|
||||
"""
|
||||
Script 21: Expanded Validation with Larger Negative Anchor + Held-out Firm A
|
||||
============================================================================
|
||||
Addresses codex review weaknesses of Script 19's pixel-identity validation:
|
||||
Addresses three weaknesses of Script 19's pixel-identity validation:
|
||||
|
||||
(a) Negative anchor of n=35 (cosine<0.70) is too small to give
|
||||
meaningful FAR confidence intervals.
|
||||
(b) Pixel-identical positive anchor is an easy subset, not
|
||||
representative of the broader positive class.
|
||||
(c) Firm A is both the calibration anchor and the validation anchor
|
||||
(circular).
|
||||
(b) Pixel-identical positive anchor is a CONSERVATIVE SUBSET of the
|
||||
true non-hand-signed class, not representative of the broader
|
||||
positive class. Recall against this subset is therefore a
|
||||
lower-bound calibration check, not a generalizable recall
|
||||
estimate.
|
||||
(c) Firm A is both the calibration anchor and a validation anchor
|
||||
(circular). The 70/30 fold split makes within-Firm-A sampling
|
||||
variance visible without claiming external validation.
|
||||
|
||||
This script:
|
||||
1. Constructs a large inter-CPA negative anchor (~50,000 pairs) by
|
||||
randomly sampling pairs from different CPAs. Inter-CPA high
|
||||
similarity is highly unlikely to arise from legitimate signing.
|
||||
2. Splits Firm A CPAs 70/30 into CALIBRATION and HELDOUT folds.
|
||||
Re-derives signature-level / accountant-level thresholds from the
|
||||
calibration fold only, then reports all metrics (including Firm A
|
||||
anchor rates) on the heldout fold.
|
||||
3. Computes proper EER (FAR = FRR interpolated) in addition to
|
||||
metrics at canonical thresholds.
|
||||
4. Computes 95% Wilson confidence intervals for each FAR/FRR.
|
||||
Re-derives signature-level thresholds from the calibration fold
|
||||
only, then reports capture rates on the heldout fold.
|
||||
3. Computes 95% Wilson confidence intervals for FAR at canonical
|
||||
thresholds (Table X in the manuscript).
|
||||
|
||||
Legacy / diagnostic-only metrics:
|
||||
Helper functions for EER, Precision, Recall, F1, and FRR remain in
|
||||
this script for backward compatibility. The manuscript intentionally
|
||||
OMITS these metrics from Table X because the byte-identical positive
|
||||
anchor has cosine ~= 1 by construction (so FRR / EER are arithmetic
|
||||
tautologies) and because positive and negative anchors are
|
||||
constructed from different sampling units, making prevalence
|
||||
arbitrary (so Precision and F1 have no meaningful population
|
||||
interpretation). Only FAR against the large inter-CPA negative
|
||||
anchor is reported as a biometric metric in the paper.
|
||||
|
||||
Output:
|
||||
reports/expanded_validation/expanded_validation_report.md
|
||||
|
||||
Reference in New Issue
Block a user