Paper A v3.8: resolve Gemini 3.1 Pro round-6 independent-review findings

Gemini round-6 (paper/gemini_review_v3_7.md) gave Minor Revision but flagged three issues that five rounds of codex review had missed. This commit addresses all three. BLOCKER: Accountant-level BD/McCrary null is a power artifact, not proof of smoothness (Gemini Issue 1) - At N=686 accountants the BD/McCrary test has limited statistical power; interpreting a failure-to-reject as affirmative proof of smoothness is a Type II error risk. - Discussion V-B: "itself diagnostic of smoothness" replaced with "failure-to-reject rather than a failure of the method --- informative alongside the other evidence but subject to the power caveat in Section V-G". - Discussion V-G (Sixth limitation): added a power-aware paragraph naming N=686 explicitly and clarifying that the substantive claim of smoothly-mixed clustering rests on the JOINT weight of dip test + BIC-selected GMM + BD null, not on BD alone. - Results IV-D.1 and IV-E: reframe accountant-level null as "consistent with --- not affirmative proof of" clustered-but- smoothly-mixed, citing V-G for the power caveat. - Appendix A interpretation paragraph: explicit inferential-asymmetry sentence ("consistency is what the BD null delivers, not affirmative proof"); "itself evidence for" removed. - Conclusion: "consistent with clustered but smoothly mixed" rephrased with explicit power caveat ("at N = 686 the test has limited power and cannot affirmatively establish smoothness"). MAJOR: Table X FRR / EER was tautological reviewer-bait (Gemini Issue 2) - Byte-identical positive anchor has cosine approx 1 by construction, so FRR against that subset is trivially 0 at every threshold below 1 and any EER calculation is arithmetic tautology, not biometric performance. - Results IV-G.1: removed EER row; dropped FRR column from Table X; added a table note explaining the omission and directing readers to Section V-F for the conservative-subset discussion. - Methodology III-K: removed the EER / FRR-against-byte-identical reporting clause; clarified that FAR against inter-CPA negatives is the primary reported quantity. - Table X is now FAR + Wilson 95% CI only, which is the quantity that actually carries empirical content on this anchor design. MINOR: Document-level worst-case aggregation narrative (Gemini Issue 3) + 15-signature delta (Gemini spot-check) - Results IV-I: added two sentences explicitly noting that the document-level percentages reflect the Section III-L worst-case aggregation rule (a report with one stamped + one hand-signed signature inherits the most-replication-consistent label), and cross-referencing Section IV-H.3 / Table XVI for the mixed-report composition that qualifies the headline percentages. - Results IV-D: added a one-sentence footnote explaining that the 15-signature delta between the Table III CPA-matched count (168,755) and the all-pairs analyzed count (168,740) is due to CPAs with exactly one signature, for whom no same-CPA pairwise best-match statistic exists. Abstract remains 243 words, comfortably under the IEEE Access 250-word cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:47:48 +08:00
parent 552b6b80d4
commit fcce58aff0
6 changed files with 157 additions and 25 deletions
@@ -244,7 +244,8 @@ The heldout fold is used exclusively to report post-hoc capture rates with Wilso
 4. **Low-similarity same-CPA anchor (supplementary negative):** signatures whose maximum same-CPA cosine similarity is below 0.70.
 This anchor is retained for continuity with prior work but is small in our dataset ($n = 35$) and is reported only as a supplementary reference; its confidence intervals are too wide for quantitative inference.

-From these anchors we report FAR with Wilson 95% confidence intervals (against the inter-CPA negative anchor) and FRR (against the byte-identical positive anchor), together with the Equal Error Rate (EER) interpolated at the threshold where FAR $=$ FRR, following biometric-verification reporting conventions [3].
+From these anchors we report FAR with Wilson 95% confidence intervals against the inter-CPA negative anchor.
+We do not report an Equal Error Rate or FRR column against the byte-identical positive anchor, because byte-identical pairs have cosine $\approx 1$ by construction and any FRR computed against that subset is trivially $0$ at every threshold below $1$; the conservative-subset role of the byte-identical anchor is instead discussed qualitatively in Section V-F.
 Precision and $F_1$ are not meaningful in this anchor-based evaluation because the positive and negative anchors are constructed from different sampling units (intra-CPA byte-identical pairs vs random inter-CPA pairs), so their relative prevalence in the combined set is an arbitrary construction rather than a population parameter; we therefore omit precision and $F_1$ from Table X.
 The 70/30 held-out Firm A fold of Section IV-G.2 additionally reports capture rates with Wilson 95% confidence intervals computed within the held-out fold, which is a valid population for rate inference.
 We additionally draw a small stratified sample (30 signatures across high-confidence replication, borderline, style-only, pixel-identical, and likely-genuine strata) for manual visual sanity inspection; this sample is used only for spot-check and does not contribute to reported metrics.