Paper A v3.12: resolve Gemini 3.1 Pro round-11 full-paper review findings

Round-11 Gemini 3.1 Pro fresh full-paper review (Minor Revision)
surfaced four issues that the prior 10 rounds (codex gpt-5.4 x4, codex
gpt-5.5 x1, Gemini 3.1 Pro x2, Opus 4.7 x1, paragraph-level v3.11
review) all missed:

1. MAJOR - Percentile-terminology contradiction between Section III-L
   L290 and Section III-H L160. III-L called 0.95 the "whole-sample
   Firm A P95" of the per-signature best-match cosine distribution,
   but III-H states 92.5% of Firm A signatures exceed 0.95. Under
   standard bottom-up percentile convention this makes 0.95 the P7.5,
   not the P95; Table XI calibration-fold data (Firm A cosine
   median = 0.9862, P5 = 0.9407) confirms true P95 is near 0.998.
   Fix: rewrote III-L L290 to state 0.95 corresponds to approximately
   the whole-sample Firm A P7.5 with the 92.5%/7.5% complement stated
   explicitly. dHash P95 claims elsewhere (Table XI, L229/L233) were
   already correct under standard convention and are unchanged.

2. MINOR - Firm A CPA count inconsistency. Discussion V-C L44 said
   "Nine additional Firm A CPAs are excluded from the GMM for having
   fewer than 10 signatures" but Results IV-G.2 L216 defines 178
   valid Firm A CPAs (180 registry minus 2 disambiguation-excluded);
   178 - 171 = 7. Fix: corrected to "seven are outside the GMM" with
   explicit 178-baseline and cross-reference to IV-G.2.

3. MINOR - Table XVI mixed-firm handling broken promise. Results
   L355-356 previously said "mixed-firm reports are reported
   separately" but Table XVI only lists single-firm rows summing to
   exactly 83,970, and no subsequent prose reports the 384 mixed-firm
   agreement rate. Fix: rewrote L355-356 to state Table XVI covers
   the 83,970 single-firm reports only and that the 384 mixed-firm
   reports (0.46%) are excluded because firm-level agreement is not
   well defined when the two signers are at different firms.

4. MINOR - Contribution-count structural inconsistency. Introduction
   enumerates seven contributions, Conclusion opens with "Our
   contributions are fourfold." Fix: rewrote the Conclusion lead to
   "The seven numbered contributions listed in Section I can be
   grouped into four broader methodological themes," making the
   grouping explicit.

No re-computation. All tables (IV-XVIII) and Appendix A numbers
unchanged. Abstract unchanged (still 248/250 words).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-24 20:10:20 +08:00
parent d2f8673a67
commit 9b0b8358a2
4 changed files with 5 additions and 5 deletions
+1 -1
View File
@@ -287,7 +287,7 @@ High feature-level similarity without structural corroboration---consistent with
5. **Likely hand-signed:** Cosine below the all-pairs KDE crossover threshold.
We note three conventions about the thresholds.
First, the cosine cutoff $0.95$ is the whole-sample Firm A P95 of the per-signature best-match cosine distribution (chosen for its transparent percentile interpretation in the whole-sample reference distribution), and the cosine crossover $0.837$ is the all-pairs intra/inter KDE crossover; both are derived from whole-sample distributions rather than from the 70% calibration fold, so the classifier inherits its operational cosine cuts from the whole-sample Firm A and all-pairs distributions.
First, the cosine cutoff $0.95$ corresponds to approximately the whole-sample Firm A P7.5 of the per-signature best-match cosine distribution---that is, 92.5% of whole-sample Firm A signatures exceed this cutoff and 7.5% fall at or below it (Section III-H)---chosen as a round-number lower-tail boundary whose complement (92.5% above) has a transparent interpretation in the whole-sample reference distribution; the cosine crossover $0.837$ is the all-pairs intra/inter KDE crossover; both are derived from whole-sample distributions rather than from the 70% calibration fold, so the classifier inherits its operational cosine cuts from the whole-sample Firm A and all-pairs distributions.
Section IV-G.2 reports both calibration-fold and held-out-fold capture rates for this classifier so that fold-level sampling variance is visible.
Second, the dHash cutoffs $\leq 5$ and $> 15$ are chosen from the whole-sample Firm A $\text{dHash}_\text{indep}$ distribution: $\leq 5$ captures the upper tail of the high-similarity mode (whole-sample Firm A median $\text{dHash}_\text{indep} = 2$, P75 $\approx 4$, so $\leq 5$ is the band immediately above median), while $> 15$ marks the regime in which independent-minimum structural similarity is no longer indicative of image reproduction.
Third, the three accountant-level 1D estimators (KDE antimode $0.973$, Beta-2 crossing $0.979$, logit-GMM-2 crossing $0.976$) and the accountant-level 2D GMM marginal ($0.945$) are *not* the operational thresholds of this classifier: they are the *convergent external reference* that supports the choice of signature-level operational cut.