From 1dfbc5f00032b81cb95fad21f0d665bcdc1e069d Mon Sep 17 00:00:00 2001 From: gbanyan Date: Sat, 25 Apr 2026 01:01:58 +0800 Subject: [PATCH] Paper A v3.15: resolve Gemini 3.1 Pro round-15 Accept-verdict minor polish Gemini 3.1 Pro round-15 full-paper review of v3.14 returned Accept with four MINOR polish suggestions. All four applied in this commit. 1. Table XIII column header: "mean cosine" renamed to "mean best-match cosine" to match the underlying metric (per- signature best-match over the full same-CPA pool) and prevent readers from inferring a simpler per-year statistic. 2. Methodology III-L (L284): added a forward-pointer in the first threshold-convention note to Section IV-G.3, explicitly confirming that replacing the 0.95 round-number heuristic with the nearby accountant-level 2D-GMM marginal crossing 0.945 alters aggregate firm-level capture rates by at most ~1.2 percentage points. This pre-empts a reader who might worry about the methodological tension between the heuristic and the mixture-derived convergence band. 3. Results IV-I document-level aggregation (L383): "Document-level rates therefore bound the share..." rewritten as "represent the share..." Gemini correctly noted that worst-case aggregation directly assigns (subject to classifier error), so "bound" spuriously implies an inequality not actually present. 4. Results IV-G.4 Sanity Sample (L273): "inter-rater agreement with the classifier" rewritten as "full human--classifier agreement (30/30)". Inter-rater conventionally refers to human-vs-human agreement; human-vs-classifier is the correct term here. No substantive changes; no tables recomputed. Gemini round-15 verdict was Accept with these four items framed as nice-to-have rather than blockers; applying them brings v3.15 to a fully polished state before manual DOCX packaging. Co-Authored-By: Claude Opus 4.7 (1M context) --- paper/paper_a_methodology_v3.md | 1 + paper/paper_a_results_v3.md | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/paper/paper_a_methodology_v3.md b/paper/paper_a_methodology_v3.md index e4938b8..7c0ac02 100644 --- a/paper/paper_a_methodology_v3.md +++ b/paper/paper_a_methodology_v3.md @@ -282,6 +282,7 @@ High feature-level similarity without structural corroboration---consistent with We note three conventions about the thresholds. First, the cosine cutoff $0.95$ corresponds to approximately the whole-sample Firm A P7.5 of the per-signature best-match cosine distribution---that is, 92.5% of whole-sample Firm A signatures exceed this cutoff and 7.5% fall at or below it (Section III-H)---chosen as a round-number lower-tail boundary whose complement (92.5% above) has a transparent interpretation in the whole-sample reference distribution; the cosine crossover $0.837$ is the all-pairs intra/inter KDE crossover; both are derived from whole-sample distributions rather than from the 70% calibration fold, so the classifier inherits its operational cosine cuts from the whole-sample Firm A and all-pairs distributions. +Section IV-G.3 reports a sensitivity check confirming that replacing $0.95$ with the nearby accountant-level 2D-GMM marginal crossing $0.945$ alters aggregate firm-level capture rates by at most $\approx 1.2$ percentage points, so the round-number heuristic is robust to mixture-derived alternatives within the accountant-level convergence band. Section IV-G.2 reports both calibration-fold and held-out-fold capture rates for this classifier so that fold-level sampling variance is visible. Second, the dHash cutoffs $\leq 5$ and $> 15$ are chosen from the whole-sample Firm A $\text{dHash}_\text{indep}$ distribution: $\leq 5$ captures the upper tail of the high-similarity mode (whole-sample Firm A median $\text{dHash}_\text{indep} = 2$, P75 $\approx 4$, so $\leq 5$ is the band immediately above median), while $> 15$ marks the regime in which independent-minimum structural similarity is no longer indicative of image reproduction. Third, the three accountant-level 1D estimators (KDE antimode $0.973$, Beta-2 crossing $0.979$, logit-GMM-2 crossing $0.976$) and the accountant-level 2D GMM marginal ($0.945$) are *not* the operational thresholds of this classifier: they are the *convergent external reference* that supports the choice of signature-level operational cut. diff --git a/paper/paper_a_results_v3.md b/paper/paper_a_results_v3.md index 178cff9..e555a2c 100644 --- a/paper/paper_a_results_v3.md +++ b/paper/paper_a_results_v3.md @@ -270,7 +270,7 @@ The paper therefore retains cos $> 0.95$ as the primary operational cut for tran ### 4) Sanity Sample -A 30-signature stratified visual sanity sample (six signatures each from pixel-identical, high-cos/low-dh, borderline, style-only, and likely-genuine strata) produced inter-rater agreement with the classifier in all 30 cases; this sample contributed only to spot-check and is not used to compute reported metrics. +A 30-signature stratified visual sanity sample (six signatures each from pixel-identical, high-cos/low-dh, borderline, style-only, and likely-genuine strata) yielded full human--classifier agreement (30/30); this sample contributed only to spot-check and is not used to compute reported metrics. ## H. Additional Firm A Benchmark Validation @@ -288,7 +288,7 @@ Consistent with the scope-of-claims framing in Section III-G, we report the rate Under the alternative hypothesis that the left tail is an artifact of scan or compression noise, the share should shrink as scanning and PDF-compression technology improved over 2013-2023.