Paper A v3.15: resolve Gemini 3.1 Pro round-15 Accept-verdict minor polish

Gemini 3.1 Pro round-15 full-paper review of v3.14 returned Accept with four MINOR polish suggestions. All four applied in this commit. 1. Table XIII column header: "mean cosine" renamed to "mean best-match cosine" to match the underlying metric (per- signature best-match over the full same-CPA pool) and prevent readers from inferring a simpler per-year statistic. 2. Methodology III-L (L284): added a forward-pointer in the first threshold-convention note to Section IV-G.3, explicitly confirming that replacing the 0.95 round-number heuristic with the nearby accountant-level 2D-GMM marginal crossing 0.945 alters aggregate firm-level capture rates by at most ~1.2 percentage points. This pre-empts a reader who might worry about the methodological tension between the heuristic and the mixture-derived convergence band. 3. Results IV-I document-level aggregation (L383): "Document-level rates therefore bound the share..." rewritten as "represent the share..." Gemini correctly noted that worst-case aggregation directly assigns (subject to classifier error), so "bound" spuriously implies an inequality not actually present. 4. Results IV-G.4 Sanity Sample (L273): "inter-rater agreement with the classifier" rewritten as "full human--classifier agreement (30/30)". Inter-rater conventionally refers to human-vs-human agreement; human-vs-classifier is the correct term here. No substantive changes; no tables recomputed. Gemini round-15 verdict was Accept with these four items framed as nice-to-have rather than blockers; applying them brings v3.15 to a fully polished state before manual DOCX packaging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 01:01:58 +08:00
parent d3b63fc0b7
commit 1dfbc5f000
2 changed files with 4 additions and 3 deletions
@@ -270,7 +270,7 @@ The paper therefore retains cos $> 0.95$ as the primary operational cut for tran

 ### 4) Sanity Sample

-A 30-signature stratified visual sanity sample (six signatures each from pixel-identical, high-cos/low-dh, borderline, style-only, and likely-genuine strata) produced inter-rater agreement with the classifier in all 30 cases; this sample contributed only to spot-check and is not used to compute reported metrics.
+A 30-signature stratified visual sanity sample (six signatures each from pixel-identical, high-cos/low-dh, borderline, style-only, and likely-genuine strata) yielded full human--classifier agreement (30/30); this sample contributed only to spot-check and is not used to compute reported metrics.

 ## H. Additional Firm A Benchmark Validation

@@ -288,7 +288,7 @@ Consistent with the scope-of-claims framing in Section III-G, we report the rate
 Under the alternative hypothesis that the left tail is an artifact of scan or compression noise, the share should shrink as scanning and PDF-compression technology improved over 2013-2023.

 <!-- TABLE XIII: Firm A Per-Year Cosine Distribution
-| Year | N sigs | mean cosine | % below 0.95 |
+| Year | N sigs | mean best-match cosine | % below 0.95 |
 |------|--------|-------------|--------------|
 | 2013 | 2,167 | 0.9733 | 12.78% |
 | 2014 | 5,256 | 0.9781 | 8.69% |
@@ -380,7 +380,7 @@ We note that this test uses the calibrated classifier of Section III-L rather th
 Table XVII presents the final classification results under the dual-descriptor framework with Firm A-calibrated thresholds for 84,386 documents.
 The document count (84,386) differs from the 85,042 documents with any YOLO detection (Table III) because 656 documents carry only a single detected signature, for which no same-CPA pairwise comparison and therefore no best-match cosine / min dHash statistic is available; those documents are excluded from the classification reported here.
 We emphasize that the document-level proportions below reflect the *worst-case aggregation rule* of Section III-L: a report carrying one stamped signature and one hand-signed signature is labeled with the most-replication-consistent of the two signature-level verdicts.
-Document-level rates therefore bound the share of reports in which *at least one* signature is non-hand-signed rather than the share in which *both* are; the intra-report agreement analysis of Section IV-H.3 (Table XVI) reports how frequently the two co-signers share the same signature-level label within each firm, so that readers can judge what fraction of the non-hand-signed document-level share corresponds to fully non-hand-signed reports versus mixed reports.
+Document-level rates therefore represent the share of reports in which *at least one* signature is non-hand-signed rather than the share in which *both* are; the intra-report agreement analysis of Section IV-H.3 (Table XVI) reports how frequently the two co-signers share the same signature-level label within each firm, so that readers can judge what fraction of the non-hand-signed document-level share corresponds to fully non-hand-signed reports versus mixed reports.

 <!-- TABLE XVII: Document-Level Classification (Dual-Descriptor: Cosine + dHash)
 | Verdict | N (PDFs) | % | Firm A | Firm A % |