Apply codex round-7 Phase 5 copy-edit fixes + refresh STATE.md

Mechanical copy-edit closing the OPEN/PARTIAL items from paper/codex_review_gpt55_v4_round7.md; substantive empirical content unchanged. Manuscript-splice items (strip internal draft notes, update stale abstract-count note) deferred to final splice. - Phase 4 prose §V-G + §III-K methodology: "candidate classifiers" -> "candidate checks" (closes round-7 m13 + Spot-check 3 wording leak) - Phase 4 prose §II: remove placeholder caveat sentence at the LOOO paragraph (closes round-7 M6 + A4) - References v3: add [42] Stone 1974, [43] Geisser 1975, [44] Vehtari et al. 2017 (44 entries; was 41) — backs the §II LOOO addition - Round-7 review: add row-count clarification note (11 Major / 15 Minor labelled rows vs. the prompt's 9/12 tally) - STATE.md: refresh from stale Phase-2 snapshot to current Phase 5 status — Phases 1-4 complete; codex rounds 1-7 closed at Minor Revision; pending Gemini + Opus rounds + round-2/3 convergence Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:21:59 +08:00
parent 980295d5bd
commit 9604b273c0
5 changed files with 50 additions and 27 deletions
@@ -16,6 +16,8 @@ The remaining issues are packaging / copy-edit blockers, not empirical blockers:

 ## Round-26 finding closure table

+Note: the round-26 file contains 11 labelled Major rows and 15 labelled Minor rows; this table covers every labelled row. The prompt's 9 Major / 12 Minor tally appears to merge duplicate themes.
+
 ### Major findings

 | # | Round-26 finding | v2 status | Round-27 note |
@@ -84,4 +84,10 @@

 [41] H. White, "Maximum likelihood estimation of misspecified models," *Econometrica*, vol. 50, no. 1, pp. 1–25, 1982.

-<!-- Total: 41 references (v2: 36 + 5 new statistical methods refs) -->
+[42] M. Stone, "Cross-validatory choice and assessment of statistical predictions," *J. R. Statist. Soc. B*, vol. 36, no. 2, pp. 111–147, 1974.
+
+[43] S. Geisser, "The predictive sample reuse method with applications," *J. Amer. Statist. Assoc.*, vol. 70, no. 350, pp. 320–328, 1975.
+
+[44] A. Vehtari, A. Gelman, and J. Gabry, "Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC," *Stat. Comput.*, vol. 27, no. 5, pp. 1413–1432, 2017.
+
+<!-- Total: 44 references (v3: 41 + 3 new LOOO refs for v4 §II addition) -->
@@ -146,9 +146,9 @@ The Script 39 report verdict is `SIG_CONVERGENCE_MODERATE`. The $\kappa = 0.870$

 **4. Positive-anchor miss rate on byte-identical signatures (Script 40).** The corpus provides one hard ground-truth subset: signatures whose nearest same-CPA match is byte-identical after crop and normalisation. Independent hand-signing cannot produce pixel-identical images, so byte-identical signatures are conservative-subset ground truth for the *replicated* class. The Big-4 byte-identical subset comprises $n = 262$ signatures ($145 / 8 / 107 / 2$ across Firms A through D; Script 40).

-We report each candidate classifier's *positive-anchor miss rate* — the fraction of byte-identical signatures classified as belonging to the less-replication-dominated descriptor positions. This is a one-sided check against a conservative positive subset, **not a paired specificity metric in the usual two-class sense**; we do not report a paired negative-anchor metric here because no signature-level hand-signed ground truth exists. The corresponding signature-level inter-CPA negative-anchor ICCR evidence is developed in §III-L.1 (Big-4 sample) and the v3.x §IV-I corpus-wide version (reported under prior "FAR" terminology):
+We report each candidate check's *positive-anchor miss rate* — the fraction of byte-identical signatures classified as belonging to the less-replication-dominated descriptor positions. This is a one-sided check against a conservative positive subset, **not a paired specificity metric in the usual two-class sense**; we do not report a paired negative-anchor metric here because no signature-level hand-signed ground truth exists. The corresponding signature-level inter-CPA negative-anchor ICCR evidence is developed in §III-L.1 (Big-4 sample) and the v3.x §IV-I corpus-wide version (reported under prior "FAR" terminology):

-| Candidate classifier | Pixel-identity miss rate (Wilson 95% CI) |
+| Candidate check | Pixel-identity miss rate (Wilson 95% CI) |
 |---|---|
 | Inherited Paper A binary high-confidence box rule (cos $> 0.95$ AND dHash $\leq 5$) | $0\%$ $[0\%, 1.45\%]$ |
 | K=3 per-CPA hard label (C3 high-cos / low-dHash corner; descriptive only) | $0\%$ $[0\%, 1.45\%]$ |
@@ -64,7 +64,7 @@ The remainder of the paper is organized as follows. Section II reviews related w

 > *Note for the Phase 4 review pass: §II is inherited substantively unchanged from v3.20.0 §II in the master manuscript, with one new paragraph added below. The unchanged content is not reproduced in this Phase 4 file; readers reviewing this draft should consult `paper/paper_a_related_work_v3.md` for the v3.20.0 §II text covering offline signature verification, near-duplicate detection, copy-move forgery detection, perceptual hashing, deep-feature similarity, and the statistical methods adopted (Hartigan dip test, finite mixture EM, Burgstahler-Dichev / McCrary density-smoothness diagnostic). The paragraph below is the only v4.0-specific §II addition.*

-**Addition for v4.0: leave-one-firm-out cross-validation in a small-cluster scope.** Cross-validation methodology in the leave-one-out tradition has been developed extensively in statistics since Stone [42] and Geisser [43], and modern surveys including Vehtari et al. [44] discuss its application to mixture models. In document-forensics calibration the technique has been used selectively, typically with the individual document or signature as the hold-out unit. Our application in §III-K differs in two respects from the standard usage: (i) the hold-out unit is the *firm* (not the individual CPA or signature), so the analysis directly probes cross-firm reproducibility of the fitted mixture rather than within-firm sampling variance; and (ii) the held-out predictions are interpreted as a *composition-sensitivity band* on the candidate mixture boundary, not as a sufficiency claim for the inherited five-way operational classifier (which is calibrated separately; §III-L). We treat LOOO drift as descriptive information about how the mixture characterisation moves when training composition changes, not as a pass/fail test for the operational classifier. Numerical references [42]–[44] are placeholders in this draft and will be replaced with the project's preferred references at copy-edit time.
+**Addition for v4.0: leave-one-firm-out cross-validation in a small-cluster scope.** Cross-validation methodology in the leave-one-out tradition has been developed extensively in statistics since Stone [42] and Geisser [43], and modern surveys including Vehtari et al. [44] discuss its application to mixture models. In document-forensics calibration the technique has been used selectively, typically with the individual document or signature as the hold-out unit. Our application in §III-K differs in two respects from the standard usage: (i) the hold-out unit is the *firm* (not the individual CPA or signature), so the analysis directly probes cross-firm reproducibility of the fitted mixture rather than within-firm sampling variance; and (ii) the held-out predictions are interpreted as a *composition-sensitivity band* on the candidate mixture boundary, not as a sufficiency claim for the inherited five-way operational classifier (which is calibrated separately; §III-L). We treat LOOO drift as descriptive information about how the mixture characterisation moves when training composition changes, not as a pass/fail test for the operational classifier.

 ---

@@ -104,7 +104,7 @@ Two additional findings refine the calibration story. First, the per-pair condit

 ## G. Pixel-Identity as a Hard Positive Anchor; Inherited Inter-CPA Negative Anchor Reframed as Coincidence Rate

-The only hard ground-truth subset in the corpus is pixel-identical signatures: those whose nearest same-CPA match is byte-identical after crop and normalisation. Independent hand-signing cannot produce byte-identical images, so these signatures are conservative-subset ground truth for the *replicated* class. On the Big-4 subset ($n = 262$ pixel-identical signatures), all three candidate classifiers — the inherited box rule, the K=3 hard label, and the reverse-anchor metric with a prevalence-calibrated cut — achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). We caution that this result is necessary but not sufficient: for the box rule it is close to tautological, because byte-identical neighbours have cosine $\approx 1$ and dHash $\approx 0$, well inside the rule's high-confidence region. The corresponding signature-level *negative* anchor evidence is developed in §III-L.1 above (v4 spike: cos$>0.95$ per-comparison ICCR $= 0.00060$, replicating v3.20.0's reported $0.0005$ under prior "FAR" terminology). We frame the per-comparison rate as a specificity proxy under the assumption that inter-CPA pairs constitute a clean negative anchor, and we document in §III-L.4 that this assumption is partially violated by within-firm cross-CPA template-like collision structures.
+The only hard ground-truth subset in the corpus is pixel-identical signatures: those whose nearest same-CPA match is byte-identical after crop and normalisation. Independent hand-signing cannot produce byte-identical images, so these signatures are conservative-subset ground truth for the *replicated* class. On the Big-4 subset ($n = 262$ pixel-identical signatures), all three candidate checks — the inherited box rule, the K=3 hard label, and the reverse-anchor metric with a prevalence-calibrated cut — achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). We caution that this result is necessary but not sufficient: for the box rule it is close to tautological, because byte-identical neighbours have cosine $\approx 1$ and dHash $\approx 0$, well inside the rule's high-confidence region. The corresponding signature-level *negative* anchor evidence is developed in §III-L.1 above (v4 spike: cos$>0.95$ per-comparison ICCR $= 0.00060$, replicating v3.20.0's reported $0.0005$ under prior "FAR" terminology). We frame the per-comparison rate as a specificity proxy under the assumption that inter-CPA pairs constitute a clean negative anchor, and we document in §III-L.4 that this assumption is partially violated by within-firm cross-CPA template-like collision structures.

 ## G. Limitations