Phase 6 manuscript splice (1/2): Abstract / §I / §II / §III spliced

Splices v4 drafts into v3.20.0 master sub-files. Drops the "paper/v4/" working drafts and lands the v4.0 content in the master file structure. Internal draft notes / close-out checklists / open- questions blocks stripped at splice (per round-1 through round-6 deferral). Abstract (paper_a_abstract_v3.md): - Replaced v3.20.0 abstract (240w) with v4.0 abstract (247w). §I Introduction (paper_a_introduction_v3.md): - Replaced v3.20.0 §I with v4.0 §I (16 paragraphs + 8-item contributions list). §II Related Work (paper_a_related_work_v3.md): - Inserted v4.0 LOOO addition paragraph after the existing finite-mixture paragraph; added refs [42]-[44] to the internal reference annotation list. §III Methodology (paper_a_methodology_v3.md): - §III-A..F (Pipeline / Data / Page ID / Detection / Features / Dual Descriptors): kept v3.20.0 content unchanged. - §III-G..M: replaced v3.20.0 §III-G..K with v4.0 §III-G..M (Unit & Scope / Reference Populations / Distributional Diagnostics + composition decomposition / K=3 descriptive / Convergent internal-consistency / Anchor-based ICCR L.0-L.7 / Validation strategy + Table XXVII ten-tool collection). - §III-N Data Source & Anonymization: kept v3.20.0 §III-L content, renumbered to §III-N (after v4 §III-M). - §III-E ablation cross-reference: updated "§IV-I" -> "§IV-L" to match the renumbered §IV. - §III-F pixel-identity cross-reference: updated "§III-J" -> "§III-K". Gemini round-2 artifact paper/gemini_review_v4_round2.md also added (was uncommitted from the parallel-review batch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:35:53 +08:00
parent 8dddc3b87c
commit c79329457a
5 changed files with 357 additions and 195 deletions
@@ -72,6 +72,9 @@ For observations bounded on $[0,1]$---such as cosine similarity and normalized H
 Under mild regularity conditions, White's quasi-MLE result [41] supports interpreting maximum-likelihood estimates under a mis-specified parametric family as consistent estimators of the pseudo-true parameter that minimizes the Kullback-Leibler divergence to the data-generating distribution within that family; we use this result to justify the Beta-mixture fit as a principled approximation rather than as a guarantee that the true distribution is Beta.

 The present study combines all three families, using each to produce an independent threshold estimate and treating cross-method convergence---or principled divergence---as evidence of where in the analysis hierarchy the mixture structure is statistically supported.
+
+*Cross-validation in a small-cluster scope.*
+Cross-validation methodology in the leave-one-out tradition has been developed extensively in statistics since Stone [42] and Geisser [43], and modern surveys including Vehtari et al. [44] discuss its application to mixture models. In document-forensics calibration the technique has been used selectively, typically with the individual document or signature as the hold-out unit. Our application in §III-K differs in two respects from the standard usage: (i) the hold-out unit is the *firm* (not the individual CPA or signature), so the analysis directly probes cross-firm reproducibility of the fitted mixture rather than within-firm sampling variance; and (ii) the held-out predictions are interpreted as a *composition-sensitivity band* on the candidate mixture boundary, not as a sufficiency claim for the inherited five-way operational classifier (which is calibrated separately; §III-L). We treat LOOO drift as descriptive information about how the mixture characterisation moves when training composition changes, not as a pass/fail test for the operational classifier.
 <!--
 REFERENCES for Related Work (see paper_a_references_v3.md for full list):
 [3]  Bromley et al. 1993 — Siamese TDNN (NeurIPS)
@@ -101,4 +104,7 @@ REFERENCES for Related Work (see paper_a_references_v3.md for full list):
 [39] McCrary 2008 — density discontinuity test
 [40] Dempster, Laird & Rubin 1977 — EM algorithm
 [41] White 1982 — quasi-MLE consistency
+[42] Stone 1974 — cross-validatory choice
+[43] Geisser 1975 — predictive sample reuse
+[44] Vehtari, Gelman & Gabry 2017 — practical Bayesian LOO/WAIC
 -->