Phase 6 round-6: soften firm-heterogeneity framing, fix DOCX table render

Framing softening (per partner tone decision: own the limitation rather than defend the strong claim). Abstract: "Firm heterogeneity is decisive ... consistent with firm-level template-like reuse" -> "The framework surfaces pronounced firm-level heterogeneity ... consistent with firm-level template-like reuse but not independently diagnostic, since descriptor-only data cannot separate reuse from digitisation-pipeline or signing-style homogeneity within a firm; we report it as a scope limitation rather than a mechanism finding." S V-H Limitations: new bullet "Mechanism attribution for the firm-level heterogeneity is not identifiable from descriptor-only data." enumerates three non-mutually-exclusive firm-level mechanisms (template-like reuse / digitisation-pipeline homogeneity / signing-style homogeneity), notes the (cosine, dHash) descriptor pair is by construction indifferent to which mechanism generated a near-identical pair, and lists what additional data would be needed for attribution. S VI Conclusion items (3) and (4): "firm heterogeneity quantification" -> "firm-level heterogeneity surfaced by the framework ... reported as a framework-discriminative observation rather than a mechanism finding"; item (4) expanded from template/stamp/document-production reuse alone to the three-mechanism scope, with explicit "not independently establishing" and S V-H cross-reference. DOCX export fix (export_v3.py): add missing LaTeX-to-Unicode tokens (\checkmark, \lvert/\rvert, \lVert/\rVert, \in, \notin, \max, \min, \log, \ln, \exp, \bullet) that were silently dropping content from Table III rows 2-4 (integer-jitter robustness check marks empty) and Table XVIII drift column (|Delta| empty). Rebuild Paper_A_IEEE_Access_Draft_v3.docx via export_v3.py and install copy as Paper_A_IEEE_Access_Draft_v4.0_20260515.docx (replaces prior pandoc-built v4 DOCX which had empty cells in every table header with LaTeX math and inconsistent column widths). All 43 tables now have non-empty cells with sub/superscript runs. Mirrored in paper_a_v4_combined.md for consistency with the single-file combined source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 02:16:05 +08:00
parent 4efbb7f4b8
commit 3672c9343e
7 changed files with 19 additions and 5 deletions
@@ -44,6 +44,8 @@ Several limitations should be transparent. We group them into primary methodolog

 *Inter-CPA negative-anchor assumption is partially violated and the violation is firm-dependent.* The cross-firm hit matrix of §III-L.4 shows that under the deployed any-pair rule, within-firm collision concentration is $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms), consistent with firm-specific template, stamp, or document-production reuse. The inter-CPA-as-negative assumption is therefore not exactly satisfied — some inter-CPA pairs may share firm-level templates rather than being independent random matches. Our reported per-comparison ICCRs are best read as specificity-proxy rates under a partially-violated assumption, not as calibrated FARs. Because the violation is firm-dependent, Firm A's per-firm ICCR is more contaminated by within-firm sharing than Firms B/C/D's; the per-firm B/C/D rates of $0.09$–$0.16$ may therefore be less contaminated than the pooled rate, and the Firm A vs Firms B/C/D contrast reflects both genuine firm heterogeneity and a firm-dependent proxy-contamination gradient.

+*Mechanism attribution for the firm-level heterogeneity is not identifiable from descriptor-only data.* The observed firm-level contrast (Firm A's per-document HC$+$MC ICCR of $0.62$ versus $0.09$–$0.16$ at Firms B/C/D; within-firm collision concentration $77$–$99\%$ under the deployed any-pair rule; byte-identical evidence of §IV-H) is consistent with at least three non-mutually-exclusive firm-level mechanisms: (i) template, stamp, or e-signature production reuse; (ii) digitisation-pipeline homogeneity — shared scanners, common PDF generation infrastructure, identical compression and form-template settings — that systematically inflates image-descriptor similarity without signature replication; and (iii) signing-style or training homogeneity that produces correlated handwritten signatures within a firm. The descriptor pair (cosine, dHash) operates at the image-similarity level and is, by construction, indifferent to which mechanism generated a given near-identical pair. We therefore report the firm contrast as a methodological observation — the framework discriminates at firm-level resolution — rather than as a mechanism finding. The byte-identical Firm A signatures across $\sim 50$ distinct partners (§IV-H, §V-C) provide direct evidence for (i) at Firm A specifically, but do not exclude additive contribution from (ii) or (iii); the milder within-firm collision patterns at Firms B/C/D are individually consistent with all three mechanisms. Image-acquisition metadata (scanner identifiers, PDF generator fingerprints, compression-codec markers), partner-level intent records, or controlled hand-signed baselines would be needed to attribute the contrast across (i), (ii), and (iii).
+
 *Scope.* The primary analyses are scoped to the Big-4 sub-corpus. We did not perform the full per-signature pool-normalised ICCR analysis at the full $n = 686$ scope; the §IV-K full-dataset Spearman re-run shows the K=3 $+$ deployed box-rule rank-convergence is preserved at $n = 686$ but does not establish portability of the Big-4 operational ICCRs, the LOOO firm-fold structure, or the five-way operational classifier at the broader scope.

 **Secondary scope and validation caveats.**