Paper A v3.11: reframe Section III-G unit hierarchy + propagate implications

Rewrites Section III-G (Unit of Analysis and Summary Statistics) after
self-review identified three logical issues in v3.10:

1. Ordering inversion: the three units are now ordered signature ->
   auditor-year -> accountant, with auditor-year as the principled
   middle unit under within-year assumptions and accountant as a
   deliberate cross-year pooling.

2. Oversold assumption: the old "within-auditor-year no-mixing
   identification assumption" is split into A1 (pair-detectability,
   weak statistical, cross-year scope matching the detector) and A2
   (within-year label uniformity, interpretive convention). The
   arithmetic statistics reported in the paper do not require A2; A2
   only underwrites interpretive readings (notably IV-H.1's partner-
   level "minority of hand-signers" framing).

3. Motivation-assumption mismatch: removed the "longitudinal behaviour
   of interest" framing and explicitly disclaimed across-year
   homogeneity. Accountant-level coordinates are now described as a
   pooled observed tendency rather than a time-invariant regime.

Propagated implications across Introduction, Discussion, and Results:
softened "tends to cluster into a dominant regime" and "directly
quantifying the minority of hand-signers" to "pooled observed
tendency" / "consistent with within-firm heterogeneity"; rewrote the
Limitations fifth point (was "treats all signatures from a CPA as
a single class"); added a seventh Limitation acknowledging the
source-template edge case; added a per-signature best-match cross-year
caveat to Section IV-H.2; softened IV-H.2's "direct consequence" to
"consistent with"; reframed pixel-identity anchor as pair-level proof
of image reuse (with source-template exception) rather than absolute
signature-level positive.

Process: self-review (9 findings) -> full-pass fixes -> codex
gpt-5.5 xhigh round-10 verification (8 RESOLVED, 1 PARTIAL, 4 MINOR
regression findings) -> regression fixes.

No re-computation. All tables (IV-XVIII) and Appendix A numbers
unchanged. Abstract at 248/250 words.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-24 19:52:45 +08:00
parent 615059a2c1
commit d2f8673a67
5 changed files with 41 additions and 22 deletions
+4 -3
View File
@@ -69,7 +69,7 @@ The $N = 168{,}740$ count used in Table V and in the downstream same-CPA per-sig
| Per-accountant dHash mean | 686 | 0.0277 | <0.001 | Multimodal |
-->
Firm A's per-signature cosine distribution is *unimodal* ($p = 0.17$), reflecting a single dominant generative mechanism (non-hand-signing) with a long left tail attributable to the minority of hand-signing Firm A partners identified in the accountant-level mixture (Section IV-E).
Firm A's per-signature cosine distribution is *unimodal* ($p = 0.17$), reflecting a single dominant generative mechanism (non-hand-signing) with a long left tail attributable to within-firm heterogeneity---consistent with a minority of hand-signing Firm A partners---as identified in the accountant-level mixture (Section IV-E).
The all-CPA cosine distribution, which mixes many firms with heterogeneous signing practices, is *multimodal* ($p < 0.001$).
At the per-accountant aggregate level both cosine and dHash means are strongly multimodal, foreshadowing the mixture structure analyzed in Section IV-E.
@@ -184,7 +184,7 @@ We report three validation analyses corresponding to the anchors of Section III-
### 1) Pixel-Identity Positive Anchor with Inter-CPA Negative Anchor
Of the 182,328 extracted signatures, 310 have a same-CPA nearest match that is byte-identical after crop and normalization (pixel-identical-to-closest = 1); these form the gold-positive anchor.
Of the 182,328 extracted signatures, 310 have a same-CPA nearest match that is byte-identical after crop and normalization (pixel-identical-to-closest = 1); these form the byte-identity positive anchor---a pair-level proof of image reuse that serves as conservative ground truth for non-hand-signed signatures, subject to the source-template edge case discussed in Section V-G.
As the gold-negative anchor we sample 50,000 random cross-CPA signature pairs (inter-CPA cosine: mean $= 0.762$, $P_{95} = 0.884$, $P_{99} = 0.913$, max $= 0.988$).
Because the positive and negative anchor populations are constructed from different sampling units (byte-identical same-CPA pairs vs random inter-CPA pairs), their relative prevalence in the combined anchor set is arbitrary, and precision / $F_1$ / recall therefore have no meaningful population interpretation.
We accordingly report FAR with Wilson 95% confidence intervals against the large inter-CPA negative anchor in Table X.
@@ -314,6 +314,7 @@ We test this prediction directly.
For each auditor-year (CPA $\times$ fiscal year) with at least 5 signatures we compute the mean best-match cosine similarity across the year's signatures, yielding 4,629 auditor-years across 2013-2023.
Firm A accounts for 1,287 of these (27.8% baseline share).
Table XIV reports per-firm occupancy of the top $K\%$ of the ranked distribution.
The per-signature best-match cosine underlying each auditor-year mean is taken over the full same-CPA pool (Section III-G) and may match against signatures from other fiscal years, so the auditor-year mean reflects the year's signatures' position within the CPA's full-sample similarity structure rather than purely within-year similarity; a within-year-restricted sensitivity replication is a natural robustness check and is left to future work.
<!-- TABLE XIV: Top-K Similarity Rank Occupancy by Firm (pooled 2013-2023)
| Top-K | k in bucket | Firm A | Firm B | Firm C | Firm D | Non-Big-4 | Firm A share |
@@ -342,7 +343,7 @@ Year-by-year (Table XV), the top-10% Firm A share ranges from 88.4% (2020) to 10
| 2023 | 474 | 47 | 46 | 97.9% | 27.4% |
-->
This over-representation is a direct consequence of firm-wide non-hand-signing practice and is not derived from any threshold we subsequently calibrate.
This over-representation is consistent with firm-wide non-hand-signing practice at Firm A and is not derived from any threshold we subsequently calibrate.
It therefore constitutes genuine cross-firm evidence for Firm A's benchmark status.
### 3) Intra-Report Consistency