Paper A v3.11: reframe Section III-G unit hierarchy + propagate implications

Rewrites Section III-G (Unit of Analysis and Summary Statistics) after
self-review identified three logical issues in v3.10:

1. Ordering inversion: the three units are now ordered signature ->
   auditor-year -> accountant, with auditor-year as the principled
   middle unit under within-year assumptions and accountant as a
   deliberate cross-year pooling.

2. Oversold assumption: the old "within-auditor-year no-mixing
   identification assumption" is split into A1 (pair-detectability,
   weak statistical, cross-year scope matching the detector) and A2
   (within-year label uniformity, interpretive convention). The
   arithmetic statistics reported in the paper do not require A2; A2
   only underwrites interpretive readings (notably IV-H.1's partner-
   level "minority of hand-signers" framing).

3. Motivation-assumption mismatch: removed the "longitudinal behaviour
   of interest" framing and explicitly disclaimed across-year
   homogeneity. Accountant-level coordinates are now described as a
   pooled observed tendency rather than a time-invariant regime.

Propagated implications across Introduction, Discussion, and Results:
softened "tends to cluster into a dominant regime" and "directly
quantifying the minority of hand-signers" to "pooled observed
tendency" / "consistent with within-firm heterogeneity"; rewrote the
Limitations fifth point (was "treats all signatures from a CPA as
a single class"); added a seventh Limitation acknowledging the
source-template edge case; added a per-signature best-match cross-year
caveat to Section IV-H.2; softened IV-H.2's "direct consequence" to
"consistent with"; reframed pixel-identity anchor as pair-level proof
of image reuse (with source-template exception) rather than absolute
signature-level positive.

Process: self-review (9 findings) -> full-pass fixes -> codex
gpt-5.5 xhigh round-10 verification (8 RESOLVED, 1 PARTIAL, 4 MINOR
regression findings) -> regression fixes.

No re-computation. All tables (IV-XVIII) and Appendix A numbers
unchanged. Abstract at 248/250 words.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-24 19:52:45 +08:00
parent 615059a2c1
commit d2f8673a67
5 changed files with 41 additions and 22 deletions
+1 -1
View File
@@ -56,7 +56,7 @@ Adopting the replication-dominated framing---rather than a near-universal framin
A third distinctive feature is our unit-of-analysis treatment.
Our threshold-framework analysis reveals an informative asymmetry between the signature level and the accountant level: per-signature similarity forms a continuous quality spectrum for which no two-mechanism mixture provides a good fit, whereas per-accountant aggregates are clustered into three recognizable groups (BIC-best $K = 3$).
The substantive reading is that *pixel-level output quality* is a continuous spectrum shaped by firm-specific reproduction technologies and scan conditions, while *accountant-level aggregate behaviour* is clustered but not sharply discrete---a given CPA tends to cluster into a dominant regime (high-replication, middle-band, or hand-signed-tendency), though the boundaries between regimes are smooth rather than discontinuous.
The substantive reading is that *pixel-level output quality* is a continuous spectrum shaped by firm-specific reproduction technologies and scan conditions, while *accountant-level aggregate behaviour* is clustered but not sharply discrete: each CPA's cross-year-pooled coordinates sit closest to one of three recognizable groups (high-replication, middle-band, or hand-signed-tendency), reflecting a pooled observed tendency rather than a time-invariant regime, with smooth rather than discontinuous boundaries between groups.
At the accountant level, the KDE antimode and the two mixture-based estimators (Beta-2 crossing and its logit-Gaussian robustness counterpart) converge within $\sim 0.006$ on a cosine threshold of approximately $0.975$, while the Burgstahler-Dichev / McCrary density-smoothness diagnostic finds no significant transition---an outcome (robust across a bin-width sweep, Appendix A) consistent with smoothly mixed clusters.
The two-dimensional GMM marginal crossings (cosine $= 0.945$, dHash $= 8.10$) are reported as a complementary cross-check rather than as the primary accountant-level threshold.