Paper A v3.11: reframe Section III-G unit hierarchy + propagate implications

Rewrites Section III-G (Unit of Analysis and Summary Statistics) after self-review identified three logical issues in v3.10: 1. Ordering inversion: the three units are now ordered signature -> auditor-year -> accountant, with auditor-year as the principled middle unit under within-year assumptions and accountant as a deliberate cross-year pooling. 2. Oversold assumption: the old "within-auditor-year no-mixing identification assumption" is split into A1 (pair-detectability, weak statistical, cross-year scope matching the detector) and A2 (within-year label uniformity, interpretive convention). The arithmetic statistics reported in the paper do not require A2; A2 only underwrites interpretive readings (notably IV-H.1's partner- level "minority of hand-signers" framing). 3. Motivation-assumption mismatch: removed the "longitudinal behaviour of interest" framing and explicitly disclaimed across-year homogeneity. Accountant-level coordinates are now described as a pooled observed tendency rather than a time-invariant regime. Propagated implications across Introduction, Discussion, and Results: softened "tends to cluster into a dominant regime" and "directly quantifying the minority of hand-signers" to "pooled observed tendency" / "consistent with within-firm heterogeneity"; rewrote the Limitations fifth point (was "treats all signatures from a CPA as a single class"); added a seventh Limitation acknowledging the source-template edge case; added a per-signature best-match cross-year caveat to Section IV-H.2; softened IV-H.2's "direct consequence" to "consistent with"; reframed pixel-identity anchor as pair-level proof of image reuse (with source-template exception) rather than absolute signature-level positive. Process: self-review (9 findings) -> full-pass fixes -> codex gpt-5.5 xhigh round-10 verification (8 RESOLVED, 1 PARTIAL, 4 MINOR regression findings) -> regression fixes. No re-computation. All tables (IV-XVIII) and Appendix A numbers unchanged. Abstract at 248/250 words. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:52:45 +08:00
parent 615059a2c1
commit d2f8673a67
5 changed files with 41 additions and 22 deletions
@@ -113,19 +113,32 @@ Cosine similarity and dHash are both robust to the noise introduced by the print

 ## G. Unit of Analysis and Summary Statistics

-Two unit-of-analysis choices are relevant for this study: (i) the *signature*---one signature image extracted from one report---and (ii) the *accountant*---the collection of all signatures attributed to a single CPA across the sample period.
-A third composite unit---the *auditor-year*, i.e. all signatures by one CPA within one fiscal year---is also natural when longitudinal behavior is of interest, and we treat auditor-year analyses as a direct extension of accountant-level analysis at finer temporal resolution.
+Three unit-of-analysis choices are relevant for this study, ordered from finest to coarsest: (i) the *signature*---one signature image extracted from one report; (ii) the *auditor-year*---all signatures by one CPA within one fiscal year; and (iii) the *accountant*---the collection of all signatures attributed to a single CPA across the full sample period.
+All three are well-defined as descriptive groupings without additional assumptions; the distinction that matters for *regime interpretation*---i.e., reading a unit's summary as "this CPA's signing mechanism for that unit"---is that the auditor-year is the smallest CPA-level aggregation that is coherent under the stipulations below without additional across-year homogeneity, whereas the accountant unit is a deliberate cross-year pooling that may blend distinct signing-mechanism regimes when a CPA's practice changes over the sample period.
+We use all three units in the paper and specify the role of each at the point of use.

-For per-signature classification we compute, for each signature, the maximum pairwise cosine similarity and the minimum dHash Hamming distance against every other signature attributed to the same CPA.
+For per-signature classification we compute, for each signature, the maximum pairwise cosine similarity and the minimum dHash Hamming distance against every other signature attributed to the same CPA (over the full same-CPA set, not restricted to the same fiscal year).
 The max/min (rather than mean) formulation reflects the identification logic for non-hand-signing: if even one other signature of the same CPA is a pixel-level reproduction, that pair will dominate the extremes and reveal the non-hand-signed mechanism.
 Mean statistics would dilute this signal.

-We also adopt an explicit *within-auditor-year no-mixing* identification assumption.
-Specifically, within any single fiscal year we treat a given CPA's signing mechanism as uniform: a CPA who reproduces one signature image in that year is assumed to do so for every report, and a CPA who hand-signs in that year is assumed to hand-sign every report in that year.
-Domain-knowledge from industry practice at Firm A is consistent with this assumption for that firm during the sample period.
-Under the assumption, per-auditor-year summary statistics are well defined and robust to outliers: if even one pair of same-CPA signatures in the year is near-identical, the max/min captures it.
-The intra-report consistency analysis in Section IV-H.3 is a related but distinct check: it tests whether the *two co-signing CPAs on the same report* receive the same signature-level label (firm-level signing-practice homogeneity) rather than testing whether a single CPA mixes mechanisms within a fiscal year.
-A direct empirical check of the within-auditor-year assumption at the same-CPA level would require labeling multiple reports of the same CPA in the same year and is left to future work; in this paper we maintain the assumption as an identification convention motivated by industry practice and bounded by the worst-case aggregation rule of Section III-L.
+We distinguish two stipulations by the role each plays, in order to avoid overstating the paper's reliance on them.
+
+**(A1) Pair-detectability** is a statistical assumption scoped to the same-CPA pool (pooled across fiscal years, matching the max/min computation above): if a CPA uses image replication anywhere in the corpus, at least one pair of same-CPA signatures is near-identical after reproduction noise, so that max cosine / min dHash detects the replication.
+This is plausible for high-volume stamping or firm-level electronic-signing workflows---where a stored image is typically reused many times under similar scan and compression conditions---but is not guaranteed in sparse CPA-corpora with only one observed replicated report, when multiple template variants are in use, or when scan-stage noise pushes a replicated pair outside the detection regime.
+A1 is what the per-signature detector requires to be sensitive to replication; it is a cross-year pair-existence property, not a within-year uniformity claim.
+
+**(A2) Within-year label uniformity** is an interpretive convention used when a signature-level label is *read as* "this CPA's signing mechanism for that fiscal year": within any single fiscal year we treat the CPA's mechanism as uniform, i.e., a CPA who replicates any signature image in that year is treated as doing so for every report in that year, and a hand-signer is treated as hand-signing every report in that year.
+A2 is consistent with industry practice at Firm A during the sample period, but may weaken at other Big-4 firms during the 2019--2021 digitalization-transition years, in which a CPA's mechanism could in principle shift mid-year as firm-level electronic-signing systems were rolled out.
+We therefore read A2 as a domain-motivated default rather than a universally validated empirical claim.
+The arithmetic statistics reported in this paper do not require A2 for their definition or computation: the per-signature classifier (Section III-L) operates at signature level, the accountant-level mixture (Section III-J) uses mean statistics over the full same-CPA pool, and the partner-level ranking (Section IV-H.2) uses a per-auditor-year mean---none of which require within-year uniformity to be well-defined.
+A2 does, however, underwrite certain *interpretive* readings---most notably, the framing in Section IV-H.1 of Firm A's yearly left-tail share as a partner-level "minority of hand-signers" rather than a bare signature-level rate---and the downstream use of per-signature or per-auditor-year labels as regime labels for auditor-behavior research.
+
+We explicitly *do not* assume across-year homogeneity.
+A CPA's mechanism may change across fiscal years---the 2019--2021 Big-4 digitalization trends documented in Section IV-H are consistent with such changes---and accountant-level summary statistics (Section III-J) therefore represent a cross-year pooled summary that may blend multiple regimes for the same CPA.
+We treat this as a design choice: the accountant-level aggregates characterize each CPA's overall distribution over the full sample period, not a single time-invariant regime.
+
+The intra-report consistency analysis in Section IV-H.3 is a related but distinct check: it tests whether the *two co-signing CPAs on the same report* receive the same signature-level label (firm-level signing-practice homogeneity) rather than testing A2 at the same-CPA level.
+A direct empirical check of A2 would require labeling multiple reports of the same CPA in the same year and is left to future work; as noted above, no reported statistic relies on A2, and A2's interpretive scope is further bounded by the worst-case aggregation rule of Section III-L.

 For accountant-level analysis we additionally aggregate these per-signature statistics to the CPA level by computing the mean best-match cosine and the mean *independent minimum dHash* across all signatures of that CPA.
 The *independent minimum dHash* of a signature is defined as the minimum Hamming distance to *any* other signature of the same CPA (over the full same-CPA set).
@@ -219,7 +232,8 @@ The accountant-level estimates from the two threshold estimators (together with
 ## J. Accountant-Level Mixture Model

 In addition to the signature-level analysis, we fit a Gaussian mixture model in two dimensions to the per-accountant aggregates (mean best-match cosine, mean independent minimum dHash).
-The motivation is the expectation---consistent with industry-practice knowledge at Firm A---that an individual CPA's signing *practice* is clustered (typically consistent adoption of non-hand-signing or consistent hand-signing within a given year) even when the output pixel-level *quality* lies on a continuous spectrum.
+The motivation is that an individual CPA's cross-year-pooled signing *tendency*---their full-sample distribution of best-match statistics---is expected to cluster with other CPAs of similar tendency, even when the output pixel-level *quality* at the signature level lies on a continuous spectrum.
+Cluster membership in the accountant-level fit is accordingly best read as a *pooled observed tendency* over the CPA's full sample-period signature set rather than as a time-invariant signing regime; where a CPA switched mechanisms during the sample period, their accountant-level coordinates reflect a weighted mix of the corresponding regimes.

 We fit mixtures with $K \in \{1, 2, 3, 4, 5\}$ components under full covariance, selecting $K^*$ by BIC with 15 random initializations per $K$.
 For the selected $K^*$ we report component means, weights, per-component firm composition, and the marginal-density crossing points from the two-component fit, which serve as the natural per-accountant thresholds.
@@ -229,14 +243,14 @@ For the selected $K^*$ we report component means, weights, per-component firm co
 Rather than construct a stratified manual-annotation validation set, we validate the classifier using four naturally occurring reference populations that require no human labeling:

 1. **Pixel-identical anchor (gold positive, conservative subset):** signatures whose nearest same-CPA match is byte-identical after crop and normalization.
-Handwriting physics makes byte-identity impossible under independent signing events, so this anchor is absolute ground truth *for the byte-identical subset* of non-hand-signed signatures.
-We emphasize that this anchor is a *subset* of the true positive class---only those non-hand-signed signatures whose nearest match happens to be byte-identical---and perfect recall against this anchor therefore does not establish recall against the full non-hand-signed population (Section V-G discusses this further).
+Handwriting physics makes byte-identity impossible under independent signing events, so a byte-identical same-CPA pair is pair-level proof of image reuse and---for the byte-identical subset---conservative ground truth for non-hand-signed signatures; the narrow exception, in which a genuinely hand-signed exemplar was subsequently reused as the stamping or e-signature template, is discussed as a Limitation in Section V-G.
+We further emphasize that this anchor is a *subset* of the true positive class---only those non-hand-signed signatures whose nearest match happens to be byte-identical---and perfect recall against this anchor therefore does not establish recall against the full non-hand-signed population (Section V-G discusses this further).

 2. **Inter-CPA negative anchor (large gold negative):** $\sim$50,000 pairs of signatures randomly sampled from *different* CPAs.
 Inter-CPA pairs cannot arise from reuse of a single signer's stored signature image, so this population is a reliable negative class for threshold sweeps.
 This anchor is substantially larger than a simple low-similarity-same-CPA negative and yields tight Wilson 95% confidence intervals on FAR at each candidate threshold.

-3. **Firm A anchor (replication-dominated prior positive):** Firm A signatures, treated as a majority-positive reference whose left tail contains a minority of hand-signers, as directly evidenced by the 32/171 middle-band share in the accountant-level mixture (Section III-H).
+3. **Firm A anchor (replication-dominated prior positive):** Firm A signatures, treated as a majority-positive reference with within-firm heterogeneity in the left tail (consistent with a minority of hand-signers), as evidenced by the 32/171 middle-band share in the accountant-level mixture (Section III-H).
 Because Firm A is both used for empirical percentile calibration in Section III-H and as a validation anchor, we make the within-Firm-A sampling variance visible by splitting Firm A CPAs randomly (at the CPA level, not the signature level) into a 70% *calibration* fold and a 30% *heldout* fold.
 Median, 1st percentile, and 95th percentile of signature-level cosine/dHash distributions are derived from the calibration fold only.
 The heldout fold is used exclusively to report post-hoc capture rates with Wilson 95% confidence intervals.