Paper A v3.11: reframe Section III-G unit hierarchy + propagate implications

Rewrites Section III-G (Unit of Analysis and Summary Statistics) after self-review identified three logical issues in v3.10: 1. Ordering inversion: the three units are now ordered signature -> auditor-year -> accountant, with auditor-year as the principled middle unit under within-year assumptions and accountant as a deliberate cross-year pooling. 2. Oversold assumption: the old "within-auditor-year no-mixing identification assumption" is split into A1 (pair-detectability, weak statistical, cross-year scope matching the detector) and A2 (within-year label uniformity, interpretive convention). The arithmetic statistics reported in the paper do not require A2; A2 only underwrites interpretive readings (notably IV-H.1's partner- level "minority of hand-signers" framing). 3. Motivation-assumption mismatch: removed the "longitudinal behaviour of interest" framing and explicitly disclaimed across-year homogeneity. Accountant-level coordinates are now described as a pooled observed tendency rather than a time-invariant regime. Propagated implications across Introduction, Discussion, and Results: softened "tends to cluster into a dominant regime" and "directly quantifying the minority of hand-signers" to "pooled observed tendency" / "consistent with within-firm heterogeneity"; rewrote the Limitations fifth point (was "treats all signatures from a CPA as a single class"); added a seventh Limitation acknowledging the source-template edge case; added a per-signature best-match cross-year caveat to Section IV-H.2; softened IV-H.2's "direct consequence" to "consistent with"; reframed pixel-identity anchor as pair-level proof of image reuse (with source-template exception) rather than absolute signature-level positive. Process: self-review (9 findings) -> full-pass fixes -> codex gpt-5.5 xhigh round-10 verification (8 RESOLVED, 1 PARTIAL, 4 MINOR regression findings) -> regression fixes. No re-computation. All tables (IV-XVIII) and Appendix A numbers unchanged. Abstract at 248/250 words. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:52:45 +08:00
parent 615059a2c1
commit d2f8673a67
5 changed files with 41 additions and 22 deletions
@@ -2,6 +2,6 @@

 <!-- IEEE Access target: <= 250 words, single paragraph -->

-Regulations require Certified Public Accountants (CPAs) to attest to each audit report by affixing a signature. Digitization makes reusing a stored signature image across reports trivial---through administrative stamping or firm-level electronic signing---potentially undermining individualized attestation. Unlike forgery, *non-hand-signed* reproduction reuses the legitimate signer's own stored image, making it visually invisible to report users and infeasible to audit at scale manually. We present a pipeline integrating a Vision-Language Model for signature-page identification, YOLOv11 for signature detection, and ResNet-50 for feature extraction, followed by dual-descriptor verification combining cosine similarity and difference hashing. For threshold determination we apply two estimators---kernel-density antimode with a Hartigan unimodality test and an EM-fitted Beta mixture with a logit-Gaussian robustness check---plus a Burgstahler-Dichev/McCrary density-smoothness diagnostic, at the signature and accountant levels. Applied to 90,282 audit reports filed in Taiwan over 2013-2023 (182,328 signatures from 758 CPAs), the methods reveal a level asymmetry: signature-level similarity is a continuous quality spectrum that no two-component mixture separates, while accountant-level aggregates cluster into three groups with the antimode and two mixture estimators converging within $\sim$0.006 at cosine $\approx 0.975$. A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with visual inspection and accountant-level mixture evidence supporting majority non-hand-signing and a minority of hand-signers; capture rates on both 70/30 calibration and held-out folds are reported with Wilson 95% intervals to make fold-level variance visible. Validation against 310 byte-identical positives and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 at all accountant-level thresholds.
+Regulations require Certified Public Accountants (CPAs) to attest to each audit report by affixing a signature. Digitization makes reusing a stored signature image across reports trivial---through administrative stamping or firm-level electronic signing---potentially undermining individualized attestation. Unlike forgery, *non-hand-signed* reproduction reuses the legitimate signer's own stored image, making it visually invisible to report users and infeasible to audit at scale manually. We present a pipeline integrating a Vision-Language Model for signature-page identification, YOLOv11 for signature detection, and ResNet-50 for feature extraction, followed by dual-descriptor verification combining cosine similarity and difference hashing. For threshold determination we apply two estimators---kernel-density antimode with a Hartigan unimodality test and an EM-fitted Beta mixture with a logit-Gaussian robustness check---plus a Burgstahler-Dichev/McCrary density-smoothness diagnostic, at the signature and accountant levels. Applied to 90,282 audit reports filed in Taiwan over 2013-2023 (182,328 signatures from 758 CPAs), the methods reveal a level asymmetry: signature-level similarity is a continuous quality spectrum that no two-component mixture separates, while accountant-level aggregates cluster into three groups with the antimode and two mixture estimators converging within $\sim$0.006 at cosine $\approx 0.975$. A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with visual inspection and accountant-level mixture evidence supporting majority non-hand-signing alongside within-firm heterogeneity consistent with a minority of hand-signers; capture rates on both 70/30 calibration and held-out folds are reported with Wilson 95% intervals to make fold-level variance visible. Validation against 310 byte-identical positives and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 at all accountant-level thresholds.

 <!-- Target word count: 240 -->
@@ -40,7 +40,7 @@ Our evidence across multiple analyses rules out that assumption for Firm A while
 Three convergent strands of evidence support the replication-dominated framing.
 First, the visual-inspection evidence: randomly sampled Firm A reports exhibit pixel-identical signature images across different audit engagements and fiscal years for the majority of partners---a physical impossibility under independent hand-signing events.
 Second, the signature-level statistical evidence: Firm A's per-signature cosine distribution is unimodal long-tail rather than a tight single peak; 92.5% of Firm A signatures exceed cosine 0.95, with the remaining 7.5% forming the left tail.
-Third, the accountant-level evidence: of the 171 Firm A CPAs with enough signatures ($\geq 10$) to enter the accountant-level GMM, 32 (19%) fall into the middle-band C2 cluster rather than the high-replication C1 cluster---directly quantifying the within-firm minority of hand-signers.
+Third, the accountant-level evidence: of the 171 Firm A CPAs with enough signatures ($\geq 10$) to enter the accountant-level GMM, 32 (19%) fall into the middle-band C2 cluster rather than the high-replication C1 cluster---consistent with within-firm heterogeneity in signing practice (spanning a minority of hand-signers, CPAs undergoing mid-sample mechanism transitions, and CPAs whose pooled coordinates reflect mixed-quality replication) rather than a pure replication population.
 Nine additional Firm A CPAs are excluded from the GMM for having fewer than 10 signatures, so we cannot place them in a cluster from the cross-sectional analysis alone.
 The held-out Firm A 70/30 validation (Section IV-G.2) gives capture rates on a non-calibration Firm A subset that sit in the same replication-dominated regime as the calibration fold across the full range of operating rules (extreme rules are statistically indistinguishable; operational rules in the 85–95% band differ between folds by 1–5 percentage points, reflecting within-Firm-A heterogeneity in replication intensity rather than a generalization failure).
 The accountant-level GMM (Section IV-E) and the threshold-independent partner-ranking analysis (Section IV-H.2) are the cross-checks that are robust to fold-level sampling variance.
@@ -72,7 +72,7 @@ The framing we adopt---replication-dominated rather than replication-pure---is a
 ## F. Pixel-Identity and Inter-CPA Anchors as Annotation-Free Validation

 A further methodological contribution is the combination of byte-level pixel identity as an annotation-free *conservative* gold positive and a large random-inter-CPA negative anchor.
-Handwriting physics makes byte-identity impossible under independent signing events, so any pair of same-CPA signatures that are byte-identical after crop and normalization is an absolute positive for non-hand-signing, requiring no human review.
+Handwriting physics makes byte-identity impossible under independent signing events, so any pair of same-CPA signatures that are byte-identical after crop and normalization is pair-level proof of image reuse and, modulo the narrow source-template edge case discussed in the seventh limitation below, a conservative positive for non-hand-signing without requiring human review.
 In our corpus 310 signatures satisfied this condition.
 We emphasize that byte-identical pairs are a *subset* of the true non-hand-signed positive class---they capture only those whose nearest same-CPA match happens to be bytewise identical, excluding replications that are pixel-near-identical but not byte-identical (for example, under different scan or compression pathways).
 Perfect recall against this subset therefore does not generalize to perfect recall against the full non-hand-signed population; it is a lower-bound calibration check on the classifier's ability to catch the clearest positives rather than a generalizable recall estimate.
@@ -99,13 +99,17 @@ This effect would bias classification toward false negatives rather than false p
 Fourth, scanning equipment, PDF generation software, and compression algorithms may have changed over the 10-year study period (2013--2023), potentially affecting similarity measurements.
 While cosine similarity and dHash are designed to be robust to such variations, longitudinal confounds cannot be entirely excluded, and we note that our accountant-level aggregates could mask within-accountant temporal transitions.

-Fifth, the classification framework treats all signatures from a CPA as belonging to a single class, not accounting for potential changes in signing practice over time (e.g., a CPA who signed genuinely in early years but adopted non-hand-signing later).
-Extending the accountant-level analysis to auditor-year units is a natural next step.
+Fifth, the accountant-level summary (Section III-J) is a cross-year pooled statistic by construction, so a CPA whose signing mechanism changed mid-sample is placed at a weighted mix of component means rather than at a single regime centroid.
+Extending the accountant-level analysis to auditor-year units---using the same convergent threshold framework at finer temporal resolution---is the natural next step for resolving such within-accountant transitions.

 Sixth, the BD/McCrary transition estimates fall inside rather than between modes for the per-signature cosine distribution, and the test produces no significant transition at all at the accountant level.
 In our application, therefore, BD/McCrary contributes diagnostic information about local density-smoothness rather than an independent accountant-level threshold estimate; that role is played by the KDE antimode and the two mixture-based estimators.
 We emphasize that the accountant-level BD/McCrary null is *consistent with*---not affirmative proof of---smoothly mixed cluster boundaries: the BD/McCrary test is known to have limited statistical power at modest sample sizes, and with $N = 686$ accountants in our analysis the test cannot reliably detect anything less than a sharp cliff-type density discontinuity.
 Failure to reject the smoothness null at this sample size therefore reinforces BD/McCrary's role as a diagnostic rather than a definitive estimator; the substantive claim of smoothly-mixed accountant-level clustering rests on the joint weight of the dip-test and Beta-mixture evidence together with the BD null, not on the BD null alone.

+Seventh, the max/min detection logic treats both ends of a near-identical same-CPA pair as non-hand-signed.
+In the rare case that one of the two documents contains a genuinely hand-signed exemplar that was subsequently reused as the stamping or e-signature template, the pair correctly identifies image reuse but misattributes the non-hand-signed status to the source exemplar.
+This misattribution affects at most one source document per template variant per CPA (the exemplar from which the template was produced), is not expected to be common given that stored signature templates are typically generated in a separate acquisition step rather than extracted from submitted audit reports, and does not materially affect aggregate capture rates at the firm level.
+
 Finally, the legal and regulatory implications of our findings depend on jurisdictional definitions of "signature" and "signing."
 Whether non-hand-signing of a CPA's own stored signature constitutes a violation of signing requirements is a legal question that our technical analysis can inform but cannot resolve.
@@ -56,7 +56,7 @@ Adopting the replication-dominated framing---rather than a near-universal framin

 A third distinctive feature is our unit-of-analysis treatment.
 Our threshold-framework analysis reveals an informative asymmetry between the signature level and the accountant level: per-signature similarity forms a continuous quality spectrum for which no two-mechanism mixture provides a good fit, whereas per-accountant aggregates are clustered into three recognizable groups (BIC-best $K = 3$).
-The substantive reading is that *pixel-level output quality* is a continuous spectrum shaped by firm-specific reproduction technologies and scan conditions, while *accountant-level aggregate behaviour* is clustered but not sharply discrete---a given CPA tends to cluster into a dominant regime (high-replication, middle-band, or hand-signed-tendency), though the boundaries between regimes are smooth rather than discontinuous.
+The substantive reading is that *pixel-level output quality* is a continuous spectrum shaped by firm-specific reproduction technologies and scan conditions, while *accountant-level aggregate behaviour* is clustered but not sharply discrete: each CPA's cross-year-pooled coordinates sit closest to one of three recognizable groups (high-replication, middle-band, or hand-signed-tendency), reflecting a pooled observed tendency rather than a time-invariant regime, with smooth rather than discontinuous boundaries between groups.
 At the accountant level, the KDE antimode and the two mixture-based estimators (Beta-2 crossing and its logit-Gaussian robustness counterpart) converge within $\sim 0.006$ on a cosine threshold of approximately $0.975$, while the Burgstahler-Dichev / McCrary density-smoothness diagnostic finds no significant transition---an outcome (robust across a bin-width sweep, Appendix A) consistent with smoothly mixed clusters.
 The two-dimensional GMM marginal crossings (cosine $= 0.945$, dHash $= 8.10$) are reported as a complementary cross-check rather than as the primary accountant-level threshold.

@@ -113,19 +113,32 @@ Cosine similarity and dHash are both robust to the noise introduced by the print

 ## G. Unit of Analysis and Summary Statistics

-Two unit-of-analysis choices are relevant for this study: (i) the *signature*---one signature image extracted from one report---and (ii) the *accountant*---the collection of all signatures attributed to a single CPA across the sample period.
-A third composite unit---the *auditor-year*, i.e. all signatures by one CPA within one fiscal year---is also natural when longitudinal behavior is of interest, and we treat auditor-year analyses as a direct extension of accountant-level analysis at finer temporal resolution.
+Three unit-of-analysis choices are relevant for this study, ordered from finest to coarsest: (i) the *signature*---one signature image extracted from one report; (ii) the *auditor-year*---all signatures by one CPA within one fiscal year; and (iii) the *accountant*---the collection of all signatures attributed to a single CPA across the full sample period.
+All three are well-defined as descriptive groupings without additional assumptions; the distinction that matters for *regime interpretation*---i.e., reading a unit's summary as "this CPA's signing mechanism for that unit"---is that the auditor-year is the smallest CPA-level aggregation that is coherent under the stipulations below without additional across-year homogeneity, whereas the accountant unit is a deliberate cross-year pooling that may blend distinct signing-mechanism regimes when a CPA's practice changes over the sample period.
+We use all three units in the paper and specify the role of each at the point of use.

-For per-signature classification we compute, for each signature, the maximum pairwise cosine similarity and the minimum dHash Hamming distance against every other signature attributed to the same CPA.
+For per-signature classification we compute, for each signature, the maximum pairwise cosine similarity and the minimum dHash Hamming distance against every other signature attributed to the same CPA (over the full same-CPA set, not restricted to the same fiscal year).
 The max/min (rather than mean) formulation reflects the identification logic for non-hand-signing: if even one other signature of the same CPA is a pixel-level reproduction, that pair will dominate the extremes and reveal the non-hand-signed mechanism.
 Mean statistics would dilute this signal.

-We also adopt an explicit *within-auditor-year no-mixing* identification assumption.
-Specifically, within any single fiscal year we treat a given CPA's signing mechanism as uniform: a CPA who reproduces one signature image in that year is assumed to do so for every report, and a CPA who hand-signs in that year is assumed to hand-sign every report in that year.
-Domain-knowledge from industry practice at Firm A is consistent with this assumption for that firm during the sample period.
-Under the assumption, per-auditor-year summary statistics are well defined and robust to outliers: if even one pair of same-CPA signatures in the year is near-identical, the max/min captures it.
-The intra-report consistency analysis in Section IV-H.3 is a related but distinct check: it tests whether the *two co-signing CPAs on the same report* receive the same signature-level label (firm-level signing-practice homogeneity) rather than testing whether a single CPA mixes mechanisms within a fiscal year.
-A direct empirical check of the within-auditor-year assumption at the same-CPA level would require labeling multiple reports of the same CPA in the same year and is left to future work; in this paper we maintain the assumption as an identification convention motivated by industry practice and bounded by the worst-case aggregation rule of Section III-L.
+We distinguish two stipulations by the role each plays, in order to avoid overstating the paper's reliance on them.
+
+**(A1) Pair-detectability** is a statistical assumption scoped to the same-CPA pool (pooled across fiscal years, matching the max/min computation above): if a CPA uses image replication anywhere in the corpus, at least one pair of same-CPA signatures is near-identical after reproduction noise, so that max cosine / min dHash detects the replication.
+This is plausible for high-volume stamping or firm-level electronic-signing workflows---where a stored image is typically reused many times under similar scan and compression conditions---but is not guaranteed in sparse CPA-corpora with only one observed replicated report, when multiple template variants are in use, or when scan-stage noise pushes a replicated pair outside the detection regime.
+A1 is what the per-signature detector requires to be sensitive to replication; it is a cross-year pair-existence property, not a within-year uniformity claim.
+
+**(A2) Within-year label uniformity** is an interpretive convention used when a signature-level label is *read as* "this CPA's signing mechanism for that fiscal year": within any single fiscal year we treat the CPA's mechanism as uniform, i.e., a CPA who replicates any signature image in that year is treated as doing so for every report in that year, and a hand-signer is treated as hand-signing every report in that year.
+A2 is consistent with industry practice at Firm A during the sample period, but may weaken at other Big-4 firms during the 2019--2021 digitalization-transition years, in which a CPA's mechanism could in principle shift mid-year as firm-level electronic-signing systems were rolled out.
+We therefore read A2 as a domain-motivated default rather than a universally validated empirical claim.
+The arithmetic statistics reported in this paper do not require A2 for their definition or computation: the per-signature classifier (Section III-L) operates at signature level, the accountant-level mixture (Section III-J) uses mean statistics over the full same-CPA pool, and the partner-level ranking (Section IV-H.2) uses a per-auditor-year mean---none of which require within-year uniformity to be well-defined.
+A2 does, however, underwrite certain *interpretive* readings---most notably, the framing in Section IV-H.1 of Firm A's yearly left-tail share as a partner-level "minority of hand-signers" rather than a bare signature-level rate---and the downstream use of per-signature or per-auditor-year labels as regime labels for auditor-behavior research.
+
+We explicitly *do not* assume across-year homogeneity.
+A CPA's mechanism may change across fiscal years---the 2019--2021 Big-4 digitalization trends documented in Section IV-H are consistent with such changes---and accountant-level summary statistics (Section III-J) therefore represent a cross-year pooled summary that may blend multiple regimes for the same CPA.
+We treat this as a design choice: the accountant-level aggregates characterize each CPA's overall distribution over the full sample period, not a single time-invariant regime.
+
+The intra-report consistency analysis in Section IV-H.3 is a related but distinct check: it tests whether the *two co-signing CPAs on the same report* receive the same signature-level label (firm-level signing-practice homogeneity) rather than testing A2 at the same-CPA level.
+A direct empirical check of A2 would require labeling multiple reports of the same CPA in the same year and is left to future work; as noted above, no reported statistic relies on A2, and A2's interpretive scope is further bounded by the worst-case aggregation rule of Section III-L.

 For accountant-level analysis we additionally aggregate these per-signature statistics to the CPA level by computing the mean best-match cosine and the mean *independent minimum dHash* across all signatures of that CPA.
 The *independent minimum dHash* of a signature is defined as the minimum Hamming distance to *any* other signature of the same CPA (over the full same-CPA set).
@@ -219,7 +232,8 @@ The accountant-level estimates from the two threshold estimators (together with
 ## J. Accountant-Level Mixture Model

 In addition to the signature-level analysis, we fit a Gaussian mixture model in two dimensions to the per-accountant aggregates (mean best-match cosine, mean independent minimum dHash).
-The motivation is the expectation---consistent with industry-practice knowledge at Firm A---that an individual CPA's signing *practice* is clustered (typically consistent adoption of non-hand-signing or consistent hand-signing within a given year) even when the output pixel-level *quality* lies on a continuous spectrum.
+The motivation is that an individual CPA's cross-year-pooled signing *tendency*---their full-sample distribution of best-match statistics---is expected to cluster with other CPAs of similar tendency, even when the output pixel-level *quality* at the signature level lies on a continuous spectrum.
+Cluster membership in the accountant-level fit is accordingly best read as a *pooled observed tendency* over the CPA's full sample-period signature set rather than as a time-invariant signing regime; where a CPA switched mechanisms during the sample period, their accountant-level coordinates reflect a weighted mix of the corresponding regimes.

 We fit mixtures with $K \in \{1, 2, 3, 4, 5\}$ components under full covariance, selecting $K^*$ by BIC with 15 random initializations per $K$.
 For the selected $K^*$ we report component means, weights, per-component firm composition, and the marginal-density crossing points from the two-component fit, which serve as the natural per-accountant thresholds.
@@ -229,14 +243,14 @@ For the selected $K^*$ we report component means, weights, per-component firm co
 Rather than construct a stratified manual-annotation validation set, we validate the classifier using four naturally occurring reference populations that require no human labeling:

 1. **Pixel-identical anchor (gold positive, conservative subset):** signatures whose nearest same-CPA match is byte-identical after crop and normalization.
-Handwriting physics makes byte-identity impossible under independent signing events, so this anchor is absolute ground truth *for the byte-identical subset* of non-hand-signed signatures.
-We emphasize that this anchor is a *subset* of the true positive class---only those non-hand-signed signatures whose nearest match happens to be byte-identical---and perfect recall against this anchor therefore does not establish recall against the full non-hand-signed population (Section V-G discusses this further).
+Handwriting physics makes byte-identity impossible under independent signing events, so a byte-identical same-CPA pair is pair-level proof of image reuse and---for the byte-identical subset---conservative ground truth for non-hand-signed signatures; the narrow exception, in which a genuinely hand-signed exemplar was subsequently reused as the stamping or e-signature template, is discussed as a Limitation in Section V-G.
+We further emphasize that this anchor is a *subset* of the true positive class---only those non-hand-signed signatures whose nearest match happens to be byte-identical---and perfect recall against this anchor therefore does not establish recall against the full non-hand-signed population (Section V-G discusses this further).

 2. **Inter-CPA negative anchor (large gold negative):** $\sim$50,000 pairs of signatures randomly sampled from *different* CPAs.
 Inter-CPA pairs cannot arise from reuse of a single signer's stored signature image, so this population is a reliable negative class for threshold sweeps.
 This anchor is substantially larger than a simple low-similarity-same-CPA negative and yields tight Wilson 95% confidence intervals on FAR at each candidate threshold.

-3. **Firm A anchor (replication-dominated prior positive):** Firm A signatures, treated as a majority-positive reference whose left tail contains a minority of hand-signers, as directly evidenced by the 32/171 middle-band share in the accountant-level mixture (Section III-H).
+3. **Firm A anchor (replication-dominated prior positive):** Firm A signatures, treated as a majority-positive reference with within-firm heterogeneity in the left tail (consistent with a minority of hand-signers), as evidenced by the 32/171 middle-band share in the accountant-level mixture (Section III-H).
 Because Firm A is both used for empirical percentile calibration in Section III-H and as a validation anchor, we make the within-Firm-A sampling variance visible by splitting Firm A CPAs randomly (at the CPA level, not the signature level) into a 70% *calibration* fold and a 30% *heldout* fold.
 Median, 1st percentile, and 95th percentile of signature-level cosine/dHash distributions are derived from the calibration fold only.
 The heldout fold is used exclusively to report post-hoc capture rates with Wilson 95% confidence intervals.
@@ -69,7 +69,7 @@ The $N = 168{,}740$ count used in Table V and in the downstream same-CPA per-sig
 | Per-accountant dHash mean | 686 | 0.0277 | <0.001 | Multimodal |
 -->

-Firm A's per-signature cosine distribution is *unimodal* ($p = 0.17$), reflecting a single dominant generative mechanism (non-hand-signing) with a long left tail attributable to the minority of hand-signing Firm A partners identified in the accountant-level mixture (Section IV-E).
+Firm A's per-signature cosine distribution is *unimodal* ($p = 0.17$), reflecting a single dominant generative mechanism (non-hand-signing) with a long left tail attributable to within-firm heterogeneity---consistent with a minority of hand-signing Firm A partners---as identified in the accountant-level mixture (Section IV-E).
 The all-CPA cosine distribution, which mixes many firms with heterogeneous signing practices, is *multimodal* ($p < 0.001$).
 At the per-accountant aggregate level both cosine and dHash means are strongly multimodal, foreshadowing the mixture structure analyzed in Section IV-E.

@@ -184,7 +184,7 @@ We report three validation analyses corresponding to the anchors of Section III-

 ### 1) Pixel-Identity Positive Anchor with Inter-CPA Negative Anchor

-Of the 182,328 extracted signatures, 310 have a same-CPA nearest match that is byte-identical after crop and normalization (pixel-identical-to-closest = 1); these form the gold-positive anchor.
+Of the 182,328 extracted signatures, 310 have a same-CPA nearest match that is byte-identical after crop and normalization (pixel-identical-to-closest = 1); these form the byte-identity positive anchor---a pair-level proof of image reuse that serves as conservative ground truth for non-hand-signed signatures, subject to the source-template edge case discussed in Section V-G.
 As the gold-negative anchor we sample 50,000 random cross-CPA signature pairs (inter-CPA cosine: mean $= 0.762$, $P_{95} = 0.884$, $P_{99} = 0.913$, max $= 0.988$).
 Because the positive and negative anchor populations are constructed from different sampling units (byte-identical same-CPA pairs vs random inter-CPA pairs), their relative prevalence in the combined anchor set is arbitrary, and precision / $F_1$ / recall therefore have no meaningful population interpretation.
 We accordingly report FAR with Wilson 95% confidence intervals against the large inter-CPA negative anchor in Table X.
@@ -314,6 +314,7 @@ We test this prediction directly.
 For each auditor-year (CPA $\times$ fiscal year) with at least 5 signatures we compute the mean best-match cosine similarity across the year's signatures, yielding 4,629 auditor-years across 2013-2023.
 Firm A accounts for 1,287 of these (27.8% baseline share).
 Table XIV reports per-firm occupancy of the top $K\%$ of the ranked distribution.
+The per-signature best-match cosine underlying each auditor-year mean is taken over the full same-CPA pool (Section III-G) and may match against signatures from other fiscal years, so the auditor-year mean reflects the year's signatures' position within the CPA's full-sample similarity structure rather than purely within-year similarity; a within-year-restricted sensitivity replication is a natural robustness check and is left to future work.

 <!-- TABLE XIV: Top-K Similarity Rank Occupancy by Firm (pooled 2013-2023)
 | Top-K | k in bucket | Firm A | Firm B | Firm C | Firm D | Non-Big-4 | Firm A share |
@@ -342,7 +343,7 @@ Year-by-year (Table XV), the top-10% Firm A share ranges from 88.4% (2020) to 10
 | 2023 | 474 | 47 | 46 | 97.9% | 27.4% |
 -->

-This over-representation is a direct consequence of firm-wide non-hand-signing practice and is not derived from any threshold we subsequently calibrate.
+This over-representation is consistent with firm-wide non-hand-signing practice at Firm A and is not derived from any threshold we subsequently calibrate.
 It therefore constitutes genuine cross-firm evidence for Firm A's benchmark status.

 ### 3) Intra-Report Consistency