Paper A v3.2: partner v4 feedback integration (threshold-independent benchmark validation)

Partner v4 (signature_paper_draft_v4) proposed 3 substantive improvements; partner confirmed the 2013-2019 restriction was an error (sample stays 2013-2023). The remaining suggestions are adopted with our own data. ## New scripts - Script 22 (partner ranking): ranks all Big-4 auditor-years by mean max-cosine. Firm A occupies 95.9% of top-10% (base 27.8%), 3.5x concentration ratio. Stable across 2013-2023 (88-100% per year). - Script 23 (intra-report consistency): for each 2-signer report, classify both signatures and check agreement. Firm A agrees 89.9% vs 62-67% at other Big-4. 87.5% Firm A reports have BOTH signers non-hand-signed; only 4 reports (0.01%) both hand-signed. ## New methodology additions - III-G: explicit within-auditor-year no-mixing identification assumption (supported by Firm A interview evidence). - III-H: 4th Firm A validation line: threshold-independent evidence from partner ranking + intra-report consistency. ## New results section IV-H (threshold-independent validation) - IV-H.1: Firm A year-by-year cosine<0.95 rate. 2013-2019 mean=8.26%, 2020-2023 mean=6.96%, 2023 lowest (3.75%). Stability contradicts partner's hypothesis that 2020+ electronic systems increase heterogeneity -- data shows opposite (electronic systems more consistent than physical stamping). - IV-H.2: partner ranking top-K tables (pooled + year-by-year). - IV-H.3: intra-report consistency per-firm table. ## Renumbering - Section H (was Classification Results) -> I - Section I (was Ablation) -> J - Tables XIII-XVI new (yearly stability, top-K pooled, top-10% per-year, intra-report), XVII = classification (was XII), XVIII = ablation (was XIII). These threshold-independent analyses address the codex review concern about circular validation by providing benchmark evidence that does not depend on any threshold calibrated to Firm A itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:59:49 +08:00
parent 9d19ca5a31
commit 51d15b32a5
5 changed files with 677 additions and 7 deletions
@@ -120,6 +120,12 @@ For per-signature classification we compute, for each signature, the maximum pai
 The max/min (rather than mean) formulation reflects the identification logic for non-hand-signing: if even one other signature of the same CPA is a pixel-level reproduction, that pair will dominate the extremes and reveal the non-hand-signed mechanism.
 Mean statistics would dilute this signal.

+We also adopt an explicit *within-auditor-year no-mixing* identification assumption.
+Specifically, within any single fiscal year we treat a given CPA's signing mechanism as uniform: a CPA who reproduces one signature image in that year is assumed to do so for every report, and a CPA who hand-signs in that year is assumed to hand-sign every report in that year.
+Interview evidence from Firm A partners supports this assumption for their firm during the sample period.
+Under the assumption, per-auditor-year summary statistics are well defined and robust to outliers: if even one pair of same-CPA signatures in the year is near-identical, the max/min captures it.
+The intra-report consistency analysis in Section IV-H.3 provides an empirical check on the within-auditor-year assumption at the report level.
+
 For accountant-level analysis we additionally aggregate these per-signature statistics to the CPA level by computing the mean best-match cosine and the mean *independent minimum dHash* across all signatures of that CPA.
 The *independent minimum dHash* of a signature is defined as the minimum Hamming distance to *any* other signature of the same CPA (over the full same-CPA set), in contrast to the *cosine-conditional dHash* used as a diagnostic elsewhere, which is the dHash to the single signature selected as the cosine-nearest match.
 The independent minimum avoids conditioning on the cosine choice and is therefore the conservative structural-similarity statistic for each signature.
@@ -137,7 +143,12 @@ Crucially, the same interview evidence does *not* exclude the possibility that a
 Second, independent visual inspection of randomly sampled Firm A reports reveals pixel-identical signature images across different audit engagements and fiscal years for the majority of partners.

 Third, our own quantitative analysis is consistent with the above: 92.5% of Firm A's per-signature best-match cosine similarities exceed 0.95, consistent with non-hand-signing as the dominant mechanism, while the remaining 7.5% exhibit lower best-match values consistent with the minority of hand-signers identified in the interviews.
-We emphasize that this 92.5% figure is a within-sample consistency check rather than an independent validation of Firm A's status; the validation role is played by the interview and visual-inspection evidence enumerated above and by the held-out Firm A fold described in Section III-K.
+
+Fourth, we additionally validate the Firm A benchmark through two analyses that do not depend on any threshold we subsequently calibrate:
+  (a) *Partner-level similarity ranking (Section IV-H.2).*  When every Big-4 auditor-year is ranked globally by its per-auditor-year mean best-match cosine, Firm A auditor-years account for 95.9% of the top decile against a baseline share of 27.8% (a 3.5$\times$ concentration ratio), and this over-representation is stable across 2013-2023.
+  (b) *Intra-report consistency (Section IV-H.3).*  Because each Taiwanese statutory audit report is co-signed by two engagement partners, firmwide stamping practice predicts that both signers on a given Firm A report should receive the same signature-level label.  Firm A exhibits 89.9% intra-report agreement against 62-67% at the other Big-4 firms, consistent with firm-wide rather than partner-specific practice.
+
+We emphasize that the 92.5% figure is a within-sample consistency check rather than an independent validation of Firm A's status; the validation role is played by the interview and visual-inspection evidence, by the two threshold-independent analyses above, and by the held-out Firm A fold described in Section III-K.

 We emphasize that Firm A's replication-dominated status was *not* derived from the thresholds we calibrate against it.
 Its identification rests on domain knowledge and visual evidence that is independent of the statistical pipeline.