diff --git a/paper/paper_a_abstract_v3.md b/paper/paper_a_abstract_v3.md index b4fd5af..baaa449 100644 --- a/paper/paper_a_abstract_v3.md +++ b/paper/paper_a_abstract_v3.md @@ -2,6 +2,6 @@ -Regulations require Certified Public Accountants (CPAs) to attest to each audit report by affixing a signature. Digitization makes reusing a stored signature image across reports trivial---through administrative stamping or firm-level electronic signing---potentially undermining individualized attestation. Unlike forgery, *non-hand-signed* reproduction reuses the legitimate signer's own stored image, making it visually invisible to report users and infeasible to audit at scale manually. We present a pipeline integrating a Vision-Language Model for signature-page identification, YOLOv11 for signature detection, and ResNet-50 for feature extraction, followed by dual-descriptor verification combining cosine similarity and difference hashing. For threshold determination we apply two estimators---kernel-density antimode with a Hartigan unimodality test and an EM-fitted Beta mixture with a logit-Gaussian robustness check---plus a Burgstahler-Dichev/McCrary density-smoothness diagnostic, at the signature and accountant levels. Applied to 90,282 audit reports filed in Taiwan over 2013-2023 (182,328 signatures from 758 CPAs), the methods reveal a level asymmetry: signature-level similarity is a continuous quality spectrum that no two-component mixture separates, while accountant-level aggregates cluster into three smoothly-mixed groups with the antimode and two mixture estimators converging within $\sim$0.006 at cosine $\approx 0.975$. A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with visual inspection and accountant-level mixture evidence supporting majority non-hand-signing alongside residual within-firm heterogeneity; capture rates on both 70/30 calibration and held-out folds are reported with Wilson 95% intervals to make fold-level variance visible. Validation against 310 byte-identical positives and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 at all accountant-level thresholds. +Regulations require Certified Public Accountants (CPAs) to attest to each audit report by affixing a signature. Digitization makes reusing a stored signature image across reports trivial---through administrative stamping or firm-level electronic signing---potentially undermining individualized attestation. Unlike forgery, *non-hand-signed* reproduction reuses the legitimate signer's own stored image, making it visually invisible to report users and infeasible to audit at scale manually. We present a pipeline integrating a Vision-Language Model for signature-page identification, YOLOv11 for signature detection, and ResNet-50 for feature extraction, followed by dual-descriptor verification combining cosine similarity and difference hashing. For threshold determination we apply two estimators---kernel-density antimode with a Hartigan unimodality test and an EM-fitted Beta mixture with a logit-Gaussian robustness check---plus a Burgstahler-Dichev/McCrary density-smoothness diagnostic, at the signature and accountant levels. Applied to 90,282 audit reports filed in Taiwan over 2013-2023 (182,328 signatures from 758 CPAs), the methods reveal a level asymmetry: signature-level similarity is a continuous quality spectrum that no two-component mixture separates, while accountant-level aggregates cluster into three smoothly-mixed groups with the antimode and two mixture estimators converging within $\sim$0.006 at cosine $\approx 0.975$. A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with byte-level pixel-identity evidence (145 signatures across 50 partners) and accountant-level mixture evidence supporting majority non-hand-signing alongside residual within-firm heterogeneity; capture rates on both 70/30 calibration and held-out folds are reported with Wilson 95% intervals to make fold-level variance visible. Validation against 310 byte-identical positives and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 at all accountant-level thresholds. diff --git a/paper/paper_a_conclusion_v3.md b/paper/paper_a_conclusion_v3.md index 2becaaa..9da4895 100644 --- a/paper/paper_a_conclusion_v3.md +++ b/paper/paper_a_conclusion_v3.md @@ -18,7 +18,7 @@ The substantive reading is therefore narrower than "discrete behavior": *pixel-l Fourth, we introduced a *replication-dominated* calibration methodology---explicitly distinguishing replication-dominated from replication-pure calibration anchors and validating classification against a byte-level pixel-identity anchor (310 byte-identical signatures) paired with a $\sim$50,000-pair inter-CPA negative anchor. To document the within-firm sampling variance of using the calibration firm as its own validation reference, we split the firm's CPAs 70/30 at the CPA level and report capture rates on both folds with Wilson 95% confidence intervals; extreme rules agree across folds while rules in the operational 85-95% capture band differ by 1-5 percentage points, reflecting within-firm heterogeneity in replication intensity rather than generalization failure. -This framing is internally consistent with all available evidence: the visual-inspection observation of pixel-identical signatures across unrelated audit engagements for many of the sampled calibration-firm partners; the 92.5% / 7.5% split in signature-level cosine thresholds; and, among the 171 calibration-firm CPAs with enough signatures to enter the accountant-level GMM (of 180 registered CPAs; 178 after excluding two with disambiguation ties, Section IV-G.2), the 139 / 32 split between the high-replication and middle-band clusters. +This framing is internally consistent with all available evidence: the byte-level pair analysis finding of 145 pixel-identical calibration-firm signatures across 50 distinct partners of 180 registered (Section IV-G.1); the 92.5% / 7.5% split in signature-level cosine thresholds; and, among the 171 calibration-firm CPAs with enough signatures to enter the accountant-level GMM (of 180 registered CPAs; 178 after excluding two with disambiguation ties, Section IV-G.2), the 139 / 32 split between the high-replication and middle-band clusters. An ablation study comparing ResNet-50, VGG-16 and EfficientNet-B0 confirmed that ResNet-50 offers the best balance of discriminative power, classification stability, and computational efficiency for this task. diff --git a/paper/paper_a_discussion_v3.md b/paper/paper_a_discussion_v3.md index 4699313..41f2c77 100644 --- a/paper/paper_a_discussion_v3.md +++ b/paper/paper_a_discussion_v3.md @@ -38,7 +38,7 @@ A recurring theme in prior work that treats Firm A or an analogous reference gro Our evidence across multiple analyses rules out that assumption for Firm A while affirming its utility as a calibration reference. Three convergent strands of evidence support the replication-dominated framing. -First, the visual-inspection evidence: randomly sampled Firm A reports exhibit pixel-identical signature images across different audit engagements and fiscal years for many of the sampled partners---a physical impossibility under independent hand-signing events. +First, the byte-level pair evidence: 145 Firm A signatures (from 50 distinct partners of 180 registered) have a byte-identical same-CPA match in a different audit report, with 35 of these matches spanning different fiscal years. Independent hand-signing cannot produce byte-identical images across distinct reports, so these pairs directly establish image reuse within Firm A. Second, the signature-level statistical evidence: Firm A's per-signature cosine distribution is unimodal long-tail rather than a tight single peak; 92.5% of Firm A signatures exceed cosine 0.95, with the remaining 7.5% forming the left tail. Third, the accountant-level evidence: of the 171 Firm A CPAs with enough signatures ($\geq 10$) to enter the accountant-level GMM, 32 (19%) fall into the middle-band C2 cluster rather than the high-replication C1 cluster---consistent with within-firm heterogeneity in signing output (potentially spanning hand-signing partners, multi-template replication workflows, CPAs undergoing mid-sample mechanism transitions, and CPAs whose pooled coordinates reflect mixed-quality replication; we do not disaggregate these mechanisms---see Section III-G for the scope of claims) rather than a pure replication population. Of the 178 valid Firm A CPAs (the 180 registered CPAs minus two excluded for disambiguation ties in the registry; Section IV-G.2), seven are outside the GMM for having fewer than 10 signatures, so we cannot place them in a cluster from the cross-sectional analysis alone. diff --git a/paper/paper_a_introduction_v3.md b/paper/paper_a_introduction_v3.md index 63abff2..3771637 100644 --- a/paper/paper_a_introduction_v3.md +++ b/paper/paper_a_introduction_v3.md @@ -52,7 +52,7 @@ A second distinctive feature is our framing of the calibration reference. One major Big-4 accounting firm in Taiwan (hereafter "Firm A") is widely recognized within the audit profession as making substantial use of non-hand-signing for the majority of its certifying partners, while not ruling out that a minority may continue to hand-sign some reports. We therefore treat Firm A as a *replication-dominated* calibration reference rather than a pure positive class. This framing is important because the statistical signature of a replication-dominated population is visible in our data: Firm A's per-signature cosine distribution is unimodal with a long left tail, 92.5% of Firm A signatures exceed cosine 0.95 but 7.5% fall below, and 32 of the 171 Firm A CPAs with enough signatures to enter our accountant-level analysis (of 180 Firm A CPAs in the registry; 178 after excluding two with disambiguation ties, see Section IV-G.2) cluster into an accountant-level "middle band" rather than the high-replication mode. -Adopting the replication-dominated framing---rather than a near-universal framing that would have to absorb these residuals as noise---ensures internal coherence among the visual-inspection evidence, the signature-level statistics, and the accountant-level mixture. +Adopting the replication-dominated framing---rather than a near-universal framing that would have to absorb these residuals as noise---ensures internal coherence among the byte-level pixel-identity evidence, the signature-level statistics, and the accountant-level mixture. A third distinctive feature is our unit-of-analysis treatment. Our threshold-framework analysis reveals an informative asymmetry between the signature level and the accountant level: per-signature similarity forms a continuous quality spectrum for which no two-mechanism mixture provides a good fit, whereas per-accountant aggregates are clustered into three recognizable groups (BIC-best $K = 3$). diff --git a/paper/paper_a_methodology_v3.md b/paper/paper_a_methodology_v3.md index 7c0ac02..fcc34ad 100644 --- a/paper/paper_a_methodology_v3.md +++ b/paper/paper_a_methodology_v3.md @@ -149,7 +149,8 @@ We use this only as background context for why Firm A is a plausible calibration We establish Firm A's replication-dominated status through three primary independent quantitative analyses plus a fourth strand comprising three complementary checks, each of which can be reproduced from the public audit-report corpus alone: -First, *independent visual inspection* of randomly sampled Firm A reports reveals pixel-identical signature images across different audit engagements and fiscal years for many of the sampled partners---a physical impossibility under independent hand-signing events. +First, *automated byte-level pair analysis* (Section IV-G.1) identifies 145 Firm A signatures that are byte-identical to at least one other same-CPA signature from a different audit report, distributed across 50 distinct Firm A partners (of 180 registered); 35 of these byte-identical matches span different fiscal years. +Byte-identity implies pixel-identity by construction, and independent hand-signing cannot produce pixel-identical images across distinct reports---these pairs therefore establish image reuse as a concrete, threshold-free phenomenon within Firm A. Second, *whole-sample signature-level rates*: 92.5% of Firm A's per-signature best-match cosine similarities exceed 0.95, consistent with non-hand-signing as the dominant mechanism, while the remaining 7.5% form a long left tail reflecting within-firm heterogeneity in signing output (we do not disaggregate partner-level mechanism here; see Section III-G for the scope of claims). @@ -160,7 +161,7 @@ Fourth, we additionally validate the Firm A benchmark through three complementar (b) *Partner-level similarity ranking (Section IV-H.2).* When every auditor-year is ranked globally by its per-auditor-year mean best-match cosine (across all firms: Big-4 and Non-Big-4), Firm A auditor-years account for 95.9% of the top decile against a baseline share of 27.8% (a 3.5$\times$ concentration ratio), and this over-representation is stable across 2013-2023. This analysis uses only the ordinal ranking and is independent of any absolute cutoff. (c) *Intra-report consistency (Section IV-H.3).* Because each Taiwanese statutory audit report is co-signed by two engagement partners, firm-wide stamping practice predicts that both signers on a given Firm A report should receive the same signature-level label under the classifier. Firm A exhibits 89.9% intra-report agreement against 62-67% at the other Big-4 firms. This test uses the operational classifier and is therefore a *consistency* check on the classifier's firm-level output rather than a threshold-free test; the cross-firm gap (not the absolute rate) is the substantive finding. -We emphasize that the 92.5% figure is a within-sample consistency check rather than an independent validation of Firm A's status; the validation role is played by the visual inspection, the accountant-level mixture, the three complementary analyses above, and the held-out Firm A fold (which confirms the qualitative replication-dominated framing; fold-level rate differences are disclosed in Section IV-G.2) described in Section III-K. +We emphasize that the 92.5% figure is a within-sample consistency check rather than an independent validation of Firm A's status; the validation role is played by the byte-level pixel-identity evidence, the accountant-level mixture, the three complementary analyses above, and the held-out Firm A fold (which confirms the qualitative replication-dominated framing; fold-level rate differences are disclosed in Section IV-G.2) described in Section III-K. We emphasize that Firm A's replication-dominated status was *not* derived from the thresholds we calibrate against it. Its identification rests on visual evidence and accountant-level clustering that is independent of the statistical pipeline. @@ -256,7 +257,6 @@ From these anchors we report FAR with Wilson 95% confidence intervals against th We do not report an Equal Error Rate or FRR column against the byte-identical positive anchor, because byte-identical pairs have cosine $\approx 1$ by construction and any FRR computed against that subset is trivially $0$ at every threshold below $1$; the conservative-subset role of the byte-identical anchor is instead discussed qualitatively in Section V-F. Precision and $F_1$ are not meaningful in this anchor-based evaluation because the positive and negative anchors are constructed from different sampling units (intra-CPA byte-identical pairs vs random inter-CPA pairs), so their relative prevalence in the combined set is an arbitrary construction rather than a population parameter; we therefore omit precision and $F_1$ from Table X. The 70/30 held-out Firm A fold of Section IV-G.2 additionally reports capture rates with Wilson 95% confidence intervals computed within the held-out fold, which is a valid population for rate inference. -We additionally draw a small stratified sample (30 signatures across high-confidence replication, borderline, style-only, pixel-identical, and likely-genuine strata) for manual visual sanity inspection; this sample is used only for spot-check and does not contribute to reported metrics. ## L. Per-Document Classification diff --git a/paper/paper_a_results_v3.md b/paper/paper_a_results_v3.md index e555a2c..908229c 100644 --- a/paper/paper_a_results_v3.md +++ b/paper/paper_a_results_v3.md @@ -268,10 +268,6 @@ The High-confidence non-hand-signed share grows from 45.62% to 46.98%. We interpret this sensitivity pattern as indicating that the classifier's aggregate and high-confidence output is robust to the choice of operational cut within the accountant-level convergence band, and that the movement is concentrated at the Uncertain/Moderate-confidence boundary. The paper therefore retains cos $> 0.95$ as the primary operational cut for transparency and reports the 0.945 results as a sensitivity check rather than as a deployed alternative; a future deployment requiring tighter accountant-level alignment could substitute cos $> 0.945$ without altering the substantive firm-level conclusions. -### 4) Sanity Sample - -A 30-signature stratified visual sanity sample (six signatures each from pixel-identical, high-cos/low-dh, borderline, style-only, and likely-genuine strata) yielded full human--classifier agreement (30/30); this sample contributed only to spot-check and is not used to compute reported metrics. - ## H. Additional Firm A Benchmark Validation The capture rates of Section IV-F are a within-sample consistency check: they evaluate how well a threshold captures Firm A, but the thresholds themselves are anchored to Firm A's percentiles.