From d3b63fc0b7e31586343885c4417b21bdf07e03e0 Mon Sep 17 00:00:00 2001 From: gbanyan Date: Fri, 24 Apr 2026 22:06:22 +0800 Subject: [PATCH] Paper A v3.14: remove A2 assumption + soften all partner-level claims The within-auditor-year uniformity assumption (A2) introduced in v3.11 Section III-G was empirically tested via a new within-year uniformity check (signature_analysis/27_within_year_uniformity.py; output in reports/within_year_uniformity/). The check found that within-year pairwise cosine distributions even at the calibration firm show substantial heterogeneity inconsistent with strict single-mechanism uniformity (Firm A 2023 CPAs typically have median pairwise cosine around 0.85 with 20-70% of pairs below the all-pairs KDE crossover 0.837). A2 as stated ("a CPA who replicates any signature image in that year is treated as doing so for every report") is therefore falsified empirically. Three explanations are compatible with the data and cannot be disambiguated without manual inspection: (i) true within-year mechanism mixing, (ii) multi-template replication workflows at the same firm within a year, (iii) feature-extraction noise on repeatedly scanned stamped images. Since A2 is falsified and its implications cannot be restored under any of the three explanations, we remove A2 entirely rather than downgrading it to an "approximation" or "interpretive convention." Changes applied: 1. Methodology Section III-G: A2 block deleted. Section now has only A1 (pair-detectability, cross-year pair-existence). Replaced A2 with an explicit statement that we make no within-year or across-year uniformity assumption, that per-signature labels are signature-level quantities throughout, and that we abstain from partner-level frequency inferences. Three candidate explanations for within-year signature heterogeneity are listed (single-template replication, multi-template replication in parallel, within-year mixing, or combinations) without attempting disaggregation. 2. Methodology III-H strand 2 (L154) softened: "7.5% form a long left tail consistent with a minority of hand-signers" rewritten as reflecting "within-firm heterogeneity in signing output (we do not disaggregate partner-level mechanism here; see Section III-G)." 3. Methodology III-H visual-inspection strand (L152) and the corresponding Discussion V-C first strand (L41) and Conclusion L21 softened: "for the majority of partners" changed to "for many of the sampled partners" (Codex round-14 MAJOR: "majority of partners" is itself a partner-level frequency claim under the new scope-of- claims regime). 4. Methodology III-K.3 Firm A anchor (L247): dropped "(consistent with a minority of hand-signers)" parenthetical. 5. Results IV-D cosine distribution narrative (L72): softened to "within-firm heterogeneity in signing outputs (see Section IV-E and Section III-G for the scope of partner-level claims)." 6. Results IV-E cluster split framing (L128): "minority-hand-signers framing of Section III-H" renamed to "within-firm heterogeneity framing of Section III-H" (matches the new III-H text). 7. Results IV-H.1 partner-level reading (L286): removed entirely. The v3.13 text "Under the within-year label-uniformity convention A2, this left-tail share is read as a partner-level minority of hand-signing CPAs" is replaced by a signature-level statement that explicitly lists hand-signing partners, multi-template replication, or a combination as possibilities without attempting attribution. 8. Results IV-H.1 stability argument (L308): softened from "persistent minority of hand-signing Firm A partners" to "persistent within- firm heterogeneity component," preserving the substantive argument that stability across production technologies is inconsistent with a noise-only explanation. 9. Results IV-I Firm A Capture Profile (L407): rewrote the "Firm A's minority hand-signers have not been captured" phrasing as a signature-level framing about the 7.5% left tail not projecting into the lowest-cosine document-level category under the dual- descriptor rules. 10. Abstract (L5): softened "alongside within-firm heterogeneity consistent with a minority of hand-signers" to "alongside residual within-firm heterogeneity." Abstract at 244/250 words. 11. Discussion V-C third strand (L43): added "multi-template replication workflows" to the list of possibilities and added a local "we do not disaggregate these mechanisms; see Section III-G for the scope of claims" disclaimer (Codex round-14 MINOR 5). 12. Discussion Limitations: added an Eighth limitation explicitly stating that partner-level frequency inferences are not made and why (no within-year uniformity assumption is adopted). 13. Methodology L124 opening: "We make one stipulation about within- auditor-year structure" fixed to "same-CPA pair detectability," since A1 is a cross-year pair-existence property, not a within- year claim (Codex round-14 MINOR 3). 14. Two broken cross-references fixed (Codex round-14 MINOR 6): methodology L86 Section V-D -> V-G (Limitations is G, not D which is Style-Replication Gap); methodology L167 Section III-I -> Section IV-D (the empirical cosine distribution is in IV-D, not III-I). Script 27 and its output (reports/within_year_uniformity/*) remain in the repository as internal due-diligence evidence but are not cited from the paper. The paper's substantive claims at signature- level and accountant (cross-year pooled) level are unchanged; only the partner-level interpretive overlay is removed. All tables (IV-XVIII), Appendix A (BD/McCrary sensitivity), and all reported numbers are unchanged. Codex round-14 (gpt-5.5 xhigh) verification: Major Revision caused by one BLOCKER (stale DOCX artifact, not part of this commit) plus one MAJOR ("majority of partners" partner-frequency claim) plus four MINOR findings. All five markdown findings addressed in this commit. DOCX regeneration deferred to pre-submission packaging. Co-Authored-By: Claude Opus 4.7 (1M context) --- paper/paper_a_abstract_v3.md | 2 +- paper/paper_a_conclusion_v3.md | 2 +- paper/paper_a_discussion_v3.md | 8 ++++++-- paper/paper_a_methodology_v3.md | 28 +++++++++++----------------- paper/paper_a_results_v3.md | 11 ++++++----- 5 files changed, 25 insertions(+), 26 deletions(-) diff --git a/paper/paper_a_abstract_v3.md b/paper/paper_a_abstract_v3.md index 8aae155..b4fd5af 100644 --- a/paper/paper_a_abstract_v3.md +++ b/paper/paper_a_abstract_v3.md @@ -2,6 +2,6 @@ -Regulations require Certified Public Accountants (CPAs) to attest to each audit report by affixing a signature. Digitization makes reusing a stored signature image across reports trivial---through administrative stamping or firm-level electronic signing---potentially undermining individualized attestation. Unlike forgery, *non-hand-signed* reproduction reuses the legitimate signer's own stored image, making it visually invisible to report users and infeasible to audit at scale manually. We present a pipeline integrating a Vision-Language Model for signature-page identification, YOLOv11 for signature detection, and ResNet-50 for feature extraction, followed by dual-descriptor verification combining cosine similarity and difference hashing. For threshold determination we apply two estimators---kernel-density antimode with a Hartigan unimodality test and an EM-fitted Beta mixture with a logit-Gaussian robustness check---plus a Burgstahler-Dichev/McCrary density-smoothness diagnostic, at the signature and accountant levels. Applied to 90,282 audit reports filed in Taiwan over 2013-2023 (182,328 signatures from 758 CPAs), the methods reveal a level asymmetry: signature-level similarity is a continuous quality spectrum that no two-component mixture separates, while accountant-level aggregates cluster into three smoothly-mixed groups with the antimode and two mixture estimators converging within $\sim$0.006 at cosine $\approx 0.975$. A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with visual inspection and accountant-level mixture evidence supporting majority non-hand-signing alongside within-firm heterogeneity consistent with a minority of hand-signers; capture rates on both 70/30 calibration and held-out folds are reported with Wilson 95% intervals to make fold-level variance visible. Validation against 310 byte-identical positives and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 at all accountant-level thresholds. +Regulations require Certified Public Accountants (CPAs) to attest to each audit report by affixing a signature. Digitization makes reusing a stored signature image across reports trivial---through administrative stamping or firm-level electronic signing---potentially undermining individualized attestation. Unlike forgery, *non-hand-signed* reproduction reuses the legitimate signer's own stored image, making it visually invisible to report users and infeasible to audit at scale manually. We present a pipeline integrating a Vision-Language Model for signature-page identification, YOLOv11 for signature detection, and ResNet-50 for feature extraction, followed by dual-descriptor verification combining cosine similarity and difference hashing. For threshold determination we apply two estimators---kernel-density antimode with a Hartigan unimodality test and an EM-fitted Beta mixture with a logit-Gaussian robustness check---plus a Burgstahler-Dichev/McCrary density-smoothness diagnostic, at the signature and accountant levels. Applied to 90,282 audit reports filed in Taiwan over 2013-2023 (182,328 signatures from 758 CPAs), the methods reveal a level asymmetry: signature-level similarity is a continuous quality spectrum that no two-component mixture separates, while accountant-level aggregates cluster into three smoothly-mixed groups with the antimode and two mixture estimators converging within $\sim$0.006 at cosine $\approx 0.975$. A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with visual inspection and accountant-level mixture evidence supporting majority non-hand-signing alongside residual within-firm heterogeneity; capture rates on both 70/30 calibration and held-out folds are reported with Wilson 95% intervals to make fold-level variance visible. Validation against 310 byte-identical positives and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 at all accountant-level thresholds. diff --git a/paper/paper_a_conclusion_v3.md b/paper/paper_a_conclusion_v3.md index 13334fb..2becaaa 100644 --- a/paper/paper_a_conclusion_v3.md +++ b/paper/paper_a_conclusion_v3.md @@ -18,7 +18,7 @@ The substantive reading is therefore narrower than "discrete behavior": *pixel-l Fourth, we introduced a *replication-dominated* calibration methodology---explicitly distinguishing replication-dominated from replication-pure calibration anchors and validating classification against a byte-level pixel-identity anchor (310 byte-identical signatures) paired with a $\sim$50,000-pair inter-CPA negative anchor. To document the within-firm sampling variance of using the calibration firm as its own validation reference, we split the firm's CPAs 70/30 at the CPA level and report capture rates on both folds with Wilson 95% confidence intervals; extreme rules agree across folds while rules in the operational 85-95% capture band differ by 1-5 percentage points, reflecting within-firm heterogeneity in replication intensity rather than generalization failure. -This framing is internally consistent with all available evidence: the visual-inspection observation of pixel-identical signatures across unrelated audit engagements for the majority of calibration-firm partners; the 92.5% / 7.5% split in signature-level cosine thresholds; and, among the 171 calibration-firm CPAs with enough signatures to enter the accountant-level GMM (of 180 registered CPAs; 178 after excluding two with disambiguation ties, Section IV-G.2), the 139 / 32 split between the high-replication and middle-band clusters. +This framing is internally consistent with all available evidence: the visual-inspection observation of pixel-identical signatures across unrelated audit engagements for many of the sampled calibration-firm partners; the 92.5% / 7.5% split in signature-level cosine thresholds; and, among the 171 calibration-firm CPAs with enough signatures to enter the accountant-level GMM (of 180 registered CPAs; 178 after excluding two with disambiguation ties, Section IV-G.2), the 139 / 32 split between the high-replication and middle-band clusters. An ablation study comparing ResNet-50, VGG-16 and EfficientNet-B0 confirmed that ResNet-50 offers the best balance of discriminative power, classification stability, and computational efficiency for this task. diff --git a/paper/paper_a_discussion_v3.md b/paper/paper_a_discussion_v3.md index 555f3ab..4699313 100644 --- a/paper/paper_a_discussion_v3.md +++ b/paper/paper_a_discussion_v3.md @@ -38,9 +38,9 @@ A recurring theme in prior work that treats Firm A or an analogous reference gro Our evidence across multiple analyses rules out that assumption for Firm A while affirming its utility as a calibration reference. Three convergent strands of evidence support the replication-dominated framing. -First, the visual-inspection evidence: randomly sampled Firm A reports exhibit pixel-identical signature images across different audit engagements and fiscal years for the majority of partners---a physical impossibility under independent hand-signing events. +First, the visual-inspection evidence: randomly sampled Firm A reports exhibit pixel-identical signature images across different audit engagements and fiscal years for many of the sampled partners---a physical impossibility under independent hand-signing events. Second, the signature-level statistical evidence: Firm A's per-signature cosine distribution is unimodal long-tail rather than a tight single peak; 92.5% of Firm A signatures exceed cosine 0.95, with the remaining 7.5% forming the left tail. -Third, the accountant-level evidence: of the 171 Firm A CPAs with enough signatures ($\geq 10$) to enter the accountant-level GMM, 32 (19%) fall into the middle-band C2 cluster rather than the high-replication C1 cluster---consistent with within-firm heterogeneity in signing practice (spanning a minority of hand-signers, CPAs undergoing mid-sample mechanism transitions, and CPAs whose pooled coordinates reflect mixed-quality replication) rather than a pure replication population. +Third, the accountant-level evidence: of the 171 Firm A CPAs with enough signatures ($\geq 10$) to enter the accountant-level GMM, 32 (19%) fall into the middle-band C2 cluster rather than the high-replication C1 cluster---consistent with within-firm heterogeneity in signing output (potentially spanning hand-signing partners, multi-template replication workflows, CPAs undergoing mid-sample mechanism transitions, and CPAs whose pooled coordinates reflect mixed-quality replication; we do not disaggregate these mechanisms---see Section III-G for the scope of claims) rather than a pure replication population. Of the 178 valid Firm A CPAs (the 180 registered CPAs minus two excluded for disambiguation ties in the registry; Section IV-G.2), seven are outside the GMM for having fewer than 10 signatures, so we cannot place them in a cluster from the cross-sectional analysis alone. The held-out Firm A 70/30 validation (Section IV-G.2) gives capture rates on a non-calibration Firm A subset that sit in the same replication-dominated regime as the calibration fold across the full range of operating rules (extreme rules are statistically indistinguishable; operational rules in the 85–95% band differ between folds by 1–5 percentage points, reflecting within-Firm-A heterogeneity in replication intensity rather than a generalization failure). The accountant-level GMM (Section IV-E) and the threshold-independent partner-ranking analysis (Section IV-H.2) are the cross-checks that are robust to fold-level sampling variance. @@ -111,5 +111,9 @@ Seventh, the max/min detection logic treats both ends of a near-identical same-C In the rare case that one of the two documents contains a genuinely hand-signed exemplar that was subsequently reused as the stamping or e-signature template, the pair correctly identifies image reuse but misattributes the non-hand-signed status to the source exemplar. This misattribution affects at most one source document per template variant per CPA (the exemplar from which the template was produced), is not expected to be common given that stored signature templates are typically generated in a separate acquisition step rather than extracted from submitted audit reports, and does not materially affect aggregate capture rates at the firm level. +Eighth, our analyses remain at the signature level and the accountant (cross-year pooled) level; we abstain from partner-level frequency inferences such as "X% of CPAs hand-sign in a given year." +Per-signature labels in this paper are not translated to per-report or per-partner mechanism assignments, because making such a translation would require an assumption of within-year uniformity of signing mechanisms that we do not adopt: a CPA's signatures within a single fiscal year may reflect a single replication template, multiple templates used in parallel (e.g., for different engagement positions or reporting pipelines), within-year mechanism mixing, or a combination, and the data at hand do not disambiguate these possibilities (Section III-G). +The signature-level rates we report, including the 92.5% / 7.5% Firm A split and the year-by-year left-tail share of Section IV-H.1, should accordingly be read as signature-level quantities rather than partner-level frequencies. + Finally, the legal and regulatory implications of our findings depend on jurisdictional definitions of "signature" and "signing." Whether non-hand-signing of a CPA's own stored signature constitutes a violation of signing requirements is a legal question that our technical analysis can inform but cannot resolve. diff --git a/paper/paper_a_methodology_v3.md b/paper/paper_a_methodology_v3.md index 3e1c9e9..e4938b8 100644 --- a/paper/paper_a_methodology_v3.md +++ b/paper/paper_a_methodology_v3.md @@ -83,7 +83,7 @@ The final classification layer was removed, yielding the 2048-dimensional output Preprocessing consisted of resizing to 224×224 pixels with aspect-ratio preservation and white padding, followed by ImageNet channel normalization. All feature vectors were L2-normalized, ensuring that cosine similarity equals the dot product. -The choice of ResNet-50 without fine-tuning was motivated by three considerations: (1) the task is similarity comparison rather than classification, making general-purpose discriminative features sufficient; (2) ImageNet features have been shown to transfer effectively to document analysis tasks [20], [21]; and (3) avoiding domain-specific fine-tuning reduces the risk of overfitting to dataset-specific artifacts, though we note that a fine-tuned model could potentially improve discriminative performance (see Section V-D). +The choice of ResNet-50 without fine-tuning was motivated by three considerations: (1) the task is similarity comparison rather than classification, making general-purpose discriminative features sufficient; (2) ImageNet features have been shown to transfer effectively to document analysis tasks [20], [21]; and (3) avoiding domain-specific fine-tuning reduces the risk of overfitting to dataset-specific artifacts, though we note that a fine-tuned model could potentially improve discriminative performance (see Section V-G). This design choice is validated by an ablation study (Section IV-J) comparing ResNet-50 against VGG-16 and EfficientNet-B0. ## F. Dual-Method Similarity Descriptors @@ -121,24 +121,18 @@ For per-signature classification we compute, for each signature, the maximum pai The max/min (rather than mean) formulation reflects the identification logic for non-hand-signing: if even one other signature of the same CPA is a pixel-level reproduction, that pair will dominate the extremes and reveal the non-hand-signed mechanism. Mean statistics would dilute this signal. -We distinguish two stipulations by the role each plays, in order to avoid overstating the paper's reliance on them. +We make one stipulation about same-CPA pair detectability. **(A1) Pair-detectability** is a statistical assumption scoped to the same-CPA pool (pooled across fiscal years, matching the max/min computation above): if a CPA uses image replication anywhere in the corpus, at least one pair of same-CPA signatures is near-identical after reproduction noise, so that max cosine / min dHash detects the replication. This is plausible for high-volume stamping or firm-level electronic-signing workflows---where a stored image is typically reused many times under similar scan and compression conditions---but is not guaranteed in sparse CPA-corpora with only one observed replicated report, when multiple template variants are in use, or when scan-stage noise pushes a replicated pair outside the detection regime. A1 is what the per-signature detector requires to be sensitive to replication; it is a cross-year pair-existence property, not a within-year uniformity claim. -**(A2) Within-year label uniformity** is an interpretive convention used when a signature-level label is *read as* "this CPA's signing mechanism for that fiscal year": within any single fiscal year we treat the CPA's mechanism as uniform, i.e., a CPA who replicates any signature image in that year is treated as doing so for every report in that year, and a hand-signer is treated as hand-signing every report in that year. -A2 is consistent with industry practice at Firm A during the sample period, but may weaken at other Big-4 firms during the 2019--2021 digitalization-transition years, in which a CPA's mechanism could in principle shift mid-year as firm-level electronic-signing systems were rolled out. -We therefore read A2 as a domain-motivated default rather than a universally validated empirical claim. -The arithmetic statistics reported in this paper do not require A2 for their definition or computation: the per-signature classifier (Section III-L) operates at signature level, the accountant-level mixture (Section III-J) uses mean statistics over the full same-CPA pool, and the partner-level ranking (Section IV-H.2) uses a per-auditor-year mean---none of which require within-year uniformity to be well-defined. -A2 does, however, underwrite certain *interpretive* readings---most notably, the framing in Section IV-H.1 of Firm A's yearly left-tail share as a partner-level "minority of hand-signers" rather than a bare signature-level rate---and the downstream use of per-signature or per-auditor-year labels as regime labels for auditor-behavior research. +We make *no* within-year or across-year uniformity assumption about CPA signing mechanisms. +Per-signature labels are signature-level quantities throughout this paper; we do not translate them to per-report or per-partner mechanism assignments, and we abstain from partner-level frequency inferences (such as "X% of CPAs hand-sign") that would require such a translation. +A CPA's signing output within a single fiscal year may reflect a single replication template, multiple templates used in parallel (e.g., different stored images for different engagement positions or reporting pipelines), within-year mechanism mixing, or a combination; our signature-level analyses remain valid under all of these regimes, since they do not attempt mechanism attribution at the partner or report level. +The accountant-level summary statistics of Section III-J are likewise cross-year pooled quantities by construction, and may blend distinct signing-mechanism regimes when a CPA's practice changes over the sample period; we treat this as a design choice, not an identification assumption, and the accountant-level aggregates are to be read as characterizing each CPA's pooled observed tendency over the full sample period rather than a single time-invariant regime. -We explicitly *do not* assume across-year homogeneity. -A CPA's mechanism may change across fiscal years---the 2019--2021 Big-4 digitalization trends documented in Section IV-H are consistent with such changes---and accountant-level summary statistics (Section III-J) therefore represent a cross-year pooled summary that may blend multiple regimes for the same CPA. -We treat this as a design choice: the accountant-level aggregates characterize each CPA's overall distribution over the full sample period, not a single time-invariant regime. - -The intra-report consistency analysis in Section IV-H.3 is a related but distinct check: it tests whether the *two co-signing CPAs on the same report* receive the same signature-level label (firm-level signing-practice homogeneity) rather than testing A2 at the same-CPA level. -A direct empirical check of A2 would require labeling multiple reports of the same CPA in the same year and is left to future work; as noted above, no reported statistic relies on A2, and A2's interpretive scope is further bounded by the worst-case aggregation rule of Section III-L. +The intra-report consistency analysis in Section IV-H.3 is a firm-level homogeneity check---whether the *two co-signing CPAs on the same report* receive the same signature-level label under the operational classifier---rather than a test of within-partner or within-year uniformity. For accountant-level analysis we additionally aggregate these per-signature statistics to the CPA level by computing the mean best-match cosine and the mean *independent minimum dHash* across all signatures of that CPA. The *independent minimum dHash* of a signature is defined as the minimum Hamming distance to *any* other signature of the same CPA (over the full same-CPA set). @@ -155,9 +149,9 @@ We use this only as background context for why Firm A is a plausible calibration We establish Firm A's replication-dominated status through three primary independent quantitative analyses plus a fourth strand comprising three complementary checks, each of which can be reproduced from the public audit-report corpus alone: -First, *independent visual inspection* of randomly sampled Firm A reports reveals pixel-identical signature images across different audit engagements and fiscal years for the majority of partners---a physical impossibility under independent hand-signing events. +First, *independent visual inspection* of randomly sampled Firm A reports reveals pixel-identical signature images across different audit engagements and fiscal years for many of the sampled partners---a physical impossibility under independent hand-signing events. -Second, *whole-sample signature-level rates*: 92.5% of Firm A's per-signature best-match cosine similarities exceed 0.95, consistent with non-hand-signing as the dominant mechanism, while the remaining 7.5% form a long left tail consistent with a minority of hand-signers. +Second, *whole-sample signature-level rates*: 92.5% of Firm A's per-signature best-match cosine similarities exceed 0.95, consistent with non-hand-signing as the dominant mechanism, while the remaining 7.5% form a long left tail reflecting within-firm heterogeneity in signing output (we do not disaggregate partner-level mechanism here; see Section III-G for the scope of claims). Third, *accountant-level mixture analysis* (Section IV-E): a BIC-selected three-component Gaussian mixture over per-accountant mean cosine and mean dHash places 139 of the 171 Firm A CPAs (with $\geq 10$ signatures) in the high-replication C1 cluster and 32 in the middle-band C2 cluster, directly quantifying the within-firm heterogeneity. @@ -170,7 +164,7 @@ We emphasize that the 92.5% figure is a within-sample consistency check rather t We emphasize that Firm A's replication-dominated status was *not* derived from the thresholds we calibrate against it. Its identification rests on visual evidence and accountant-level clustering that is independent of the statistical pipeline. -The "replication-dominated, not pure" framing is important both for internal consistency---it predicts and explains the long left tail observed in Firm A's cosine distribution (Section III-I below)---and for avoiding overclaim in downstream inference. +The "replication-dominated, not pure" framing is important both for internal consistency---it predicts and explains the long left tail observed in Firm A's cosine distribution (Section IV-D)---and for avoiding overclaim in downstream inference. ## I. Convergent Threshold Determination with a Density-Smoothness Diagnostic @@ -250,7 +244,7 @@ We further emphasize that this anchor is a *subset* of the true positive class-- Inter-CPA pairs cannot arise from reuse of a single signer's stored signature image, so this population is a reliable negative class for threshold sweeps. This anchor is substantially larger than a simple low-similarity-same-CPA negative and yields tight Wilson 95% confidence intervals on FAR at each candidate threshold. -3. **Firm A anchor (replication-dominated prior positive):** Firm A signatures, treated as a majority-positive reference with within-firm heterogeneity in the left tail (consistent with a minority of hand-signers), as evidenced by the 32/171 middle-band share in the accountant-level mixture (Section III-H). +3. **Firm A anchor (replication-dominated prior positive):** Firm A signatures, treated as a majority-positive reference with within-firm heterogeneity in the left tail, as evidenced by the 32/171 middle-band share in the accountant-level mixture (Section III-H). Because Firm A is both used for empirical percentile calibration in Section III-H and as a validation anchor, we make the within-Firm-A sampling variance visible by splitting Firm A CPAs randomly (at the CPA level, not the signature level) into a 70% *calibration* fold and a 30% *heldout* fold. The calibration-fold percentiles used in thresholding---cosine median, P1, and P5 (lower-tail, since higher cosine indicates greater similarity), and dHash_indep median and P95 (upper-tail, since lower dHash indicates greater similarity)---are derived from the 70% calibration fold only. The heldout fold is used exclusively to report post-hoc capture rates with Wilson 95% confidence intervals. diff --git a/paper/paper_a_results_v3.md b/paper/paper_a_results_v3.md index 8406858..178cff9 100644 --- a/paper/paper_a_results_v3.md +++ b/paper/paper_a_results_v3.md @@ -69,7 +69,7 @@ The $N = 168{,}740$ count used in Table V and in the downstream same-CPA per-sig | Per-accountant dHash mean | 686 | 0.0277 | <0.001 | Multimodal | --> -Firm A's per-signature cosine distribution is *unimodal* ($p = 0.17$), reflecting a single dominant generative mechanism (non-hand-signing) with a long left tail attributable to within-firm heterogeneity---consistent with a minority of hand-signing Firm A partners---as identified in the accountant-level mixture (Section IV-E). +Firm A's per-signature cosine distribution is *unimodal* ($p = 0.17$), reflecting a single dominant generative mechanism (non-hand-signing) with a long left tail attributable to within-firm heterogeneity in signing outputs (see Section IV-E for the accountant-level mixture evidence and Section III-G for the scope of partner-level claims). The all-CPA cosine distribution, which mixes many firms with heterogeneous signing practices, is *multimodal* ($p < 0.001$). At the per-accountant aggregate level both cosine and dHash means are strongly multimodal, foreshadowing the mixture structure analyzed in Section IV-E. @@ -125,7 +125,7 @@ Table VII reports the three-component composition, and Fig. 4 visualizes the acc Three empirical findings stand out. First, of the 180 CPAs in the Firm A registry, 171 have $\geq 10$ signatures and therefore enter the accountant-level GMM (the remaining 9 have too few signatures for reliable aggregates and are excluded from this analysis only). Component C1 captures 139 of these 171 Firm A CPAs (81%) in a tight high-cosine / low-dHash cluster; the remaining 32 Firm A CPAs fall into C2. -This split is consistent with the minority-hand-signers framing of Section III-H and with the unimodal-long-tail observation of Section IV-D. +This split is consistent with the within-firm heterogeneity framing of Section III-H and with the unimodal-long-tail observation of Section IV-D. Second, the three-component partition is *not* a firm-identity partition: three of the four Big-4 firms dominate C2 together, and smaller domestic firms cluster into C3. Third, applying the threshold framework of Section III-I to the accountant-level cosine-mean distribution yields the estimates summarized in the accountant-level rows of Table VIII (below): KDE antimode $= 0.973$, Beta-2 crossing $= 0.979$, and the logit-GMM-2 crossing $= 0.976$ converge within $\sim 0.006$ of each other, while the BD/McCrary density-smoothness diagnostic is largely null at the accountant level---no significant transition at two of three cosine bin widths and two of three dHash bin widths, with the one cosine transition at bin 0.005 sitting at cosine 0.980 on the upper edge of the convergence band (Appendix A). For completeness we also report the marginal crossings of a *separately fit* two-component 2D GMM (reported as a cross-check on the 1D accountant-level crossings) at cosine $= 0.945$ and dHash $= 8.10$; these differ from the 1D crossings because they are derived from the joint (cosine, dHash) covariance structure rather than from each 1D marginal in isolation. @@ -283,7 +283,8 @@ Subsection H.3 applies the calibrated classifier and is therefore a consistency ### 1) Year-by-Year Stability of the Firm A Left Tail Table XIII reports the proportion of Firm A signatures with per-signature best-match cosine below 0.95, disaggregated by fiscal year. -Under the replication-dominated interpretation (Section III-H) and the within-year label-uniformity convention A2 (Section III-G), this left-tail share is read as a partner-level minority of Firm A CPAs who continue to hand-sign rather than as a bare signature-level rate. +Under the replication-dominated interpretation (Section III-H), this signature-level left-tail rate reflects within-firm heterogeneity in signing outputs at Firm A. +Consistent with the scope-of-claims framing in Section III-G, we report the rate as a signature-level quantity without disaggregating the underlying mechanism (which may span a minority of hand-signing partners, multi-template replication workflows within the firm, or a combination); partner-level mechanism attribution is not attempted. Under the alternative hypothesis that the left tail is an artifact of scan or compression noise, the share should shrink as scanning and PDF-compression technology improved over 2013-2023.