diff --git a/paper/v4/paper_a_methodology_v4_section_iii.md b/paper/v4/paper_a_methodology_v4_section_iii.md index c906843..1b20549 100644 --- a/paper/v4/paper_a_methodology_v4_section_iii.md +++ b/paper/v4/paper_a_methodology_v4_section_iii.md @@ -96,7 +96,7 @@ $\text{BIC}(K{=}3) = -1111.93$, lower than $K{=}2$ by $3.48$ (mild numerical pre - Firm C: $23.5\%$ C1, $75.5\%$ C2, $1.0\%$ C3 - Firm D: $11.5\%$ C1, $\sim 84\%$ C2, $\sim 4.5\%$ C3 -Firm A accounts for $141$ of the $143$ C3-assigned CPAs; Firm C accounts for $24$ of the $40$ C1-assigned CPAs. The K=3 partition is therefore well-described as a firm-compositional decomposition: C3 is essentially "Firm A and any non-Firm-A CPA whose mean descriptors happen to land in the high-cos / low-dHash corner"; C1 is essentially "non-Firm-A CPAs whose mean descriptors land in the low-cos / high-dHash corner." The composition contrast that K=3 captures at the accountant level reappears at the deployment level in the cross-firm hit matrix of §III-L.4 (Script 44): nearly all (98%) of the inter-CPA-anchor hits for a Firm A source signature have a Firm A candidate, and the same within-firm concentration holds for Firms B, C, D individually. The K=3 partition and the cross-firm hit matrix therefore describe the same underlying firm-compositional structure at two different units of analysis. +Firm A accounts for $141$ of the $143$ C3-assigned CPAs; Firm C accounts for $24$ of the $40$ C1-assigned CPAs. The K=3 partition is therefore well-described as a firm-compositional decomposition: C3 is essentially "Firm A and any non-Firm-A CPA whose mean descriptors happen to land in the high-cos / low-dHash corner"; C1 is essentially "non-Firm-A CPAs whose mean descriptors land in the low-cos / high-dHash corner." The composition contrast that K=3 captures at the accountant level reappears at the deployment level in the cross-firm hit matrix of §III-L.4 (Script 44): under the deployed any-pair rule, within-firm collision concentration is $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms). The K=3 partition and the cross-firm hit matrix therefore describe the same underlying firm-compositional structure at two different units of analysis. **Leave-one-firm-out stability (Scripts 36, 37).** Leave-one-firm-out cross-validation shows that K=2 is unstable across folds: holding Firm A out gives a fold rule cos $> 0.938$ AND dHash $\leq 8.79$, while holding any single non-Firm-A Big-4 firm out gives a fold rule near cos $> 0.975$ AND dHash $\leq 3.76$ (Script 36). The maximum absolute deviation of the four fold cosine crossings from their across-fold mean is $0.028$ (the corresponding pairwise across-fold range is $0.0376$, from $0.9380$ for the held-out-Firm-A fold to $0.9756$ for the held-out-Firm-D fold; Script 36 stability summary). The $0.028$ value is $5.6\times$ the report's $0.005$ across-fold stability tolerance. K=3 in contrast has a *reproducible component shape*: across the four folds the C1 cosine mean varies by at most $0.005$, the C1 dHash mean by at most $0.96$, and the C1 weight by at most $0.023$ (Script 37). K=3 hard-posterior membership for the held-out firm is composition-sensitive — for Firm C the held-out C1 rate is $36.3\%$ vs the full-Big-4 baseline of $23.5\%$, an absolute difference of $12.8$ pp; for Firm A the held-out C1 rate is $4.7\%$ vs baseline $0.0\%$; the report's own legend classifies this pattern as `P2_PARTIAL` ("the C1 cluster exists but membership is not well-predicted by the held-out fit"). We accordingly do not use K=3 hard-posterior membership as an operational label. @@ -280,9 +280,9 @@ The per-decile per-firm breakdown (Script 44) confirms the pattern: within every For the same-pair joint event (a single candidate satisfying both $\text{cos} > 0.95$ and $\text{dHash} \leq 5$), the candidate firm is even more strongly concentrated within the source firm: Firm A source $\to$ Firm A candidate in $11{,}314$ of $11{,}319$ same-pair hits ($99.96\%$); Firm B source $\to$ Firm B candidate in $85$ of $87$ ($97.7\%$); Firm C source $\to$ Firm C candidate in $54$ of $55$ ($98.2\%$); Firm D source $\to$ Firm D candidate in $64$ of $66$ ($97.0\%$). -**Interpretation.** The cross-firm hit matrix shows that nearly all inter-CPA collisions under the deployed rule originate from candidates within the source firm (different CPA, same firm). This pattern is consistent with — but not by itself diagnostic of — firm-specific template, stamp, or document-production reuse: within-firm scanning workflows, common form templates, and shared report-generation infrastructure could produce visually similar signature crops across different CPAs within the same firm. The byte-level evidence of v3.x §IV-F.1 (Firm A's $145$ pixel-identical signatures across $\sim 50$ distinct certifying partners) provides direct evidence that firm-level template reuse does occur at Firm A; the broader inter-CPA collision pattern in §III-L.4 is consistent with that mechanism extending in milder form to Firms B/C/D. We report this as "inter-CPA collision concentration is within-firm" — a descriptive observation about deployed-rule behaviour — and refrain from inferring that the within-firm hits constitute deliberate or systematic template sharing. +**Interpretation.** Under the deployed any-pair rule, the within-firm collision concentration is $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D — Firm A's pattern is markedly more within-firm-concentrated than the other three firms', though every Big-4 firm still has more than three quarters of its any-pair collisions falling on candidates within the same firm. The stricter same-pair joint event — a single candidate satisfying both cos $> 0.95$ and dHash $\leq 5$ — saturates at $97.0$–$99.96\%$ within-firm across all four firms. This pattern is consistent with — but not by itself diagnostic of — firm-specific template, stamp, or document-production reuse: within-firm scanning workflows, common form templates, and shared report-generation infrastructure could produce visually similar signature crops across different CPAs within the same firm. The byte-level evidence of v3.x §IV-F.1 (Firm A's $145$ pixel-identical signatures across $\sim 50$ distinct certifying partners) provides direct evidence that firm-level template reuse does occur at Firm A; the broader inter-CPA collision pattern in §III-L.4 is consistent with that mechanism extending in milder form to Firms B/C/D. We report this as "inter-CPA collision concentration is within-firm" — a descriptive observation about deployed-rule behaviour — and refrain from inferring that the within-firm hits constitute deliberate or systematic template sharing. -This connects back to §III-J: the K=3 firm-composition contrast at the accountant level (Firm A dominating C3; Firm C dominating C1) reappears at the deployment level in the cross-firm hit matrix, where nearly all collisions are within-firm. The K=3 partition and the cross-firm hit matrix describe the same underlying firm-compositional structure at two different units of analysis. +This connects back to §III-J: the K=3 firm-composition contrast at the accountant level (Firm A dominating C3; Firm C dominating C1) reappears at the deployment level in the cross-firm hit matrix, where the within-firm collision concentration is the dominant pattern at all four Big-4 firms — most strongly at Firm A ($98.8\%$ any-pair, $99.96\%$ same-pair) and at materially lower but still majority levels at Firms B/C/D ($76.7$–$83.7\%$ any-pair; $97.0$–$98.2\%$ same-pair). The K=3 partition and the cross-firm hit matrix describe the same underlying firm-compositional structure at two different units of analysis. ### L.5. Alert-rate sensitivity around inherited thresholds (Script 46) diff --git a/paper/v4/paper_a_prose_v4_phase4.md b/paper/v4/paper_a_prose_v4_phase4.md index 580e551..f969450 100644 --- a/paper/v4/paper_a_prose_v4_phase4.md +++ b/paper/v4/paper_a_prose_v4_phase4.md @@ -8,7 +8,7 @@ > *IEEE Access target: <= 250 words, single paragraph.* -Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes reusing a stored signature image across reports — through administrative stamping or firm-level electronic signing — technically trivial and visually invisible, undermining individualized attestation. We build an end-to-end pipeline detecting such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, YOLOv11 localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. Applied to 90,282 Taiwan audit reports (2013–2023), the pipeline yields 182,328 signatures from 758 CPAs; primary analyses are scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures). Distributional diagnostics show that the apparent multimodality of the descriptor distribution dissolves under joint firm-mean centring and integer-tie jitter ($p$ rises to $0.35$), so no within-population bimodal antimode anchors the operational thresholds. We instead adopt an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units: per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ under the deployed any-pair high-confidence rule), and per-document ($0.34$ for the operational HC+MC alarm). Firm heterogeneity is decisive: Firm A's per-document HC+MC alarm rate is $0.62$ versus $0.09$–$0.16$ at Firms B/C/D after pool-size adjustment, with $98$–$100\%$ of inter-CPA collisions concentrated within the source firm — consistent with firm-level template-like reuse. We position the system as a specificity-proxy-anchored screening framework with human-in-the-loop review, not as a validated forensic detector; no calibrated error rates are reportable without signature-level ground truth. +Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes reusing a stored signature image across reports — through administrative stamping or firm-level electronic signing — technically trivial and visually invisible, undermining individualized attestation. We build an end-to-end pipeline detecting such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, YOLOv11 localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. Applied to 90,282 Taiwan audit reports (2013–2023), the pipeline yields 182,328 signatures from 758 CPAs; primary analyses are scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures). Distributional diagnostics show that the apparent multimodality of the descriptor distribution dissolves under joint firm-mean centring and integer-tie jitter ($p$ rises to $0.35$), so no within-population bimodal antimode anchors the operational thresholds. We instead adopt an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units: per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ under the deployed any-pair high-confidence rule), and per-document ($0.34$ for the operational HC+MC alarm). Firm heterogeneity is decisive: Firm A's per-document HC+MC alarm rate is $0.62$ versus $0.09$–$0.16$ at Firms B/C/D after pool-size adjustment, and under the deployed any-pair rule between $77\%$ and $99\%$ of inter-CPA collisions concentrate within the source firm (Firm A $98.8\%$; Firms B/C/D $76.7$–$83.7\%$) — consistent with firm-level template-like reuse. We position the system as a specificity-proxy-anchored screening framework with human-in-the-loop review, not as a validated forensic detector; no calibrated error rates are reportable without signature-level ground truth. --- @@ -32,7 +32,7 @@ The methodological reframing relative to earlier versions of this work is centra In place of distributional anchoring, v4.0 adopts an anchor-based inter-CPA coincidence-rate (ICCR) calibration. At the per-comparison unit, the inherited cos$>0.95$ operating point yields ICCR $= 0.00060$ on a $5 \times 10^5$-pair Big-4 sample (replicating v3.x's reported per-comparison rate of $0.0005$ under prior "FAR" terminology); the dHash$\leq 5$ structural cutoff yields ICCR $= 0.00129$ (v4 new); the joint rule cos$>0.95$ AND dHash$\leq 5$ yields joint ICCR $= 0.00014$ (any-pair semantics, matching the deployed extrema rule). At the pool-normalised per-signature unit, the same rule's effective coincidence rate is materially higher because the deployed classifier takes max-cosine and min-dHash over a same-CPA pool: pooled Big-4 any-pair ICCR is $0.1102$ (Wilson 95% CI $[0.1086, 0.1118]$; CPA-block bootstrap 95% $[0.0908, 0.1330]$). At the per-document unit, the operational HC$+$MC alarm fires on $33.75\%$ of Big-4 documents under the inter-CPA candidate-pool counterfactual. -The pooled per-signature and per-document rates conceal striking firm heterogeneity. A logistic regression of the per-signature hit indicator on firm dummies (Firm A reference) and centred log pool size yields odds ratios of $0.053$ (Firm B), $0.010$ (Firm C), and $0.027$ (Firm D) — Firms B/C/D are an order of magnitude below Firm A even after controlling for the pool-size confound (Script 44). Cross-firm hit matrix analysis shows that $98$–$100\%$ of inter-CPA collisions originate from candidates within the source firm (different CPA, same firm), consistent with firm-specific template, stamp, or document-production reuse mechanisms — though not by itself diagnostic of deliberate sharing. We retain the inherited Paper A v3.x five-way box rule as the operational classifier; v4.0's contribution is to characterise its multi-level coincidence behaviour against the inter-CPA negative anchor rather than to derive new thresholds. +The pooled per-signature and per-document rates conceal striking firm heterogeneity. A logistic regression of the per-signature hit indicator on firm dummies (Firm A reference) and centred log pool size yields odds ratios of $0.053$ (Firm B), $0.010$ (Firm C), and $0.027$ (Firm D) — Firms B/C/D are an order of magnitude below Firm A even after controlling for the pool-size confound (Script 44). Cross-firm hit matrix analysis under the deployed any-pair rule shows within-firm collision concentrations of $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D (Table XXV; the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms). The pattern is consistent with firm-specific template, stamp, or document-production reuse mechanisms — though not by itself diagnostic of deliberate sharing. We retain the inherited Paper A v3.x five-way box rule as the operational classifier; v4.0's contribution is to characterise its multi-level coincidence behaviour against the inter-CPA negative anchor rather than to derive new thresholds. Three feature-derived scores converge on the per-CPA descriptor-position ranking with Spearman $\rho \geq 0.879$ (Script 38): the K=3 mixture posterior (now interpreted as a firm-compositional position score, not a mechanism cluster posterior; §III-J), a reverse-anchor cosine percentile relative to a strictly-out-of-target non-Big-4 reference, and the inherited box-rule less-replication-dominated rate. The three scores are deterministic functions of the same per-CPA descriptor pair, so the convergence is documented as internal consistency among feature-derived ranks rather than external validation. Hard ground truth for the *replicated* class is provided by 262 byte-identical signatures in the Big-4 subset (Firm A 145, Firm B 8, Firm C 107, Firm D 2), against which all three candidate checks achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). For the box rule this result is close to tautological at byte-identity; we discuss the conservative-subset caveat in §V-G. @@ -50,7 +50,7 @@ The contributions of this paper are: 5. **Anchor-based multi-level inter-CPA coincidence-rate calibration.** We characterise the deployed five-way classifier at three units of analysis: per-comparison ICCR (cos$>0.95$: $0.0006$; dHash$\leq 5$: $0.0013$; joint: $0.00014$), pool-normalised per-signature ICCR ($0.11$ for the deployed any-pair high-confidence rule), and per-document ICCR ($0.34$ for the operational HC$+$MC alarm). We adopt "inter-CPA coincidence rate" as the metric name throughout and reserve "False Acceptance Rate" for terminology that requires ground-truth negative labels, which the corpus does not provide. -6. **Firm heterogeneity quantification and within-firm cross-CPA collision concentration.** Per-firm rates differ by an order of magnitude after pool-size adjustment (Firm A's per-document HC$+$MC alarm at $0.62$ versus Firms B/C/D at $0.09$–$0.16$). Cross-firm hit matrix analysis shows that $98$–$100\%$ of inter-CPA collisions originate from candidates within the source firm, consistent with firm-specific template, stamp, or document-production reuse mechanisms — a descriptive finding about deployed-rule behaviour, not a claim of deliberate template sharing. +6. **Firm heterogeneity quantification and within-firm cross-CPA collision concentration.** Per-firm rates differ by an order of magnitude after pool-size adjustment (Firm A's per-document HC$+$MC alarm at $0.62$ versus Firms B/C/D at $0.09$–$0.16$). Cross-firm hit matrix analysis shows within-firm collision concentrations of $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D under the deployed any-pair rule (the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms); the pattern is consistent with firm-specific template, stamp, or document-production reuse mechanisms — a descriptive finding about deployed-rule behaviour, not a claim of deliberate template sharing. 7. **K=3 as descriptive firm-compositional partition; three-score convergent internal consistency.** We fit a K=3 Gaussian mixture as a descriptive partition of the Big-4 accountant-level distribution (no longer interpreted as three mechanism clusters). Three feature-derived scores agree on the per-CPA descriptor-position ranking at Spearman $\rho \geq 0.879$; we report this as internal consistency rather than external validation, given that the scores share the underlying descriptor pair. @@ -84,7 +84,7 @@ The Big-4 accountant-level descriptor distribution does reject unimodality on bo Firm A is empirically the firm whose CPAs are most concentrated in the high-cosine, low-dHash corner of the Big-4 descriptor plane. In the Big-4 K=3 hard-posterior assignment (now interpreted as a firm-compositional position assignment; §III-J), Firm A accounts for $0\%$ of C1 (low-cos / high-dHash position) and $82.5\%$ of C3 (high-cos / low-dHash position); the opposite pattern holds at Firm C, which has the highest C1 concentration at $23.5\%$. Firm A also accounts for 145 of the 262 byte-identical signatures in the Big-4 byte-identical anchor of §IV-H (with Firm B 8, Firm C 107, Firm D 2). The additional v3.x finding that the 145 Firm A pixel-identical signatures span 50 distinct Firm A partners (of 180 registered), with 35 byte-identical matches across different fiscal years, is inherited from v3.20.0 §IV-F.1 / Script 28 / Appendix B byte-decomposition output and was not regenerated in v4.0's spike scripts; we retain those numbers by reference. -In v4.0 we treat Firm A as a *templated-end case study* rather than as the calibration anchor for the operational threshold. Firm A enters the Big-4 anchor-based ICCR calibration on equal footing with the other three Big-4 firms (§III-L). The cross-firm hit matrix of §III-L.4 strengthens this framing: $98$–$100\%$ of inter-CPA collisions originate from candidates within the source firm, regardless of which Big-4 firm is the source. Firm A's high per-document HC$+$MC alarm rate of $0.62$ (versus Firms B/C/D's $0.09$–$0.16$) reflects high inter-CPA collision concentration under the deployed rule on real same-CPA pools, consistent with firm-specific template, stamp, or document-production reuse — though the inter-CPA-anchor analysis alone is not diagnostic of deliberate template sharing. The byte-level evidence of v3.x §IV-F.1 (Firm A's 145 pixel-identical signatures across $\sim 50$ distinct partners) provides direct evidence that firm-level template reuse does occur at Firm A; the within-firm collision pattern at all four Big-4 firms is consistent with that mechanism extending in milder form to Firms B/C/D. +In v4.0 we treat Firm A as a *templated-end case study* rather than as the calibration anchor for the operational threshold. Firm A enters the Big-4 anchor-based ICCR calibration on equal footing with the other three Big-4 firms (§III-L). The cross-firm hit matrix of §III-L.4 strengthens this framing: under the deployed any-pair rule, within-firm collision concentration is $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms). Firm A's high per-document HC$+$MC alarm rate of $0.62$ (versus Firms B/C/D's $0.09$–$0.16$) reflects high inter-CPA collision concentration under the deployed rule on real same-CPA pools, consistent with firm-specific template, stamp, or document-production reuse — though the inter-CPA-anchor analysis alone is not diagnostic of deliberate template sharing. The byte-level evidence of v3.x §IV-F.1 (Firm A's 145 pixel-identical signatures across $\sim 50$ distinct partners) provides direct evidence that firm-level template reuse does occur at Firm A; the within-firm collision pattern at all four Big-4 firms is consistent with that mechanism extending in milder form to Firms B/C/D. ## D. K=2 / K=3 as Descriptive Firm-Compositional Partitions @@ -106,13 +106,13 @@ Two additional findings refine the calibration story. First, the per-pair condit The only hard ground-truth subset in the corpus is pixel-identical signatures: those whose nearest same-CPA match is byte-identical after crop and normalisation. Independent hand-signing cannot produce byte-identical images, so these signatures are conservative-subset ground truth for the *replicated* class. On the Big-4 subset ($n = 262$ pixel-identical signatures), all three candidate checks — the inherited box rule, the K=3 hard label, and the reverse-anchor metric with a prevalence-calibrated cut — achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). We caution that this result is necessary but not sufficient: for the box rule it is close to tautological, because byte-identical neighbours have cosine $\approx 1$ and dHash $\approx 0$, well inside the rule's high-confidence region. The corresponding signature-level *negative* anchor evidence is developed in §III-L.1 above (v4 spike: cos$>0.95$ per-comparison ICCR $= 0.00060$, replicating v3.20.0's reported $0.0005$ under prior "FAR" terminology). We frame the per-comparison rate as a specificity proxy under the assumption that inter-CPA pairs constitute a clean negative anchor, and we document in §III-L.4 that this assumption is partially violated by within-firm cross-CPA template-like collision structures. -## G. Limitations +## H. Limitations Several limitations should be transparent. The first nine are v4.0-specific; the last five are inherited from v3.20.0 §V-G and still apply to the v4.0 pipeline. *No signature-level ground truth; no true error rates reportable.* The corpus does not contain labelled hand-signed or replicated classes at the signature level. We therefore cannot report False Rejection Rate, sensitivity, recall, Equal Error Rate, ROC-AUC, precision, or positive predictive value against ground truth. All quantitative rates reported in §III-L are inter-CPA negative-anchor coincidence rates (ICCRs) under the assumption that inter-CPA pairs constitute a clean negative anchor; this is a specificity proxy, not a calibrated specificity (§III-M). -*Inter-CPA negative-anchor assumption is partially violated.* The cross-firm hit matrix of §III-L.4 shows that $98$–$100\%$ of inter-CPA collisions under the deployed rule originate from candidates within the source firm, consistent with firm-specific template, stamp, or document-production reuse. The inter-CPA-as-negative assumption is therefore not exactly satisfied — some inter-CPA pairs may share firm-level templates rather than being independent random matches. Our reported per-comparison ICCRs are best read as specificity-proxy rates under a partially-violated assumption, not as calibrated FARs. +*Inter-CPA negative-anchor assumption is partially violated.* The cross-firm hit matrix of §III-L.4 shows that under the deployed any-pair rule, within-firm collision concentration is $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms), consistent with firm-specific template, stamp, or document-production reuse. The inter-CPA-as-negative assumption is therefore not exactly satisfied — some inter-CPA pairs may share firm-level templates rather than being independent random matches. Our reported per-comparison ICCRs are best read as specificity-proxy rates under a partially-violated assumption, not as calibrated FARs. *Scope.* The v4.0 primary analyses are scoped to the Big-4 sub-corpus. We did not perform the full per-signature pool-normalised ICCR analysis at the full $n = 686$ scope; the §IV-K full-dataset Spearman re-run shows the K=3 $+$ box-rule rank-convergence is preserved at $n = 686$ but does not validate the Big-4 operational ICCRs, the LOOO firm-fold structure, or the five-way operational classifier at the broader scope. @@ -144,9 +144,9 @@ Several limitations should be transparent. The first nine are v4.0-specific; the We present a fully automated pipeline for detecting non-hand-signed CPA signatures in Taiwan-listed financial audit reports and a multi-tool framework for characterising and disclosing its operational behaviour at the Big-4 sub-corpus scope. The pipeline processes raw PDFs through VLM-based page identification, YOLO-based signature detection, ResNet-50 feature extraction, and dual-descriptor (cosine + independent-minimum dHash) similarity computation. The operational output is an inherited Paper A five-way per-signature classifier with worst-case document-level aggregation (§III-L). Applied to 90,282 audit reports filed between 2013 and 2023, the pipeline extracts 182,328 signatures from 758 CPAs, with the Big-4 sub-corpus (437 CPAs at accountant level; 150,442–150,453 signatures at signature level) as the primary analytical population. -Our central methodological contributions are: (1) a composition decomposition (Scripts 39b–39e) that establishes the absence of a within-population bimodal antimode in the Big-4 descriptor distribution: the apparent multimodality dissolves under joint firm-mean centring and integer-tie jitter ($p_{\text{median}} = 0.35$), so distributional "natural-threshold" framings of the inherited operating points are not empirically supported; (2) an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units of analysis — per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ for the deployed any-pair HC rule), and per-document ($0.34$ for the operational HC$+$MC alarm) — with explicit terminological replacement of "FAR" by "ICCR" given the unsupervised setting; (3) firm heterogeneity quantification: logistic regression with pool-size adjustment gives odds ratios $0.053$, $0.010$, $0.027$ for Firms B/C/D relative to Firm A reference, indicating a large multiplicative effect that pool-size differences do not explain; (4) cross-firm hit matrix evidence that $98$–$100\%$ of inter-CPA collisions under the deployed rule originate from candidates within the source firm, consistent with firm-specific template, stamp, or document-production reuse mechanisms; (5) K=3 mixture demoted from "three mechanism clusters" to a descriptive firm-compositional partition; (6) three feature-derived scores converging on the per-CPA descriptor-position ranking at Spearman $\rho \geq 0.879$, reported as internal consistency rather than external validation; (7) $0\%$ positive-anchor miss rate on 262 byte-identical Big-4 signatures with the conservative-subset caveat; and (8) a nine-tool unsupervised-validation collection (§III-M) that explicitly discloses each tool's untested assumption and positions the system as an anchor-calibrated screening framework with human-in-the-loop review, not as a validated forensic detector. +Our central methodological contributions are: (1) a composition decomposition (Scripts 39b–39e) that establishes the absence of a within-population bimodal antimode in the Big-4 descriptor distribution: the apparent multimodality dissolves under joint firm-mean centring and integer-tie jitter ($p_{\text{median}} = 0.35$), so distributional "natural-threshold" framings of the inherited operating points are not empirically supported; (2) an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units of analysis — per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ for the deployed any-pair HC rule), and per-document ($0.34$ for the operational HC$+$MC alarm) — with explicit terminological replacement of "FAR" by "ICCR" given the unsupervised setting; (3) firm heterogeneity quantification: logistic regression with pool-size adjustment gives odds ratios $0.053$, $0.010$, $0.027$ for Firms B/C/D relative to Firm A reference, indicating a large multiplicative effect that pool-size differences do not explain; (4) cross-firm hit matrix evidence that under the deployed any-pair rule, within-firm collision concentration is $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms), consistent with firm-specific template, stamp, or document-production reuse mechanisms; (5) K=3 mixture demoted from "three mechanism clusters" to a descriptive firm-compositional partition; (6) three feature-derived scores converging on the per-CPA descriptor-position ranking at Spearman $\rho \geq 0.879$, reported as internal consistency rather than external validation; (7) $0\%$ positive-anchor miss rate on 262 byte-identical Big-4 signatures with the conservative-subset caveat; and (8) a nine-tool unsupervised-validation collection (§III-M) that explicitly discloses each tool's untested assumption and positions the system as an anchor-calibrated screening framework with human-in-the-loop review, not as a validated forensic detector. -Future work falls in four directions. *First*, a small-scale human-rated validation set would enable direct ROC optimisation and provide signature-level ground truth that v4.0 fundamentally lacks; without such ground truth, no true error rates can be reported. *Second*, the within-firm collision concentration documented in §III-L.4 (98–100% same-firm partners) invites a separate study to distinguish deliberate template sharing from passive firm-level production artefacts (shared scanners, common form templates, identical report-generation infrastructure) — a question the inter-CPA-anchor analysis alone cannot resolve. *Third*, the descriptive Firm A versus Firms B/C/D contrast (per-document HC$+$MC alarm $0.62$ vs $0.09$–$0.16$) — together with v3.x's byte-level evidence of 145 pixel-identical signatures across $\sim 50$ distinct Firm A partners — invites a companion analysis examining whether such firm-level signing patterns correlate with established audit-quality measures. *Fourth*, generalisation to mid- and small-firm contexts requires extending the anchor-based ICCR framework to scopes where firm-level LOOO folds are not available; the §III-I.4 composition diagnostics already document that the absence of within-population bimodality is corpus-universal, so the v4.0 calibration approach in principle generalises, but a full extension with cluster-robust uncertainty quantification is left as future work. +Future work falls in four directions. *First*, a small-scale human-rated validation set would enable direct ROC optimisation and provide signature-level ground truth that v4.0 fundamentally lacks; without such ground truth, no true error rates can be reported. *Second*, the within-firm collision concentration documented in §III-L.4 (any-pair $76.7$–$98.8\%$ across Big-4; same-pair joint $97.0$–$99.96\%$) invites a separate study to distinguish deliberate template sharing from passive firm-level production artefacts (shared scanners, common form templates, identical report-generation infrastructure) — a question the inter-CPA-anchor analysis alone cannot resolve. *Third*, the descriptive Firm A versus Firms B/C/D contrast (per-document HC$+$MC alarm $0.62$ vs $0.09$–$0.16$) — together with v3.x's byte-level evidence of 145 pixel-identical signatures across $\sim 50$ distinct Firm A partners — invites a companion analysis examining whether such firm-level signing patterns correlate with established audit-quality measures. *Fourth*, generalisation to mid- and small-firm contexts requires extending the anchor-based ICCR framework to scopes where firm-level LOOO folds are not available; the §III-I.4 composition diagnostics already document that the absence of within-population bimodality is corpus-universal, so the v4.0 calibration approach in principle generalises, but a full extension with cluster-robust uncertainty quantification is left as future work. --- diff --git a/paper/v4/paper_a_results_v4_section_iv.md b/paper/v4/paper_a_results_v4_section_iv.md index 7a13332..8f4b129 100644 --- a/paper/v4/paper_a_results_v4_section_iv.md +++ b/paper/v4/paper_a_results_v4_section_iv.md @@ -82,24 +82,24 @@ This section reports the empirical evidence for §III-K's three-score internal-c | Score pair | Spearman $\rho$ | $p$-value | |---|---|---| -| K=3 P(C1) vs Paper A box-rule hand-leaning rate | $+0.9627$ | $< 10^{-248}$ | -| Reverse-anchor cosine percentile vs Paper A box-rule hand-leaning rate | $+0.8890$ | $< 10^{-149}$ | +| K=3 P(C1) vs Paper A box-rule less-replication-dominated rate | $+0.9627$ | $< 10^{-248}$ | +| Reverse-anchor cosine percentile vs Paper A box-rule less-replication-dominated rate | $+0.8890$ | $< 10^{-149}$ | | K=3 P(C1) vs Reverse-anchor cosine percentile | $+0.8794$ | $< 10^{-142}$ | (Source: Script 38.) Reverse-anchor reference: 2D Gaussian fit by MCD (support fraction 0.85) on $n = 249$ non-Big-4 CPAs; reference centre $\overline{\text{cos}} = 0.935$, $\overline{\text{dHash}} = 9.77$. **Table X.** Per-firm summary across the three feature-derived scores, Big-4. -| Firm | $n$ CPAs | mean $P(\text{C1})$ | mean reverse-anchor score | mean Paper A hand-leaning rate | +| Firm | $n$ CPAs | mean $P(\text{C1})$ | mean reverse-anchor score | mean Paper A less-replication-dominated rate | |---|---|---|---|---| | Firm A | 171 | 0.0072 | $-0.9726$ | 0.1935 | | Firm B | 112 | 0.1410 | $-0.8201$ | 0.6962 | | Firm C | 102 | 0.3110 | $-0.7672$ | 0.7896 | | Firm D | 52 | 0.2406 | $-0.7125$ | 0.7608 | -(Source: Script 38 per-firm summary; reverse-anchor score is sign-flipped so that *higher* values indicate deeper into the reference left tail = more hand-leaning relative to the non-Big-4 reference.) +(Source: Script 38 per-firm summary; reverse-anchor score is sign-flipped so that *higher* values indicate deeper into the reference left tail = less replication-dominated relative to the non-Big-4 reference.) -The three scores agree on placing Firm A as the most replication-dominated and the three non-Firm-A firms as more hand-leaning. The K=3 posterior P(C1) and the box-rule hand-leaning rate (Score 1 and Score 3) place Firm C at the most-hand-leaning end of Big-4; the reverse-anchor cosine percentile (Score 2) ranks Firm D fractionally above Firm C. This residual within-Big-4-non-A disagreement is a design feature of the reverse-anchor metric: Score 2 measures only the marginal cosine percentile under the non-Big-4 reference, so a firm with a slightly higher cosine but a markedly different dHash distribution (Firm D vs Firm C) can score higher on Score 2 while scoring lower on Scores 1 and 3, both of which use both descriptors. +The three scores agree on placing Firm A as the most replication-dominated and the three non-Firm-A firms as less replication-dominated. The K=3 posterior P(C1) and the box-rule less-replication-dominated rate (Score 1 and Score 3) place Firm C at the least-replication-dominated end of Big-4; the reverse-anchor cosine percentile (Score 2) ranks Firm D fractionally above Firm C. This residual within-Big-4-non-A disagreement is a design feature of the reverse-anchor metric: Score 2 measures only the marginal cosine percentile under the non-Big-4 reference, so a firm with a slightly higher cosine but a markedly different dHash distribution (Firm D vs Firm C) can score higher on Score 2 while scoring lower on Scores 1 and 3, both of which use both descriptors. **Table XI.** Per-signature Cohen $\kappa$ (binary collapse, replicated vs not-replicated), $n = 150{,}442$ Big-4 signatures. @@ -144,10 +144,10 @@ This section reports the only hard-ground-truth subset analysis available in the **Table XIV.** Positive-anchor miss rate, $n = 262$ Big-4 byte-identical signatures. -| Classifier | Misclassified as hand-leaning | Miss rate | Wilson 95% CI | +| Classifier | Misclassified as less-replication-dominated | Miss rate | Wilson 95% CI | |---|---|---|---| | Paper A binary high-confidence box rule (cos $> 0.95$ AND dHash $\leq 5$) | $0 / 262$ | $0\%$ | $[0\%, 1.45\%]$ | -| K=3 per-CPA hard label (C3 = replicated; descriptive) | $0 / 262$ | $0\%$ | $[0\%, 1.45\%]$ | +| K=3 per-CPA hard label (C3 = high-cos / low-dHash; descriptive) | $0 / 262$ | $0\%$ | $[0\%, 1.45\%]$ | | Reverse-anchor (prevalence-calibrated cut) | $0 / 262$ | $0\%$ | $[0\%, 1.45\%]$ | (Source: Script 40.) Per-firm breakdown of the byte-identical subset: Firm A 145; Firm B 8; Firm C 107; Firm D 2. All three candidate scores correctly assign every byte-identical signature to the replicated class. @@ -174,7 +174,7 @@ This section reports the §III-L five-way per-signature + document-level worst-c | UN | Uncertain | 35,480 | 23.58% | | LH | Likely hand-signed | 238 | 0.16% | -(Source: Script 42; 11 of 150,453 loaded Big-4 signatures lacked one or both descriptors and were excluded.) +(Source: Script 42; 11 of 150,453 loaded Big-4 signatures lacked one or both descriptors and were excluded. The $150{,}442$ vs $150{,}453$ distinction — descriptor-complete vs vector-complete — recurs across §IV: §IV-D through §IV-J and §IV-M.5 use $n = 150{,}442$; §IV-M.3 Table XXII uses $n = 150{,}453$ because the pool-normalised ICCR analysis loads all vector-complete signatures including those failing the descriptor-complete filter. See §III-G for the sample-size reconciliation.) **Per-firm five-way breakdown (% within firm).** @@ -185,11 +185,11 @@ This section reports the §III-L five-way per-signature + document-level worst-c | Firm C | 23.75% | 41.44% | 0.38% | 34.21% | 0.22% | 38,613 | | Firm D | 24.51% | 29.33% | 0.22% | 45.65% | 0.29% | 17,133 | -(Source: Script 42 per-firm cross-tab.) The per-firm pattern qualitatively aligns with the K=3 cluster cross-tab of Table XVI: Firm A's signatures concentrate in the HC band (81.70%) while its CPAs concentrate at the accountant level in the K=3 C3-replicated component (82.46%; Table XVI). These two figures address different units (per-signature classification vs per-CPA hard cluster assignment) and are not directly comparable as a like-for-like consistency check; we report the qualitative alignment but do not infer a numerical equivalence. The three non-Firm-A Big-4 firms have markedly lower HC rates than Firm A and substantially higher Uncertain rates, with Firm D having the highest Uncertain rate (45.65%). +(Source: Script 42 per-firm cross-tab.) The per-firm pattern qualitatively aligns with the K=3 cluster cross-tab of Table XVI: Firm A's signatures concentrate in the HC band (81.70%) while its CPAs concentrate at the accountant level in the K=3 C3 (high-cos / low-dHash) component (82.46%; Table XVI). These two figures address different units (per-signature classification vs per-CPA hard cluster assignment) and are not directly comparable as a like-for-like consistency check; we report the qualitative alignment but do not infer a numerical equivalence. The three non-Firm-A Big-4 firms have markedly lower HC rates than Firm A and substantially higher Uncertain rates, with Firm D having the highest Uncertain rate (45.65%). **Document-level worst-case aggregation.** Each audit report typically carries two certifying-CPA signatures. We aggregate signature-level outcomes to document-level labels using the v3.20.0 worst-case rule (HC > MC > HSC > UN > LH; §III-L). v4.0 does not change this aggregation rule; only the population over which it is computed changes (Big-4 subset). -**Table XV-B.** Document-level worst-case category counts, Big-4 sub-corpus, $n = 75{,}233$ unique PDFs. +**Table XIX.** Document-level worst-case category counts, Big-4 sub-corpus, $n = 75{,}233$ unique PDFs. | Category | Long name | $n$ documents | % | |---|---|---|---| @@ -212,38 +212,38 @@ This section reports the §III-L five-way per-signature + document-level worst-c (Source: Script 42; mixed-firm PDFs $n = 379$ excluded from the per-firm rows but included in the overall counts above.) -The five-way **moderate-confidence non-hand-signed** band (cos $> 0.95$ AND $5 < \text{dHash} \leq 15$) inherits its v3.x calibration; it is **not separately validated by Scripts 38–40**, which evaluated only the binary high-confidence rule (cos $> 0.95$ AND dHash $\leq 5$). v4.0 does not re-derive the moderate-band cuts on the Big-4 subset; we report the Table XV per-firm MC proportions (10.76% / 35.88% / 41.44% / 29.33% across Firms A through D) descriptively. The v3.20.0 capture-rate calibration evidence for the moderate band (v3.20.0 Tables IX, XI, XII, XII-B) is carried into v4.0 by reference and not regenerated on the Big-4 subset. We do not claim that the MC-band per-firm ordering above is a separate validation of the §III-K Spearman convergence, since MC occupancy is not a monotone function of the per-CPA hand-leaning ranking (e.g., Firm D's MC fraction is lower than Firm B's while Firm D's reverse-anchor score ranks it as more hand-leaning than Firm B). +The five-way **moderate-confidence non-hand-signed** band (cos $> 0.95$ AND $5 < \text{dHash} \leq 15$) inherits its v3.x calibration; it is **not separately validated by Scripts 38–40**, which evaluated only the binary high-confidence rule (cos $> 0.95$ AND dHash $\leq 5$). v4.0 does not re-derive the moderate-band cuts on the Big-4 subset; we report the Table XV per-firm MC proportions (10.76% / 35.88% / 41.44% / 29.33% across Firms A through D) descriptively. The v3.20.0 capture-rate calibration evidence for the moderate band (v3.20.0 Tables IX, XI, XII, XII-B) is carried into v4.0 by reference and not regenerated on the Big-4 subset. We do not claim that the MC-band per-firm ordering above is a separate validation of the §III-K Spearman convergence, since MC occupancy is not a monotone function of the per-CPA less-replication-dominated ranking (e.g., Firm D's MC fraction is lower than Firm B's while Firm D's reverse-anchor score ranks it as less replication-dominated than Firm B). **Table XVI.** Firm × K=3 cluster cross-tabulation, Big-4 sub-corpus. -| Firm | $n$ | C1 (hand-leaning) | C2 (mixed) | C3 (replicated) | C1 % | C3 % | +| Firm | $n$ | C1 (low-cos / high-dHash) | C2 (central) | C3 (high-cos / low-dHash) | C1 % | C3 % | |---|---|---|---|---|---|---| | Firm A | 171 | 0 | 30 | 141 | $0.00\%$ | $82.46\%$ | | Firm B | 112 | 10 | 102 | 0 | $8.93\%$ | $0.00\%$ | | Firm C | 102 | 24 | 77 | 1 | $23.53\%$ | $0.98\%$ | | Firm D | 52 | 6 | 45 | 1 | $11.54\%$ | $1.92\%$ | -(Source: Script 35.) The cross-tab is the accountant-level descriptive output of the K=3 mixture (§III-J / §IV-E). It is reported here as a complement to the five-way per-signature classifier (Table XV), not as an operational classifier output. Reading: Firm A's CPAs are concentrated in the C3 replicated component (no Firm A CPAs in C1); Firm C has the highest hand-leaning concentration of the Big-4 (C1 fraction $23.5\%$); Firms B and D sit between A and C on the K=3 hard-label ordering, broadly consistent with the per-firm Spearman ordering of Table X (with the within-Big-4-non-A reverse-anchor disagreement noted there). +(Source: Script 35.) The cross-tab is the accountant-level descriptive output of the K=3 mixture (§III-J / §IV-E). It is reported here as a complement to the five-way per-signature classifier (Table XV), not as an operational classifier output. Reading: Firm A's CPAs are concentrated in the C3 (high-cos / low-dHash) component (no Firm A CPAs in C1); Firm C has the highest C1 (low-cos / high-dHash) concentration of the Big-4 (C1 fraction $23.5\%$); Firms B and D sit between A and C on the K=3 hard-label ordering, broadly consistent with the per-firm Spearman ordering of Table X (with the within-Big-4-non-A reverse-anchor disagreement noted there). -**Document-level worst-case aggregation outputs are reported in Table XV-B above.** +**Document-level worst-case aggregation outputs are reported in Table XIX above.** ## K. Full-Dataset Robustness (light scope) -This section reports the v4.0 reproducibility cross-check at the full accountant scope ($n = 686$ CPAs, Big-4 plus mid/small firms). The scope of §IV-K is deliberately narrow: we re-run only the K=3 mixture + Paper A operational-rule per-CPA hand-leaning rate analysis, sufficient to demonstrate that the v4.0 K=3 + Paper A convergence reproduces at the wider scope. The §III-L five-way classifier and the §IV-G LOOO analyses are not re-run at the full scope. The five-way moderate-confidence band is documented as inherited from v3.x calibration in §IV-J. +This section reports the v4.0 reproducibility cross-check at the full accountant scope ($n = 686$ CPAs, Big-4 plus mid/small firms). The scope of §IV-K is deliberately narrow: we re-run only the K=3 mixture + Paper A operational-rule per-CPA less-replication-dominated rate analysis, sufficient to demonstrate that the v4.0 K=3 + Paper A convergence reproduces at the wider scope. The §III-L five-way classifier and the §IV-G LOOO analyses are not re-run at the full scope. The five-way moderate-confidence band is documented as inherited from v3.x calibration in §IV-J. **Table XVII.** K=3 component comparison, Big-4 sub-corpus vs full dataset. | K=3 component | Big-4 (n=437) cos / dHash / weight | Full (n=686) cos / dHash / weight | Drift Big-4 → Full | |---|---|---|---| -| C1 hand-leaning | 0.9457 / 9.17 / 0.143 | 0.9278 / 11.17 / 0.284 | $\lvert\Delta\rvert$ cos 0.018, dHash 1.99, wt 0.141 | -| C2 mixed | 0.9558 / 6.66 / 0.536 | 0.9535 / 6.99 / 0.512 | $\lvert\Delta\rvert$ cos 0.002, dHash 0.33, wt 0.024 | -| C3 replicated | 0.9826 / 2.41 / 0.321 | 0.9826 / 2.40 / 0.205 | $\lvert\Delta\rvert$ cos 0.000, dHash 0.01, wt 0.117 | +| C1 (low-cos / high-dHash) | 0.9457 / 9.17 / 0.143 | 0.9278 / 11.17 / 0.284 | $\lvert\Delta\rvert$ cos 0.018, dHash 1.99, wt 0.141 | +| C2 (central) | 0.9558 / 6.66 / 0.536 | 0.9535 / 6.99 / 0.512 | $\lvert\Delta\rvert$ cos 0.002, dHash 0.33, wt 0.024 | +| C3 (high-cos / low-dHash) | 0.9826 / 2.41 / 0.321 | 0.9826 / 2.40 / 0.205 | $\lvert\Delta\rvert$ cos 0.000, dHash 0.01, wt 0.117 | (Source: Script 41; full-dataset $\text{BIC}(K{=}3) = -792.31$ vs Big-4 $\text{BIC}(K{=}3) = -1111.93$; BIC values are not directly comparable across different $n$ and are reported only for completeness.) -**Table XVIII.** Spearman rank correlation between K=3 P(C1) and Paper A operational hand-leaning rate, Big-4 sub-corpus vs full dataset. +**Table XVIII.** Spearman rank correlation between K=3 P(C1) and Paper A operational less-replication-dominated rate, Big-4 sub-corpus vs full dataset. -| Scope | $n$ CPAs | Spearman $\rho$ (P(C1) vs Paper A hand-leaning rate) | $p$-value | +| Scope | $n$ CPAs | Spearman $\rho$ (P(C1) vs Paper A less-replication-dominated rate) | $p$-value | |---|---|---|---| | Big-4 (primary) | 437 | $+0.9627$ | $< 10^{-248}$ | | Full dataset | 686 | $+0.9558$ | $< 10^{-300}$ | @@ -251,7 +251,7 @@ This section reports the v4.0 reproducibility cross-check at the full accountant (Source: Script 41.) -**Reading.** The K=3 component ordering and the strong Spearman convergence between K=3 P(C1) and the Paper A box-rule hand-leaning rate are preserved at the full scope. Component centres shift modestly: C3 (replicated) is essentially unchanged in centre but loses weight $0.117$ as the full population includes more non-templated CPAs (mid/small firms); C1 (hand-leaning) gains weight $0.141$ and shifts to lower cosine and higher dHash (centre $(0.928, 11.17)$ vs Big-4 $(0.946, 9.17)$) as the broader population includes mid/small-firm hand-leaning CPAs that the Big-4-primary scope deliberately excludes. We read this as evidence that the Big-4-primary K=3 + Paper A convergence is not a Big-4-specific artefact; we do **not** read it as an endorsement of using full-dataset K=3 component centres or operational thresholds in place of the Big-4-primary analysis. Mid/small-firm composition shifts the component centres meaningfully and the v4.0 primary methodology is restricted to Big-4 by design (§III-G item 4). +**Reading.** The K=3 component ordering and the strong Spearman convergence between K=3 P(C1) and the Paper A box-rule less-replication-dominated rate are preserved at the full scope. Component centres shift modestly: C3 (high-cos / low-dHash) is essentially unchanged in centre but loses weight $0.117$ as the full population includes more non-templated CPAs (mid/small firms); C1 (low-cos / high-dHash) gains weight $0.141$ and shifts to lower cosine and higher dHash (centre $(0.928, 11.17)$ vs Big-4 $(0.946, 9.17)$) as the broader population includes mid/small-firm CPAs landing toward the low-cos / high-dHash region that the Big-4-primary scope deliberately excludes. We read this as evidence that the Big-4-primary K=3 + Paper A convergence is not a Big-4-specific artefact; we do **not** read it as an endorsement of using full-dataset K=3 component centres or operational thresholds in place of the Big-4-primary analysis. Mid/small-firm composition shifts the component centres meaningfully and the v4.0 primary methodology is restricted to Big-4 by design (§III-G item 4). ## L. Feature Backbone Ablation (inherited from v3.20.0 §IV-I) @@ -263,7 +263,7 @@ This section consolidates the v4-new empirical results that support the §III-L ### M.1 Composition decomposition (Scripts 39b–39e) -**Table XIX.** Within-firm and between-firm decomposition of the Big-4 accountant-level dip-test rejection. +**Table XX.** Within-firm and between-firm decomposition of the Big-4 accountant-level dip-test rejection. | Diagnostic | Scope | Statistic | Implication | |---|---|---|---| @@ -277,7 +277,7 @@ This section consolidates the v4-new empirical results that support the §III-L ### M.2 Anchor-based inter-CPA pair-level ICCR (Script 40b) -**Table XX.** Big-4 inter-CPA per-comparison ICCR sweep, $n = 5 \times 10^5$ pairs (Big-4 scope; v4 new). +**Table XXI.** Big-4 inter-CPA per-comparison ICCR sweep, $n = 5 \times 10^5$ pairs (Big-4 scope; v4 new). | Threshold | Per-comparison ICCR | 95% Wilson CI | |---|---|---| @@ -297,7 +297,7 @@ The cos $> 0.95$ row replicates v3.20.0 §IV-F.1 Table X (v3 reported $0.0005$ u ### M.3 Pool-normalised per-signature ICCR (Script 43) -**Table XXI.** Pool-normalised per-signature ICCR under the deployed any-pair HC rule (cos $> 0.95$ AND dHash $\leq 5$); $n_{\text{sig}} = 150{,}453$ (vector-complete Big-4); CPA-block bootstrap $n_{\text{boot}} = 1000$. +**Table XXII.** Pool-normalised per-signature ICCR under the deployed any-pair HC rule (cos $> 0.95$ AND dHash $\leq 5$); $n_{\text{sig}} = 150{,}453$ (vector-complete Big-4); CPA-block bootstrap $n_{\text{boot}} = 1000$. | Scope | Per-signature ICCR | Wilson 95% CI | CPA-bootstrap 95% CI | |---|---|---|---| @@ -314,7 +314,7 @@ Decile trend is broadly monotone in pool size with two minor reversals (decile 5 ### M.4 Document-level ICCR under three alarm definitions (Script 45) -**Table XXII.** Document-level inter-CPA ICCR by alarm definition; $n_{\text{docs}} = 75{,}233$. +**Table XXIII.** Document-level inter-CPA ICCR by alarm definition; $n_{\text{docs}} = 75{,}233$. | Alarm definition | Alarm set | Document-level ICCR | Wilson 95% CI | |---|---|---|---| @@ -326,7 +326,7 @@ Per-firm D2 document-level ICCR: Firm A $0.6201$ ($n = 30{,}226$); Firm B $0.160 ### M.5 Firm heterogeneity logistic regression and cross-firm hit matrix (Script 44) -**Table XXIII.** Logistic regression of per-signature any-pair HC hit indicator on firm dummies and centred log pool size (Firm A reference). +**Table XXIV.** Logistic regression of per-signature any-pair HC hit indicator on firm dummies and centred log pool size (Firm A reference). | Term | Odds ratio (vs Firm A) | Direction | |---|---|---| @@ -337,7 +337,7 @@ Per-firm D2 document-level ICCR: Firm A $0.6201$ ($n = 30{,}226$); Firm B $0.160 Per-decile per-firm rates (Table not duplicated here; Script 44 decile table available in the supplementary report): within every pool-size decile, Firms B/C/D show rates of $0.0006$–$0.0358$ while Firm A ranges $0.0541$–$0.5958$. The firm gap survives within matched pool sizes. -**Table XXIV.** Cross-firm hit matrix among Big-4 source signatures with any-pair HC hit; max-cosine partner firm (counts). +**Table XXV.** Cross-firm hit matrix among Big-4 source signatures with any-pair HC hit; max-cosine partner firm (counts). | Source firm | Firm A cand. | Firm B | Firm C | Firm D | non-Big-4 | n hits | |---|---|---|---|---|---|---| @@ -350,7 +350,7 @@ Same-pair joint hits (single candidate satisfying both cos $> 0.95$ AND dHash $\ ### M.6 Alert-rate sensitivity around inherited HC threshold (Script 46) -**Table XXV.** Local-gradient / median-gradient ratio at inherited thresholds (descriptive plateau diagnostic). +**Table XXVI.** Local-gradient / median-gradient ratio at inherited thresholds (descriptive plateau diagnostic). | Threshold | Local / median gradient ratio | Interpretation | |---|---|---|