Apply Phase 5 round-3 splice-blocker fixes from codex round-8

Closes the three concrete splice blockers codex round-8 surfaced in the post-round-2 drafts, plus the binary-collapse terminology residue. No empirical changes. - Abstract trimmed 261 -> 247 words (3 under IEEE Access <=250 target). Cut "technically trivial and visually invisible," (S1 motivational redundancy) and the within-firm-rate parenthetical "(Firm A 98.8%; Firms B/C/D 76.7-83.7%)" plus "between" connector; preserved the corrected 77-99% any-pair headline so the M3 substance survives. - §IV-J Table XV sample-size footnote (line 177) corrected: round-2 misclassified §IV-M.5 as descriptor-complete n=150,442; Script 44 / Tables XXIV-XXV actually use vector-complete n=150,453, same as §IV-M.2 Table XXI (Script 40b) and §IV-M.3 Table XXII (Script 43). New footnote distinguishes descriptor-complete (§IV-D through §IV-J) from vector/pair-recomputed (§IV-M.2/M.3/M.5; Scripts 40b/43/44). - §IV-I (line 161) stale cross-reference: "§IV-M Table XVI" was the K=3 firm cross-tab (descriptive), not the v4-new ICCR calibration. Replaced with "§IV-M Tables XXI-XXVI" — the full ICCR calibration block. Pre-existing error exposed by the round-2 cascade. - §III line 131 + §IV Table XI line 104 binary-collapse label: "replicated vs not-replicated" -> "replication-dominated vs less-replication-dominated" for consistency with the K=3 descriptor-position framing. "Replicated class" preserved where it refers to byte-identical positive-anchor ground truth (§III-K.4, §IV-H lines 143/153/155). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:17:30 +08:00
parent 4ee2efb5bb
commit 4a6f9c5c98
3 changed files with 5 additions and 5 deletions
@@ -128,7 +128,7 @@ Pairwise Spearman rank correlations among the three scores, $n = 437$ Big-4 CPAs
 We read this as the strongest internal-consistency signal in v4.0: three different summarisations of the same descriptor pair agree on the per-CPA descriptor-position ranking with $\rho > 0.87$. The three scores agree on placing Firm A as the most replication-dominated descriptor position and the three non-Firm-A Big-4 firms further from the templated end, but they do not all rank the non-Firm-A firms identically: the K=3 posterior P(C1) and the box-rule less-replication-dominated rate (Scores 1 and 3) place Firm C at the less-replication-dominated end of Big-4 (mean P(C1) $= 0.311$; mean box-rule less-replication-dominated rate $= 0.790$), while the reverse-anchor cosine percentile (Score 2) places Firm D fractionally higher than Firm C (mean reverse-anchor score $-0.7125$ vs Firm C $-0.7672$, with higher value indicating deeper into the reference left tail). The mean values for Firms B and D sit between Firms A and C on Scores 1 and 3 (Script 38 per-firm summary). We do not claim this constitutes external validation of any operational classifier; the inherited box rule is calibrated separately (§III-L), and the convergence above shows that a mixture-derived score and a reverse-anchor score concur with the box rule's per-CPA-aggregated outputs on the directional ordering, with a modest disagreement at the less-replication-dominated end between the three non-A Big-4 firms.
-**2. Per-signature consistency (Script 39).** Per-CPA aggregation could in principle reflect averaging across within-CPA heterogeneity rather than coherent within-CPA behaviour. We test this by repeating the K=3 fit at the signature level — fitting a fresh K=3 GMM to the 150,442 Big-4 signature-level $(\text{cos}, \text{dHash}_{\text{indep}})$ points (Script 39) — and comparing labels. The per-CPA and per-signature K=3 fits recover a broadly similar three-component ordering; per-CPA C1 is at $\overline{\text{cos}} = 0.946$, $\overline{\text{dHash}} = 9.17$ vs per-signature C1 at $\overline{\text{cos}} = 0.928$, $\overline{\text{dHash}} = 9.75$ (an absolute cosine drift of $0.018$). Cohen $\kappa$ on the binary collapse (replicated vs not-replicated):
+**2. Per-signature consistency (Script 39).** Per-CPA aggregation could in principle reflect averaging across within-CPA heterogeneity rather than coherent within-CPA behaviour. We test this by repeating the K=3 fit at the signature level — fitting a fresh K=3 GMM to the 150,442 Big-4 signature-level $(\text{cos}, \text{dHash}_{\text{indep}})$ points (Script 39) — and comparing labels. The per-CPA and per-signature K=3 fits recover a broadly similar three-component ordering; per-CPA C1 is at $\overline{\text{cos}} = 0.946$, $\overline{\text{dHash}} = 9.17$ vs per-signature C1 at $\overline{\text{cos}} = 0.928$, $\overline{\text{dHash}} = 9.75$ (an absolute cosine drift of $0.018$). Cohen $\kappa$ on the binary collapse (replication-dominated vs less-replication-dominated):
 | Pair | Cohen $\kappa$ |
 |---|---|
@@ -8,7 +8,7 @@
 > *IEEE Access target: <= 250 words, single paragraph.*
-Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes reusing a stored signature image across reports — through administrative stamping or firm-level electronic signing — technically trivial and visually invisible, undermining individualized attestation. We build an end-to-end pipeline detecting such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, YOLOv11 localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. Applied to 90,282 Taiwan audit reports (2013–2023), the pipeline yields 182,328 signatures from 758 CPAs; primary analyses are scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures). Distributional diagnostics show that the apparent multimodality of the descriptor distribution dissolves under joint firm-mean centring and integer-tie jitter ($p$ rises to $0.35$), so no within-population bimodal antimode anchors the operational thresholds. We instead adopt an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units: per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ under the deployed any-pair high-confidence rule), and per-document ($0.34$ for the operational HC+MC alarm). Firm heterogeneity is decisive: Firm A's per-document HC+MC alarm rate is $0.62$ versus $0.09$–$0.16$ at Firms B/C/D after pool-size adjustment, and under the deployed any-pair rule between $77\%$ and $99\%$ of inter-CPA collisions concentrate within the source firm (Firm A $98.8\%$; Firms B/C/D $76.7$–$83.7\%$) — consistent with firm-level template-like reuse. We position the system as a specificity-proxy-anchored screening framework with human-in-the-loop review, not as a validated forensic detector; no calibrated error rates are reportable without signature-level ground truth.
+Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes reusing a stored signature image across reports — through administrative stamping or firm-level electronic signing — undermining individualized attestation. We build an end-to-end pipeline detecting such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, YOLOv11 localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. Applied to 90,282 Taiwan audit reports (2013–2023), the pipeline yields 182,328 signatures from 758 CPAs; primary analyses are scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures). Distributional diagnostics show that the apparent multimodality of the descriptor distribution dissolves under joint firm-mean centring and integer-tie jitter ($p$ rises to $0.35$), so no within-population bimodal antimode anchors the operational thresholds. We instead adopt an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units: per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ under the deployed any-pair high-confidence rule), and per-document ($0.34$ for the operational HC+MC alarm). Firm heterogeneity is decisive: Firm A's per-document HC+MC alarm rate is $0.62$ versus $0.09$–$0.16$ at Firms B/C/D after pool-size adjustment, and under the deployed any-pair rule $77$–$99\%$ of inter-CPA collisions concentrate within the source firm — consistent with firm-level template-like reuse. We position the system as a specificity-proxy-anchored screening framework with human-in-the-loop review, not as a validated forensic detector; no calibrated error rates are reportable without signature-level ground truth.
 ---
@@ -101,7 +101,7 @@ This section reports the empirical evidence for §III-K's three-score internal-c
 The three scores agree on placing Firm A as the most replication-dominated and the three non-Firm-A firms as less replication-dominated. The K=3 posterior P(C1) and the box-rule less-replication-dominated rate (Score 1 and Score 3) place Firm C at the least-replication-dominated end of Big-4; the reverse-anchor cosine percentile (Score 2) ranks Firm D fractionally above Firm C. This residual within-Big-4-non-A disagreement is a design feature of the reverse-anchor metric: Score 2 measures only the marginal cosine percentile under the non-Big-4 reference, so a firm with a slightly higher cosine but a markedly different dHash distribution (Firm D vs Firm C) can score higher on Score 2 while scoring lower on Scores 1 and 3, both of which use both descriptors.
-**Table XI.** Per-signature Cohen $\kappa$ (binary collapse, replicated vs not-replicated), $n = 150{,}442$ Big-4 signatures.
+**Table XI.** Per-signature Cohen $\kappa$ (binary collapse, replication-dominated vs less-replication-dominated), $n = 150{,}442$ Big-4 signatures.
 | Pair | Cohen $\kappa$ |
 |---|---|
@@ -158,7 +158,7 @@ We caution that for the Paper A box rule this result is close to tautological (b
 The signature-level inter-CPA pair-level coincidence-rate analysis (reported in v3.20.0 §IV-F.1, Table X as "FAR") is inherited and extended in v4.0. v4.0 retroactively reframes the metric as **inter-CPA pair-level coincidence rate (ICCR)** rather than "False Acceptance Rate" because the corpus does not provide signature-level ground-truth negative labels; the inter-CPA negative-anchor assumption underpinning the metric is itself partially violated by within-firm cross-CPA template-like collision structures (§III-L.4). The v3.20.0 corpus-wide spike on $\sim 50{,}000$ inter-CPA pairs reported a per-comparison rate of $0.0005$ (Wilson 95% CI $[0.0003, 0.0007]$) at the cosine cut $0.95$.
-v4.0 additionally reports the §III-L.1 Big-4-scope spike at higher sample size ($5 \times 10^5$ inter-CPA pairs; Script 40b), which replicates and extends the v3 result and adds the structural dimension (dHash) and joint-rule rates. The §III-L.1 numbers are referenced rather than duplicated here; the consolidated v4-new ICCR calibration appears in §IV-M Table XVI.
+v4.0 additionally reports the §III-L.1 Big-4-scope spike at higher sample size ($5 \times 10^5$ inter-CPA pairs; Script 40b), which replicates and extends the v3 result and adds the structural dimension (dHash) and joint-rule rates. The §III-L.1 numbers are referenced rather than duplicated here; the consolidated v4-new ICCR calibration appears in §IV-M Tables XXI–XXVI.
 ## J. Five-Way Per-Signature + Document-Level Classification Output
@@ -174,7 +174,7 @@ This section reports the §III-L five-way per-signature + document-level worst-c
 | UN | Uncertain | 35,480 | 23.58% |
 | LH | Likely hand-signed | 238 | 0.16% |
-(Source: Script 42; 11 of 150,453 loaded Big-4 signatures lacked one or both descriptors and were excluded. The $150{,}442$ vs $150{,}453$ distinction — descriptor-complete vs vector-complete — recurs across §IV: §IV-D through §IV-J and §IV-M.5 use $n = 150{,}442$; §IV-M.3 Table XXII uses $n = 150{,}453$ because the pool-normalised ICCR analysis loads all vector-complete signatures including those failing the descriptor-complete filter. See §III-G for the sample-size reconciliation.)
+(Source: Script 42; 11 of 150,453 loaded Big-4 signatures lacked one or both descriptors and were excluded. The $150{,}442$ vs $150{,}453$ distinction — descriptor-complete vs vector-complete — recurs across §IV: descriptor-complete analyses (§IV-D through §IV-J, all using accountant-level aggregates or per-signature category counts derived from the same 150,442-signature substrate) use $n = 150{,}442$; vector- or pair-recomputed analyses (§IV-M.2 Table XXI, §IV-M.3 Table XXII, §IV-M.5 Tables XXIV–XXV; Scripts 40b, 43, 44) use $n = 150{,}453$ because their pair- or pool-level computations load all vector-complete signatures including those failing the descriptor-complete filter. See §III-G for the sample-size reconciliation.)
 **Per-firm five-way breakdown (% within firm).**