Update §IV v3.3: soften §IV-D/E framing + rename §IV-I + add §IV-M

- §IV-D opening: note that the accountant-level dip rejection is fully explained by between-firm composition + integer ties per §III-I.4 (Scripts 39b-e), no longer "the empirical justification for fitting a mixture model" - §IV-E Tables VII/VIII: K=2/K=3 component labels changed from "hand-leaning / mixed / replicated" to position-on-plane labels per §III-J recasting - §IV-I retitled "Inter-CPA Pair-Level Coincidence Rate"; v3.x's "FAR" terminology retroactively reframed; references §IV-M for the v4 Big-4 spike (Script 40b) - New §IV-M (7 tables XIX-XXV): v4-new anchor-based ICCR calibration results consolidated — composition decomposition (Scripts 39b-e), pair-level ICCR sweep (Script 40b), pool- normalised per-signature ICCR (Script 43), document-level ICCR by alarm definition (Script 45), firm-heterogeneity logistic regression + cross-firm hit matrix (Script 44), alert-rate sensitivity (Script 46) - Header bumped to v3.3 (post codex rounds 21-34) Companion to §III v7 commit 723a3f6 and Phase 4 prose v3 commit b33e20d. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:18:59 +08:00
parent b33e20d479
commit 980295d5bd
1 changed files with 117 additions and 12 deletions
@@ -1,4 +1,4 @@
-# Section IV. Results — v4.0 Draft v3.2 (post codex rounds 21–25)
+# Section IV. Results — v4.0 Draft v3.3 (post codex rounds 21–34)

 > **Draft note (2026-05-12, v3.2; internal — remove before submission).** This file replaces the §IV-A through §IV-H block of `paper/paper_a_results_v3.md` (v3.20.0) with the Big-4 reframed structure. Section IV expands from 8 sub-sections in v3.20.0 to 12 sub-sections in v4.0 (A through L) to mirror the §III-G..L lineage. **Table-numbering scheme**: the v4 manuscript uses Tables V through XVIII (plus Table XV-B for document-level worst-case counts) for the new v4 Big-4 results; inherited v3.x tables are cited only as "v3.20.0 Table N" with their original v3 number and are *not* renumbered into the v4 sequence. No v4 Table IV is printed; the inherited v3.20.0 Table IV (per-firm detection counts) remains a v3.x reference rather than a v4 table. **Anonymisation**: the Big-4 firms are pseudonymously labelled Firm A through Firm D throughout the manuscript body; real names are not printed in v4 tables or prose. The v3 → v3.1 → v3.2 revision history is: v3 (post round 23) made the table-numbering scheme and anonymisation policy decisions and applied 14 presentation fixes; v3.1 (post round 24) tightened the close-out checklist; v3.2 (post round 25) finalises this draft note. Empirical anchors trace to Scripts 32–42 on branch `paper-a-v4-big4`; the §III provenance table covers the methodology-side citations and §IV adds new tables for the v4.0-specific results.

@@ -20,7 +20,7 @@ The all-pairs intra-vs-inter class distribution analysis (KDE crossover at $\ove

 ## D. Big-4 Accountant-Level Distributional Characterisation

-This section reports the empirical evidence for §III-I's three-diagnostic distributional characterisation at the Big-4 accountant level. All numbers below are direct re-statements from Scripts 32 / 34; cross-citations to the v3.x (signature-level) analysis are noted where the v4.0 result differs structurally from the v3.x result.
+This section reports the empirical evidence for §III-I's distributional diagnostics at the Big-4 accountant level. All numbers below are direct re-statements from Scripts 32 / 34. The accountant-level dip-test rejection reported in Table V is, per §III-I.4 (Scripts 39b–39e), fully attributable to between-firm location shifts and integer mass-point artefacts rather than to within-population bimodality; the v4-new composition-decomposition diagnostics that establish this finding are tabulated in §IV-M below alongside the anchor-based ICCR calibration.

 **Table V.** Hartigan dip-test results, accountant-level marginals (Big-4 primary; comparison scopes from Script 32).

@@ -48,12 +48,12 @@ The Big-4-scope null on both axes is consistent with the §IV-E mixture evidence

 This section reports the K=2 and K=3 2D Gaussian mixture fits to the Big-4 accountant-level distribution and the bootstrap stability of their marginal crossings.

-**Table VII.** Big-4 K=2 mixture components and marginal-crossing bootstrap 95% confidence intervals.
+**Table VII.** Big-4 K=2 mixture components (descriptive partition; not mechanism clusters per §III-J) and marginal-crossing bootstrap 95% confidence intervals.

 | K=2 component | $\overline{\text{cos}}$ | $\overline{\text{dHash}}$ | weight |
 |---|---|---|---|
-| Hand-leaning | 0.954 | 7.14 | 0.689 |
-| Replicated | 0.983 | 2.41 | 0.311 |
+| K=2-a (low-cos / high-dHash position) | 0.954 | 7.14 | 0.689 |
+| K=2-b (high-cos / low-dHash position) | 0.983 | 2.41 | 0.311 |

 Marginal crossings (point + bootstrap 95% CI, $n_{\text{boot}} = 500$):

@@ -64,13 +64,13 @@ Marginal crossings (point + bootstrap 95% CI, $n_{\text{boot}} = 500$):

 $\text{BIC}(K{=}2) = -1108.45$ (Script 34).

-**Table VIII.** Big-4 K=3 mixture components.
+**Table VIII.** Big-4 K=3 mixture components (descriptive firm-compositional partition per §III-J; not mechanism clusters).

-| K=3 component | $\overline{\text{cos}}$ | $\overline{\text{dHash}}$ | weight | descriptive label |
+| K=3 component | $\overline{\text{cos}}$ | $\overline{\text{dHash}}$ | weight | descriptive position |
 |---|---|---|---|---|
-| C1 | 0.9457 | 9.17 | 0.143 | hand-leaning |
-| C2 | 0.9558 | 6.66 | 0.536 | mixed |
-| C3 | 0.9826 | 2.41 | 0.321 | replicated |
+| C1 | 0.9457 | 9.17 | 0.143 | low-cos / high-dHash corner |
+| C2 | 0.9558 | 6.66 | 0.536 | central region |
+| C3 | 0.9826 | 2.41 | 0.321 | high-cos / low-dHash corner |

 $\text{BIC}(K{=}3) = -1111.93$, lower than $K{=}2$ by $3.48$ (mild support; not by itself decisive). The full-fit K=3 baseline above is reproduced in Scripts 35, 37, and 38 with identical hyperparameters; Script 37 additionally fits K=3 on each leave-one-firm-out training set (those fold-specific components differ from the full-fit baseline by design and are reported separately in §IV-G Table XIII). Operational use of the K=2 / K=3 fits is governed by §III-J and §III-L; §IV-G reports the LOOO reproducibility evidence that motivates reporting both fits descriptively.

@@ -154,9 +154,11 @@ This section reports the only hard-ground-truth subset analysis available in the

 We caution that for the Paper A box rule this result is close to tautological (byte-identical nearest-neighbour signatures have cosine $\approx 1$ and dHash $\approx 0$, well inside the rule's high-confidence region); v3.20.0 §V-F discusses this conservative-subset caveat at length and we retain that discussion. The reverse-anchor cut is chosen by *prevalence calibration* against the inherited box rule's overall replicated rate of $49.58\%$ across Big-4 signatures; this is a documented v4.0 limitation since no signature-level hand-signed ground truth exists to permit direct ROC optimisation.

-## I. Inter-CPA Negative-Anchor FAR (inherited from v3.x)
+## I. Inter-CPA Pair-Level Coincidence Rate (Big-4 spike + inherited corpus-wide)

-The signature-level inter-CPA negative-anchor FAR analysis (~50,000 random pairs from different CPAs; v3.20.0 §IV-F.1, Table X) is inherited unchanged. The v3.x result, reproduced here for reference: at the operational cosine cut of $0.95$, the inter-CPA FAR is $0.0005$ (Wilson 95% CI $[0.0003, 0.0007]$). v4.0 does not regenerate this analysis on the Big-4 subset; the inter-CPA negative-anchor logic is corpus-wide and the v3.x FAR remains the operational specificity reference for the §III-L operational rule.
+The signature-level inter-CPA pair-level coincidence-rate analysis (reported in v3.20.0 §IV-F.1, Table X as "FAR") is inherited and extended in v4.0. v4.0 retroactively reframes the metric as **inter-CPA pair-level coincidence rate (ICCR)** rather than "False Acceptance Rate" because the corpus does not provide signature-level ground-truth negative labels; the inter-CPA negative-anchor assumption underpinning the metric is itself partially violated by within-firm cross-CPA template-like collision structures (§III-L.4). The v3.20.0 corpus-wide spike on $\sim 50{,}000$ inter-CPA pairs reported a per-comparison rate of $0.0005$ (Wilson 95% CI $[0.0003, 0.0007]$) at the cosine cut $0.95$.
+
+v4.0 additionally reports the §III-L.1 Big-4-scope spike at higher sample size ($5 \times 10^5$ inter-CPA pairs; Script 40b), which replicates and extends the v3 result and adds the structural dimension (dHash) and joint-rule rates. The §III-L.1 numbers are referenced rather than duplicated here; the consolidated v4-new ICCR calibration appears in §IV-M Table XVI.

 ## J. Five-Way Per-Signature + Document-Level Classification Output

@@ -255,6 +257,109 @@ This section reports the v4.0 reproducibility cross-check at the full accountant

 The feature-backbone ablation (v3.20.0 Table XVIII; backbone replacement of ResNet-50 with alternative ImageNet-pretrained backbones to verify that the §III-E embedding choice is not load-bearing) is inherited unchanged. v3.20.0 Table XVIII is cited by its original v3 number and is **not** the same table as the v4 Table XVIII (which reports the Big-4 vs full-dataset Spearman drift in §IV-K). v4.0 makes no scope-specific re-derivation of the ablation; the analysis is a methodological-stability check on the embedding stage and is corpus-wide rather than Big-4-restricted.

+## M. v4-New Anchor-Based ICCR Calibration Results
+
+This section consolidates the v4-new empirical results that support the §III-L anchor-based threshold calibration framework. Numbers below are direct re-statements from the spike scripts cited per row; the corresponding §III provenance table entries appear in §III's provenance table.
+
+### M.1 Composition decomposition (Scripts 39b–39e)
+
+**Table XIX.** Within-firm and between-firm decomposition of the Big-4 accountant-level dip-test rejection.
+
+| Diagnostic | Scope | Statistic | Implication |
+|---|---|---|---|
+| Within-firm signature-level cosine dip | Big-4 (4 firms) | $p_{\text{cos}} \in \{0.176, 0.991, 0.551, 0.976\}$ | 0/4 firms reject; cosine within-firm unimodal |
+| Within-firm signature-level cosine dip | non-Big-4 (10 firms $\geq 500$ sigs) | $p_{\text{cos}} \in [0.59, 0.99]$ | 0/10 firms reject; cosine within-firm unimodal |
+| Within-firm jittered-dHash dip (5 seeds, median) | Big-4 (4 firms) | $p_{\text{med}} \in \{0.999, 0.996, 0.999, 0.9995\}$ | 0/4 firms reject after integer-jitter; raw rejection was integer-tie artefact |
+| Big-4 pooled dHash: 2×2 factorial | firm-centred + jittered (5 seeds) | $p_{\text{med}} = 0.35$, 0/5 seeds reject | combined corrections eliminate rejection; multimodality is composition + integer artefact |
+| Integer-histogram valley near $\text{dHash} \approx 5$ | within each Big-4 firm | none (0/4 firms) | no within-firm dHash antimode at the inherited HC cutoff |
+
+(Source: Scripts 39b, 39c, 39d, 39e; bootstrap $n_{\text{boot}} = 2000$; jitter $\sim \mathrm{U}[-0.5, +0.5]$.)
+
+### M.2 Anchor-based inter-CPA pair-level ICCR (Script 40b)
+
+**Table XX.** Big-4 inter-CPA per-comparison ICCR sweep, $n = 5 \times 10^5$ pairs (Big-4 scope; v4 new).
+
+| Threshold | Per-comparison ICCR | 95% Wilson CI |
+|---|---|---|
+| cos $> 0.945$ (v3.x published "natural threshold") | $0.00081$ | $[0.00073, 0.00089]$ |
+| cos $> 0.95$ (inherited operating point) | $0.00060$ | $[0.00053, 0.00067]$ |
+| cos $> 0.97$ | $0.00024$ | $[0.00020, 0.00029]$ |
+| cos $> 0.98$ | $0.00009$ | $[0.00007, 0.00012]$ |
+| dHash $\leq 5$ (inherited operating point) | $0.00129$ | $[0.00120, 0.00140]$ |
+| dHash $\leq 4$ | $0.00050$ | $[0.00044, 0.00057]$ |
+| dHash $\leq 3$ | $0.00019$ | $[0.00015, 0.00023]$ |
+| Joint: cos $> 0.95$ AND dHash $\leq 5$ (any-pair semantics) | $0.00014$ | — |
+| Joint: cos $> 0.95$ AND dHash $\leq 4$ (any-pair) | $0.00011$ | — |
+
+Conditional ICCR(dHash $\leq 5$ | cos $> 0.95$) $= 0.234$ (Wilson 95% $[0.190, 0.285]$; $70$ of $299$ pairs).
+
+The cos $> 0.95$ row replicates v3.20.0 §IV-F.1 Table X (v3 reported $0.0005$ under prior "FAR" terminology). The dHash row and joint row are v4 new.
+
+### M.3 Pool-normalised per-signature ICCR (Script 43)
+
+**Table XXI.** Pool-normalised per-signature ICCR under the deployed any-pair HC rule (cos $> 0.95$ AND dHash $\leq 5$); $n_{\text{sig}} = 150{,}453$ (vector-complete Big-4); CPA-block bootstrap $n_{\text{boot}} = 1000$.
+
+| Scope | Per-signature ICCR | Wilson 95% CI | CPA-bootstrap 95% CI |
+|---|---|---|---|
+| Big-4 pooled (any-pair, deployed) | $0.1102$ | $[0.1086, 0.1118]$ | $[0.0908, 0.1330]$ |
+| Big-4 pooled (same-pair, stricter alternative) | $0.0827$ | $[0.0813, 0.0841]$ | $[0.0668, 0.1021]$ |
+| Firm A (any-pair) | $0.2594$ | — | — |
+| Firm B (any-pair) | $0.0147$ | — | — |
+| Firm C (any-pair) | $0.0053$ | — | — |
+| Firm D (any-pair) | $0.0110$ | — | — |
+| Pool-size decile 1 (smallest pools) any-pair | $0.0249$ | — | — |
+| Pool-size decile 10 (largest pools) any-pair | $0.1905$ | — | — |
+
+Decile trend is broadly monotone in pool size with two minor reversals (decile 5 and decile 9 dip below their predecessors). Stricter operating point cos $> 0.95$ AND dHash $\leq 3$ (same-pair) gives per-signature ICCR $0.0449$.
+
+### M.4 Document-level ICCR under three alarm definitions (Script 45)
+
+**Table XXII.** Document-level inter-CPA ICCR by alarm definition; $n_{\text{docs}} = 75{,}233$.
+
+| Alarm definition | Alarm set | Document-level ICCR | Wilson 95% CI |
+|---|---|---|---|
+| D1 | HC only | $0.1797$ | $[0.1770, 0.1825]$ |
+| D2 (operational) | HC + MC | $0.3375$ | $[0.3342, 0.3409]$ |
+| D3 | HC + MC + HSC | $0.3384$ | $[0.3351, 0.3418]$ |
+
+Per-firm D2 document-level ICCR: Firm A $0.6201$ ($n = 30{,}226$); Firm B $0.1600$ ($n = 17{,}127$); Firm C $0.1635$ ($n = 19{,}501$); Firm D $0.0863$ ($n = 8{,}379$).
+
+### M.5 Firm heterogeneity logistic regression and cross-firm hit matrix (Script 44)
+
+**Table XXIII.** Logistic regression of per-signature any-pair HC hit indicator on firm dummies and centred log pool size (Firm A reference).
+
+| Term | Odds ratio (vs Firm A) | Direction |
+|---|---|---|
+| Firm B | $0.053$ | $\sim 19\times$ lower odds than Firm A |
+| Firm C | $0.010$ | $\sim 100\times$ lower odds than Firm A |
+| Firm D | $0.027$ | $\sim 37\times$ lower odds than Firm A |
+| log(pool size, centred) | $4.01$ | $\sim 4\times$ higher odds per log unit pool size |
+
+Per-decile per-firm rates (Table not duplicated here; Script 44 decile table available in the supplementary report): within every pool-size decile, Firms B/C/D show rates of $0.0006$–$0.0358$ while Firm A ranges $0.0541$–$0.5958$. The firm gap survives within matched pool sizes.
+
+**Table XXIV.** Cross-firm hit matrix among Big-4 source signatures with any-pair HC hit; max-cosine partner firm (counts).
+
+| Source firm | Firm A cand. | Firm B | Firm C | Firm D | non-Big-4 | n hits |
+|---|---|---|---|---|---|---|
+| Firm A | $14{,}447$ | $95$ | $44$ | $19$ | $17$ | $14{,}622$ |
+| Firm B | $92$ | $371$ | $8$ | $4$ | $9$ | $484$ |
+| Firm C | $16$ | $7$ | $149$ | $5$ | $1$ | $178$ |
+| Firm D | $22$ | $2$ | $6$ | $106$ | $1$ | $137$ |
+
+Same-pair joint hits (single candidate satisfying both cos $> 0.95$ AND dHash $\leq 5$) are within-firm at rates $99.96\%$ / $97.7\%$ / $98.2\%$ / $97.0\%$ for Firms A/B/C/D respectively.
+
+### M.6 Alert-rate sensitivity around inherited HC threshold (Script 46)
+
+**Table XXV.** Local-gradient / median-gradient ratio at inherited thresholds (descriptive plateau diagnostic).
+
+| Threshold | Local / median gradient ratio | Interpretation |
+|---|---|---|
+| cos $= 0.95$ (HC) | $\approx 25\times$ | locally sensitive (not plateau-stable) |
+| dHash $= 5$ (HC) | $\approx 3.8\times$ | locally sensitive (not plateau-stable) |
+| dHash $= 15$ (MC/HSC boundary) | $\approx 0.08$ | plateau-like (saturating tail) |
+
+Big-4 observed deployed alert rate on actual same-CPA pools: per-signature HC $= 0.4958$; per-document HC $= 0.6228$. The deployed-rate excess over the inter-CPA proxy is $0.3856$ pp per-signature and $0.4431$ pp per-document; this excess is interpreted as a same-CPA repeatability signal under the §III-M caveats, not as a presumed true-positive rate.
+
 ---

 ## Phase 3 close-out checklist