From 980295d5bd1c1919a047da1de4b4b38306968c69 Mon Sep 17 00:00:00 2001 From: gbanyan Date: Wed, 13 May 2026 18:18:59 +0800 Subject: [PATCH] =?UTF-8?q?Update=20=C2=A7IV=20v3.3:=20soften=20=C2=A7IV-D?= =?UTF-8?q?/E=20framing=20+=20rename=20=C2=A7IV-I=20+=20add=20=C2=A7IV-M?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - §IV-D opening: note that the accountant-level dip rejection is fully explained by between-firm composition + integer ties per §III-I.4 (Scripts 39b-e), no longer "the empirical justification for fitting a mixture model" - §IV-E Tables VII/VIII: K=2/K=3 component labels changed from "hand-leaning / mixed / replicated" to position-on-plane labels per §III-J recasting - §IV-I retitled "Inter-CPA Pair-Level Coincidence Rate"; v3.x's "FAR" terminology retroactively reframed; references §IV-M for the v4 Big-4 spike (Script 40b) - New §IV-M (7 tables XIX-XXV): v4-new anchor-based ICCR calibration results consolidated — composition decomposition (Scripts 39b-e), pair-level ICCR sweep (Script 40b), pool- normalised per-signature ICCR (Script 43), document-level ICCR by alarm definition (Script 45), firm-heterogeneity logistic regression + cross-firm hit matrix (Script 44), alert-rate sensitivity (Script 46) - Header bumped to v3.3 (post codex rounds 21-34) Companion to §III v7 commit 723a3f6 and Phase 4 prose v3 commit b33e20d. Co-Authored-By: Claude Opus 4.7 (1M context) --- paper/v4/paper_a_results_v4_section_iv.md | 129 ++++++++++++++++++++-- 1 file changed, 117 insertions(+), 12 deletions(-) diff --git a/paper/v4/paper_a_results_v4_section_iv.md b/paper/v4/paper_a_results_v4_section_iv.md index a602c24..7a13332 100644 --- a/paper/v4/paper_a_results_v4_section_iv.md +++ b/paper/v4/paper_a_results_v4_section_iv.md @@ -1,4 +1,4 @@ -# Section IV. Results — v4.0 Draft v3.2 (post codex rounds 21–25) +# Section IV. Results — v4.0 Draft v3.3 (post codex rounds 21–34) > **Draft note (2026-05-12, v3.2; internal — remove before submission).** This file replaces the §IV-A through §IV-H block of `paper/paper_a_results_v3.md` (v3.20.0) with the Big-4 reframed structure. Section IV expands from 8 sub-sections in v3.20.0 to 12 sub-sections in v4.0 (A through L) to mirror the §III-G..L lineage. **Table-numbering scheme**: the v4 manuscript uses Tables V through XVIII (plus Table XV-B for document-level worst-case counts) for the new v4 Big-4 results; inherited v3.x tables are cited only as "v3.20.0 Table N" with their original v3 number and are *not* renumbered into the v4 sequence. No v4 Table IV is printed; the inherited v3.20.0 Table IV (per-firm detection counts) remains a v3.x reference rather than a v4 table. **Anonymisation**: the Big-4 firms are pseudonymously labelled Firm A through Firm D throughout the manuscript body; real names are not printed in v4 tables or prose. The v3 → v3.1 → v3.2 revision history is: v3 (post round 23) made the table-numbering scheme and anonymisation policy decisions and applied 14 presentation fixes; v3.1 (post round 24) tightened the close-out checklist; v3.2 (post round 25) finalises this draft note. Empirical anchors trace to Scripts 32–42 on branch `paper-a-v4-big4`; the §III provenance table covers the methodology-side citations and §IV adds new tables for the v4.0-specific results. @@ -20,7 +20,7 @@ The all-pairs intra-vs-inter class distribution analysis (KDE crossover at $\ove ## D. Big-4 Accountant-Level Distributional Characterisation -This section reports the empirical evidence for §III-I's three-diagnostic distributional characterisation at the Big-4 accountant level. All numbers below are direct re-statements from Scripts 32 / 34; cross-citations to the v3.x (signature-level) analysis are noted where the v4.0 result differs structurally from the v3.x result. +This section reports the empirical evidence for §III-I's distributional diagnostics at the Big-4 accountant level. All numbers below are direct re-statements from Scripts 32 / 34. The accountant-level dip-test rejection reported in Table V is, per §III-I.4 (Scripts 39b–39e), fully attributable to between-firm location shifts and integer mass-point artefacts rather than to within-population bimodality; the v4-new composition-decomposition diagnostics that establish this finding are tabulated in §IV-M below alongside the anchor-based ICCR calibration. **Table V.** Hartigan dip-test results, accountant-level marginals (Big-4 primary; comparison scopes from Script 32). @@ -48,12 +48,12 @@ The Big-4-scope null on both axes is consistent with the §IV-E mixture evidence This section reports the K=2 and K=3 2D Gaussian mixture fits to the Big-4 accountant-level distribution and the bootstrap stability of their marginal crossings. -**Table VII.** Big-4 K=2 mixture components and marginal-crossing bootstrap 95% confidence intervals. +**Table VII.** Big-4 K=2 mixture components (descriptive partition; not mechanism clusters per §III-J) and marginal-crossing bootstrap 95% confidence intervals. | K=2 component | $\overline{\text{cos}}$ | $\overline{\text{dHash}}$ | weight | |---|---|---|---| -| Hand-leaning | 0.954 | 7.14 | 0.689 | -| Replicated | 0.983 | 2.41 | 0.311 | +| K=2-a (low-cos / high-dHash position) | 0.954 | 7.14 | 0.689 | +| K=2-b (high-cos / low-dHash position) | 0.983 | 2.41 | 0.311 | Marginal crossings (point + bootstrap 95% CI, $n_{\text{boot}} = 500$): @@ -64,13 +64,13 @@ Marginal crossings (point + bootstrap 95% CI, $n_{\text{boot}} = 500$): $\text{BIC}(K{=}2) = -1108.45$ (Script 34). -**Table VIII.** Big-4 K=3 mixture components. +**Table VIII.** Big-4 K=3 mixture components (descriptive firm-compositional partition per §III-J; not mechanism clusters). -| K=3 component | $\overline{\text{cos}}$ | $\overline{\text{dHash}}$ | weight | descriptive label | +| K=3 component | $\overline{\text{cos}}$ | $\overline{\text{dHash}}$ | weight | descriptive position | |---|---|---|---|---| -| C1 | 0.9457 | 9.17 | 0.143 | hand-leaning | -| C2 | 0.9558 | 6.66 | 0.536 | mixed | -| C3 | 0.9826 | 2.41 | 0.321 | replicated | +| C1 | 0.9457 | 9.17 | 0.143 | low-cos / high-dHash corner | +| C2 | 0.9558 | 6.66 | 0.536 | central region | +| C3 | 0.9826 | 2.41 | 0.321 | high-cos / low-dHash corner | $\text{BIC}(K{=}3) = -1111.93$, lower than $K{=}2$ by $3.48$ (mild support; not by itself decisive). The full-fit K=3 baseline above is reproduced in Scripts 35, 37, and 38 with identical hyperparameters; Script 37 additionally fits K=3 on each leave-one-firm-out training set (those fold-specific components differ from the full-fit baseline by design and are reported separately in §IV-G Table XIII). Operational use of the K=2 / K=3 fits is governed by §III-J and §III-L; §IV-G reports the LOOO reproducibility evidence that motivates reporting both fits descriptively. @@ -154,9 +154,11 @@ This section reports the only hard-ground-truth subset analysis available in the We caution that for the Paper A box rule this result is close to tautological (byte-identical nearest-neighbour signatures have cosine $\approx 1$ and dHash $\approx 0$, well inside the rule's high-confidence region); v3.20.0 §V-F discusses this conservative-subset caveat at length and we retain that discussion. The reverse-anchor cut is chosen by *prevalence calibration* against the inherited box rule's overall replicated rate of $49.58\%$ across Big-4 signatures; this is a documented v4.0 limitation since no signature-level hand-signed ground truth exists to permit direct ROC optimisation. -## I. Inter-CPA Negative-Anchor FAR (inherited from v3.x) +## I. Inter-CPA Pair-Level Coincidence Rate (Big-4 spike + inherited corpus-wide) -The signature-level inter-CPA negative-anchor FAR analysis (~50,000 random pairs from different CPAs; v3.20.0 §IV-F.1, Table X) is inherited unchanged. The v3.x result, reproduced here for reference: at the operational cosine cut of $0.95$, the inter-CPA FAR is $0.0005$ (Wilson 95% CI $[0.0003, 0.0007]$). v4.0 does not regenerate this analysis on the Big-4 subset; the inter-CPA negative-anchor logic is corpus-wide and the v3.x FAR remains the operational specificity reference for the §III-L operational rule. +The signature-level inter-CPA pair-level coincidence-rate analysis (reported in v3.20.0 §IV-F.1, Table X as "FAR") is inherited and extended in v4.0. v4.0 retroactively reframes the metric as **inter-CPA pair-level coincidence rate (ICCR)** rather than "False Acceptance Rate" because the corpus does not provide signature-level ground-truth negative labels; the inter-CPA negative-anchor assumption underpinning the metric is itself partially violated by within-firm cross-CPA template-like collision structures (§III-L.4). The v3.20.0 corpus-wide spike on $\sim 50{,}000$ inter-CPA pairs reported a per-comparison rate of $0.0005$ (Wilson 95% CI $[0.0003, 0.0007]$) at the cosine cut $0.95$. + +v4.0 additionally reports the §III-L.1 Big-4-scope spike at higher sample size ($5 \times 10^5$ inter-CPA pairs; Script 40b), which replicates and extends the v3 result and adds the structural dimension (dHash) and joint-rule rates. The §III-L.1 numbers are referenced rather than duplicated here; the consolidated v4-new ICCR calibration appears in §IV-M Table XVI. ## J. Five-Way Per-Signature + Document-Level Classification Output @@ -255,6 +257,109 @@ This section reports the v4.0 reproducibility cross-check at the full accountant The feature-backbone ablation (v3.20.0 Table XVIII; backbone replacement of ResNet-50 with alternative ImageNet-pretrained backbones to verify that the §III-E embedding choice is not load-bearing) is inherited unchanged. v3.20.0 Table XVIII is cited by its original v3 number and is **not** the same table as the v4 Table XVIII (which reports the Big-4 vs full-dataset Spearman drift in §IV-K). v4.0 makes no scope-specific re-derivation of the ablation; the analysis is a methodological-stability check on the embedding stage and is corpus-wide rather than Big-4-restricted. +## M. v4-New Anchor-Based ICCR Calibration Results + +This section consolidates the v4-new empirical results that support the §III-L anchor-based threshold calibration framework. Numbers below are direct re-statements from the spike scripts cited per row; the corresponding §III provenance table entries appear in §III's provenance table. + +### M.1 Composition decomposition (Scripts 39b–39e) + +**Table XIX.** Within-firm and between-firm decomposition of the Big-4 accountant-level dip-test rejection. + +| Diagnostic | Scope | Statistic | Implication | +|---|---|---|---| +| Within-firm signature-level cosine dip | Big-4 (4 firms) | $p_{\text{cos}} \in \{0.176, 0.991, 0.551, 0.976\}$ | 0/4 firms reject; cosine within-firm unimodal | +| Within-firm signature-level cosine dip | non-Big-4 (10 firms $\geq 500$ sigs) | $p_{\text{cos}} \in [0.59, 0.99]$ | 0/10 firms reject; cosine within-firm unimodal | +| Within-firm jittered-dHash dip (5 seeds, median) | Big-4 (4 firms) | $p_{\text{med}} \in \{0.999, 0.996, 0.999, 0.9995\}$ | 0/4 firms reject after integer-jitter; raw rejection was integer-tie artefact | +| Big-4 pooled dHash: 2×2 factorial | firm-centred + jittered (5 seeds) | $p_{\text{med}} = 0.35$, 0/5 seeds reject | combined corrections eliminate rejection; multimodality is composition + integer artefact | +| Integer-histogram valley near $\text{dHash} \approx 5$ | within each Big-4 firm | none (0/4 firms) | no within-firm dHash antimode at the inherited HC cutoff | + +(Source: Scripts 39b, 39c, 39d, 39e; bootstrap $n_{\text{boot}} = 2000$; jitter $\sim \mathrm{U}[-0.5, +0.5]$.) + +### M.2 Anchor-based inter-CPA pair-level ICCR (Script 40b) + +**Table XX.** Big-4 inter-CPA per-comparison ICCR sweep, $n = 5 \times 10^5$ pairs (Big-4 scope; v4 new). + +| Threshold | Per-comparison ICCR | 95% Wilson CI | +|---|---|---| +| cos $> 0.945$ (v3.x published "natural threshold") | $0.00081$ | $[0.00073, 0.00089]$ | +| cos $> 0.95$ (inherited operating point) | $0.00060$ | $[0.00053, 0.00067]$ | +| cos $> 0.97$ | $0.00024$ | $[0.00020, 0.00029]$ | +| cos $> 0.98$ | $0.00009$ | $[0.00007, 0.00012]$ | +| dHash $\leq 5$ (inherited operating point) | $0.00129$ | $[0.00120, 0.00140]$ | +| dHash $\leq 4$ | $0.00050$ | $[0.00044, 0.00057]$ | +| dHash $\leq 3$ | $0.00019$ | $[0.00015, 0.00023]$ | +| Joint: cos $> 0.95$ AND dHash $\leq 5$ (any-pair semantics) | $0.00014$ | — | +| Joint: cos $> 0.95$ AND dHash $\leq 4$ (any-pair) | $0.00011$ | — | + +Conditional ICCR(dHash $\leq 5$ | cos $> 0.95$) $= 0.234$ (Wilson 95% $[0.190, 0.285]$; $70$ of $299$ pairs). + +The cos $> 0.95$ row replicates v3.20.0 §IV-F.1 Table X (v3 reported $0.0005$ under prior "FAR" terminology). The dHash row and joint row are v4 new. + +### M.3 Pool-normalised per-signature ICCR (Script 43) + +**Table XXI.** Pool-normalised per-signature ICCR under the deployed any-pair HC rule (cos $> 0.95$ AND dHash $\leq 5$); $n_{\text{sig}} = 150{,}453$ (vector-complete Big-4); CPA-block bootstrap $n_{\text{boot}} = 1000$. + +| Scope | Per-signature ICCR | Wilson 95% CI | CPA-bootstrap 95% CI | +|---|---|---|---| +| Big-4 pooled (any-pair, deployed) | $0.1102$ | $[0.1086, 0.1118]$ | $[0.0908, 0.1330]$ | +| Big-4 pooled (same-pair, stricter alternative) | $0.0827$ | $[0.0813, 0.0841]$ | $[0.0668, 0.1021]$ | +| Firm A (any-pair) | $0.2594$ | — | — | +| Firm B (any-pair) | $0.0147$ | — | — | +| Firm C (any-pair) | $0.0053$ | — | — | +| Firm D (any-pair) | $0.0110$ | — | — | +| Pool-size decile 1 (smallest pools) any-pair | $0.0249$ | — | — | +| Pool-size decile 10 (largest pools) any-pair | $0.1905$ | — | — | + +Decile trend is broadly monotone in pool size with two minor reversals (decile 5 and decile 9 dip below their predecessors). Stricter operating point cos $> 0.95$ AND dHash $\leq 3$ (same-pair) gives per-signature ICCR $0.0449$. + +### M.4 Document-level ICCR under three alarm definitions (Script 45) + +**Table XXII.** Document-level inter-CPA ICCR by alarm definition; $n_{\text{docs}} = 75{,}233$. + +| Alarm definition | Alarm set | Document-level ICCR | Wilson 95% CI | +|---|---|---|---| +| D1 | HC only | $0.1797$ | $[0.1770, 0.1825]$ | +| D2 (operational) | HC + MC | $0.3375$ | $[0.3342, 0.3409]$ | +| D3 | HC + MC + HSC | $0.3384$ | $[0.3351, 0.3418]$ | + +Per-firm D2 document-level ICCR: Firm A $0.6201$ ($n = 30{,}226$); Firm B $0.1600$ ($n = 17{,}127$); Firm C $0.1635$ ($n = 19{,}501$); Firm D $0.0863$ ($n = 8{,}379$). + +### M.5 Firm heterogeneity logistic regression and cross-firm hit matrix (Script 44) + +**Table XXIII.** Logistic regression of per-signature any-pair HC hit indicator on firm dummies and centred log pool size (Firm A reference). + +| Term | Odds ratio (vs Firm A) | Direction | +|---|---|---| +| Firm B | $0.053$ | $\sim 19\times$ lower odds than Firm A | +| Firm C | $0.010$ | $\sim 100\times$ lower odds than Firm A | +| Firm D | $0.027$ | $\sim 37\times$ lower odds than Firm A | +| log(pool size, centred) | $4.01$ | $\sim 4\times$ higher odds per log unit pool size | + +Per-decile per-firm rates (Table not duplicated here; Script 44 decile table available in the supplementary report): within every pool-size decile, Firms B/C/D show rates of $0.0006$–$0.0358$ while Firm A ranges $0.0541$–$0.5958$. The firm gap survives within matched pool sizes. + +**Table XXIV.** Cross-firm hit matrix among Big-4 source signatures with any-pair HC hit; max-cosine partner firm (counts). + +| Source firm | Firm A cand. | Firm B | Firm C | Firm D | non-Big-4 | n hits | +|---|---|---|---|---|---|---| +| Firm A | $14{,}447$ | $95$ | $44$ | $19$ | $17$ | $14{,}622$ | +| Firm B | $92$ | $371$ | $8$ | $4$ | $9$ | $484$ | +| Firm C | $16$ | $7$ | $149$ | $5$ | $1$ | $178$ | +| Firm D | $22$ | $2$ | $6$ | $106$ | $1$ | $137$ | + +Same-pair joint hits (single candidate satisfying both cos $> 0.95$ AND dHash $\leq 5$) are within-firm at rates $99.96\%$ / $97.7\%$ / $98.2\%$ / $97.0\%$ for Firms A/B/C/D respectively. + +### M.6 Alert-rate sensitivity around inherited HC threshold (Script 46) + +**Table XXV.** Local-gradient / median-gradient ratio at inherited thresholds (descriptive plateau diagnostic). + +| Threshold | Local / median gradient ratio | Interpretation | +|---|---|---| +| cos $= 0.95$ (HC) | $\approx 25\times$ | locally sensitive (not plateau-stable) | +| dHash $= 5$ (HC) | $\approx 3.8\times$ | locally sensitive (not plateau-stable) | +| dHash $= 15$ (MC/HSC boundary) | $\approx 0.08$ | plateau-like (saturating tail) | + +Big-4 observed deployed alert rate on actual same-CPA pools: per-signature HC $= 0.4958$; per-document HC $= 0.6228$. The deployed-rate excess over the inter-CPA proxy is $0.3856$ pp per-signature and $0.4431$ pp per-document; this excess is interpreted as a same-CPA repeatability signal under the §III-M caveats, not as a presumed true-positive rate. + --- ## Phase 3 close-out checklist