Paper A v3.7: demote BD/McCrary to density-smoothness diagnostic; add Appendix A

Implements codex gpt-5.4 recommendation (paper/codex_bd_mccrary_opinion.md, "option (c) hybrid"): demote BD/McCrary in the main text from a co-equal threshold estimator to a density-smoothness diagnostic, and add a bin-width sensitivity appendix as an audit trail. Why: the bin-width sweep (Script 25) confirms that at the signature level the BD transition drifts monotonically with bin width (Firm A cosine: 0.987 -> 0.985 -> 0.980 -> 0.975 as bin width widens 0.003 -> 0.015; full-sample dHash transitions drift from 2 to 10 to 9 across bin widths 1 / 2 / 3) and Z statistics inflate superlinearly with bin width, both characteristic of a histogram-resolution artifact. At the accountant level the BD null is robust across the sweep. The paper's earlier "three methodologically distinct estimators" framing therefore could not be defended to an IEEE Access reviewer once the sweep was run. Added - signature_analysis/25_bd_mccrary_sensitivity.py: bin-width sweep across 6 variants (Firm A / full-sample / accountant-level, each cosine + dHash_indep) and 3-4 bin widths per variant. Reports Z_below, Z_above, p-values, and number of significant transitions per cell. Writes reports/bd_sensitivity/bd_sensitivity.{json,md}. - paper/paper_a_appendix_v3.md: new "Appendix A. BD/McCrary Bin-Width Sensitivity" with Table A.I (all 20 sensitivity cells) and interpretation linking the empirical pattern to the main-text framing decision. - export_v3.py: appendix inserted into SECTIONS between conclusion and references. - paper/codex_bd_mccrary_opinion.md: codex gpt-5.4 recommendation captured verbatim for audit trail. Main-text reframing - Abstract: "three methodologically distinct estimators" -> "two estimators plus a Burgstahler-Dichev/McCrary density- smoothness diagnostic". Trimmed to 243 words. - Introduction: related-work summary, pipeline step 5, accountant- level convergence sentence, contribution 4, and section-outline line all updated. Contribution 4 renamed to "Convergent threshold framework with a smoothness diagnostic". - Methodology III-I: section renamed to "Convergent Threshold Determination with a Density-Smoothness Diagnostic". "Method 2: BD/McCrary Discontinuity" converted to "Density-Smoothness Diagnostic" in a new subsection; Method 3 (Beta mixture) renumbered to Method 2. Subsections 4 and 5 updated to refer to "two threshold estimators" with BD as diagnostic. - Methodology III-A pipeline overview: "three methodologically distinct statistical methods" -> "two methodologically distinct threshold estimators complemented by a density-smoothness diagnostic". - Methodology III-L: "three-method analysis" -> "accountant-level threshold analysis (KDE antimode, Beta-2 crossing, logit-Gaussian robustness crossing)". - Results IV-D.1 heading: "BD/McCrary Discontinuity" -> "BD/McCrary Density-Smoothness Diagnostic". Prose now notes the Appendix-A bin-width instability explicitly. - Results IV-E: Table VIII restructured to label BD rows "(diagnostic only; bin-unstable)" and "(diagnostic; null across Appendix A)". Summary sentence rewritten to frame BD null as evidence for clustered-but-smoothly-mixed rather than as a convergence failure. Table cosine P5 row corrected from 0.941 to 0.9407 to match III-K. - Results IV-G.3 and IV-I.2: "three-method convergence/thresholds" -> "accountant-level convergent thresholds" (clarifies the 3 converging estimates are KDE antimode, Beta-2, logit-Gaussian, not KDE/BD/Beta). - Discussion V-B: "three-method framework" -> "convergent threshold framework". - Conclusion: "three methodologically distinct methods" -> "two threshold estimators and a density-smoothness diagnostic"; contribution 3 restated; future-work sentence updated. - Impact Statement (archived): "three methodologically distinct threshold-selection methods" -> "two methodologically distinct threshold estimators plus a density-smoothness diagnostic" so the archived text is internally consistent if reused. Discussion V-B / V-G already framed BD as a diagnostic in v3.5 (unchanged in this commit). The reframing therefore brings Abstract / Introduction / Methodology / Results / Conclusion into alignment with the Discussion framing that codex had already endorsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:32:50 +08:00
parent 6946baa096
commit 552b6b80d4
11 changed files with 458 additions and 63 deletions
@@ -75,12 +75,14 @@ At the per-accountant aggregate level both cosine and dHash means are strongly m
 This asymmetry between signature level and accountant level is itself an empirical finding.
 It predicts that a two-component mixture fit to per-signature cosine will be a forced fit (Section IV-D.2 below), while the same fit at the accountant level will succeed---a prediction borne out in the subsequent analyses.

-### 1) Burgstahler-Dichev / McCrary Discontinuity
+### 1) Burgstahler-Dichev / McCrary Density-Smoothness Diagnostic

-Applying the BD/McCrary test (Section III-I.2) to the per-signature cosine distribution yields a single significant transition at 0.985 for Firm A and 0.985 for the full sample; the min-dHash distributions exhibit a transition at Hamming distance 2 for both Firm A and the full sample.
-We note that the cosine transition at 0.985 lies *inside* the non-hand-signed mode rather than at the separation between two mechanisms, consistent with the dip-test finding that per-signature cosine is not cleanly bimodal.
-In contrast, the dHash transition at distance 2 is a substantively meaningful structural boundary that corresponds to the natural separation between pixel-near-identical replication and scan-noise-perturbed replication.
-At the accountant level the test does not produce a significant $Z^- \rightarrow Z^+$ transition in either the cosine-mean or the dHash-mean distribution (Section IV-E), reflecting that accountant aggregates are smooth at the bin resolution the test requires rather than exhibiting a sharp density discontinuity.
+Applying the BD/McCrary procedure (Section III-I.3) to the per-signature cosine distribution yields a nominally significant $Z^- \rightarrow Z^+$ transition at cosine 0.985 for Firm A and 0.985 for the full sample; the min-dHash distributions exhibit a transition at Hamming distance 2 for both Firm A and the full sample under the bin width ($0.005$ / $1$) used here.
+Two cautions, however, prevent us from treating these signature-level transitions as thresholds.
+First, the cosine transition at 0.985 lies *inside* the non-hand-signed mode rather than at the separation between two mechanisms, consistent with the dip-test finding that per-signature cosine is not cleanly bimodal.
+Second, Appendix A documents that the signature-level transition locations are not bin-width-stable (Firm A cosine drifts across 0.987, 0.985, 0.980, 0.975 as the bin width is widened from 0.003 to 0.015, and full-sample dHash transitions drift across 2, 10, 9 as bin width grows from 1 to 3), which is characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity between two mechanisms.
+At the accountant level the test does not produce a significant transition in either the cosine-mean or the dHash-mean distribution, and this null is robust across the Appendix-A bin-width sweep.
+We therefore read the BD/McCrary pattern as evidence that accountant-level aggregates are clustered-but-smoothly-mixed rather than sharply discontinuous, and we use BD/McCrary as a density-smoothness diagnostic rather than as an independent threshold estimator.

 ### 2) Beta Mixture at Signature Level: A Forced Fit

@@ -123,29 +125,29 @@ First, of the 180 CPAs in the Firm A registry, 171 have $\geq 10$ signatures and
 Component C1 captures 139 of these 171 Firm A CPAs (81%) in a tight high-cosine / low-dHash cluster; the remaining 32 Firm A CPAs fall into C2.
 This split is consistent with the minority-hand-signers framing of Section III-H and with the unimodal-long-tail observation of Section IV-D.
 Second, the three-component partition is *not* a firm-identity partition: three of the four Big-4 firms dominate C2 together, and smaller domestic firms cluster into C3.
-Third, applying the three-method framework of Section III-I to the accountant-level cosine-mean distribution yields the estimates summarized in the accountant-level rows of Table VIII (below): KDE antimode $= 0.973$, Beta-2 crossing $= 0.979$, and the logit-GMM-2 crossing $= 0.976$ converge within $\sim 0.006$ of each other, while the BD/McCrary test does not produce a significant transition at the accountant level.
+Third, applying the threshold framework of Section III-I to the accountant-level cosine-mean distribution yields the estimates summarized in the accountant-level rows of Table VIII (below): KDE antimode $= 0.973$, Beta-2 crossing $= 0.979$, and the logit-GMM-2 crossing $= 0.976$ converge within $\sim 0.006$ of each other, while the BD/McCrary density-smoothness diagnostic does not produce a significant transition at the accountant level (robust across the bin-width sweep in Appendix A).
 For completeness we also report the two-dimensional two-component GMM's marginal crossings at cosine $= 0.945$ and dHash $= 8.10$; these differ from the 1D crossings because they are derived from the joint (cosine, dHash) covariance structure rather than from each 1D marginal in isolation.

-Table VIII summarizes all threshold estimates produced by the three methods across the two analysis levels for a compact cross-level comparison.
+Table VIII summarizes the threshold estimates produced by the two threshold estimators and the BD/McCrary smoothness diagnostic across the two analysis levels for a compact cross-level comparison.

 <!-- TABLE VIII: Threshold Convergence Summary Across Levels
 | Level / method | Cosine threshold | dHash threshold |
 |----------------|-------------------|------------------|
-| Signature-level, all-pairs KDE crossover | 0.837 | — |
-| Signature-level, BD/McCrary transition | 0.985 | 2.0 |
-| Signature-level, Beta-2 EM crossing (Firm A) | 0.977 | — |
-| Signature-level, logit-GMM-2 crossing (Full) | 0.980 | — |
-| Accountant-level, KDE antimode | **0.973** | **4.07** |
-| Accountant-level, BD/McCrary transition | no transition | no transition |
-| Accountant-level, Beta-2 EM crossing | **0.979** | **3.41** |
-| Accountant-level, logit-GMM-2 crossing | **0.976** | **3.93** |
-| Accountant-level, 2D-GMM 2-comp marginal crossing | 0.945 | 8.10 |
-| Firm A calibration-fold cosine P5 | 0.941 | — |
-| Firm A calibration-fold dHash P95 | — | 9 |
-| Firm A calibration-fold dHash median | — | 2 |
+| Signature-level, all-pairs KDE crossover                                 | 0.837             | —                 |
+| Signature-level, Beta-2 EM crossing (Firm A)                              | 0.977             | —                 |
+| Signature-level, logit-GMM-2 crossing (Full)                              | 0.980             | —                 |
+| Signature-level, BD/McCrary transition (diagnostic only; bin-unstable, Appendix A) | 0.985     | 2.0               |
+| Accountant-level, KDE antimode (threshold estimator)                      | **0.973**         | **4.07**          |
+| Accountant-level, Beta-2 EM crossing (threshold estimator)                | **0.979**         | **3.41**          |
+| Accountant-level, logit-GMM-2 crossing (robustness)                       | **0.976**         | **3.93**          |
+| Accountant-level, BD/McCrary transition (diagnostic; null across Appendix A) | no transition  | no transition     |
+| Accountant-level, 2D-GMM 2-comp marginal crossing (secondary)             | 0.945             | 8.10              |
+| Firm A calibration-fold cosine P5                                         | 0.9407            | —                 |
+| Firm A calibration-fold dHash_indep P95                                   | —                 | 9                 |
+| Firm A calibration-fold dHash_indep median                                | —                 | 2                 |
 -->

-Methods 1 and 3 (KDE antimode, Beta-2 crossing, and its logit-GMM robustness check) converge at the accountant level to a cosine threshold of $\approx 0.975 \pm 0.003$ and a dHash threshold of $\approx 3.8 \pm 0.4$, while Method 2 (BD/McCrary) does not produce a significant discontinuity.
+At the accountant level the two threshold estimators (KDE antimode and Beta-2 crossing) together with the logit-Gaussian robustness crossing converge to a cosine threshold of $\approx 0.975 \pm 0.003$ and a dHash threshold of $\approx 3.8 \pm 0.4$; the BD/McCrary density-smoothness diagnostic produces no significant transition at the same level (and this null is robust across Appendix A's bin-width sweep), consistent with clustered-but-smoothly-mixed accountant-level aggregates.
 This is the accountant-level convergence we rely on for the primary threshold interpretation; the two-dimensional GMM marginal crossings (cosine $= 0.945$, dHash $= 8.10$) differ because they reflect joint (cosine, dHash) covariance structure, and we report them as a secondary cross-check.
 The signature-level estimates are reported for completeness and as diagnostic evidence of the continuous-spectrum asymmetry (Section IV-D.2) rather than as primary classification boundaries.

@@ -238,8 +240,8 @@ We therefore interpret the held-out fold as confirming the qualitative finding (
 ### 3) Operational-Threshold Sensitivity: cos $> 0.95$ vs cos $> 0.945$

 The per-signature classifier (Section III-L) uses cos $> 0.95$ as its operational cosine cut, anchored on the whole-sample Firm A P95 heuristic.
-The accountant-level three-method convergence (Section IV-E) places the primary accountant-level reference between $0.973$ and $0.979$, and the accountant-level 2D-GMM marginal at $0.945$.
-Because the classifier operates at the signature level while the three-method convergence estimates are at the accountant level, they are formally non-substitutable.
+The accountant-level convergent threshold analysis (Section IV-E) places the primary accountant-level reference between $0.973$ and $0.979$ (KDE antimode, Beta-2 crossing, logit-Gaussian robustness crossing), and the accountant-level 2D-GMM marginal at $0.945$.
+Because the classifier operates at the signature level while these convergent accountant-level estimates are at the accountant level, they are formally non-substitutable.
 We report a sensitivity check in which the classifier's operational cut cos $> 0.95$ is replaced by the nearest accountant-level reference, cos $> 0.945$.

 <!-- TABLE XII: Classifier Sensitivity to the Operational Cosine Cut (All-Sample Five-Way Output, N = 168,740 signatures)
@@ -398,7 +400,7 @@ We note that because the non-hand-signed thresholds are themselves calibrated to
 ### 2) Cross-Method Agreement

 Among non-Firm-A CPAs with cosine $> 0.95$, only 11.3% exhibit dHash $\leq 5$, compared to 58.7% for Firm A---a five-fold difference that demonstrates the discriminative power of the structural verification layer.
-This is consistent with the three-method thresholds (Section IV-E, Table VIII) and with the cross-firm compositional pattern of the accountant-level GMM (Table VII).
+This is consistent with the accountant-level convergent thresholds (Section IV-E, Table VIII) and with the cross-firm compositional pattern of the accountant-level GMM (Table VII).

 ## J. Ablation Study: Feature Backbone Comparison