pdf_signature_extraction/paper/paper_a_appendix_v3.md

# Appendix A. BD/McCrary Bin-Width Sensitivity

The main text (Sections III-I and IV-E) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a *density-smoothness diagnostic* rather than as one of the threshold estimators whose convergence anchors the accountant-level threshold band.
This appendix documents the empirical basis for that framing by sweeping the bin width across six (variant, bin-width) panels: Firm A / full-sample / accountant-level, each in the cosine and $\text{dHash}_\text{indep}$ direction.

<!-- TABLE A.I: BD/McCrary Bin-Width Sensitivity (two-sided alpha = 0.05, |Z| > 1.96)
| Variant | n | Bin width | Best transition | z_below | z_above |
|---------|---|-----------|-----------------|---------|---------|
| Firm A cosine (sig-level)        | 60,448  | 0.003  | 0.9870 | -2.81  | +9.42   |
| Firm A cosine (sig-level)        | 60,448  | 0.005  | 0.9850 | -9.57  | +19.07  |
| Firm A cosine (sig-level)        | 60,448  | 0.010  | 0.9800 | -54.64 | +69.96  |
| Firm A cosine (sig-level)        | 60,448  | 0.015  | 0.9750 | -85.86 | +106.17 |
| Firm A dHash_indep (sig-level)   | 60,448  | 1      | 2.0    | -4.69  | +10.01  |
| Firm A dHash_indep (sig-level)   | 60,448  | 2      | no transition | — | — |
| Firm A dHash_indep (sig-level)   | 60,448  | 3      | no transition | — | — |
| Full-sample cosine (sig-level)   | 168,740 | 0.003  | 0.9870 | -3.21  | +8.17   |
| Full-sample cosine (sig-level)   | 168,740 | 0.005  | 0.9850 | -8.80  | +14.32  |
| Full-sample cosine (sig-level)   | 168,740 | 0.010  | 0.9800 | -29.69 | +44.91  |
| Full-sample cosine (sig-level)   | 168,740 | 0.015  | 0.9450 | -11.35 | +14.85  |
| Full-sample dHash_indep (sig-l.) | 168,740 | 1      | 2.0    | -6.22  | +4.89   |
| Full-sample dHash_indep (sig-l.) | 168,740 | 2      | 10.0   | -7.35  | +3.83   |
| Full-sample dHash_indep (sig-l.) | 168,740 | 3      | 9.0    | -11.05 | +45.39  |
| Accountant-level cosine_mean     | 686     | 0.002  | no transition | — | — |
| Accountant-level cosine_mean     | 686     | 0.005  | 0.9800 | -3.23  | +5.18   |
| Accountant-level cosine_mean     | 686     | 0.010  | no transition | — | — |
| Accountant-level dHash_indep_mean| 686     | 0.2    | no transition | — | — |
| Accountant-level dHash_indep_mean| 686     | 0.5    | no transition | — | — |
| Accountant-level dHash_indep_mean| 686     | 1.0    | 3.0    | -2.00  | +3.24   |
-->

Two patterns are visible in Table A.I.
First, at the signature level the procedure consistently identifies a "transition" under every bin width, but the *location* of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3).
The $Z$ statistics also inflate superlinearly with the bin width (Firm A cosine $|Z|$ rises from $\sim 9$ at bin 0.003 to $\sim 106$ at bin 0.015) because wider bins aggregate more mass per bin and therefore shrink the per-bin standard error on a very large sample.
Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.

Second, at the accountant level---the unit we rely on for primary threshold inference (Sections III-H, III-J, IV-E)---the procedure produces no significant transition at two of three cosine bin widths and two of three dHash bin widths, and the one marginal transition it does produce ($Z_\text{below} = -2.00$ in the dHash sweep at bin width $1.0$) sits exactly at the critical value for $\alpha = 0.05$.
We stress the inferential asymmetry here: *consistency* with smoothly-mixed clustering is what the BD null delivers, not *affirmative proof* of smoothness.
At $N = 686$ accountants the BD/McCrary test has limited statistical power and can typically reject only sharp cliff-type discontinuities; failure to reject the smoothness null therefore constrains the data only to distributions whose between-cluster transitions are gradual *enough* to escape the test's sensitivity at that sample size.
We read this as reinforcing---not establishing---the clustered-but-smoothly-mixed interpretation derived from the GMM fit and the dip-test evidence.

Taken together, Table A.I shows (i) that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes---and (ii) that the accountant-level BD/McCrary null persists across the bin-width sweep, consistent with but not alone sufficient to establish the clustered-but-smoothly-mixed interpretation discussed in Section V-B and limitation-caveated in Section V-G.
Both observations support the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator.
The accountant-level threshold band reported in Table VIII ($\text{cosine} \approx 0.975$ from the convergence of the KDE antimode, the Beta-2 crossing, and the logit-GMM-2 crossing) is therefore not adjusted to include any BD/McCrary location.

Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials (`reports/bd_sensitivity/bd_sensitivity.json`) produced by `signature_analysis/25_bd_mccrary_sensitivity.py`.