pdf_signature_extraction/paper/paper_a_appendix_v3.md

# Appendix A. BD/McCrary Bin-Width Sensitivity (Signature Level)

The main text (Section III-I, Section IV-D.2) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a *density-smoothness diagnostic* rather than as a threshold estimator.
This appendix documents the empirical basis for that framing by sweeping the bin width across four (variant, bin-width) panels: Firm A and full-sample, each in the cosine and $\text{dHash}_\text{indep}$ direction.

<!-- TABLE A.I: BD/McCrary Bin-Width Sensitivity (two-sided alpha = 0.05, |Z| > 1.96)
| Variant | n | Bin width | Best transition | z_below | z_above |
|---------|---|-----------|-----------------|---------|---------|
| Firm A cosine (sig-level)        | 60,448  | 0.003  | 0.9870 | -2.81  | +9.42   |
| Firm A cosine (sig-level)        | 60,448  | 0.005  | 0.9850 | -9.57  | +19.07  |
| Firm A cosine (sig-level)        | 60,448  | 0.010  | 0.9800 | -54.64 | +69.96  |
| Firm A cosine (sig-level)        | 60,448  | 0.015  | 0.9750 | -85.86 | +106.17 |
| Firm A dHash_indep (sig-level)   | 60,448  | 1      | 2.0    | -4.69  | +10.01  |
| Firm A dHash_indep (sig-level)   | 60,448  | 2      | no transition | — | — |
| Firm A dHash_indep (sig-level)   | 60,448  | 3      | no transition | — | — |
| Full-sample cosine (sig-level)   | 168,740 | 0.003  | 0.9870 | -3.21  | +8.17   |
| Full-sample cosine (sig-level)   | 168,740 | 0.005  | 0.9850 | -8.80  | +14.32  |
| Full-sample cosine (sig-level)   | 168,740 | 0.010  | 0.9800 | -29.69 | +44.91  |
| Full-sample cosine (sig-level)   | 168,740 | 0.015  | 0.9450 | -11.35 | +14.85  |
| Full-sample dHash_indep (sig-l.) | 168,740 | 1      | 2.0    | -6.22  | +4.89   |
| Full-sample dHash_indep (sig-l.) | 168,740 | 2      | 10.0   | -7.35  | +3.83   |
| Full-sample dHash_indep (sig-l.) | 168,740 | 3      | 9.0    | -11.05 | +45.39  |
-->

Two patterns are visible in Table A.I.
First, the procedure consistently identifies a "transition" under every bin width, but the *location* of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3).
The $Z$ statistics also inflate superlinearly with the bin width (Firm A cosine $|Z|$ rises from $\sim 9$ at bin 0.003 to $\sim 106$ at bin 0.015) because wider bins aggregate more mass per bin and therefore shrink the per-bin standard error on a very large sample.
Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.

Second, the candidate transitions all locate *inside* the non-hand-signed mode (cosine $\geq 0.975$, dHash $\leq 10$) rather than between modes, which is the location pattern we would expect of a clean two-mechanism boundary.

Taken together, Table A.I shows that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes.
This observation supports the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator and reinforces the joint reading of Section IV-D that per-signature similarity does not form a clean two-mechanism mixture.

Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials.