16e90bab20
Major changes (per partner red-pen + user decision): - Delete entire accountant-level analysis (III.J, IV.E, Tables VI/VII/VIII, Fig 4) -- cross-year pooling assumption unjustified, removes the implicit "habitually stamps = always stamps" reading. - Renumber sections III.J/K/L (was K/L/M) and IV.E/F/G/H/I (was F/G/H/I/J). - Title: "Three-Method Convergent Thresholding" -> "Replication-Dominated Calibration" (the three diagnostics do NOT converge at signature level). - Operational cosine cut anchored on whole-sample Firm A P7.5 (cos > 0.95). - Three statistical diagnostics (Hartigan/Beta/BD-McCrary) reframed as descriptive characterisation, not threshold estimators. - Firm A replication-dominated framing: 3 evidence strands -> 2. - Discussion limitation list: drop accountant-level cross-year pooling and BD/McCrary diagnostic; add auditor-year longitudinal tracking as future work. - Tone-shift: "we do not claim / do not derive" -> "we find / motivates". Reference verification (independent web-search audit of all 41 refs): - Fix [5] author hallucination: Hadjadj et al. -> Kao & Wen (real authors of Appl. Sci. 10:11:3716; report at paper/reference_verification_v3.md). - Polish [16] [21] [22] [25] (year/volume/page-range/model-name). Gemini 2.5 Pro peer review (Minor Revision verdict, A-F all positive): - Neutralize script-path references in tables/appendix -> "supplementary materials". - Move conflict-of-interest declaration from III-L to new Declarations section before References (paper_a_declarations_v3.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
36 lines
3.4 KiB
Markdown
36 lines
3.4 KiB
Markdown
# Appendix A. BD/McCrary Bin-Width Sensitivity (Signature Level)
|
|
|
|
The main text (Section III-I, Section IV-D.2) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a *density-smoothness diagnostic* rather than as a threshold estimator.
|
|
This appendix documents the empirical basis for that framing by sweeping the bin width across four (variant, bin-width) panels: Firm A and full-sample, each in the cosine and $\text{dHash}_\text{indep}$ direction.
|
|
|
|
<!-- TABLE A.I: BD/McCrary Bin-Width Sensitivity (two-sided alpha = 0.05, |Z| > 1.96)
|
|
| Variant | n | Bin width | Best transition | z_below | z_above |
|
|
|---------|---|-----------|-----------------|---------|---------|
|
|
| Firm A cosine (sig-level) | 60,448 | 0.003 | 0.9870 | -2.81 | +9.42 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.005 | 0.9850 | -9.57 | +19.07 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.010 | 0.9800 | -54.64 | +69.96 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.015 | 0.9750 | -85.86 | +106.17 |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 1 | 2.0 | -4.69 | +10.01 |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 2 | no transition | — | — |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 3 | no transition | — | — |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.003 | 0.9870 | -3.21 | +8.17 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.005 | 0.9850 | -8.80 | +14.32 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.010 | 0.9800 | -29.69 | +44.91 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.015 | 0.9450 | -11.35 | +14.85 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 1 | 2.0 | -6.22 | +4.89 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 2 | 10.0 | -7.35 | +3.83 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 3 | 9.0 | -11.05 | +45.39 |
|
|
-->
|
|
|
|
Two patterns are visible in Table A.I.
|
|
First, the procedure consistently identifies a "transition" under every bin width, but the *location* of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3).
|
|
The $Z$ statistics also inflate superlinearly with the bin width (Firm A cosine $|Z|$ rises from $\sim 9$ at bin 0.003 to $\sim 106$ at bin 0.015) because wider bins aggregate more mass per bin and therefore shrink the per-bin standard error on a very large sample.
|
|
Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.
|
|
|
|
Second, the candidate transitions all locate *inside* the non-hand-signed mode (cosine $\geq 0.975$, dHash $\leq 10$) rather than between modes, which is the location pattern we would expect of a clean two-mechanism boundary.
|
|
|
|
Taken together, Table A.I shows that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes.
|
|
This observation supports the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator and reinforces the joint reading of Section IV-D that per-signature similarity does not form a clean two-mechanism mixture.
|
|
|
|
Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials.
|