Paper A v3.18: remove accountant-level + replication-dominated calibration + Gemini 2.5 Pro review minor fixes

Major changes (per partner red-pen + user decision): - Delete entire accountant-level analysis (III.J, IV.E, Tables VI/VII/VIII, Fig 4) -- cross-year pooling assumption unjustified, removes the implicit "habitually stamps = always stamps" reading. - Renumber sections III.J/K/L (was K/L/M) and IV.E/F/G/H/I (was F/G/H/I/J). - Title: "Three-Method Convergent Thresholding" -> "Replication-Dominated Calibration" (the three diagnostics do NOT converge at signature level). - Operational cosine cut anchored on whole-sample Firm A P7.5 (cos > 0.95). - Three statistical diagnostics (Hartigan/Beta/BD-McCrary) reframed as descriptive characterisation, not threshold estimators. - Firm A replication-dominated framing: 3 evidence strands -> 2. - Discussion limitation list: drop accountant-level cross-year pooling and BD/McCrary diagnostic; add auditor-year longitudinal tracking as future work. - Tone-shift: "we do not claim / do not derive" -> "we find / motivates". Reference verification (independent web-search audit of all 41 refs): - Fix [5] author hallucination: Hadjadj et al. -> Kao & Wen (real authors of Appl. Sci. 10:11:3716; report at paper/reference_verification_v3.md). - Polish [16] [21] [22] [25] (year/volume/page-range/model-name). Gemini 2.5 Pro peer review (Minor Revision verdict, A-F all positive): - Neutralize script-path references in tables/appendix -> "supplementary materials". - Move conflict-of-interest declaration from III-L to new Declarations section before References (paper_a_declarations_v3.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:43:09 +08:00
parent 6ab6e19137
commit 16e90bab20
13 changed files with 430 additions and 264 deletions
@@ -1,7 +1,7 @@
-# Appendix A. BD/McCrary Bin-Width Sensitivity
+# Appendix A. BD/McCrary Bin-Width Sensitivity (Signature Level)

-The main text (Sections III-I and IV-E) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a *density-smoothness diagnostic* rather than as one of the threshold estimators whose convergence anchors the accountant-level threshold band.
-This appendix documents the empirical basis for that framing by sweeping the bin width across six (variant, bin-width) panels: Firm A / full-sample / accountant-level, each in the cosine and $\text{dHash}_\text{indep}$ direction.
+The main text (Section III-I, Section IV-D.2) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a *density-smoothness diagnostic* rather than as a threshold estimator.
+This appendix documents the empirical basis for that framing by sweeping the bin width across four (variant, bin-width) panels: Firm A and full-sample, each in the cosine and $\text{dHash}_\text{indep}$ direction.

 <!-- TABLE A.I: BD/McCrary Bin-Width Sensitivity (two-sided alpha = 0.05, |Z| > 1.96)
 | Variant | n | Bin width | Best transition | z_below | z_above |
@@ -20,26 +20,16 @@ This appendix documents the empirical basis for that framing by sweeping the bin
 | Full-sample dHash_indep (sig-l.) | 168,740 | 1      | 2.0    | -6.22  | +4.89   |
 | Full-sample dHash_indep (sig-l.) | 168,740 | 2      | 10.0   | -7.35  | +3.83   |
 | Full-sample dHash_indep (sig-l.) | 168,740 | 3      | 9.0    | -11.05 | +45.39  |
-| Accountant-level cosine_mean     | 686     | 0.002  | no transition | — | — |
-| Accountant-level cosine_mean     | 686     | 0.005  | 0.9800 | -3.23  | +5.18   |
-| Accountant-level cosine_mean     | 686     | 0.010  | no transition | — | — |
-| Accountant-level dHash_indep_mean| 686     | 0.2    | no transition | — | — |
-| Accountant-level dHash_indep_mean| 686     | 0.5    | no transition | — | — |
-| Accountant-level dHash_indep_mean| 686     | 1.0    | 3.0    | -2.00  | +3.24   |
 -->

 Two patterns are visible in Table A.I.
-First, at the signature level the procedure consistently identifies a "transition" under every bin width, but the *location* of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3).
+First, the procedure consistently identifies a "transition" under every bin width, but the *location* of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3).
 The $Z$ statistics also inflate superlinearly with the bin width (Firm A cosine $|Z|$ rises from $\sim 9$ at bin 0.003 to $\sim 106$ at bin 0.015) because wider bins aggregate more mass per bin and therefore shrink the per-bin standard error on a very large sample.
 Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.

-Second, at the accountant level---the unit we rely on for primary threshold inference (Sections III-H, III-J, IV-E)---the procedure produces no significant transition at two of three cosine bin widths and two of three dHash bin widths, and the one marginal transition it does produce ($Z_\text{below} = -2.00$ in the dHash sweep at bin width $1.0$) sits exactly at the critical value for $\alpha = 0.05$.
-We stress the inferential asymmetry here: *consistency* with smoothly-mixed clustering is what the BD null delivers, not *affirmative proof* of smoothness.
-At $N = 686$ accountants the BD/McCrary test has limited statistical power and can typically reject only sharp cliff-type discontinuities; failure to reject the smoothness null therefore constrains the data only to distributions whose between-cluster transitions are gradual *enough* to escape the test's sensitivity at that sample size.
-We read this as reinforcing---not establishing---the clustered-but-smoothly-mixed interpretation derived from the GMM fit and the dip-test evidence.
+Second, the candidate transitions all locate *inside* the non-hand-signed mode (cosine $\geq 0.975$, dHash $\leq 10$) rather than between modes, which is the location pattern we would expect of a clean two-mechanism boundary.

-Taken together, Table A.I shows (i) that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes---and (ii) that the accountant-level BD/McCrary null persists across the bin-width sweep, consistent with but not alone sufficient to establish the clustered-but-smoothly-mixed interpretation discussed in Section V-B and limitation-caveated in Section V-G.
-Both observations support the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator.
-The accountant-level threshold band reported in Table VIII ($\text{cosine} \approx 0.975$ from the convergence of the KDE antimode, the Beta-2 crossing, and the logit-GMM-2 crossing) is therefore not adjusted to include any BD/McCrary location.
+Taken together, Table A.I shows that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes.
+This observation supports the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator and reinforces the joint reading of Section IV-D that per-signature similarity does not form a clean two-mechanism mixture.

-Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials (`reports/bd_sensitivity/bd_sensitivity.json`) produced by `signature_analysis/25_bd_mccrary_sensitivity.py`.
+Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials.