fcce58aff0
Gemini round-6 (paper/gemini_review_v3_7.md) gave Minor Revision but
flagged three issues that five rounds of codex review had missed.
This commit addresses all three.
BLOCKER: Accountant-level BD/McCrary null is a power artifact, not
proof of smoothness (Gemini Issue 1)
- At N=686 accountants the BD/McCrary test has limited statistical
power; interpreting a failure-to-reject as affirmative proof of
smoothness is a Type II error risk.
- Discussion V-B: "itself diagnostic of smoothness" replaced with
"failure-to-reject rather than a failure of the method ---
informative alongside the other evidence but subject to the power
caveat in Section V-G".
- Discussion V-G (Sixth limitation): added a power-aware paragraph
naming N=686 explicitly and clarifying that the substantive claim
of smoothly-mixed clustering rests on the JOINT weight of dip
test + BIC-selected GMM + BD null, not on BD alone.
- Results IV-D.1 and IV-E: reframe accountant-level null as
"consistent with --- not affirmative proof of" clustered-but-
smoothly-mixed, citing V-G for the power caveat.
- Appendix A interpretation paragraph: explicit inferential-asymmetry
sentence ("consistency is what the BD null delivers, not
affirmative proof"); "itself evidence for" removed.
- Conclusion: "consistent with clustered but smoothly mixed"
rephrased with explicit power caveat ("at N = 686 the test has
limited power and cannot affirmatively establish smoothness").
MAJOR: Table X FRR / EER was tautological reviewer-bait
(Gemini Issue 2)
- Byte-identical positive anchor has cosine approx 1 by construction,
so FRR against that subset is trivially 0 at every threshold
below 1 and any EER calculation is arithmetic tautology, not
biometric performance.
- Results IV-G.1: removed EER row; dropped FRR column from Table X;
added a table note explaining the omission and directing readers
to Section V-F for the conservative-subset discussion.
- Methodology III-K: removed the EER / FRR-against-byte-identical
reporting clause; clarified that FAR against inter-CPA negatives
is the primary reported quantity.
- Table X is now FAR + Wilson 95% CI only, which is the quantity
that actually carries empirical content on this anchor design.
MINOR: Document-level worst-case aggregation narrative (Gemini
Issue 3) + 15-signature delta (Gemini spot-check)
- Results IV-I: added two sentences explicitly noting that the
document-level percentages reflect the Section III-L worst-case
aggregation rule (a report with one stamped + one hand-signed
signature inherits the most-replication-consistent label), and
cross-referencing Section IV-H.3 / Table XVI for the mixed-report
composition that qualifies the headline percentages.
- Results IV-D: added a one-sentence footnote explaining that the
15-signature delta between the Table III CPA-matched count
(168,755) and the all-pairs analyzed count (168,740) is due to
CPAs with exactly one signature, for whom no same-CPA pairwise
best-match statistic exists.
Abstract remains 243 words, comfortably under the IEEE Access
250-word cap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
46 lines
5.3 KiB
Markdown
46 lines
5.3 KiB
Markdown
# Appendix A. BD/McCrary Bin-Width Sensitivity
|
|
|
|
The main text (Sections III-I and IV-E) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a *density-smoothness diagnostic* rather than as one of the threshold estimators whose convergence anchors the accountant-level threshold band.
|
|
This appendix documents the empirical basis for that framing by sweeping the bin width across six (variant, bin-width) panels: Firm A / full-sample / accountant-level, each in the cosine and $\text{dHash}_\text{indep}$ direction.
|
|
|
|
<!-- TABLE A.I: BD/McCrary Bin-Width Sensitivity (two-sided alpha = 0.05, |Z| > 1.96)
|
|
| Variant | n | Bin width | Best transition | z_below | z_above |
|
|
|---------|---|-----------|-----------------|---------|---------|
|
|
| Firm A cosine (sig-level) | 60,448 | 0.003 | 0.9870 | -2.81 | +9.42 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.005 | 0.9850 | -9.57 | +19.07 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.010 | 0.9800 | -54.64 | +69.96 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.015 | 0.9750 | -85.86 | +106.17 |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 1 | 2.0 | -4.69 | +10.01 |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 2 | no transition | — | — |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 3 | no transition | — | — |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.003 | 0.9870 | -3.21 | +8.17 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.005 | 0.9850 | -8.80 | +14.32 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.010 | 0.9800 | -29.69 | +44.91 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.015 | 0.9450 | -11.35 | +14.85 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 1 | 2.0 | -6.22 | +4.89 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 2 | 10.0 | -7.35 | +3.83 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 3 | 9.0 | -11.05 | +45.39 |
|
|
| Accountant-level cosine_mean | 686 | 0.002 | no transition | — | — |
|
|
| Accountant-level cosine_mean | 686 | 0.005 | 0.9800 | -3.23 | +5.18 |
|
|
| Accountant-level cosine_mean | 686 | 0.010 | no transition | — | — |
|
|
| Accountant-level dHash_indep_mean| 686 | 0.2 | no transition | — | — |
|
|
| Accountant-level dHash_indep_mean| 686 | 0.5 | no transition | — | — |
|
|
| Accountant-level dHash_indep_mean| 686 | 1.0 | 3.0 | -2.00 | +3.24 |
|
|
-->
|
|
|
|
Two patterns are visible in Table A.I.
|
|
First, at the signature level the procedure consistently identifies a "transition" under every bin width, but the *location* of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3).
|
|
The $Z$ statistics also inflate superlinearly with the bin width (Firm A cosine $|Z|$ rises from $\sim 9$ at bin 0.003 to $\sim 106$ at bin 0.015) because wider bins aggregate more mass per bin and therefore shrink the per-bin standard error on a very large sample.
|
|
Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.
|
|
|
|
Second, at the accountant level---the unit we rely on for primary threshold inference (Sections III-H, III-J, IV-E)---the procedure produces no significant transition at two of three cosine bin widths and two of three dHash bin widths, and the one marginal transition it does produce ($Z_\text{below} = -2.00$ in the dHash sweep at bin width $1.0$) sits exactly at the critical value for $\alpha = 0.05$.
|
|
We stress the inferential asymmetry here: *consistency* with smoothly-mixed clustering is what the BD null delivers, not *affirmative proof* of smoothness.
|
|
At $N = 686$ accountants the BD/McCrary test has limited statistical power and can typically reject only sharp cliff-type discontinuities; failure to reject the smoothness null therefore constrains the data only to distributions whose between-cluster transitions are gradual *enough* to escape the test's sensitivity at that sample size.
|
|
We read this as reinforcing---not establishing---the clustered-but-smoothly-mixed interpretation derived from the GMM fit and the dip-test evidence.
|
|
|
|
Taken together, Table A.I shows (i) that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes---and (ii) that the accountant-level BD/McCrary null persists across the bin-width sweep, consistent with but not alone sufficient to establish the clustered-but-smoothly-mixed interpretation discussed in Section V-B and limitation-caveated in Section V-G.
|
|
Both observations support the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator.
|
|
The accountant-level threshold band reported in Table VIII ($\text{cosine} \approx 0.975$ from the convergence of the KDE antimode, the Beta-2 crossing, and the logit-GMM-2 crossing) is therefore not adjusted to include any BD/McCrary location.
|
|
|
|
Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials (`reports/bd_sensitivity/bd_sensitivity.json`) produced by `signature_analysis/25_bd_mccrary_sensitivity.py`.
|