Files

T

gbanyan fcce58aff0 Paper A v3.8: resolve Gemini 3.1 Pro round-6 independent-review findings

Gemini round-6 (paper/gemini_review_v3_7.md) gave Minor Revision but
flagged three issues that five rounds of codex review had missed.
This commit addresses all three.

BLOCKER: Accountant-level BD/McCrary null is a power artifact, not
proof of smoothness (Gemini Issue 1)
- At N=686 accountants the BD/McCrary test has limited statistical
  power; interpreting a failure-to-reject as affirmative proof of
  smoothness is a Type II error risk.
- Discussion V-B: "itself diagnostic of smoothness" replaced with
  "failure-to-reject rather than a failure of the method ---
  informative alongside the other evidence but subject to the power
  caveat in Section V-G".
- Discussion V-G (Sixth limitation): added a power-aware paragraph
  naming N=686 explicitly and clarifying that the substantive claim
  of smoothly-mixed clustering rests on the JOINT weight of dip
  test + BIC-selected GMM + BD null, not on BD alone.
- Results IV-D.1 and IV-E: reframe accountant-level null as
  "consistent with --- not affirmative proof of" clustered-but-
  smoothly-mixed, citing V-G for the power caveat.
- Appendix A interpretation paragraph: explicit inferential-asymmetry
  sentence ("consistency is what the BD null delivers, not
  affirmative proof"); "itself evidence for" removed.
- Conclusion: "consistent with clustered but smoothly mixed"
  rephrased with explicit power caveat ("at N = 686 the test has
  limited power and cannot affirmatively establish smoothness").

MAJOR: Table X FRR / EER was tautological reviewer-bait
(Gemini Issue 2)
- Byte-identical positive anchor has cosine approx 1 by construction,
  so FRR against that subset is trivially 0 at every threshold
  below 1 and any EER calculation is arithmetic tautology, not
  biometric performance.
- Results IV-G.1: removed EER row; dropped FRR column from Table X;
  added a table note explaining the omission and directing readers
  to Section V-F for the conservative-subset discussion.
- Methodology III-K: removed the EER / FRR-against-byte-identical
  reporting clause; clarified that FAR against inter-CPA negatives
  is the primary reported quantity.
- Table X is now FAR + Wilson 95% CI only, which is the quantity
  that actually carries empirical content on this anchor design.

MINOR: Document-level worst-case aggregation narrative (Gemini
Issue 3) + 15-signature delta (Gemini spot-check)
- Results IV-I: added two sentences explicitly noting that the
  document-level percentages reflect the Section III-L worst-case
  aggregation rule (a report with one stamped + one hand-signed
  signature inherits the most-replication-consistent label), and
  cross-referencing Section IV-H.3 / Table XVI for the mixed-report
  composition that qualifies the headline percentages.
- Results IV-D: added a one-sentence footnote explaining that the
  15-signature delta between the Table III CPA-matched count
  (168,755) and the all-pairs analyzed count (168,740) is due to
  CPAs with exactly one signature, for whom no same-CPA pairwise
  best-match statistic exists.

Abstract remains 243 words, comfortably under the IEEE Access
250-word cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-21 14:47:48 +08:00

5.3 KiB

Raw Blame History

Appendix A. BD/McCrary Bin-Width Sensitivity

The main text (Sections III-I and IV-E) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a density-smoothness diagnostic rather than as one of the threshold estimators whose convergence anchors the accountant-level threshold band. This appendix documents the empirical basis for that framing by sweeping the bin width across six (variant, bin-width) panels: Firm A / full-sample / accountant-level, each in the cosine and \text{dHash}_\text{indep} direction.

Two patterns are visible in Table A.I. First, at the signature level the procedure consistently identifies a "transition" under every bin width, but the location of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3). The Z statistics also inflate superlinearly with the bin width (Firm A cosine |Z| rises from \sim 9 at bin 0.003 to \sim 106 at bin 0.015) because wider bins aggregate more mass per bin and therefore shrink the per-bin standard error on a very large sample. Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.

Second, at the accountant level---the unit we rely on for primary threshold inference (Sections III-H, III-J, IV-E)---the procedure produces no significant transition at two of three cosine bin widths and two of three dHash bin widths, and the one marginal transition it does produce (Z_\text{below} = -2.00 in the dHash sweep at bin width 1.0) sits exactly at the critical value for \alpha = 0.05. We stress the inferential asymmetry here: consistency with smoothly-mixed clustering is what the BD null delivers, not affirmative proof of smoothness. At N = 686 accountants the BD/McCrary test has limited statistical power and can typically reject only sharp cliff-type discontinuities; failure to reject the smoothness null therefore constrains the data only to distributions whose between-cluster transitions are gradual enough to escape the test's sensitivity at that sample size. We read this as reinforcing---not establishing---the clustered-but-smoothly-mixed interpretation derived from the GMM fit and the dip-test evidence.

Taken together, Table A.I shows (i) that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located inside the non-hand-signed mode rather than between modes---and (ii) that the accountant-level BD/McCrary null persists across the bin-width sweep, consistent with but not alone sufficient to establish the clustered-but-smoothly-mixed interpretation discussed in Section V-B and limitation-caveated in Section V-G. Both observations support the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator. The accountant-level threshold band reported in Table VIII (\text{cosine} \approx 0.975 from the convergence of the KDE antimode, the Beta-2 crossing, and the logit-GMM-2 crossing) is therefore not adjusted to include any BD/McCrary location.

Raw per-bin Z sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials (reports/bd_sensitivity/bd_sensitivity.json) produced by signature_analysis/25_bd_mccrary_sensitivity.py.

5.3 KiB Raw Blame History

Appendix A. BD/McCrary Bin-Width Sensitivity

5.3 KiB

Raw Blame History