Files
pdf_signature_extraction/paper/paper_a_appendix_v3.md
T
gbanyan af08391a68 Paper A v3.19.0: address Gemini 3.1 Pro round-19 Major Revision findings
Gemini 3.1 Pro round-19 (paper/gemini_review_v3_18_4.md) caught FOUR
serious issues that all 18 prior AI review rounds missed, including
fabricated rationalizations and a real statistical flaw. All four
verified by direct DB / script inspection. Verdict: Major Revision; this
commit closes every flagged item.

Fabricated rationalization corrections (text only, numbers unchanged):

- Section IV-H "656 documents excluded" rewritten. Previous text claimed
  the exclusion was because "single-signature documents have no same-CPA
  pairwise comparison" -- a fabricated explanation that contradicts the
  paper's cross-document matching methodology. The truth, verified
  against signature_analysis/09_pdf_signature_verdict.py L44 (WHERE
  s.is_valid = 1 AND s.assigned_accountant IS NOT NULL): the 656
  documents are excluded because none of their detected signatures could
  be matched to a registered CPA name (assigned_accountant IS NULL).
- Section IV-F.2 "two CPAs excluded for disambiguation ties" rewritten.
  No disambiguation logic exists in script 24; the 178 vs 180 difference
  comes from two registered Firm A partners being singletons in the
  corpus (one signature each, so per-signature best-match cosine is
  undefined and they do not appear in the matched-signature table that
  feeds the 70/30 split).
- Appendix B Table XIII provenance corrected. The previous attribution
  to 13_deloitte_distribution_analysis.py / accountant_similarity_analysis.json
  was wrong: neither artifact has year_month grouping. New script
  29_firm_a_yearly_distribution.py reproduces Table XIII exactly from
  the database via accountants.firm + signatures.year_month grouping.

Statistical flaw corrections (numbers updated):

- Inter-CPA negative anchor rewritten in 21_expanded_validation.py. The
  prior implementation drew 50,000 random cross-CPA pairs from a
  LIMIT-3000 random subsample, reusing each signature ~33 times and
  artificially tightening Wilson FAR confidence intervals on Table X.
  The corrected implementation samples 50,000 i.i.d. pairs uniformly
  across the full 168,755-signature matched corpus.
- Re-run script 21. Table X numbers are close to v3.18.4 but no longer
  rest on the inflated-precision artifact:
    cos > 0.837: FAR 0.2101 (was 0.2062), CI [0.2066, 0.2137]
    cos > 0.900: FAR 0.0250 (was 0.0233), CI [0.0237, 0.0264]
    cos > 0.945: FAR 0.0008 (unchanged at this resolution)
    cos > 0.950: FAR 0.0005 (was 0.0007), CI [0.0003, 0.0007]
    cos > 0.973: FAR 0.0002 (was 0.0003), CI [0.0001, 0.0004]
    cos > 0.979: FAR 0.0001 (was 0.0002), CI [0.0001, 0.0003]
- Inter-CPA cosine summary stats also updated:
    mean 0.763 (was 0.762)
    P95 0.886 (was 0.884)
    P99 0.915 (was 0.913)
    max 0.992 (was 0.988)
- Manuscript IV-F.1 prose updated to reflect the i.i.d. full-corpus
  sampling.

Rebuild Paper_A_IEEE_Access_Draft_v3.docx.

Note: this is v3.19.0 because v3.19 closes both fabrication and a
genuine statistical flaw, not just provenance polish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:40:42 +08:00

7.1 KiB

Appendix A. BD/McCrary Bin-Width Sensitivity (Signature Level)

The main text (Section III-I, Section IV-D.2) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a density-smoothness diagnostic rather than as a threshold estimator. This appendix documents the empirical basis for that framing by sweeping the bin width across four (variant, bin-width) panels: Firm A and full-sample, each in the cosine and \text{dHash}_\text{indep} direction.

Two patterns are visible in Table A.I. First, the procedure consistently identifies a "transition" under every bin width, but the location of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3). The Z statistics also inflate superlinearly with the bin width (Firm A cosine |Z| rises from \sim 9 at bin 0.003 to \sim 106 at bin 0.015) because wider bins aggregate more mass per bin and therefore shrink the per-bin standard error on a very large sample. Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.

Second, the candidate transitions all locate inside the non-hand-signed mode (cosine \geq 0.975, dHash \leq 10) rather than between modes, which is the location pattern we would expect of a clean two-mechanism boundary.

Taken together, Table A.I shows that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located inside the non-hand-signed mode rather than between modes. This observation supports the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator and reinforces the joint reading of Section IV-D that per-signature similarity does not form a clean two-mechanism mixture.

Raw per-bin Z sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials.

Appendix B. Table-to-Script Provenance

For reproducibility, the following table maps each numerical table in Section IV to the analysis script that produces its underlying values and to the report file emitted by that script. Scripts are under signature_analysis/. Report artifact paths below are listed relative to the project's analysis report root, which is /Volumes/NV2/PDF-Processing/signature-analysis/ in our local deployment; replicators should rebase the paths to whatever report root they configure when invoking the scripts.

The table-to-script mapping above is intended as a navigation aid for replicators. All scripts run deterministically under the fixed random seeds documented in the supplementary materials; the artifact paths above were verified against the local deployment at the time of submission, and any reviewer reproduction step should re-emit the artifacts from the listed scripts rather than depend on the absolute path layout.