af08391a68
Gemini 3.1 Pro round-19 (paper/gemini_review_v3_18_4.md) caught FOUR
serious issues that all 18 prior AI review rounds missed, including
fabricated rationalizations and a real statistical flaw. All four
verified by direct DB / script inspection. Verdict: Major Revision; this
commit closes every flagged item.
Fabricated rationalization corrections (text only, numbers unchanged):
- Section IV-H "656 documents excluded" rewritten. Previous text claimed
the exclusion was because "single-signature documents have no same-CPA
pairwise comparison" -- a fabricated explanation that contradicts the
paper's cross-document matching methodology. The truth, verified
against signature_analysis/09_pdf_signature_verdict.py L44 (WHERE
s.is_valid = 1 AND s.assigned_accountant IS NOT NULL): the 656
documents are excluded because none of their detected signatures could
be matched to a registered CPA name (assigned_accountant IS NULL).
- Section IV-F.2 "two CPAs excluded for disambiguation ties" rewritten.
No disambiguation logic exists in script 24; the 178 vs 180 difference
comes from two registered Firm A partners being singletons in the
corpus (one signature each, so per-signature best-match cosine is
undefined and they do not appear in the matched-signature table that
feeds the 70/30 split).
- Appendix B Table XIII provenance corrected. The previous attribution
to 13_deloitte_distribution_analysis.py / accountant_similarity_analysis.json
was wrong: neither artifact has year_month grouping. New script
29_firm_a_yearly_distribution.py reproduces Table XIII exactly from
the database via accountants.firm + signatures.year_month grouping.
Statistical flaw corrections (numbers updated):
- Inter-CPA negative anchor rewritten in 21_expanded_validation.py. The
prior implementation drew 50,000 random cross-CPA pairs from a
LIMIT-3000 random subsample, reusing each signature ~33 times and
artificially tightening Wilson FAR confidence intervals on Table X.
The corrected implementation samples 50,000 i.i.d. pairs uniformly
across the full 168,755-signature matched corpus.
- Re-run script 21. Table X numbers are close to v3.18.4 but no longer
rest on the inflated-precision artifact:
cos > 0.837: FAR 0.2101 (was 0.2062), CI [0.2066, 0.2137]
cos > 0.900: FAR 0.0250 (was 0.0233), CI [0.0237, 0.0264]
cos > 0.945: FAR 0.0008 (unchanged at this resolution)
cos > 0.950: FAR 0.0005 (was 0.0007), CI [0.0003, 0.0007]
cos > 0.973: FAR 0.0002 (was 0.0003), CI [0.0001, 0.0004]
cos > 0.979: FAR 0.0001 (was 0.0002), CI [0.0001, 0.0003]
- Inter-CPA cosine summary stats also updated:
mean 0.763 (was 0.762)
P95 0.886 (was 0.884)
P99 0.915 (was 0.913)
max 0.992 (was 0.988)
- Manuscript IV-F.1 prose updated to reflect the i.i.d. full-corpus
sampling.
Rebuild Paper_A_IEEE_Access_Draft_v3.docx.
Note: this is v3.19.0 because v3.19 closes both fabrication and a
genuine statistical flaw, not just provenance polish.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
63 lines
7.1 KiB
Markdown
63 lines
7.1 KiB
Markdown
# Appendix A. BD/McCrary Bin-Width Sensitivity (Signature Level)
|
|
|
|
The main text (Section III-I, Section IV-D.2) treats the Burgstahler-Dichev / McCrary discontinuity procedure [38], [39] as a *density-smoothness diagnostic* rather than as a threshold estimator.
|
|
This appendix documents the empirical basis for that framing by sweeping the bin width across four (variant, bin-width) panels: Firm A and full-sample, each in the cosine and $\text{dHash}_\text{indep}$ direction.
|
|
|
|
<!-- TABLE A.I: BD/McCrary Bin-Width Sensitivity (two-sided alpha = 0.05, |Z| > 1.96)
|
|
| Variant | n | Bin width | Best transition | z_below | z_above |
|
|
|---------|---|-----------|-----------------|---------|---------|
|
|
| Firm A cosine (sig-level) | 60,448 | 0.003 | 0.9870 | -2.81 | +9.42 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.005 | 0.9850 | -9.57 | +19.07 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.010 | 0.9800 | -54.64 | +69.96 |
|
|
| Firm A cosine (sig-level) | 60,448 | 0.015 | 0.9750 | -85.86 | +106.17 |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 1 | 2.0 | -4.69 | +10.01 |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 2 | no transition | — | — |
|
|
| Firm A dHash_indep (sig-level) | 60,448 | 3 | no transition | — | — |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.003 | 0.9870 | -3.21 | +8.17 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.005 | 0.9850 | -8.80 | +14.32 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.010 | 0.9800 | -29.69 | +44.91 |
|
|
| Full-sample cosine (sig-level) | 168,740 | 0.015 | 0.9450 | -11.35 | +14.85 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 1 | 2.0 | -6.22 | +4.89 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 2 | 10.0 | -7.35 | +3.83 |
|
|
| Full-sample dHash_indep (sig-l.) | 168,740 | 3 | 9.0 | -11.05 | +45.39 |
|
|
-->
|
|
|
|
Two patterns are visible in Table A.I.
|
|
First, the procedure consistently identifies a "transition" under every bin width, but the *location* of that transition drifts monotonically with bin width (Firm A cosine: 0.987 → 0.985 → 0.980 → 0.975 as bin width grows from 0.003 to 0.015; full-sample dHash: 2 → 10 → 9 as the bin width grows from 1 to 3).
|
|
The $Z$ statistics also inflate superlinearly with the bin width (Firm A cosine $|Z|$ rises from $\sim 9$ at bin 0.003 to $\sim 106$ at bin 0.015) because wider bins aggregate more mass per bin and therefore shrink the per-bin standard error on a very large sample.
|
|
Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.
|
|
|
|
Second, the candidate transitions all locate *inside* the non-hand-signed mode (cosine $\geq 0.975$, dHash $\leq 10$) rather than between modes, which is the location pattern we would expect of a clean two-mechanism boundary.
|
|
|
|
Taken together, Table A.I shows that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes.
|
|
This observation supports the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator and reinforces the joint reading of Section IV-D that per-signature similarity does not form a clean two-mechanism mixture.
|
|
|
|
Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials.
|
|
|
|
# Appendix B. Table-to-Script Provenance
|
|
|
|
For reproducibility, the following table maps each numerical table in Section IV to the analysis script that produces its underlying values and to the report file emitted by that script. Scripts are under `signature_analysis/`. Report artifact paths below are listed relative to the project's analysis report root, which is `/Volumes/NV2/PDF-Processing/signature-analysis/` in our local deployment; replicators should rebase the paths to whatever report root they configure when invoking the scripts.
|
|
|
|
<!-- TABLE B.I: Manuscript table → reproduction artifact
|
|
| Manuscript table | Generating script | Report artifact |
|
|
|------------------|-------------------|-----------------|
|
|
| Table III (extraction results) | `02_extract_features.py`; `09_pdf_signature_verdict.py` | `reports/extraction_methodology.md`; `reports/pdf_signature_verdicts.json` |
|
|
| Table IV (intra/inter all-pairs cosine statistics) | `10_formal_statistical_analysis.py` | `reports/formal_statistical_data.json`; `reports/formal_statistical_report.md` |
|
|
| Table V (Hartigan dip test) | `15_hartigan_dip_test.py` | `reports/dip_test/dip_test_results.json` |
|
|
| Table VI (signature-level threshold-estimator summary) | `17_beta_mixture_em.py`; `25_bd_mccrary_sensitivity.py` | `reports/beta_mixture/beta_mixture_results.json`; `reports/bd_sensitivity/bd_sensitivity.json` |
|
|
| Table IX (Firm A whole-sample capture rates) | `19_pixel_identity_validation.py`; `24_validation_recalibration.py` | `reports/pixel_validation/pixel_validation_results.json`; `reports/validation_recalibration/validation_recalibration.json` |
|
|
| Table X (cosine threshold sweep, FAR vs inter-CPA negatives) | `21_expanded_validation.py` | `reports/expanded_validation/expanded_validation_results.json` |
|
|
| Table XI (held-out vs calibration Firm A capture rates) | `24_validation_recalibration.py` | `reports/validation_recalibration/validation_recalibration.json` |
|
|
| Table XII (operational-cut sensitivity 0.95 vs 0.945) | `24_validation_recalibration.py` | `reports/validation_recalibration/validation_recalibration.json` |
|
|
| Table XIII (Firm A per-year cosine distribution) | `29_firm_a_yearly_distribution.py` | `reports/firm_a_yearly/firm_a_yearly_distribution.json` |
|
|
| Tables XIV / XV (partner-level similarity ranking) | `22_partner_ranking.py` | `reports/partner_ranking/partner_ranking_results.json` |
|
|
| Table XVI (intra-report classification agreement) | `23_intra_report_consistency.py` | `reports/intra_report/intra_report_results.json` |
|
|
| Table XVII (document-level five-way classification) | `09_pdf_signature_verdict.py`; `12_generate_pdf_level_report.py` | `reports/pdf_signature_verdicts.json`; `reports/pdf_signature_verdict_report.md` (CSV / XLSX bulk reports also at `reports/`) |
|
|
| Table XVIII (backbone ablation) | `paper/ablation_backbone_comparison.py` | `ablation/ablation_results.json` (sibling of `reports/`) |
|
|
| Table A.I (BD/McCrary bin-width sensitivity) | `25_bd_mccrary_sensitivity.py` | `reports/bd_sensitivity/bd_sensitivity.json` |
|
|
| Byte-identity decomposition (145 / 50 / 180 / 35; Section IV-F.1) | `28_byte_identity_decomposition.py` | `reports/byte_identity_decomp/byte_identity_decomposition.json` |
|
|
| Cross-firm dual-descriptor convergence (Section IV-H.2) | `28_byte_identity_decomposition.py` | `reports/byte_identity_decomp/byte_identity_decomposition.json` |
|
|
-->
|
|
|
|
The table-to-script mapping above is intended as a navigation aid for replicators. All scripts run deterministically under the fixed random seeds documented in the supplementary materials; the artifact paths above were verified against the local deployment at the time of submission, and any reviewer reproduction step should re-emit the artifacts from the listed scripts rather than depend on the absolute path layout.
|