552b6b80d4
Implements codex gpt-5.4 recommendation (paper/codex_bd_mccrary_opinion.md,
"option (c) hybrid"): demote BD/McCrary in the main text from a co-equal
threshold estimator to a density-smoothness diagnostic, and add a
bin-width sensitivity appendix as an audit trail.
Why: the bin-width sweep (Script 25) confirms that at the signature
level the BD transition drifts monotonically with bin width (Firm A
cosine: 0.987 -> 0.985 -> 0.980 -> 0.975 as bin width widens 0.003 ->
0.015; full-sample dHash transitions drift from 2 to 10 to 9 across
bin widths 1 / 2 / 3) and Z statistics inflate superlinearly with bin
width, both characteristic of a histogram-resolution artifact. At the
accountant level the BD null is robust across the sweep. The paper's
earlier "three methodologically distinct estimators" framing therefore
could not be defended to an IEEE Access reviewer once the sweep was
run.
Added
- signature_analysis/25_bd_mccrary_sensitivity.py: bin-width sweep
across 6 variants (Firm A / full-sample / accountant-level, each
cosine + dHash_indep) and 3-4 bin widths per variant. Reports
Z_below, Z_above, p-values, and number of significant transitions
per cell. Writes reports/bd_sensitivity/bd_sensitivity.{json,md}.
- paper/paper_a_appendix_v3.md: new "Appendix A. BD/McCrary Bin-Width
Sensitivity" with Table A.I (all 20 sensitivity cells) and
interpretation linking the empirical pattern to the main-text
framing decision.
- export_v3.py: appendix inserted into SECTIONS between conclusion
and references.
- paper/codex_bd_mccrary_opinion.md: codex gpt-5.4 recommendation
captured verbatim for audit trail.
Main-text reframing
- Abstract: "three methodologically distinct estimators" ->
"two estimators plus a Burgstahler-Dichev/McCrary density-
smoothness diagnostic". Trimmed to 243 words.
- Introduction: related-work summary, pipeline step 5, accountant-
level convergence sentence, contribution 4, and section-outline
line all updated. Contribution 4 renamed to "Convergent threshold
framework with a smoothness diagnostic".
- Methodology III-I: section renamed to "Convergent Threshold
Determination with a Density-Smoothness Diagnostic". "Method 2:
BD/McCrary Discontinuity" converted to "Density-Smoothness
Diagnostic" in a new subsection; Method 3 (Beta mixture) renumbered
to Method 2. Subsections 4 and 5 updated to refer to "two threshold
estimators" with BD as diagnostic.
- Methodology III-A pipeline overview: "three methodologically
distinct statistical methods" -> "two methodologically distinct
threshold estimators complemented by a density-smoothness
diagnostic".
- Methodology III-L: "three-method analysis" -> "accountant-level
threshold analysis (KDE antimode, Beta-2 crossing, logit-Gaussian
robustness crossing)".
- Results IV-D.1 heading: "BD/McCrary Discontinuity" ->
"BD/McCrary Density-Smoothness Diagnostic". Prose now notes the
Appendix-A bin-width instability explicitly.
- Results IV-E: Table VIII restructured to label BD rows
"(diagnostic only; bin-unstable)" and "(diagnostic; null across
Appendix A)". Summary sentence rewritten to frame BD null as
evidence for clustered-but-smoothly-mixed rather than as a
convergence failure. Table cosine P5 row corrected from 0.941 to
0.9407 to match III-K.
- Results IV-G.3 and IV-I.2: "three-method convergence/thresholds"
-> "accountant-level convergent thresholds" (clarifies the 3
converging estimates are KDE antimode, Beta-2, logit-Gaussian,
not KDE/BD/Beta).
- Discussion V-B: "three-method framework" -> "convergent threshold
framework".
- Conclusion: "three methodologically distinct methods" -> "two
threshold estimators and a density-smoothness diagnostic";
contribution 3 restated; future-work sentence updated.
- Impact Statement (archived): "three methodologically distinct
threshold-selection methods" -> "two methodologically distinct
threshold estimators plus a density-smoothness diagnostic" so the
archived text is internally consistent if reused.
Discussion V-B / V-G already framed BD as a diagnostic in v3.5
(unchanged in this commit). The reframing therefore brings Abstract /
Introduction / Methodology / Results / Conclusion into alignment with
the Discussion framing that codex had already endorsed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 lines
2.0 KiB
Markdown
8 lines
2.0 KiB
Markdown
# Abstract
|
|
|
|
<!-- IEEE Access target: <= 250 words, single paragraph -->
|
|
|
|
Regulations require Certified Public Accountants (CPAs) to attest to each audit report by affixing a signature. Digitization makes reusing a stored signature image across reports trivial---through administrative stamping or firm-level electronic signing---potentially undermining individualized attestation. Unlike forgery, *non-hand-signed* reproduction reuses the legitimate signer's own stored image, making it visually invisible to report users and infeasible to audit at scale manually. We present a pipeline integrating a Vision-Language Model for signature-page identification, YOLOv11 for signature detection, and ResNet-50 for feature extraction, followed by dual-descriptor verification combining cosine similarity and difference hashing. For threshold determination we apply two estimators---kernel-density antimode with a Hartigan unimodality test and an EM-fitted Beta mixture with a logit-Gaussian robustness check---plus a Burgstahler-Dichev/McCrary density-smoothness diagnostic, at the signature and accountant levels. Applied to 90,282 audit reports filed in Taiwan over 2013-2023 (182,328 signatures from 758 CPAs), the methods reveal a level asymmetry: signature-level similarity is a continuous quality spectrum that no two-component mixture separates, while accountant-level aggregates cluster into three groups with the antimode and two mixture estimators converging within $\sim$0.006 at cosine $\approx 0.975$. A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with visual inspection and accountant-level mixture evidence supporting majority non-hand-signing and a minority of hand-signers; capture rates on both 70/30 calibration and held-out folds are reported with Wilson 95% intervals to make fold-level variance visible. Validation against 310 byte-identical positives and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 at all accountant-level thresholds.
|
|
|
|
<!-- Target word count: 240 -->
|