Files
pdf_signature_extraction/paper/paper_a_abstract_v3.md
T
gbanyan 0ff1845b22 Paper A v3.4: resolve codex round-3 major-revision blockers
Three blockers from codex gpt-5.4 round-3 review (codex_review_gpt54_v3_3.md):

B1 Classifier vs three-method threshold mismatch
  - Methodology III-L rewritten to make explicit that the per-signature
    classifier and the accountant-level three-method convergence operate
    at different units (signature vs accountant) and are complementary
    rather than substitutable.
  - Add Results IV-G.3 + Table XII operational-threshold sensitivity:
    cos>0.95 vs cos>0.945 shifts dual-rule capture by 1.19 pp on whole
    Firm A; ~5% of signatures flip at the Uncertain/Moderate boundary.

B2 Held-out validation false "within Wilson CI" claim
  - Script 24 recomputes both calibration-fold and held-out-fold rates
    with Wilson 95% CIs and a two-proportion z-test on each rule.
  - Table XI replaced with the proper fold-vs-fold comparison; prose
    in Results IV-G.2 and Discussion V-C corrected: extreme rules agree
    across folds (p>0.7); operational rules in the 85-95% band differ
    by 1-5 pp due to within-Firm-A heterogeneity (random 30% sample
    contained more high-replication C1 accountants), not generalization
    failure.

B3 Interview evidence reframed as practitioner knowledge
  - The Firm A "interviews" referenced throughout v3.3 are private,
    informal professional conversations, not structured research
    interviews. Reframed accordingly: all "interview*" references in
    abstract / intro / methodology / results / discussion / conclusion
    are replaced with "domain knowledge / industry-practice knowledge".
  - This avoids overclaiming methodological formality and removes the
    human-subjects research framing that triggered the ethics-statement
    requirement.
  - Section III-H four-pillar Firm A validation now stands on visual
    inspection, signature-level statistics, accountant-level GMM, and
    the three Section IV-H analyses, with practitioner knowledge as
    background context only.
  - New Section III-M ("Data Source and Firm Anonymization") covers
    MOPS public-data provenance, Firm A/B/C/D pseudonymization, and
    conflict-of-interest declaration.

Add signature_analysis/24_validation_recalibration.py for the recomputed
calib-vs-held-out z-tests and the classifier sensitivity analysis;
output in reports/validation_recalibration/.

Pending (not in this commit): abstract length (368 -> 250 words),
Impact Statement removal, BD/McCrary sensitivity reporting, full
reproducibility appendix, references cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 11:45:24 +08:00

17 lines
2.8 KiB
Markdown

# Abstract
<!-- 200-270 words -->
Regulations in many jurisdictions require Certified Public Accountants (CPAs) to attest to each audit report they certify, typically by affixing a signature or seal.
However, the digitization of financial reporting makes it straightforward to reuse a stored signature image across multiple reports---whether by administrative stamping or firm-level electronic signing systems---potentially undermining the intent of individualized attestation.
Unlike signature forgery, where an impostor imitates another person's handwriting, *non-hand-signed* reproduction involves the legitimate signer's own stored signature image being reproduced on each report, a practice that is visually invisible to report users and infeasible to audit at scale through manual inspection.
We present an end-to-end AI pipeline that automatically detects non-hand-signed auditor signatures in financial audit reports.
The pipeline integrates a Vision-Language Model for signature page identification, YOLOv11 for signature region detection, and ResNet-50 for deep feature extraction, followed by a dual-descriptor verification combining cosine similarity of deep embeddings with difference hashing (dHash).
For threshold determination we apply three methodologically distinct methods---Kernel Density antimode with a Hartigan unimodality test, Burgstahler-Dichev/McCrary discontinuity, and EM-fitted Beta mixtures with a logit-Gaussian robustness check---at both the signature level and the accountant level.
Applied to 90,282 audit reports filed in Taiwan over 2013--2023 (182,328 signatures from 758 CPAs) the methods reveal an informative asymmetry: signature-level similarity forms a continuous quality spectrum that no two-component mixture cleanly separates, while accountant-level aggregates are clustered into three recognizable groups (BIC-best $K = 3$) with the KDE antimode and the two mixture-based estimators converging within $\sim$0.006 of each other at cosine $\approx 0.975$; the Burgstahler-Dichev / McCrary test produces no significant discontinuity at the accountant level, consistent with clustered-but-smooth rather than sharply discrete accountant-level heterogeneity.
A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with visual-inspection and accountant-level mixture evidence supporting majority non-hand-signing and a minority of hand-signers; we break the circularity of using the same firm for calibration and validation by a 70/30 CPA-level held-out fold.
Validation against 310 byte-identical positive signatures and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 with Wilson 95% confidence intervals at all accountant-level thresholds.
To our knowledge, this represents the largest-scale forensic analysis of auditor signature authenticity reported in the literature.
<!-- Word count: ~290 -->