pdf_signature_extraction/paper/paper_a_abstract_v3.md

# Abstract

<!-- 200-270 words -->

Regulations in many jurisdictions require Certified Public Accountants (CPAs) to attest to each audit report they certify, typically by affixing a signature or seal.
However, the digitization of financial reporting makes it straightforward to reuse a stored signature image across multiple reports---whether by administrative stamping or firm-level electronic signing systems---potentially undermining the intent of individualized attestation.
Unlike signature forgery, where an impostor imitates another person's handwriting, *non-hand-signed* reproduction involves the legitimate signer's own stored signature image being reproduced on each report, a practice that is visually invisible to report users and infeasible to audit at scale through manual inspection.
We present an end-to-end AI pipeline that automatically detects non-hand-signed auditor signatures in financial audit reports.
The pipeline integrates a Vision-Language Model for signature page identification, YOLOv11 for signature region detection, and ResNet-50 for deep feature extraction, followed by a dual-descriptor verification combining cosine similarity of deep embeddings with difference hashing (dHash).
For threshold determination we apply three methodologically distinct methods---Kernel Density antimode with a Hartigan unimodality test, Burgstahler-Dichev/McCrary discontinuity, and EM-fitted Beta mixtures with a logit-Gaussian robustness check---at both the signature level and the accountant level.
Applied to 90,282 audit reports filed in Taiwan over 2013--2023 (182,328 signatures from 758 CPAs) the methods reveal an informative asymmetry: signature-level similarity forms a continuous quality spectrum that no two-component mixture cleanly separates, while accountant-level aggregates are clustered into three recognizable groups (BIC-best $K = 3$) with the KDE antimode and the two mixture-based estimators converging within $\sim$0.006 of each other at cosine $\approx 0.975$; the Burgstahler-Dichev / McCrary test produces no significant discontinuity at the accountant level, consistent with clustered-but-smooth rather than sharply discrete accountant-level heterogeneity.
A major Big-4 firm is used as a *replication-dominated* (not pure) calibration anchor, with visual-inspection and accountant-level mixture evidence supporting majority non-hand-signing and a minority of hand-signers; we break the circularity of using the same firm for calibration and validation by a 70/30 CPA-level held-out fold.
Validation against 310 byte-identical positive signatures and a $\sim$50,000-pair inter-CPA negative anchor yields FAR $\leq$ 0.001 with Wilson 95% confidence intervals at all accountant-level thresholds.
To our knowledge, this represents the largest-scale forensic analysis of auditor signature authenticity reported in the literature.

<!-- Word count: ~290 -->