pdf_signature_extraction/paper/paper_a_abstract_v3.md

# Abstract

<!-- IEEE Access target: <= 250 words, single paragraph -->

Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes it feasible to reuse a stored signature image across reports — through administrative stamping or firm-level electronic signing — thereby undermining individualized attestation. We build an end-to-end pipeline for screening such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, YOLOv11 localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. Applied to 90,282 Taiwan audit reports (2013–2023), the pipeline yields 182,328 signatures from 758 CPAs; primary analyses are scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures). Distributional diagnostics show that the apparent multimodality of the descriptor distribution dissolves under joint firm-mean centring and integer-tie jitter ($p$ rises to $0.35$), so no within-population bimodal antimode anchors the operational thresholds. We instead adopt an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units: per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ under the deployed any-pair high-confidence rule), and per-document ($0.34$ for the operational HC+MC alarm). The framework surfaces pronounced firm-level heterogeneity: Firm A's per-document HC+MC ICCR is $0.62$ versus $0.09$–$0.16$ at Firms B/C/D (gap persists after pool-size adjustment), and $77$–$99\%$ of inter-CPA collisions concentrate within the source firm. This contrast is consistent with firm-level template-like reuse but not independently diagnostic, since descriptor-only data cannot separate reuse from digitisation-pipeline or signing-style homogeneity within a firm; we report it as a scope limitation rather than a mechanism finding. We position the system as a specificity-proxy-anchored screening framework with human-in-the-loop review, not as a validated forensic detector; no calibrated error rates are reportable without signature-level ground truth.

<!-- Word count: 281 -->