Files
pdf_signature_extraction/paper/paper_a_abstract.md
gbanyan 939a348da4 Add Paper A (IEEE TAI) complete draft with Firm A-calibrated dual-method classification
Paper draft includes all sections (Abstract through Conclusion), 36 references,
and supporting scripts. Key methodology: Cosine similarity + dHash dual-method
verification with thresholds calibrated against known-replication firm (Firm A).

Includes:
- 8 section markdown files (paper_a_*.md)
- Ablation study script (ResNet-50 vs VGG-16 vs EfficientNet-B0)
- Recalibrated classification script (84,386 PDFs, 5-tier system)
- Figure generation and Word export scripts
- Citation renumbering script ([1]-[36])
- Signature analysis pipeline (12 steps)
- YOLO extraction scripts

Three rounds of AI review completed (GPT-5.4, Claude Opus 4.6, Gemini 3 Pro).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:05:33 +08:00

2.2 KiB

Abstract

Regulations in many jurisdictions require Certified Public Accountants (CPAs) to attest to each audit report they certify, typically by affixing a signature or seal. However, the digitization of financial reporting makes it straightforward to reuse a scanned signature image across multiple reports, potentially undermining the intent of individualized attestation. Unlike signature forgery, where an impostor imitates another person's handwriting, signature replication involves a legitimate signer reusing a digital copy of their own genuine signature---a practice that is difficult to detect through manual inspection at scale. We present an end-to-end AI pipeline that automatically detects signature replication in financial audit reports. The pipeline employs a Vision-Language Model for signature page identification, YOLOv11 for signature region detection, and ResNet-50 for deep feature extraction, followed by a dual-method verification combining cosine similarity with difference hashing (dHash). This dual-method design distinguishes consistent handwriting style (high feature similarity but divergent perceptual hashes) from digital replication (convergent evidence across both methods), addressing an ambiguity that single-metric approaches cannot resolve. We apply this pipeline to 90,282 audit reports filed by publicly listed companies in Taiwan over a decade (2013--2023), analyzing 182,328 signatures from 758 CPAs. Using an accounting firm independently identified as employing digital replication as a calibration reference, we establish empirically grounded detection thresholds. Our analysis reveals that among documents with high feature-level similarity (cosine > 0.95), the structural verification layer stratifies them into distinct populations: 41% with converging replication evidence, 52% with partial structural similarity, and 7% with no structural corroboration despite near-identical features---demonstrating that single-metric approaches conflate style consistency with digital duplication. To our knowledge, this represents the largest-scale analysis of signature authenticity in financial audit documents to date.