Files
pdf_signature_extraction/paper/paper_a_discussion.md
gbanyan 939a348da4 Add Paper A (IEEE TAI) complete draft with Firm A-calibrated dual-method classification
Paper draft includes all sections (Abstract through Conclusion), 36 references,
and supporting scripts. Key methodology: Cosine similarity + dHash dual-method
verification with thresholds calibrated against known-replication firm (Firm A).

Includes:
- 8 section markdown files (paper_a_*.md)
- Ablation study script (ResNet-50 vs VGG-16 vs EfficientNet-B0)
- Recalibrated classification script (84,386 PDFs, 5-tier system)
- Figure generation and Word export scripts
- Citation renumbering script ([1]-[36])
- Signature analysis pipeline (12 steps)
- YOLO extraction scripts

Three rounds of AI review completed (GPT-5.4, Claude Opus 4.6, Gemini 3 Pro).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:05:33 +08:00

6.0 KiB

V. Discussion

A. Replication Detection as a Distinct Problem

Our results highlight the importance of distinguishing signature replication detection from the well-studied signature forgery detection problem. In forgery detection, the challenge lies in modeling the variability of skilled forgers who produce plausible imitations of a target signature. In replication detection, the signer's identity is not in question; the challenge is distinguishing between legitimate intra-signer consistency (a CPA who signs similarly each time) and digital duplication (a CPA who reuses a scanned image).

This distinction has direct methodological consequences. Forgery detection systems optimize for inter-class discriminability---maximizing the gap between genuine and forged signatures. Replication detection, by contrast, requires sensitivity to the upper tail of the intra-class similarity distribution, where the boundary between consistent handwriting and digital copies becomes ambiguous. The dual-method framework we propose---combining semantic-level features (cosine similarity) with structural-level features (pHash)---addresses this ambiguity in a way that single-method approaches cannot.

B. The Style-Replication Gap

Perhaps the most important empirical finding is the stratification that the dual-method framework reveals within the high-cosine population. Of 71,656 documents with cosine similarity exceeding 0.95, the dHash dimension partitions them into three distinct groups: 29,529 (41.2%) with high-confidence structural evidence of replication, 36,994 (51.7%) with moderate structural similarity, and 5,133 (7.2%) with no structural corroboration despite near-identical feature-level appearance. A cosine-only approach would treat all 71,656 identically; the dual-method framework separates them into populations with fundamentally different interpretations.

The 7.2% classified as "high style consistency" (cosine > 0.95 but dHash > 15) are particularly informative. Several plausible explanations may account for their high feature similarity without structural identity, though we lack direct evidence to confirm their relative contributions. Many accountants may develop highly consistent signing habits---using similar pen pressure, stroke order, and spatial layout---resulting in signatures that appear nearly identical at the feature level while retaining the microscopic variations inherent to handwriting. Some may use signing pads or templates that further constrain variability without constituting digital replication. The dual-method framework correctly identifies these as distinct from digitally replicated signatures by detecting the absence of structural-level convergence.

C. Value of Known-Replication Calibration

The use of Firm A as a calibration reference addresses a fundamental challenge in document forensics: the scarcity of ground truth labels. In most forensic applications, establishing ground truth requires expensive manual verification or access to privileged information about document provenance. Our approach leverages domain knowledge---the established practice of digital signature replication at a specific firm---to create a naturally occurring positive control group within the dataset.

This calibration strategy has broader applicability beyond signature analysis. Any forensic detection system operating on real-world corpora can benefit from identifying subpopulations with known characteristics (positive or negative) to anchor threshold selection, particularly when the distributions of interest are non-normal and percentile-based thresholds are preferred over parametric alternatives.

D. Limitations

Several limitations should be acknowledged.

First, comprehensive ground truth labels are not available for the full dataset. While Firm A provides a known-replication reference and the dual-method framework produces internally consistent results, the classification of non-Firm-A documents relies on statistical inference without independent per-document ground truth. A small-scale manual verification study (e.g., 100--200 documents sampled across classification categories) would strengthen confidence in the classification boundaries.

Second, the ResNet-50 feature extractor was used with pre-trained ImageNet weights without domain-specific fine-tuning. While our ablation study and prior literature [20]--[22] support the effectiveness of transferred ImageNet features for signature comparison, a signature-specific feature extractor trained on a curated dataset could improve discriminative performance.

Third, the red stamp removal preprocessing uses simple HSV color space filtering, which may introduce artifacts where handwritten strokes overlap with red seal impressions. In these overlap regions, blended pixels are replaced with white, potentially creating small gaps in the signature strokes that could reduce dHash similarity. This effect would make replication harder to detect (biasing toward false negatives) rather than easier, but the magnitude of the impact has not been quantified.

Fourth, scanning equipment, PDF generation software, and compression algorithms may have changed over the 10-year study period (2013--2023), potentially affecting similarity measurements. While cosine similarity and dHash are designed to be robust to such variations, longitudinal confounds cannot be entirely excluded.

Fifth, the classification framework treats all signatures from a CPA as belonging to a single class, not accounting for potential changes in signing practice over time (e.g., a CPA who signed genuinely in early years but adopted digital replication later). Temporal segmentation of signature similarity could reveal such transitions but is beyond the scope of this study.

Finally, the legal and regulatory implications of our findings depend on jurisdictional definitions of "signature" and "signing." Whether digital replication of a CPA's own genuine signature constitutes a violation of signing requirements is a legal question that our technical analysis can inform but cannot resolve.