939a348da4
Paper draft includes all sections (Abstract through Conclusion), 36 references, and supporting scripts. Key methodology: Cosine similarity + dHash dual-method verification with thresholds calibrated against known-replication firm (Firm A). Includes: - 8 section markdown files (paper_a_*.md) - Ablation study script (ResNet-50 vs VGG-16 vs EfficientNet-B0) - Recalibrated classification script (84,386 PDFs, 5-tier system) - Figure generation and Word export scripts - Citation renumbering script ([1]-[36]) - Signature analysis pipeline (12 steps) - YOLO extraction scripts Three rounds of AI review completed (GPT-5.4, Claude Opus 4.6, Gemini 3 Pro). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
58 lines
6.0 KiB
Markdown
58 lines
6.0 KiB
Markdown
# V. Discussion
|
|
|
|
## A. Replication Detection as a Distinct Problem
|
|
|
|
Our results highlight the importance of distinguishing signature replication detection from the well-studied signature forgery detection problem.
|
|
In forgery detection, the challenge lies in modeling the variability of skilled forgers who produce plausible imitations of a target signature.
|
|
In replication detection, the signer's identity is not in question; the challenge is distinguishing between legitimate intra-signer consistency (a CPA who signs similarly each time) and digital duplication (a CPA who reuses a scanned image).
|
|
|
|
This distinction has direct methodological consequences.
|
|
Forgery detection systems optimize for inter-class discriminability---maximizing the gap between genuine and forged signatures.
|
|
Replication detection, by contrast, requires sensitivity to the *upper tail* of the intra-class similarity distribution, where the boundary between consistent handwriting and digital copies becomes ambiguous.
|
|
The dual-method framework we propose---combining semantic-level features (cosine similarity) with structural-level features (pHash)---addresses this ambiguity in a way that single-method approaches cannot.
|
|
|
|
## B. The Style-Replication Gap
|
|
|
|
Perhaps the most important empirical finding is the stratification that the dual-method framework reveals within the high-cosine population.
|
|
Of 71,656 documents with cosine similarity exceeding 0.95, the dHash dimension partitions them into three distinct groups: 29,529 (41.2%) with high-confidence structural evidence of replication, 36,994 (51.7%) with moderate structural similarity, and 5,133 (7.2%) with no structural corroboration despite near-identical feature-level appearance.
|
|
A cosine-only approach would treat all 71,656 identically; the dual-method framework separates them into populations with fundamentally different interpretations.
|
|
|
|
The 7.2% classified as "high style consistency" (cosine > 0.95 but dHash > 15) are particularly informative.
|
|
Several plausible explanations may account for their high feature similarity without structural identity, though we lack direct evidence to confirm their relative contributions.
|
|
Many accountants may develop highly consistent signing habits---using similar pen pressure, stroke order, and spatial layout---resulting in signatures that appear nearly identical at the feature level while retaining the microscopic variations inherent to handwriting.
|
|
Some may use signing pads or templates that further constrain variability without constituting digital replication.
|
|
The dual-method framework correctly identifies these as distinct from digitally replicated signatures by detecting the absence of structural-level convergence.
|
|
|
|
## C. Value of Known-Replication Calibration
|
|
|
|
The use of Firm A as a calibration reference addresses a fundamental challenge in document forensics: the scarcity of ground truth labels.
|
|
In most forensic applications, establishing ground truth requires expensive manual verification or access to privileged information about document provenance.
|
|
Our approach leverages domain knowledge---the established practice of digital signature replication at a specific firm---to create a naturally occurring positive control group within the dataset.
|
|
|
|
This calibration strategy has broader applicability beyond signature analysis.
|
|
Any forensic detection system operating on real-world corpora can benefit from identifying subpopulations with known characteristics (positive or negative) to anchor threshold selection, particularly when the distributions of interest are non-normal and percentile-based thresholds are preferred over parametric alternatives.
|
|
|
|
## D. Limitations
|
|
|
|
Several limitations should be acknowledged.
|
|
|
|
First, comprehensive ground truth labels are not available for the full dataset.
|
|
While Firm A provides a known-replication reference and the dual-method framework produces internally consistent results, the classification of non-Firm-A documents relies on statistical inference without independent per-document ground truth.
|
|
A small-scale manual verification study (e.g., 100--200 documents sampled across classification categories) would strengthen confidence in the classification boundaries.
|
|
|
|
Second, the ResNet-50 feature extractor was used with pre-trained ImageNet weights without domain-specific fine-tuning.
|
|
While our ablation study and prior literature [20]--[22] support the effectiveness of transferred ImageNet features for signature comparison, a signature-specific feature extractor trained on a curated dataset could improve discriminative performance.
|
|
|
|
Third, the red stamp removal preprocessing uses simple HSV color space filtering, which may introduce artifacts where handwritten strokes overlap with red seal impressions.
|
|
In these overlap regions, blended pixels are replaced with white, potentially creating small gaps in the signature strokes that could reduce dHash similarity.
|
|
This effect would make replication harder to detect (biasing toward false negatives) rather than easier, but the magnitude of the impact has not been quantified.
|
|
|
|
Fourth, scanning equipment, PDF generation software, and compression algorithms may have changed over the 10-year study period (2013--2023), potentially affecting similarity measurements.
|
|
While cosine similarity and dHash are designed to be robust to such variations, longitudinal confounds cannot be entirely excluded.
|
|
|
|
Fifth, the classification framework treats all signatures from a CPA as belonging to a single class, not accounting for potential changes in signing practice over time (e.g., a CPA who signed genuinely in early years but adopted digital replication later).
|
|
Temporal segmentation of signature similarity could reveal such transitions but is beyond the scope of this study.
|
|
|
|
Finally, the legal and regulatory implications of our findings depend on jurisdictional definitions of "signature" and "signing."
|
|
Whether digital replication of a CPA's own genuine signature constitutes a violation of signing requirements is a legal question that our technical analysis can inform but cannot resolve.
|