# II. Related Work

## A. Offline Signature Verification

Offline signature verification---determining whether a static signature image is genuine or forged---has been studied extensively using deep learning.
Bromley et al. [3] introduced the Siamese neural network architecture for signature verification, establishing the pairwise comparison paradigm that remains dominant.
Hafemann et al. [14] demonstrated that deep CNN features learned from signature images provide strong discriminative representations for writer-independent verification, establishing the foundational baseline for subsequent work.
Dey et al. [4] proposed SigNet, a convolutional Siamese network for writer-independent offline verification, extending this paradigm to generalize across signers without per-writer retraining.
Kao and Wen [5] addressed offline verification and forgery detection using only a single known genuine signature per writer with an explainable deep-learning approach.
More recently, Li et al. [6] introduced TransOSV, the first Vision Transformer-based approach, achieving state-of-the-art results.
Tehsin et al. [7] evaluated distance metrics for triplet Siamese networks, finding that Manhattan distance outperformed cosine and Euclidean alternatives.
Zois et al. [15] proposed similarity distance learning on SPD manifolds for writer-independent verification, achieving robust cross-dataset transfer.
Hafemann et al. [16] further addressed the practical challenge of adapting to new users through meta-learning, reducing the enrollment burden for signature verification systems.

A common thread in this literature is the assumption that the primary threat is *identity fraud*: a forger attempting to produce a convincing imitation of another person's signature.
Our work addresses a fundamentally different problem---detecting whether the *legitimate signer's* stored signature image has been reproduced across many documents---which requires analyzing the upper tail of the intra-signer similarity distribution rather than modeling inter-signer discriminability.

Brimoh and Olisah [8] are closest in spirit in using reference evidence to discipline threshold choice.
Their setting, however, uses standard verification benchmarks with known genuine references, whereas our archival setting lacks signature-level labels and therefore characterises a fixed deployed screening rule through inter-CPA coincidence-rate anchors.

## B. Document Forensics and Copy Detection

Image forensics encompasses a broad range of techniques for detecting manipulated visual content [17], with recent surveys highlighting the growing role of deep learning in forgery detection [18].
Copy-move forgery detection (CMFD) identifies duplicated regions within or across images, typically targeting manipulated photographs [11].
Abramova and Böhme [10] adapted block-based CMFD to scanned text documents, noting that standard methods perform poorly in this domain because legitimate character repetitions produce high similarity scores that confound duplicate detection.

Woodruff et al. [9] developed the work most closely related to ours: a fully automated pipeline for extracting and analyzing signatures from corporate filings in the context of anti-money-laundering investigations.
Their system uses connected component analysis for signature detection, GANs for noise removal, and Siamese networks for author clustering.
While their pipeline shares our goal of large-scale automated signature analysis on real regulatory documents, their objective---grouping signatures by authorship---differs fundamentally from ours, which is detecting image-level reproduction within a single author's signatures across documents.

In the domain of image copy detection, Pizzi et al. [13] proposed SSCD, a self-supervised descriptor using ResNet-50 with contrastive learning for large-scale copy detection on natural images.
Their work demonstrates that pre-trained CNN features with cosine similarity provide a strong baseline for identifying near-duplicate images, a finding that supports our feature-extraction approach.

## C. Perceptual Hashing

Perceptual hashing algorithms generate compact fingerprints that are robust to minor image transformations while remaining sensitive to substantive content changes [19].
Unlike cryptographic hashes, which change entirely with any pixel modification, perceptual hashes produce similar outputs for visually similar inputs, making them suitable for near-duplicate detection in scanned documents where minor variations arise from the scanning process.

Jakhar and Borah [12] demonstrated that combining perceptual hashing with deep learning features significantly outperforms either approach alone for near-duplicate image detection, achieving AUROC of 0.99 on standard benchmarks.
Their two-stage architecture---pHash for fast structural comparison followed by deep features for semantic verification---provides methodological precedent for our dual-descriptor approach, though applied to natural images rather than document signatures.

Our work differs from prior perceptual-hashing studies in its application context and in the specific challenge it addresses: distinguishing legitimate high visual consistency (a careful signer producing similar-looking signatures) from image-level reproduction in scanned financial documents.

## D. Deep Feature Extraction for Signature Analysis

Several studies have explored pre-trained CNN features for signature comparison without metric learning or Siamese architectures.
Engin et al. [20] used ResNet-50 features with cosine similarity for offline signature verification on real-world scanned documents, incorporating CycleGAN-based stamp removal as preprocessing---a pipeline design closely paralleling our approach.
Tsourounis et al. [21] demonstrated successful transfer from handwritten text recognition to signature verification, showing that CNN features trained on related but distinct handwriting tasks generalize effectively to signature comparison.
Chamakh and Bounouh [22] confirmed that a simple ResNet backbone with cosine similarity achieves competitive verification accuracy across multilingual signature datasets without fine-tuning, supporting the viability of our off-the-shelf feature-extraction approach.

Babenko et al. [23] established that CNN-extracted neural codes with cosine similarity provide an effective framework for image retrieval and matching, a finding that underpins our feature-comparison approach.
These findings collectively suggest that pre-trained CNN features, when L2-normalized and compared via cosine similarity, provide a robust and computationally efficient representation for signature comparison---particularly suitable for large-scale applications where the computational overhead of Siamese training or metric learning is impractical.

## E. Statistical Methods for Threshold Characterisation and Calibration

Our threshold-characterisation and calibration framework combines three families of methods developed in statistics and accounting-econometrics.

*Non-parametric density estimation.*
Kernel density estimation [28] provides a smooth estimate of a similarity distribution without parametric assumptions.
In idealized two-class mixture settings with equal priors and equal misclassification costs, the local density minimum (antimode) between the two modes coincides with the Bayes-optimal decision boundary.
The statistical validity of the unimodality-vs-multimodality dichotomy can be tested via the Hartigan & Hartigan dip test [37], which tests the null of unimodality; we use rejection of this null as evidence consistent with (though not a direct test for) bimodality.

*Discontinuity tests on empirical distributions.*
Burgstahler and Dichev [38], working in the accounting-disclosure literature, proposed a test for smoothness violations in empirical frequency distributions.
Under the null that the distribution is generated by a single smooth process, the expected count in any histogram bin equals the average of its two neighbours, and the standardized deviation from this expectation is approximately $N(0,1)$.
The test was placed on rigorous asymptotic footing by McCrary [39], whose density-discontinuity test provides full asymptotic distribution theory, bandwidth-selection rules, and power analysis.
The BD/McCrary pairing provides a local-density-discontinuity diagnostic that is informative about distributional smoothness under minimal assumptions; we use it in that diagnostic role (rather than as a threshold estimator) because its transitions in our corpus are bin-width-sensitive at the signature level and rarely significant at the accountant level (Appendix A).

*Finite mixture models.*
When the empirical distribution is viewed as a weighted sum of two (or more) latent component distributions, the Expectation-Maximization algorithm [40] provides consistent maximum-likelihood estimates of the component parameters.
For observations bounded on $[0,1]$---such as cosine similarity and normalized Hamming-based dHash similarity---the Beta distribution is the natural parametric choice, with applications spanning bioinformatics and Bayesian estimation.
Under mild regularity conditions, White's quasi-MLE result [41] supports interpreting maximum-likelihood estimates under a mis-specified parametric family as consistent estimators of the pseudo-true parameter that minimizes the Kullback-Leibler divergence to the data-generating distribution within that family; we use this result to justify the Beta-mixture fit as a principled approximation rather than as a guarantee that the true distribution is Beta.

The present study uses these tools diagnostically: first to test whether the descriptor distribution supports a natural operating boundary, and then, when that support fails under composition decomposition, to motivate anchor-based ICCR calibration of a fixed deployed rule.

*Cross-validation in a small-cluster scope.*
Cross-validation methodology in the leave-one-out tradition has been developed extensively in statistics since Stone [42] and Geisser [43], and modern surveys including Vehtari et al. [44] discuss its application to mixture models. In document-forensics calibration the technique has been used selectively, typically with the individual document or signature as the hold-out unit. Our application in §III-K differs in two respects from the standard usage: (i) the hold-out unit is the *firm* (not the individual CPA or signature), so the analysis directly probes cross-firm reproducibility of the fitted mixture rather than within-firm sampling variance; and (ii) the held-out predictions are interpreted as a *composition-sensitivity band* on the candidate mixture boundary, not as a sufficiency claim for the deployed five-way operational classifier (§III-H.1; calibrated separately in §III-L). We treat LOOO drift as descriptive information about how the mixture characterisation moves when training composition changes, not as a pass/fail test for the operational classifier.
<!--
REFERENCES for Related Work (full list in the References section):
[3]  Bromley et al. 1993 — Siamese TDNN (NeurIPS)
[4]  Dey et al. 2017 — SigNet
[5]  Kao & Wen 2020 — Single-sample SV with forgery detection
[6]  Li et al. 2024 — TransOSV
[7]  Tehsin et al. 2024 — Triplet Siamese
[8]  Brimoh & Olisah 2024 — Consensus threshold
[9]  Woodruff et al. 2021 — AML signature pipeline
[10] Abramova & Böhme 2016 — CMFD in scanned docs
[11] Copy-move forgery detection survey — MTAP 2024
[12] Jakhar & Borah 2025 — pHash + DL
[13] Pizzi et al. 2022 — SSCD
[14] Hafemann et al. 2017 — CNN features for SV
[15] Zois et al. 2024 — SPD manifold SV
[16] Hafemann et al. 2019 — Meta-learning for SV
[17] Farid 2009 — Image forgery detection survey
[18] Mehrjardi et al. 2023 — DL-based image forgery detection survey
[19] Luo et al. 2025 — Perceptual hashing survey
[20] Engin et al. 2020 — ResNet + cosine on real docs
[21] Tsourounis et al. 2022 — Transfer from text to signatures
[22] Chamakh & Bounouh 2025 — ResNet18 unified SV
[23] Babenko et al. 2014 — Neural codes for image retrieval
[28] Silverman 1986 — Density estimation
[37] Hartigan & Hartigan 1985 — dip test of unimodality
[38] Burgstahler & Dichev 1997 — earnings management discontinuity
[39] McCrary 2008 — density discontinuity test
[40] Dempster, Laird & Rubin 1977 — EM algorithm
[41] White 1982 — quasi-MLE consistency
[42] Stone 1974 — cross-validatory choice
[43] Geisser 1975 — predictive sample reuse
[44] Vehtari, Gelman & Gabry 2017 — practical Bayesian LOO/WAIC
-->