9b11f03548
Major changes from v2: Terminology: - "digitally replicated" -> "non-hand-signed" throughout (per partner v3 feedback and to avoid implicit accusation) - "Firm A near-universal non-hand-signing" -> "replication-dominated" (per interview nuance: most but not all Firm A partners use replication) Target journal: IEEE TAI -> IEEE Access (per NCKU CSIE list) New methodological sections (III.G-III.L + IV.D-IV.G): - Three convergent threshold methods (KDE antimode + Hartigan dip test / Burgstahler-Dichev McCrary / EM-fitted Beta mixture + logit-GMM robustness check) - Explicit unit-of-analysis discussion (signature vs accountant) - Accountant-level 2D Gaussian mixture (BIC-best K=3 found empirically) - Pixel-identity validation anchor (no manual annotation needed) - Low-similarity negative anchor + Firm A replication-dominated anchor New empirical findings integrated: - Firm A signature cosine UNIMODAL (dip p=0.17) - long left tail = minority hand-signers - Full-sample cosine MULTIMODAL but not cleanly bimodal (BIC prefers 3-comp mixture) - signature-level is continuous quality spectrum - Accountant-level mixture trimodal (C1 Deloitte-heavy 139/141, C2 other Big-4, C3 smaller firms). 2-comp crossings cos=0.945, dh=8.10 - Pixel-identity anchor (310 pairs) gives perfect recall at all cosine thresholds - Firm A anchor rates: cos>0.95=92.5%, dual-rule cos>0.95 AND dh<=8=89.95% New discussion section V.B: "Continuous-quality spectrum vs discrete- behavior regimes" - the core interpretive contribution of v3. References added: Hartigan & Hartigan 1985, Burgstahler & Dichev 1997, McCrary 2008, Dempster-Laird-Rubin 1977, White 1982 (refs 37-41). export_v3.py builds Paper_A_IEEE_Access_Draft_v3.docx (462 KB, +40% vs v2 from expanded methodology + results sections). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
102 lines
12 KiB
Markdown
102 lines
12 KiB
Markdown
# V. Discussion
|
|
|
|
## A. Non-Hand-Signing Detection as a Distinct Problem
|
|
|
|
Our results highlight the importance of distinguishing *non-hand-signing detection* from the well-studied *signature forgery detection* problem.
|
|
In forgery detection, the challenge lies in modeling the variability of skilled forgers who produce plausible imitations of a target signature.
|
|
In non-hand-signing detection the signer's identity is not in question; the challenge is distinguishing between legitimate intra-signer consistency (a CPA who signs similarly each time) and image-level reproduction of a stored signature (a CPA whose signature on each report is a byte-level or near-byte-level copy of a single source image).
|
|
|
|
This distinction has direct methodological consequences.
|
|
Forgery detection systems optimize for inter-class discriminability---maximizing the gap between genuine and forged signatures.
|
|
Non-hand-signing detection, by contrast, requires sensitivity to the *upper tail* of the intra-class similarity distribution, where the boundary between consistent handwriting and image reproduction becomes ambiguous.
|
|
The dual-descriptor framework we propose---combining semantic-level features (cosine similarity) with structural-level features (dHash)---addresses this ambiguity in a way that single-descriptor approaches cannot.
|
|
|
|
## B. Continuous-Quality Spectrum vs. Discrete-Behavior Regimes
|
|
|
|
The most consequential empirical finding of this study is the asymmetry between signature level and accountant level revealed by the three-method convergent analysis and the Hartigan dip test (Sections IV-D and IV-E).
|
|
|
|
At the per-signature level, the distribution of best-match cosine similarity is *not* cleanly bimodal.
|
|
Firm A's signature-level cosine is formally unimodal (dip test $p = 0.17$) with a long left tail.
|
|
The all-CPA signature-level cosine is multimodal ($p < 0.001$), but its structure is not well approximated by a two-component Beta mixture: BIC clearly prefers a three-component fit, and the 2-component Beta crossing and its logit-GMM counterpart disagree sharply on the candidate threshold (0.977 vs. 0.999 for Firm A).
|
|
The BD/McCrary discontinuity test locates its transition at 0.985---*inside* the non-hand-signed mode rather than at a boundary.
|
|
Taken together, these results indicate that non-hand-signed signatures form a continuous quality spectrum rather than a discrete class.
|
|
Replication quality varies continuously with scan equipment, PDF compression, stamp pressure, and firm-level e-signing system generation, producing a heavy-tailed distribution that no two-mechanism mixture explains.
|
|
|
|
At the per-accountant aggregate level the picture reverses.
|
|
The distribution of per-accountant mean cosine (and mean dHash) is unambiguously multimodal.
|
|
A BIC-selected three-component Gaussian mixture cleanly separates (C1) a high-replication cluster dominated by Firm A, (C2) a middle band shared by the other Big-4 firms, and (C3) a hand-signed-tendency cluster dominated by smaller domestic firms.
|
|
The two-component marginal crossings (cosine $= 0.945$, dHash $= 8.10$) are sharp and mutually consistent.
|
|
|
|
The substantive interpretation is simple: *pixel-level output quality* is continuous, but *individual signing behavior* is close to discrete.
|
|
A given CPA tends to be either a consistent user of non-hand-signing or a consistent hand-signer; it is the mixing of these discrete behavioral types at the firm and population levels that produces the quality spectrum observed at the signature level.
|
|
Methodologically, this implies that the three-method convergent framework (KDE antimode / BD-McCrary / Beta mixture) is meaningful at the level where the mixture structure is statistically supported---namely the accountant level---and serves as diagnostic evidence rather than a primary classification boundary at the signature level.
|
|
|
|
## C. Firm A as a Replication-Dominated, Not Pure, Population
|
|
|
|
A recurring theme in prior work that treats Firm A or an analogous reference group as a calibration anchor is the implicit assumption that the anchor is a pure positive class.
|
|
Our evidence across multiple analyses rules out that assumption for Firm A while affirming its utility as a calibration reference.
|
|
|
|
Three convergent strands of evidence support the replication-dominated framing.
|
|
First, the interview evidence itself: Firm A partners report that most certifying partners at the firm use non-hand-signing, without excluding the possibility that a minority continue to hand-sign.
|
|
Second, the statistical evidence: Firm A's per-signature cosine distribution is unimodal long-tail rather than a tight single peak; 92.5% of Firm A signatures exceed cosine 0.95, with the remaining 7.5% forming the left tail.
|
|
Third, the accountant-level evidence: 32 of the 180 Firm A CPAs (18%) fall into the middle-band C2 cluster rather than the high-replication C1 cluster of the accountant GMM---consistent with the interview-acknowledged minority of hand-signers.
|
|
|
|
The replication-dominated framing is internally coherent with all three pieces of evidence, and it predicts and explains the residuals that a "near-universal" framing would be forced to treat as noise.
|
|
We therefore recommend that future work building on this calibration strategy should explicitly distinguish replication-dominated from replication-pure calibration anchors.
|
|
|
|
## D. The Style-Replication Gap
|
|
|
|
Within the 71,656 documents exceeding cosine $0.95$, the dHash descriptor partitions them into three distinct populations: 29,529 (41.2%) with high-confidence structural evidence of non-hand-signing, 36,994 (51.7%) with moderate structural similarity, and 5,133 (7.2%) with no structural corroboration despite near-identical feature-level appearance.
|
|
A cosine-only classifier would treat all 71,656 documents identically; the dual-descriptor framework separates them into populations with fundamentally different interpretations.
|
|
|
|
The 7.2% classified as "high style consistency" (cosine $> 0.95$ but dHash $> 15$) are particularly informative.
|
|
Several plausible explanations may account for their high feature similarity without structural identity, though we lack direct evidence to confirm their relative contributions.
|
|
Many accountants may develop highly consistent signing habits---using similar pen pressure, stroke order, and spatial layout---resulting in signatures that appear nearly identical at the semantic feature level while retaining the microscopic variations inherent to handwriting.
|
|
Some may use signing pads or templates that further constrain variability without constituting image-level reproduction.
|
|
The dual-descriptor framework correctly identifies these cases as distinct from non-hand-signed signatures by detecting the absence of structural-level convergence.
|
|
|
|
## E. Value of a Replication-Dominated Calibration Group
|
|
|
|
The use of Firm A as a calibration reference addresses a fundamental challenge in document forensics: the scarcity of ground truth labels.
|
|
In most forensic applications, establishing ground truth requires expensive manual verification or access to privileged information about document provenance.
|
|
Our approach leverages domain knowledge---the established prevalence of non-hand-signing at a specific firm---to create a naturally occurring reference population within the dataset.
|
|
|
|
This calibration strategy has broader applicability beyond signature analysis.
|
|
Any forensic detection system operating on real-world corpora can benefit from identifying subpopulations with known dominant characteristics (positive or negative) to anchor threshold selection, particularly when the distributions of interest are non-normal and non-parametric or mixture-based thresholds are preferred over parametric alternatives.
|
|
The framing we adopt---replication-dominated rather than replication-pure---is an important refinement of this strategy: it prevents overclaim, accommodates interview-acknowledged heterogeneity, and yields classification rates that are internally consistent with the data.
|
|
|
|
## F. Pixel-Identity as Annotation-Free Ground Truth
|
|
|
|
A further methodological contribution is the use of byte-level pixel identity as an annotation-free gold positive.
|
|
Handwriting physics makes byte-identity impossible under independent signing events, so any pair of same-CPA signatures that are byte-identical after crop and normalization is an absolute positive for non-hand-signing, requiring no human review.
|
|
In our corpus 310 signatures satisfied this condition, and all candidate thresholds (KDE crossover 0.837; accountant-crossing 0.945; canonical 0.95) classify this population perfectly.
|
|
Paired with a conservative low-similarity anchor, this yields EER-style validation without the stratified manual annotation that would otherwise be the default methodological choice.
|
|
We regard this as a reusable pattern for other document-forensics settings in which the target mechanism leaves a byte-level physical signature in the artifact itself.
|
|
|
|
## G. Limitations
|
|
|
|
Several limitations should be acknowledged.
|
|
|
|
First, comprehensive per-document ground truth labels are not available.
|
|
The pixel-identity anchor is a strict *subset* of the true non-hand-signing positives (only those whose nearest same-CPA match happens to be byte-identical), and the low-similarity anchor ($n = 35$) is small because intra-CPA pairs rarely fall below cosine 0.70.
|
|
The reported FAR values should therefore be read as order-of-magnitude estimates rather than tight bounds, and a modest expansion of the negative anchor---for example by sampling inter-CPA pairs under explicit accountant-level blocking---would strengthen future work.
|
|
|
|
Second, the ResNet-50 feature extractor was used with pre-trained ImageNet weights without domain-specific fine-tuning.
|
|
While our ablation study and prior literature [20]--[22] support the effectiveness of transferred ImageNet features for signature comparison, a signature-specific feature extractor could improve discriminative performance.
|
|
|
|
Third, the red stamp removal preprocessing uses simple HSV color-space filtering, which may introduce artifacts where handwritten strokes overlap with red seal impressions.
|
|
In these overlap regions, blended pixels are replaced with white, potentially creating small gaps in the signature strokes that could reduce dHash similarity.
|
|
This effect would bias classification toward false negatives rather than false positives, but the magnitude has not been quantified.
|
|
|
|
Fourth, scanning equipment, PDF generation software, and compression algorithms may have changed over the 10-year study period (2013--2023), potentially affecting similarity measurements.
|
|
While cosine similarity and dHash are designed to be robust to such variations, longitudinal confounds cannot be entirely excluded, and we note that our accountant-level aggregates could mask within-accountant temporal transitions.
|
|
|
|
Fifth, the classification framework treats all signatures from a CPA as belonging to a single class, not accounting for potential changes in signing practice over time (e.g., a CPA who signed genuinely in early years but adopted non-hand-signing later).
|
|
Extending the accountant-level analysis to auditor-year units is a natural next step.
|
|
|
|
Sixth, the BD/McCrary transition estimates fall inside rather than between modes for the per-signature cosine distribution, reflecting that this test is a diagnostic of local density-smoothness violation rather than of two-mechanism separation.
|
|
This result is itself an informative diagnostic---consistent with the dip-test and Beta-mixture evidence that signature-level cosine is not cleanly bimodal---but it means BD/McCrary should be interpreted at the accountant level for threshold-setting purposes.
|
|
|
|
Finally, the legal and regulatory implications of our findings depend on jurisdictional definitions of "signature" and "signing."
|
|
Whether non-hand-signing of a CPA's own stored signature constitutes a violation of signing requirements is a legal question that our technical analysis can inform but cannot resolve.
|