pdf_signature_extraction/paper/paper_a_conclusion_v3.md

# VI. Conclusion and Future Work

## Conclusion

We have presented an end-to-end AI pipeline for detecting non-hand-signed auditor signatures in financial audit reports at scale.
Applied to 90,282 audit reports from Taiwanese publicly listed companies spanning 2013--2023, our system extracted and analyzed 182,328 CPA signatures using a combination of VLM-based page identification, YOLO-based signature detection, deep feature extraction, and dual-descriptor similarity verification, with the operational classifier's cosine cut anchored on a whole-sample Firm A percentile heuristic and the per-signature similarity distribution characterised through two threshold estimators and a density-smoothness diagnostic.

The seven numbered contributions listed in Section I can be grouped into four broader methodological themes, summarized below.

First, we argued that non-hand-signing detection is a distinct problem from signature forgery detection, requiring analytical tools focused on the upper tail of intra-signer similarity rather than inter-signer discriminability.

Second, we showed that combining cosine similarity of deep embeddings with difference hashing is essential for meaningful classification---among 71,656 documents with high feature-level similarity, the dual-descriptor framework revealed that only 41% exhibit converging structural evidence of non-hand-signing while 7% show no structural corroboration despite near-identical feature-level appearance, demonstrating that a single-descriptor approach conflates style consistency with image reproduction.

Third, we characterised the per-signature similarity distribution using three diagnostics---a Hartigan dip test, an EM-fitted Beta mixture (with logit-Gaussian robustness check), and a Burgstahler-Dichev / McCrary density-smoothness procedure---and showed that no two-mechanism mixture cleanly explains it: the dip test fails to reject unimodality for Firm A ($p = 0.17$), BIC strongly prefers a 3-component over a 2-component Beta fit ($\Delta\text{BIC} = 381$ for Firm A), and the BD/McCrary candidate transition lies inside the non-hand-signed mode rather than between modes (and is not bin-width-stable; Appendix A).
The substantive reading is that *pixel-level output quality* is a continuous spectrum produced by firm-specific reproduction technologies (administrative stamping in early years, firm-level e-signing later) and scan conditions, rather than a discrete class cleanly separated from hand-signing.
This reading motivates anchoring the operational classifier's cosine cut on a whole-sample Firm A P7.5 percentile heuristic (cos $> 0.95$) rather than on a mixture-fit crossing.

Fourth, we introduced a *replication-dominated* calibration methodology---explicitly distinguishing replication-dominated from replication-pure calibration anchors and validating classification against a byte-level pixel-identity anchor (310 byte-identical signatures) paired with a $\sim$50,000-pair inter-CPA negative anchor.
To document the within-firm sampling variance of using the calibration firm as its own validation reference, we split the firm's CPAs 70/30 at the CPA level and report capture rates on both folds with Wilson 95% confidence intervals; extreme rules agree across folds while rules in the operational 85--95% capture band differ by 1--5 percentage points, reflecting within-firm heterogeneity in replication intensity rather than generalization failure.
This framing is internally consistent with the available evidence: the byte-level pair analysis finding of 145 pixel-identical calibration-firm signatures across 50 distinct partners of 180 registered (Section IV-F.1); the 92.5% / 7.5% split in signature-level cosine thresholds and the dip-test-confirmed unimodal-long-tail shape of Firm A's per-signature cosine distribution (Section IV-D.1); and the 95.9% top-decile concentration of Firm A auditor-years in the threshold-independent partner-ranking analysis (Section IV-G.2).

An ablation study comparing ResNet-50, VGG-16 and EfficientNet-B0 confirmed that ResNet-50 offers the best balance of discriminative power, classification stability, and computational efficiency for this task.

## Future Work

Several directions merit further investigation.
Domain-adapted feature extractors, trained or fine-tuned on signature-specific datasets, may improve discriminative performance beyond the transferred ImageNet features used in this study.
Extending the analysis to auditor-year units---computing per-signature statistics within each fiscal year and tracking how individual CPAs move across years---could reveal within-CPA transitions between hand-signing and non-hand-signing over the decade and is the natural next step beyond the cross-sectional analysis reported here.
The pipeline's applicability to other jurisdictions and document types (e.g., corporate filings in other countries, legal documents, medical records) warrants exploration.
The replication-dominated calibration strategy and the pixel-identity anchor technique are both generalizable to settings in which (i) a reference subpopulation has a known dominant mechanism and (ii) the target mechanism leaves a byte-level signature in the artifact itself, conditional on the availability of analogous anchors in the new domain and on artifact-generation physics that preserve the byte-level trace.
Finally, integration with regulatory monitoring systems and a larger negative-anchor study---for example drawing from inter-CPA pairs under explicit accountant-level blocking---would strengthen the practical deployment potential of this approach.