Files
pdf_signature_extraction/paper/paper_a_conclusion_v3.md
T
gbanyan 9b11f03548 Paper A v3: full rewrite for IEEE Access with three-method convergence
Major changes from v2:

Terminology:
- "digitally replicated" -> "non-hand-signed" throughout (per partner v3
  feedback and to avoid implicit accusation)
- "Firm A near-universal non-hand-signing" -> "replication-dominated"
  (per interview nuance: most but not all Firm A partners use replication)

Target journal: IEEE TAI -> IEEE Access (per NCKU CSIE list)

New methodological sections (III.G-III.L + IV.D-IV.G):
- Three convergent threshold methods (KDE antimode + Hartigan dip test /
  Burgstahler-Dichev McCrary / EM-fitted Beta mixture + logit-GMM
  robustness check)
- Explicit unit-of-analysis discussion (signature vs accountant)
- Accountant-level 2D Gaussian mixture (BIC-best K=3 found empirically)
- Pixel-identity validation anchor (no manual annotation needed)
- Low-similarity negative anchor + Firm A replication-dominated anchor

New empirical findings integrated:
- Firm A signature cosine UNIMODAL (dip p=0.17) - long left tail = minority
  hand-signers
- Full-sample cosine MULTIMODAL but not cleanly bimodal (BIC prefers 3-comp
  mixture) - signature-level is continuous quality spectrum
- Accountant-level mixture trimodal (C1 Deloitte-heavy 139/141,
  C2 other Big-4, C3 smaller firms). 2-comp crossings cos=0.945, dh=8.10
- Pixel-identity anchor (310 pairs) gives perfect recall at all cosine
  thresholds
- Firm A anchor rates: cos>0.95=92.5%, dual-rule cos>0.95 AND dh<=8=89.95%

New discussion section V.B: "Continuous-quality spectrum vs discrete-
behavior regimes" - the core interpretive contribution of v3.

References added: Hartigan & Hartigan 1985, Burgstahler & Dichev 1997,
McCrary 2008, Dempster-Laird-Rubin 1977, White 1982 (refs 37-41).

export_v3.py builds Paper_A_IEEE_Access_Draft_v3.docx (462 KB, +40% vs v2
from expanded methodology + results sections).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 00:14:47 +08:00

31 lines
4.2 KiB
Markdown

# VI. Conclusion and Future Work
## Conclusion
We have presented an end-to-end AI pipeline for detecting non-hand-signed auditor signatures in financial audit reports at scale.
Applied to 90,282 audit reports from Taiwanese publicly listed companies spanning 2013--2023, our system extracted and analyzed 182,328 CPA signatures using a combination of VLM-based page identification, YOLO-based signature detection, deep feature extraction, and dual-descriptor similarity verification, with threshold selection placed on a statistically principled footing through three independent methods applied at two analysis levels.
Our contributions are fourfold.
First, we argued that non-hand-signing detection is a distinct problem from signature forgery detection, requiring analytical tools focused on the upper tail of intra-signer similarity rather than inter-signer discriminability.
Second, we showed that combining cosine similarity of deep embeddings with difference hashing is essential for meaningful classification---among 71,656 documents with high feature-level similarity, the dual-descriptor framework revealed that only 41% exhibit converging structural evidence of non-hand-signing while 7% show no structural corroboration despite near-identical feature-level appearance, demonstrating that a single-descriptor approach conflates style consistency with image reproduction.
Third, we introduced a three-method convergent threshold framework combining KDE antimode (with a Hartigan dip test as formal bimodality check), Burgstahler-Dichev / McCrary discontinuity, and EM-fitted Beta mixture (with a logit-Gaussian robustness check).
Applied at both the signature and accountant levels, this framework surfaced an informative structural asymmetry: at the per-signature level the distribution is a continuous quality spectrum for which no two-mechanism mixture provides a good fit, whereas at the per-accountant level BIC cleanly selects a three-component mixture whose two-component marginal crossings (cosine $= 0.945$, dHash $= 8.10$) are sharp and mutually consistent.
The substantive reading is that *pixel-level output quality* is continuous while *individual signing behavior* is close to discrete.
Fourth, we introduced a *replication-dominated* calibration methodology---explicitly distinguishing replication-dominated from replication-pure calibration anchors and validating classification against a byte-level pixel-identity anchor that requires no manual annotation.
This framing is internally consistent with all available evidence: interview reports that the calibration firm uses non-hand-signing for most but not all partners; the 92.5% / 7.5% split in signature-level cosine thresholds; and the 139/32 split of the calibration firm's 180 CPAs across the accountant-level mixture's high-replication and middle-band clusters.
An ablation study comparing ResNet-50, VGG-16 and EfficientNet-B0 confirmed that ResNet-50 offers the best balance of discriminative power, classification stability, and computational efficiency for this task.
## Future Work
Several directions merit further investigation.
Domain-adapted feature extractors, trained or fine-tuned on signature-specific datasets, may improve discriminative performance beyond the transferred ImageNet features used in this study.
Extending the accountant-level analysis to auditor-year units---using the same three-method convergent framework but at finer temporal resolution---could reveal within-accountant transitions between hand-signing and non-hand-signing over the decade.
The pipeline's applicability to other jurisdictions and document types (e.g., corporate filings in other countries, legal documents, medical records) warrants exploration.
The replication-dominated calibration strategy and the pixel-identity anchor technique are both directly generalizable to settings in which (i) a reference subpopulation has a known dominant mechanism and (ii) the target mechanism leaves a byte-level signature in the artifact itself.
Finally, integration with regulatory monitoring systems and a larger negative-anchor study---for example drawing from inter-CPA pairs under explicit accountant-level blocking---would strengthen the practical deployment potential of this approach.