Files
pdf_signature_extraction/paper/paper_a_conclusion_v3.md
T
gbanyan d3b63fc0b7 Paper A v3.14: remove A2 assumption + soften all partner-level claims
The within-auditor-year uniformity assumption (A2) introduced in v3.11
Section III-G was empirically tested via a new within-year uniformity
check (signature_analysis/27_within_year_uniformity.py; output in
reports/within_year_uniformity/). The check found that within-year
pairwise cosine distributions even at the calibration firm show
substantial heterogeneity inconsistent with strict single-mechanism
uniformity (Firm A 2023 CPAs typically have median pairwise cosine
around 0.85 with 20-70% of pairs below the all-pairs KDE crossover
0.837). A2 as stated ("a CPA who replicates any signature image in
that year is treated as doing so for every report") is therefore
falsified empirically.

Three explanations are compatible with the data and cannot be
disambiguated without manual inspection: (i) true within-year
mechanism mixing, (ii) multi-template replication workflows at the
same firm within a year, (iii) feature-extraction noise on repeatedly
scanned stamped images. Since A2 is falsified and its implications
cannot be restored under any of the three explanations, we remove
A2 entirely rather than downgrading it to an "approximation" or
"interpretive convention."

Changes applied:

1. Methodology Section III-G: A2 block deleted. Section now has only
   A1 (pair-detectability, cross-year pair-existence). Replaced A2
   with an explicit statement that we make no within-year or
   across-year uniformity assumption, that per-signature labels are
   signature-level quantities throughout, and that we abstain from
   partner-level frequency inferences. Three candidate explanations
   for within-year signature heterogeneity are listed (single-template
   replication, multi-template replication in parallel, within-year
   mixing, or combinations) without attempting disaggregation.

2. Methodology III-H strand 2 (L154) softened: "7.5% form a long left
   tail consistent with a minority of hand-signers" rewritten as
   reflecting "within-firm heterogeneity in signing output (we do not
   disaggregate partner-level mechanism here; see Section III-G)."

3. Methodology III-H visual-inspection strand (L152) and the
   corresponding Discussion V-C first strand (L41) and Conclusion L21
   softened: "for the majority of partners" changed to "for many of
   the sampled partners" (Codex round-14 MAJOR: "majority of partners"
   is itself a partner-level frequency claim under the new scope-of-
   claims regime).

4. Methodology III-K.3 Firm A anchor (L247): dropped "(consistent
   with a minority of hand-signers)" parenthetical.

5. Results IV-D cosine distribution narrative (L72): softened to
   "within-firm heterogeneity in signing outputs (see Section IV-E
   and Section III-G for the scope of partner-level claims)."

6. Results IV-E cluster split framing (L128): "minority-hand-signers
   framing of Section III-H" renamed to "within-firm heterogeneity
   framing of Section III-H" (matches the new III-H text).

7. Results IV-H.1 partner-level reading (L286): removed entirely.
   The v3.13 text "Under the within-year label-uniformity convention
   A2, this left-tail share is read as a partner-level minority of
   hand-signing CPAs" is replaced by a signature-level statement
   that explicitly lists hand-signing partners, multi-template
   replication, or a combination as possibilities without attempting
   attribution.

8. Results IV-H.1 stability argument (L308): softened from "persistent
   minority of hand-signing Firm A partners" to "persistent within-
   firm heterogeneity component," preserving the substantive argument
   that stability across production technologies is inconsistent with
   a noise-only explanation.

9. Results IV-I Firm A Capture Profile (L407): rewrote the "Firm A's
   minority hand-signers have not been captured" phrasing as a
   signature-level framing about the 7.5% left tail not projecting
   into the lowest-cosine document-level category under the dual-
   descriptor rules.

10. Abstract (L5): softened "alongside within-firm heterogeneity
    consistent with a minority of hand-signers" to "alongside residual
    within-firm heterogeneity." Abstract at 244/250 words.

11. Discussion V-C third strand (L43): added "multi-template
    replication workflows" to the list of possibilities and added
    a local "we do not disaggregate these mechanisms; see Section
    III-G for the scope of claims" disclaimer (Codex round-14 MINOR 5).

12. Discussion Limitations: added an Eighth limitation explicitly
    stating that partner-level frequency inferences are not made and
    why (no within-year uniformity assumption is adopted).

13. Methodology L124 opening: "We make one stipulation about within-
    auditor-year structure" fixed to "same-CPA pair detectability,"
    since A1 is a cross-year pair-existence property, not a within-
    year claim (Codex round-14 MINOR 3).

14. Two broken cross-references fixed (Codex round-14 MINOR 6):
    methodology L86 Section V-D -> V-G (Limitations is G, not D which
    is Style-Replication Gap); methodology L167 Section III-I ->
    Section IV-D (the empirical cosine distribution is in IV-D, not
    III-I).

Script 27 and its output (reports/within_year_uniformity/*) remain
in the repository as internal due-diligence evidence but are not
cited from the paper. The paper's substantive claims at signature-
level and accountant (cross-year pooled) level are unchanged; only
the partner-level interpretive overlay is removed. All tables
(IV-XVIII), Appendix A (BD/McCrary sensitivity), and all reported
numbers are unchanged.

Codex round-14 (gpt-5.5 xhigh) verification: Major Revision caused
by one BLOCKER (stale DOCX artifact, not part of this commit) plus
one MAJOR ("majority of partners" partner-frequency claim) plus
four MINOR findings. All five markdown findings addressed in this
commit. DOCX regeneration deferred to pre-submission packaging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 22:06:22 +08:00

5.7 KiB

VI. Conclusion and Future Work

Conclusion

We have presented an end-to-end AI pipeline for detecting non-hand-signed auditor signatures in financial audit reports at scale. Applied to 90,282 audit reports from Taiwanese publicly listed companies spanning 2013--2023, our system extracted and analyzed 182,328 CPA signatures using a combination of VLM-based page identification, YOLO-based signature detection, deep feature extraction, and dual-descriptor similarity verification, with threshold selection placed on a statistically principled footing through two methodologically distinct threshold estimators and a density-smoothness diagnostic applied at two analysis levels.

The seven numbered contributions listed in Section I can be grouped into four broader methodological themes, summarized below.

First, we argued that non-hand-signing detection is a distinct problem from signature forgery detection, requiring analytical tools focused on the upper tail of intra-signer similarity rather than inter-signer discriminability.

Second, we showed that combining cosine similarity of deep embeddings with difference hashing is essential for meaningful classification---among 71,656 documents with high feature-level similarity, the dual-descriptor framework revealed that only 41% exhibit converging structural evidence of non-hand-signing while 7% show no structural corroboration despite near-identical feature-level appearance, demonstrating that a single-descriptor approach conflates style consistency with image reproduction.

Third, we introduced a convergent threshold framework combining two methodologically distinct estimators---KDE antimode (with a Hartigan unimodality test) and an EM-fitted Beta mixture (with a logit-Gaussian robustness check)---together with a Burgstahler-Dichev / McCrary density-smoothness diagnostic. Applied at both the signature and accountant levels, this framework surfaced an informative structural asymmetry: at the per-signature level the distribution is a continuous quality spectrum for which no two-mechanism mixture provides a good fit, whereas at the per-accountant level BIC cleanly selects a three-component mixture and the KDE antimode together with the Beta-mixture and logit-Gaussian estimators agree within \sim 0.006 at cosine \approx 0.975. The Burgstahler-Dichev / McCrary test, by contrast, is largely null at the accountant level (no significant transition at two of three cosine bin widths and two of three dHash bin widths, with the one cosine transition sitting on the upper edge of the convergence band; Appendix A); at N = 686 accountants the test has limited power and cannot affirmatively establish smoothness, but its largely-null pattern is consistent with the smoothly-mixed cluster boundaries implied by the accountant-level GMM. The substantive reading is therefore narrower than "discrete behavior": pixel-level output quality is continuous and heavy-tailed, and accountant-level aggregate behavior is clustered into three recognizable groups whose inter-cluster boundaries are gradual rather than sharp.

Fourth, we introduced a replication-dominated calibration methodology---explicitly distinguishing replication-dominated from replication-pure calibration anchors and validating classification against a byte-level pixel-identity anchor (310 byte-identical signatures) paired with a $\sim$50,000-pair inter-CPA negative anchor. To document the within-firm sampling variance of using the calibration firm as its own validation reference, we split the firm's CPAs 70/30 at the CPA level and report capture rates on both folds with Wilson 95% confidence intervals; extreme rules agree across folds while rules in the operational 85-95% capture band differ by 1-5 percentage points, reflecting within-firm heterogeneity in replication intensity rather than generalization failure. This framing is internally consistent with all available evidence: the visual-inspection observation of pixel-identical signatures across unrelated audit engagements for many of the sampled calibration-firm partners; the 92.5% / 7.5% split in signature-level cosine thresholds; and, among the 171 calibration-firm CPAs with enough signatures to enter the accountant-level GMM (of 180 registered CPAs; 178 after excluding two with disambiguation ties, Section IV-G.2), the 139 / 32 split between the high-replication and middle-band clusters.

An ablation study comparing ResNet-50, VGG-16 and EfficientNet-B0 confirmed that ResNet-50 offers the best balance of discriminative power, classification stability, and computational efficiency for this task.

Future Work

Several directions merit further investigation. Domain-adapted feature extractors, trained or fine-tuned on signature-specific datasets, may improve discriminative performance beyond the transferred ImageNet features used in this study. Extending the accountant-level analysis to auditor-year units---using the same convergent threshold framework at finer temporal resolution---could reveal within-accountant transitions between hand-signing and non-hand-signing over the decade. The pipeline's applicability to other jurisdictions and document types (e.g., corporate filings in other countries, legal documents, medical records) warrants exploration. The replication-dominated calibration strategy and the pixel-identity anchor technique are both directly generalizable to settings in which (i) a reference subpopulation has a known dominant mechanism and (ii) the target mechanism leaves a byte-level signature in the artifact itself. Finally, integration with regulatory monitoring systems and a larger negative-anchor study---for example drawing from inter-CPA pairs under explicit accountant-level blocking---would strengthen the practical deployment potential of this approach.