Files
pdf_signature_extraction/paper/paper_a_abstract_v3.md
T
gbanyan c79329457a Phase 6 manuscript splice (1/2): Abstract / §I / §II / §III spliced
Splices v4 drafts into v3.20.0 master sub-files. Drops the
"paper/v4/" working drafts and lands the v4.0 content in the master
file structure. Internal draft notes / close-out checklists / open-
questions blocks stripped at splice (per round-1 through round-6
deferral).

Abstract (paper_a_abstract_v3.md):
- Replaced v3.20.0 abstract (240w) with v4.0 abstract (247w).

§I Introduction (paper_a_introduction_v3.md):
- Replaced v3.20.0 §I with v4.0 §I (16 paragraphs + 8-item
  contributions list).

§II Related Work (paper_a_related_work_v3.md):
- Inserted v4.0 LOOO addition paragraph after the existing
  finite-mixture paragraph; added refs [42]-[44] to the
  internal reference annotation list.

§III Methodology (paper_a_methodology_v3.md):
- §III-A..F (Pipeline / Data / Page ID / Detection / Features /
  Dual Descriptors): kept v3.20.0 content unchanged.
- §III-G..M: replaced v3.20.0 §III-G..K with v4.0 §III-G..M
  (Unit & Scope / Reference Populations / Distributional
  Diagnostics + composition decomposition / K=3 descriptive /
  Convergent internal-consistency / Anchor-based ICCR L.0-L.7 /
  Validation strategy + Table XXVII ten-tool collection).
- §III-N Data Source & Anonymization: kept v3.20.0 §III-L content,
  renumbered to §III-N (after v4 §III-M).
- §III-E ablation cross-reference: updated "§IV-I" -> "§IV-L" to
  match the renumbered §IV.
- §III-F pixel-identity cross-reference: updated "§III-J" ->
  "§III-K".

Gemini round-2 artifact paper/gemini_review_v4_round2.md also
added (was uncommitted from the parallel-review batch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:35:53 +08:00

8 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Abstract
<!-- IEEE Access target: <= 250 words, single paragraph -->
Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes reusing a stored signature image across reports — through administrative stamping or firm-level electronic signing — undermining individualized attestation. We build an end-to-end pipeline detecting such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, YOLOv11 localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. Applied to 90,282 Taiwan audit reports (20132023), the pipeline yields 182,328 signatures from 758 CPAs; primary analyses are scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures). Distributional diagnostics show that the apparent multimodality of the descriptor distribution dissolves under joint firm-mean centring and integer-tie jitter ($p$ rises to $0.35$), so no within-population bimodal antimode anchors the operational thresholds. We instead adopt an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units: per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ under the deployed any-pair high-confidence rule), and per-document ($0.34$ for the operational HC+MC alarm). Firm heterogeneity is decisive: Firm A's per-document HC+MC alarm rate is $0.62$ versus $0.09$$0.16$ at Firms B/C/D after pool-size adjustment, and under the deployed any-pair rule $77$$99\%$ of inter-CPA collisions concentrate within the source firm — consistent with firm-level template-like reuse. We position the system as a specificity-proxy-anchored screening framework with human-in-the-loop review, not as a validated forensic detector; no calibrated error rates are reportable without signature-level ground truth.
<!-- Word count: 247 -->