Files
pdf_signature_extraction/paper/paper_a_conclusion_v3.md
T
gbanyan 12637cd413 Phase 6 manuscript splice (2/2): §IV / §V / §VI spliced
Lands v4.0 §IV / §V / §VI content into v3.20.0 master sub-files.
Strips internal close-out checklists, draft notes, and open-questions
blocks at splice. Completes the Phase 6 manuscript-master file
assembly.

§IV Results (paper_a_results_v3.md):
- §IV-A..C: kept v3.20.0 inherited content (experimental setup,
  detection performance, all-pairs distribution); added v4 scope
  note (Big-4 primary) at the §IV header
- §IV-D..K: replaced v3.20.0 §IV-D..H with v4.0 §IV-D..K (Big-4
  distributional / mixture / convergence / LOOO / pixel-identity /
  inter-CPA reference / five-way classification / full-dataset
  robustness)
- §IV-L: renumbered v3.20.0 §IV-I (backbone ablation) content to
  match v4's "§IV-L inherited from v3.20.0 §IV-I" reframing
- §IV-M: appended v4.0 ICCR calibration tables (XX-XXVI):
  composition decomposition, per-comparison/per-signature/
  per-document ICCRs, firm heterogeneity + cross-firm hit matrix,
  alert-rate sensitivity
- §III-K ablation cross-ref updated to §IV-L (was §IV-I)
- Phase 3 close-out checklist (lines 365+) stripped

§V Discussion (paper_a_discussion_v3.md):
- Replaced v3.20.0 §V with v4.0 §V (8 sub-sections A-H):
  A. Distinct problem framing
  B. Continuous quality spectrum + composition-driven multimodality
  C. Firm A as templated end (case study, not anchor)
  D. K=2 / K=3 descriptive partitions
  E. Three-score convergent internal-consistency
  F. Anchor-based multi-level calibration
  G. Pixel-identity hard positive anchor + ICCR reframing
  H. Limitations (14 items: 9 v4-specific + 5 inherited from v3.x)

§VI Conclusion (paper_a_conclusion_v3.md):
- Replaced v3.20.0 §VI with v4.0 §VI (8 contribution items mirroring
  §I contributions; 4-direction future work).

Known splice-time issue (deferred to typesetting): §IV table numbering
is sequential by label (V, VI, ..., XXVI) but Table XIX (document-level
worst-case) appears physically before Tables XVI/XVII/XVIII in §IV-J
narrative flow. IEEE Access typesetters typically normalize table order
during typesetting; we accept the in-file ordering quirk to preserve
the §IV-J narrative arc (per-signature -> document-level worst-case ->
K=3 cross-tab). Renumbering to strictly-ascending physical order would
require renaming Tables XVI/XVII/XVIII -> XVII/XVIII/XIX with
downstream cross-reference updates; deferred unless partner Jimmy
review or IEEE Access submission portal flags it.

Manuscript splice complete. Working drafts in paper/v4/ retained as
archive of the round-by-round Phase 5 fix history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:43:41 +08:00

8 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# VI. Conclusion and Future Work
We present a fully automated pipeline for detecting non-hand-signed CPA signatures in Taiwan-listed financial audit reports and a multi-tool framework for characterising and disclosing its operational behaviour at the Big-4 sub-corpus scope. The pipeline processes raw PDFs through VLM-based page identification, YOLO-based signature detection, ResNet-50 feature extraction, and dual-descriptor (cosine + independent-minimum dHash) similarity computation. The operational output is an inherited Paper A five-way per-signature classifier with worst-case document-level aggregation (§III-L). Applied to 90,282 audit reports filed between 2013 and 2023, the pipeline extracts 182,328 signatures from 758 CPAs, with the Big-4 sub-corpus (437 CPAs at accountant level; 150,442150,453 signatures at signature level) as the primary analytical population.
Our central methodological contributions are: (1) a composition decomposition (Scripts 39b39e) that establishes the absence of a within-population bimodal antimode in the Big-4 descriptor distribution: the apparent multimodality dissolves under joint firm-mean centring and integer-tie jitter ($p_{\text{median}} = 0.35$), so distributional "natural-threshold" framings of the inherited operating points are not empirically supported; (2) an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units of analysis — per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ for the deployed any-pair HC rule), and per-document ($0.34$ for the operational HC$+$MC alarm) — with explicit terminological replacement of "FAR" by "ICCR" given the unsupervised setting; (3) firm heterogeneity quantification: logistic regression with pool-size adjustment gives odds ratios $0.053$, $0.010$, $0.027$ for Firms B/C/D relative to Firm A reference, indicating a large multiplicative effect that pool-size differences do not explain; (4) cross-firm hit matrix evidence that under the deployed any-pair rule, within-firm collision concentration is $98.8\%$ at Firm A and $76.7$$83.7\%$ at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$$99.96\%$ within-firm across all four firms), consistent with firm-specific template, stamp, or document-production reuse mechanisms; (5) K=3 mixture demoted from "three mechanism clusters" to a descriptive firm-compositional partition; (6) three feature-derived scores converging on the per-CPA descriptor-position ranking at Spearman $\rho \geq 0.879$, reported as internal consistency rather than external validation; (7) $0\%$ positive-anchor miss rate on 262 byte-identical Big-4 signatures with the conservative-subset caveat; and (8) a ten-tool unsupervised-validation collection (§III-M Table XXVII) that explicitly discloses each tool's untested assumption and positions the system as an anchor-calibrated screening framework with human-in-the-loop review, not as a validated forensic detector.
Future work falls in four directions. *First*, a small-scale human-rated validation set would enable direct ROC optimisation and provide signature-level ground truth that v4.0 fundamentally lacks; without such ground truth, no true error rates can be reported. *Second*, the within-firm collision concentration documented in §III-L.4 (any-pair $76.7$$98.8\%$ across Big-4; same-pair joint $97.0$$99.96\%$) invites a separate study to distinguish deliberate template sharing from passive firm-level production artefacts (shared scanners, common form templates, identical report-generation infrastructure) — a question the inter-CPA-anchor analysis alone cannot resolve. *Third*, the descriptive Firm A versus Firms B/C/D contrast (per-document HC$+$MC alarm $0.62$ vs $0.09$$0.16$) — together with v3.x's byte-level evidence of 145 pixel-identical signatures across $\sim 50$ distinct Firm A partners — invites a companion analysis examining whether such firm-level signing patterns correlate with established audit-quality measures. *Fourth*, generalisation to mid- and small-firm contexts requires extending the anchor-based ICCR framework to scopes where firm-level LOOO folds are not available; the §III-I.4 composition diagnostics already document that the absence of within-population bimodality is corpus-universal, so the v4.0 calibration approach in principle generalises, but a full extension with cluster-robust uncertainty quantification is left as future work.