12637cd413
Lands v4.0 §IV / §V / §VI content into v3.20.0 master sub-files. Strips internal close-out checklists, draft notes, and open-questions blocks at splice. Completes the Phase 6 manuscript-master file assembly. §IV Results (paper_a_results_v3.md): - §IV-A..C: kept v3.20.0 inherited content (experimental setup, detection performance, all-pairs distribution); added v4 scope note (Big-4 primary) at the §IV header - §IV-D..K: replaced v3.20.0 §IV-D..H with v4.0 §IV-D..K (Big-4 distributional / mixture / convergence / LOOO / pixel-identity / inter-CPA reference / five-way classification / full-dataset robustness) - §IV-L: renumbered v3.20.0 §IV-I (backbone ablation) content to match v4's "§IV-L inherited from v3.20.0 §IV-I" reframing - §IV-M: appended v4.0 ICCR calibration tables (XX-XXVI): composition decomposition, per-comparison/per-signature/ per-document ICCRs, firm heterogeneity + cross-firm hit matrix, alert-rate sensitivity - §III-K ablation cross-ref updated to §IV-L (was §IV-I) - Phase 3 close-out checklist (lines 365+) stripped §V Discussion (paper_a_discussion_v3.md): - Replaced v3.20.0 §V with v4.0 §V (8 sub-sections A-H): A. Distinct problem framing B. Continuous quality spectrum + composition-driven multimodality C. Firm A as templated end (case study, not anchor) D. K=2 / K=3 descriptive partitions E. Three-score convergent internal-consistency F. Anchor-based multi-level calibration G. Pixel-identity hard positive anchor + ICCR reframing H. Limitations (14 items: 9 v4-specific + 5 inherited from v3.x) §VI Conclusion (paper_a_conclusion_v3.md): - Replaced v3.20.0 §VI with v4.0 §VI (8 contribution items mirroring §I contributions; 4-direction future work). Known splice-time issue (deferred to typesetting): §IV table numbering is sequential by label (V, VI, ..., XXVI) but Table XIX (document-level worst-case) appears physically before Tables XVI/XVII/XVIII in §IV-J narrative flow. IEEE Access typesetters typically normalize table order during typesetting; we accept the in-file ordering quirk to preserve the §IV-J narrative arc (per-signature -> document-level worst-case -> K=3 cross-tab). Renumbering to strictly-ascending physical order would require renaming Tables XVI/XVII/XVIII -> XVII/XVIII/XIX with downstream cross-reference updates; deferred unless partner Jimmy review or IEEE Access submission portal flags it. Manuscript splice complete. Working drafts in paper/v4/ retained as archive of the round-by-round Phase 5 fix history. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 lines
4.4 KiB
Markdown
8 lines
4.4 KiB
Markdown
# VI. Conclusion and Future Work
|
||
|
||
We present a fully automated pipeline for detecting non-hand-signed CPA signatures in Taiwan-listed financial audit reports and a multi-tool framework for characterising and disclosing its operational behaviour at the Big-4 sub-corpus scope. The pipeline processes raw PDFs through VLM-based page identification, YOLO-based signature detection, ResNet-50 feature extraction, and dual-descriptor (cosine + independent-minimum dHash) similarity computation. The operational output is an inherited Paper A five-way per-signature classifier with worst-case document-level aggregation (§III-L). Applied to 90,282 audit reports filed between 2013 and 2023, the pipeline extracts 182,328 signatures from 758 CPAs, with the Big-4 sub-corpus (437 CPAs at accountant level; 150,442–150,453 signatures at signature level) as the primary analytical population.
|
||
|
||
Our central methodological contributions are: (1) a composition decomposition (Scripts 39b–39e) that establishes the absence of a within-population bimodal antimode in the Big-4 descriptor distribution: the apparent multimodality dissolves under joint firm-mean centring and integer-tie jitter ($p_{\text{median}} = 0.35$), so distributional "natural-threshold" framings of the inherited operating points are not empirically supported; (2) an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units of analysis — per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ for the deployed any-pair HC rule), and per-document ($0.34$ for the operational HC$+$MC alarm) — with explicit terminological replacement of "FAR" by "ICCR" given the unsupervised setting; (3) firm heterogeneity quantification: logistic regression with pool-size adjustment gives odds ratios $0.053$, $0.010$, $0.027$ for Firms B/C/D relative to Firm A reference, indicating a large multiplicative effect that pool-size differences do not explain; (4) cross-firm hit matrix evidence that under the deployed any-pair rule, within-firm collision concentration is $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms), consistent with firm-specific template, stamp, or document-production reuse mechanisms; (5) K=3 mixture demoted from "three mechanism clusters" to a descriptive firm-compositional partition; (6) three feature-derived scores converging on the per-CPA descriptor-position ranking at Spearman $\rho \geq 0.879$, reported as internal consistency rather than external validation; (7) $0\%$ positive-anchor miss rate on 262 byte-identical Big-4 signatures with the conservative-subset caveat; and (8) a ten-tool unsupervised-validation collection (§III-M Table XXVII) that explicitly discloses each tool's untested assumption and positions the system as an anchor-calibrated screening framework with human-in-the-loop review, not as a validated forensic detector.
|
||
|
||
Future work falls in four directions. *First*, a small-scale human-rated validation set would enable direct ROC optimisation and provide signature-level ground truth that v4.0 fundamentally lacks; without such ground truth, no true error rates can be reported. *Second*, the within-firm collision concentration documented in §III-L.4 (any-pair $76.7$–$98.8\%$ across Big-4; same-pair joint $97.0$–$99.96\%$) invites a separate study to distinguish deliberate template sharing from passive firm-level production artefacts (shared scanners, common form templates, identical report-generation infrastructure) — a question the inter-CPA-anchor analysis alone cannot resolve. *Third*, the descriptive Firm A versus Firms B/C/D contrast (per-document HC$+$MC alarm $0.62$ vs $0.09$–$0.16$) — together with v3.x's byte-level evidence of 145 pixel-identical signatures across $\sim 50$ distinct Firm A partners — invites a companion analysis examining whether such firm-level signing patterns correlate with established audit-quality measures. *Fourth*, generalisation to mid- and small-firm contexts requires extending the anchor-based ICCR framework to scopes where firm-level LOOO folds are not available; the §III-I.4 composition diagnostics already document that the absence of within-population bimodality is corpus-universal, so the v4.0 calibration approach in principle generalises, but a full extension with cluster-robust uncertainty quantification is left as future work.
|