Files
pdf_signature_extraction/paper/paper_a_conclusion_v3.md
T
gbanyan 3672c9343e Phase 6 round-6: soften firm-heterogeneity framing, fix DOCX table render
Framing softening (per partner tone decision: own the limitation rather
than defend the strong claim).

Abstract: "Firm heterogeneity is decisive ... consistent with firm-level
template-like reuse" -> "The framework surfaces pronounced firm-level
heterogeneity ... consistent with firm-level template-like reuse but not
independently diagnostic, since descriptor-only data cannot separate
reuse from digitisation-pipeline or signing-style homogeneity within a
firm; we report it as a scope limitation rather than a mechanism finding."

S V-H Limitations: new bullet "Mechanism attribution for the firm-level
heterogeneity is not identifiable from descriptor-only data." enumerates
three non-mutually-exclusive firm-level mechanisms (template-like reuse /
digitisation-pipeline homogeneity / signing-style homogeneity), notes the
(cosine, dHash) descriptor pair is by construction indifferent to which
mechanism generated a near-identical pair, and lists what additional data
would be needed for attribution.

S VI Conclusion items (3) and (4): "firm heterogeneity quantification" ->
"firm-level heterogeneity surfaced by the framework ... reported as a
framework-discriminative observation rather than a mechanism finding";
item (4) expanded from template/stamp/document-production reuse alone to
the three-mechanism scope, with explicit "not independently establishing"
and S V-H cross-reference.

DOCX export fix (export_v3.py): add missing LaTeX-to-Unicode tokens
(\checkmark, \lvert/\rvert, \lVert/\rVert, \in, \notin, \max, \min,
\log, \ln, \exp, \bullet) that were silently dropping content from
Table III rows 2-4 (integer-jitter robustness check marks empty) and
Table XVIII drift column (|Delta| empty).

Rebuild Paper_A_IEEE_Access_Draft_v3.docx via export_v3.py and install
copy as Paper_A_IEEE_Access_Draft_v4.0_20260515.docx (replaces prior
pandoc-built v4 DOCX which had empty cells in every table header with
LaTeX math and inconsistent column widths). All 43 tables now have
non-empty cells with sub/superscript runs.

Mirrored in paper_a_v4_combined.md for consistency with the single-file
combined source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 02:16:05 +08:00

8 lines
4.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# VI. Conclusion and Future Work
We present a fully automated pipeline for screening non-hand-signed CPA signatures in Taiwan-listed financial audit reports, together with an anchor-calibrated screening framework that characterises the pipeline's operational behaviour at the Big-4 sub-corpus scope under explicit unsupervised assumptions. The pipeline processes raw PDFs through VLM-based page identification, YOLO-based signature detection, ResNet-50 feature extraction, and dual-descriptor (cosine + independent-minimum dHash) similarity computation. The operational output is the deployed five-way per-signature classifier with worst-case document-level aggregation (§III-H.1; calibrated in §III-L). Applied to 90,282 audit reports filed between 2013 and 2023, the pipeline extracts 182,328 signatures from 758 CPAs, with the Big-4 sub-corpus (437 CPAs at accountant level; 150,442150,453 signatures at signature level) as the primary analytical population.
Our central methodological contributions are: (1) a composition decomposition that establishes the absence of a within-population bimodal antimode in the Big-4 descriptor distribution: the apparent multimodality dissolves under joint firm-mean centring and integer-tie jitter ($p_{\text{median}} = 0.35$), so distributional "natural-threshold" framings of the deployed operating points are not empirically supported; (2) an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units of analysis — per-comparison ($0.0006$ at cos$>0.95$; $0.0013$ at dHash$\leq 5$; $0.00014$ jointly), pool-normalised per-signature ($0.11$ for the deployed any-pair HC rule), and per-document ($0.34$ for the operational HC$+$MC alarm) — with explicit terminological replacement of "FAR" by "ICCR" given the unsupervised setting; (3) firm-level heterogeneity surfaced by the framework: logistic regression with pool-size adjustment gives odds ratios $0.053$, $0.010$, $0.027$ for Firms B/C/D relative to Firm A reference, a contrast that pool-size differences do not explain and that we report as a framework-discriminative observation rather than a mechanism finding (§V-H); (4) cross-firm hit matrix evidence that under the deployed any-pair rule, within-firm collision concentration is $98.8\%$ at Firm A and $76.7$$83.7\%$ at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$$99.96\%$ within-firm across all four firms), consistent with — but not independently establishing — firm-level template-like reuse, digitisation-pipeline homogeneity, or signing-style similarity, which descriptor-only data cannot separate (§V-H); (5) K=3 mixture demoted from "three mechanism clusters" to a descriptive firm-compositional partition; (6) three feature-derived scores converging on the per-CPA descriptor-position ranking at Spearman $\rho \geq 0.879$, reported as internal consistency rather than external validation; (7) $0\%$ positive-anchor miss rate on 262 byte-identical Big-4 signatures with the conservative-subset caveat; and (8) explicit disclosure of each diagnostic's untested assumption (Appendix A Table A.II), positioning the system as an anchor-calibrated screening framework with human-in-the-loop review rather than as a validated forensic detector.
Future work falls in four directions. *First*, a small-scale human-rated labelled set would enable direct ROC optimisation and provide the signature-level ground truth that the present analysis fundamentally lacks; without such ground truth, no true error rates can be reported. *Second*, the within-firm collision concentration documented in §III-L.4 (any-pair $76.7$$98.8\%$ across Big-4; same-pair joint $97.0$$99.96\%$) invites a separate study to distinguish deliberate template sharing from passive firm-level production artefacts (shared scanners, common form templates, identical report-generation infrastructure) — a question the inter-CPA-anchor analysis alone cannot resolve. *Third*, the descriptive Firm A versus Firms B/C/D contrast (per-document HC$+$MC alarm $0.62$ vs $0.09$$0.16$) — together with the byte-level evidence of 145 pixel-identical signatures across $\sim 50$ distinct Firm A partners — invites a companion analysis examining whether such firm-level signing patterns correlate with established audit-quality measures. *Fourth*, generalisation to mid- and small-firm contexts requires extending the anchor-based ICCR framework to scopes where firm-level LOOO folds are not available; the §III-I.4 composition diagnostics already document that the absence of within-population bimodality holds across the tested eligible scopes, so the calibration approach in principle generalises, but a full extension with cluster-robust uncertainty quantification is left as future work.