diff --git a/paper/v4/paper_a_prose_v4_phase4.md b/paper/v4/paper_a_prose_v4_phase4.md index a7069bf..79e1711 100644 --- a/paper/v4/paper_a_prose_v4_phase4.md +++ b/paper/v4/paper_a_prose_v4_phase4.md @@ -8,7 +8,7 @@ > *IEEE Access target: <= 250 words, single paragraph.* -Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes reusing a stored signature image across reports — through administrative stamping or firm-level electronic signing — technically trivial and visually invisible, undermining individualized attestation. We build an end-to-end pipeline detecting such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, a YOLOv11 detector localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. The operational output is an inherited five-way per-signature classifier with worst-case document-level aggregation. Applied to 90,282 audit reports filed in Taiwan over 2013–2023, the pipeline yields 182,328 signatures from 758 CPAs. Our primary statistical analysis is scoped to the Big-4 sub-corpus (437 CPAs, 150,442 signatures), where the accountant-level $(\overline{\text{cos}}, \overline{\text{dHash}})$ distribution exhibits dip-test multimodality ($p < 5 \times 10^{-4}$, both axes) — the smallest scope tested at which this holds. A 3-component mixture recovers a stable cross-firm hand-leaning component (weight 0.143; leave-one-firm-out drift $\leq 0.005$ cos / 0.96 dh / 0.023 weight) alongside a Firm-A-concentrated templated component. Three feature-derived scores — mixture posterior, reverse-anchor cosine percentile against a non-Big-4 reference, and the inherited box-rule hand-leaning rate; not statistically independent — agree on the per-CPA ranking at Spearman $\rho \geq 0.879$. Against 262 byte-identical Big-4 signatures, all three achieve 0% positive-anchor miss rate (Wilson 95% upper bound 1.45%). A light full-dataset cross-scope check ($n = 686$) preserves the K=3 + box-rule Spearman convergence with drift 0.007. +Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes reusing a stored signature image across reports — through administrative stamping or firm-level electronic signing — technically trivial and visually invisible, undermining individualized attestation. We build an end-to-end pipeline detecting such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, YOLOv11 localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. The operational output is an inherited five-way per-signature classifier with worst-case document-level aggregation. Applied to 90,282 Taiwan audit reports (2013–2023), the pipeline yields 182,328 signatures from 758 CPAs. Our primary analysis is scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures), where the accountant-level $(\overline{\text{cos}}, \overline{\text{dHash}})$ distribution exhibits dip-test multimodality ($p < 5 \times 10^{-4}$, both axes) — the smallest scope tested at which this holds. A 3-component mixture recovers a stable cross-firm hand-leaning component (weight 0.143; leave-one-firm-out drift $\leq 0.005$ cos / 0.96 dh / 0.023 weight) alongside a Firm-A-concentrated templated component. Three feature-derived scores — mixture posterior, reverse-anchor cosine percentile against a non-Big-4 reference, and the inherited box-rule hand-leaning rate; not statistically independent — agree on the per-CPA ranking at Spearman $\rho \geq 0.879$. Against 262 byte-identical Big-4 signatures, all three achieve 0% positive-anchor miss rate (Wilson upper bound 1.45%). A light full-dataset check ($n = 686$) preserves the K=3 + box-rule Spearman convergence (drift 0.007). ---