Phase 6 manuscript splice (1/2): Abstract / §I / §II / §III spliced

Splices v4 drafts into v3.20.0 master sub-files. Drops the "paper/v4/" working drafts and lands the v4.0 content in the master file structure. Internal draft notes / close-out checklists / open- questions blocks stripped at splice (per round-1 through round-6 deferral). Abstract (paper_a_abstract_v3.md): - Replaced v3.20.0 abstract (240w) with v4.0 abstract (247w). §I Introduction (paper_a_introduction_v3.md): - Replaced v3.20.0 §I with v4.0 §I (16 paragraphs + 8-item contributions list). §II Related Work (paper_a_related_work_v3.md): - Inserted v4.0 LOOO addition paragraph after the existing finite-mixture paragraph; added refs [42]-[44] to the internal reference annotation list. §III Methodology (paper_a_methodology_v3.md): - §III-A..F (Pipeline / Data / Page ID / Detection / Features / Dual Descriptors): kept v3.20.0 content unchanged. - §III-G..M: replaced v3.20.0 §III-G..K with v4.0 §III-G..M (Unit & Scope / Reference Populations / Distributional Diagnostics + composition decomposition / K=3 descriptive / Convergent internal-consistency / Anchor-based ICCR L.0-L.7 / Validation strategy + Table XXVII ten-tool collection). - §III-N Data Source & Anonymization: kept v3.20.0 §III-L content, renumbered to §III-N (after v4 §III-M). - §III-E ablation cross-reference: updated "§IV-I" -> "§IV-L" to match the renumbered §IV. - §III-F pixel-identity cross-reference: updated "§III-J" -> "§III-K". Gemini round-2 artifact paper/gemini_review_v4_round2.md also added (was uncommitted from the parallel-review batch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:35:53 +08:00
parent 8dddc3b87c
commit c79329457a
5 changed files with 357 additions and 195 deletions
@@ -2,85 +2,44 @@

 <!-- Target: ~1.5 pages double-column IEEE format. Double-blind: no author/institution info. -->

-Financial audit reports serve as a critical mechanism for ensuring corporate accountability and investor protection.
-In Taiwan, the Certified Public Accountant Act (會計師法 §4) and the Financial Supervisory Commission's attestation regulations (查核簽證核准準則 §6) require that certifying CPAs affix their signature or seal (簽名或蓋章) to each audit report [1].
-While the law permits either a handwritten signature or a seal, the CPA's attestation on each report is intended to represent a deliberate, individual act of professional endorsement for that specific audit engagement [2].
+Financial audit reports serve as a critical mechanism for ensuring corporate accountability and investor protection. In Taiwan, the Certified Public Accountant Act (會計師法 §4) and the Financial Supervisory Commission's attestation regulations (查核簽證核准準則 §6) require certifying CPAs to affix their signature or seal (簽名或蓋章) to each audit report [1]. While the law permits either a handwritten signature or a seal, the CPA's attestation on each report is intended to represent a deliberate, individual act of professional endorsement for that specific audit engagement [2].

-The digitization of financial reporting has introduced a practice that complicates this intent.
-As audit reports are now routinely generated, transmitted, and archived as PDF documents, it is technically and operationally straightforward to reproduce a CPA's stored signature image across many reports rather than re-executing the signing act for each one.
-This reproduction can occur either through an administrative stamping workflow---in which scanned signature images are affixed by staff as part of the report-assembly process---or through a firm-level electronic signing system that automates the same step.
-From the perspective of the output image the two workflows are equivalent: both can reproduce one or more stored signature images, producing same-CPA signatures that are identical or near-identical up to reproduction, scanning, compression, and template-variant noise.
-We refer to signatures produced by either workflow collectively as *non-hand-signed*.
-Although this practice may fall within the literal statutory requirement of "signature or seal," it raises substantive concerns about audit quality, as an identically reproduced signature applied across hundreds of reports may not represent meaningful individual attestation for each engagement.
-The accounting literature has long examined the audit-quality consequences of partner-level engagement transparency: studies of partner-signature mandates in the United Kingdom find measurable downstream effects [31], cross-jurisdictional evidence on individual partner signature requirements highlights similar quality channels [32], and Taiwan-specific evidence on mandatory partner rotation documents how individual-partner identification interacts with audit-quality outcomes [33].
-Unlike traditional signature forgery, where a third party attempts to imitate another person's handwriting, non-hand-signing involves the legitimate signer's own stored signature being reused.
-This practice, while potentially widespread, is visually invisible to report users and virtually undetectable through manual inspection at scale: regulatory agencies overseeing thousands of publicly listed companies cannot feasibly examine each signature for evidence of image reproduction.
+The digitization of financial reporting has introduced a practice that complicates this intent. As audit reports are now routinely generated, transmitted, and archived as PDF documents, it is technically and operationally straightforward to reproduce a CPA's stored signature image across many reports rather than re-executing the signing act for each one. This reproduction can occur either through an administrative stamping workflow — in which scanned signature images are affixed by staff as part of the report-assembly process — or through a firm-level electronic signing system that automates the same step. We refer to signatures produced by either workflow collectively as *non-hand-signed*. Although this practice may fall within the literal statutory requirement of "signature or seal," it raises substantive concerns about audit quality, as an identically reproduced signature applied across hundreds of reports may not represent meaningful individual attestation for each engagement. The accounting literature has examined the audit-quality consequences of partner-level engagement transparency: studies of partner-signature mandates in the United Kingdom find measurable downstream effects [31], cross-jurisdictional evidence on individual partner signature requirements highlights similar quality channels [32], and Taiwan-specific evidence on mandatory partner rotation documents how individual-partner identification interacts with audit-quality outcomes [33]. Unlike traditional signature forgery, where a third party attempts to imitate another person's handwriting, non-hand-signing involves the legitimate signer's own stored signature being reused, and is visually invisible to report users at scale.

-The distinction between *non-hand-signing detection* and *signature forgery detection* is both conceptually and technically important.
-The extensive body of research on offline signature verification [3]--[8] has focused almost exclusively on forgery detection---determining whether a questioned signature was produced by its purported author or by an impostor.
-This framing presupposes that the central threat is identity fraud.
-In our context, identity is not in question; the CPA is indeed the legitimate signer.
-The question is whether the physical act of signing occurred for each individual report, or whether a single signing event was reproduced as an image across many reports.
-This detection problem differs fundamentally from forgery detection: while it does not require modeling skilled-forger variability, it introduces the distinct challenge of separating legitimate intra-signer consistency from image-level reproduction, requiring an analytical framework focused on detecting abnormally high similarity across documents.
+The distinction between *non-hand-signing detection* and *signature forgery detection* is conceptually and technically important. The extensive body of research on offline signature verification [3]–[8] focuses almost exclusively on forgery detection — determining whether a questioned signature was produced by its purported author. In our context, identity is not in question; the CPA is indeed the legitimate signer. The question is whether the physical act of signing occurred for each individual report, or whether a single signing event was reproduced as an image across many reports. This detection problem differs fundamentally from forgery detection: while it does not require modeling skilled-forger variability, it introduces the distinct challenge of separating legitimate intra-signer consistency from image-level reproduction.

-A secondary methodological concern shapes the research design.
-Many prior similarity-based classification studies rely on ad-hoc thresholds---declaring two images equivalent above a hand-picked cosine cutoff, for example---without principled statistical justification.
-Such thresholds are fragile in an archival-data setting where the cost of misclassification propagates into downstream inference.
-A defensible approach requires (i) a transparent threshold anchored to an empirical reference population drawn from the target corpus; (ii) statistical diagnostics that characterise the *shape* of the underlying similarity distribution and so motivate the choice of anchor; and (iii) external validation against naturally-occurring anchor populations---byte-level identical pairs as a conservative gold positive subset and large random inter-CPA pairs as a gold negative population---reported with Wilson 95% confidence intervals on per-rule capture / FAR rates, since precision and $F_1$ are not meaningful when the positive and negative anchor populations are sampled from different units.
+A methodological concern shapes the research design. Many prior similarity-based classification studies rely on ad-hoc thresholds — declaring two images equivalent above a hand-picked cosine cutoff, for example — without principled statistical justification. Such thresholds are fragile in an archival-data setting. A defensible approach requires (i) explicit calibration of the operational thresholds against measurable negative-anchor evidence; (ii) diagnostic procedures that test whether the descriptor distribution itself supports a within-population threshold, including formal decomposition of apparent multimodality into between-group composition and integer-tie artefacts; (iii) annotation-free reporting of operational alarm rates at multiple analysis units (per-comparison, per-signature pool, per-document) with Wilson 95% confidence intervals; (iv) per-firm stratification of the reported rates to surface heterogeneity that aggregate metrics conceal; and (v) explicit disclosure of the unsupervised setting's limits — in particular, the inability to estimate true error rates without signature-level ground-truth labels.

-Despite the significance of the problem for audit quality and regulatory oversight, no prior work has specifically addressed non-hand-signing detection in financial audit documents at scale with these methodological safeguards.
-Woodruff et al. [9] developed an automated pipeline for signature analysis in corporate filings for anti-money-laundering investigations, but their work focused on author clustering (grouping signatures by signer identity) rather than detecting reuse of a stored image.
-Copy-move forgery detection methods [10], [11] address duplicated regions within or across images but are designed for natural images and do not account for the specific characteristics of scanned document signatures, where legitimate visual similarity between a signer's authentic signatures is expected and must be distinguished from image reproduction.
-Research on near-duplicate image detection using perceptual hashing combined with deep learning [12], [13] provides relevant methodological foundations but has not been applied to document forensics or signature analysis.
-From the statistical side, the methods we adopt for distributional characterisation---the Hartigan dip test [37] and finite mixture modelling via the EM algorithm [40], [41], complemented by a Burgstahler-Dichev / McCrary density-smoothness diagnostic [38], [39]---have been developed in statistics and accounting-econometrics but have not, to our knowledge, been combined as a joint diagnostic toolkit for document-forensics threshold selection.
+Despite the significance of the problem for audit quality and regulatory oversight, no prior work has specifically addressed non-hand-signing detection in financial audit documents at scale with these methodological safeguards. Woodruff et al. [9] developed an automated pipeline for signature analysis in corporate filings for anti-money-laundering investigations, but their work focused on author clustering rather than detecting image reuse. Copy-move forgery detection methods [10], [11] address duplicated regions within or across images but are designed for natural images and do not account for the specific characteristics of scanned document signatures. Research on near-duplicate image detection using perceptual hashing combined with deep learning [12], [13] provides relevant methodological foundations but has not been applied to document forensics or signature analysis. From the statistical side, the methods we adopt for distributional characterisation — the Hartigan dip test [37] and finite mixture modelling via the EM algorithm [40], [41], complemented by a Burgstahler-Dichev / McCrary density-smoothness diagnostic [38], [39] — have been developed in statistics and accounting-econometrics but have not been combined as a joint diagnostic toolkit for document-forensics threshold characterisation.

-In this paper, we present a fully automated, end-to-end pipeline for detecting non-hand-signed CPA signatures in audit reports at scale.
-Our approach processes raw PDF documents through the following stages:
-(1) signature page identification using a Vision-Language Model (VLM);
-(2) signature region detection using a trained YOLOv11 object detector;
-(3) deep feature extraction via a pre-trained ResNet-50 convolutional neural network;
-(4) dual-descriptor similarity computation combining cosine similarity on deep embeddings with difference hash (dHash) distance;
-(5) signature-level distributional characterisation using two threshold estimators---KDE antimode with a Hartigan unimodality test and finite Beta mixture via EM with a logit-Gaussian robustness check---complemented by a Burgstahler-Dichev / McCrary density-smoothness diagnostic, used to read the structure of the per-signature similarity distribution and to motivate a percentile-based operational anchor rather than a mixture-fit crossing; and
-(6) validation against a pixel-identical anchor, a low-similarity anchor, and a replication-dominated Big-4 calibration firm.
+In this paper we present a fully automated, end-to-end pipeline for detecting non-hand-signed CPA signatures in audit reports at scale, together with a multi-tool validation framework that explicitly discloses the unsupervised setting's limits. The pipeline processes raw PDF documents through (1) signature page identification with a Vision-Language Model; (2) signature region detection with a trained YOLOv11 object detector; (3) deep feature extraction via a pre-trained ResNet-50; (4) dual-descriptor similarity (cosine + independent-minimum dHash); (5) anchor-based threshold calibration at three units of analysis (per-comparison, pool-normalised per-signature, per-document) against an inter-CPA negative-anchor coincidence-rate proxy (§III-L); (6) firm-stratified per-rule reporting and a within-firm cross-CPA hit-matrix analysis (§III-L.4); (7) a composition decomposition that establishes the absence of a within-population bimodal antimode in the descriptor distributions (§III-I.4); and (8) a multi-tool unsupervised validation strategy with disclosed assumption-violation analysis (§III-M).

-The dual-descriptor verification is central to our contribution.
-Cosine similarity of deep feature embeddings captures high-level visual style similarity---it can identify signatures that share similar stroke patterns and spatial layouts---but cannot distinguish between a CPA who signs consistently and one whose signature is reproduced from a stored image.
-Perceptual hashing (specifically, difference hashing) encodes structural-level image gradients into compact binary fingerprints that are robust to scan noise but sensitive to substantive content differences.
-By requiring convergent evidence from both descriptors, we can differentiate *style consistency* (high cosine but divergent dHash) from *image reproduction* (high cosine with low dHash), resolving an ambiguity that neither descriptor can address alone.
+The methodological reframing relative to earlier versions of this work is central to our v4.0 contribution. Earlier work in this lineage adopted a distributional path to thresholds — fitting accountant-level finite-mixture models and treating their marginal crossings as data-derived "natural" thresholds. v4.0 reports a composition decomposition diagnostic (§III-I.4) that overturns this reading: the apparent multimodality of the Big-4 accountant-level distribution is fully explained by between-firm location-shift effects (Firm A's mean dHash of $2.73$ versus Firms B/C/D's $6.46$, $7.39$, $7.21$) and integer mass-point artefacts on the integer-valued dHash axis. Once both confounds are removed (firm-mean centring plus uniform integer jitter), the Big-4 pooled dHash dip test yields $p_{\text{median}} = 0.35$ across five jitter seeds, eliminating the rejection. Within-firm signature-level cosine dip tests fail to reject in every individual Big-4 firm and in every individual mid/small firm with $\geq 500$ signatures (10 firms tested in Script 39c), and the corresponding within-firm jittered-dHash dip tests likewise fail to reject in all four Big-4 firms (Script 39d) and across a codex-verified read-only spike on the same ten mid/small firms ($0/10$ reject; §III-I.4). The descriptor distributions therefore contain no within-population bimodal antimode that could anchor an operational threshold.

-A second distinctive feature is our framing of the calibration reference.
-One major Big-4 accounting firm in Taiwan (hereafter "Firm A") was selected as a candidate calibration reference based on practitioner-knowledge motivation; its benchmark status is then evaluated using the image evidence reported in this paper, not asserted by the practitioner-knowledge motivation itself.
-We therefore treat Firm A as a *replication-dominated* calibration reference rather than a pure positive class.
-This framing is important because the statistical signature of a replication-dominated population is visible in our data: Firm A's per-signature cosine distribution is unimodal with a long left tail (Hartigan dip $p = 0.17$), 92.5% of Firm A signatures exceed cosine 0.95 with the remaining 7.5% forming the left tail, and 145 Firm A signatures across 50 distinct partners are byte-identical to a same-CPA match in a different audit report (35 spanning different fiscal years).
-Adopting the replication-dominated framing---rather than a near-universal framing that would have to absorb the 7.5% residual as noise---ensures internal coherence between the byte-level pixel-identity evidence and the signature-level distributional shape.
+In place of distributional anchoring, v4.0 adopts an anchor-based inter-CPA coincidence-rate (ICCR) calibration. At the per-comparison unit, the inherited cos$>0.95$ operating point yields ICCR $= 0.00060$ on a $5 \times 10^5$-pair Big-4 sample (replicating v3.x's reported per-comparison rate of $0.0005$ under prior "FAR" terminology); the dHash$\leq 5$ structural cutoff yields ICCR $= 0.00129$ (v4 new); the joint rule cos$>0.95$ AND dHash$\leq 5$ yields joint ICCR $= 0.00014$ (any-pair semantics, matching the deployed extrema rule). At the pool-normalised per-signature unit, the same rule's effective coincidence rate is materially higher because the deployed classifier takes max-cosine and min-dHash over a same-CPA pool: pooled Big-4 any-pair ICCR is $0.1102$ (Wilson 95% CI $[0.1086, 0.1118]$; CPA-block bootstrap 95% $[0.0908, 0.1330]$). At the per-document unit, the operational HC$+$MC alarm fires on $33.75\%$ of Big-4 documents under the inter-CPA candidate-pool counterfactual.

-A third distinctive feature is the empirical reading we take from the per-signature distributional analysis.
-Three diagnostics applied to the per-signature similarity distribution---the Hartigan dip test, an EM-fitted Beta mixture (with logit-Gaussian robustness check), and the Burgstahler-Dichev / McCrary density-smoothness procedure---jointly indicate that no two-mechanism mixture cleanly explains per-signature similarity: the dip test fails to reject unimodality for Firm A, BIC strongly prefers a 3-component over a 2-component Beta fit, and the BD/McCrary candidate transition lies *inside* the non-hand-signed mode rather than between modes (and is not bin-width-stable; Appendix A).
-The substantive reading is that *pixel-level output quality* is a continuous spectrum shaped by firm-specific reproduction technologies (administrative stamping in early years, firm-level e-signing later) and scan conditions, rather than a discrete class cleanly separated from hand-signing.
-This reading motivates anchoring the operational classifier on a percentile heuristic over the Firm A reference distribution rather than on a mixture-fit crossing, and it motivates the byte-level pixel-identity anchor (Section IV-F.1) as a threshold-free positive reference that does not depend on resolving signature-level mixture structure.
+The pooled per-signature and per-document rates conceal striking firm heterogeneity. A logistic regression of the per-signature hit indicator on firm dummies (Firm A reference) and centred log pool size yields odds ratios of $0.053$ (Firm B), $0.010$ (Firm C), and $0.027$ (Firm D) — Firms B/C/D are an order of magnitude below Firm A even after controlling for the pool-size confound (Script 44). Cross-firm hit matrix analysis under the deployed any-pair rule shows within-firm collision concentrations of $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D (Table XXV; the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms). The pattern is consistent with firm-specific template, stamp, or document-production reuse mechanisms — though not by itself diagnostic of deliberate sharing. We retain the inherited Paper A v3.x five-way box rule as the operational classifier; v4.0's contribution is to characterise its multi-level coincidence behaviour against the inter-CPA negative anchor rather than to derive new thresholds.

-We apply this pipeline to 90,282 audit reports filed by publicly listed companies in Taiwan between 2013 and 2023, extracting and analyzing 182,328 individual CPA signatures from 758 unique accountants.
-To our knowledge, this represents the largest-scale forensic analysis of signature authenticity in financial documents reported in the literature.
+Three feature-derived scores converge on the per-CPA descriptor-position ranking with Spearman $\rho \geq 0.879$ (Script 38): the K=3 mixture posterior (now interpreted as a firm-compositional position score, not a mechanism cluster posterior; §III-J), a reverse-anchor cosine percentile relative to a strictly-out-of-target non-Big-4 reference, and the inherited box-rule less-replication-dominated rate. The three scores are deterministic functions of the same per-CPA descriptor pair, so the convergence is documented as internal consistency among feature-derived ranks rather than external validation. Hard ground truth for the *replicated* class is provided by 262 byte-identical signatures in the Big-4 subset (Firm A 145, Firm B 8, Firm C 107, Firm D 2), against which all three candidate checks achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). For the box rule this result is close to tautological at byte-identity; we discuss the conservative-subset caveat in §V-G.

-The contributions of this paper are summarized as follows:
+We apply this pipeline to 90,282 audit reports filed by publicly listed companies in Taiwan between 2013 and 2023, extracting and analyzing 182,328 individual CPA signatures from 758 unique accountants. The Big-4 sub-corpus comprises 437 CPAs and 150,442 signatures with both descriptors available.

-1. **Problem formulation.** We formally define non-hand-signing detection as distinct from signature forgery detection and argue that it requires an analytical framework focused on intra-signer similarity distributions rather than genuine-versus-forged classification.
+The contributions of this paper are:

-2. **End-to-end pipeline.** We present a pipeline that processes raw PDF audit reports through VLM-based page identification, YOLO-based signature detection, deep feature extraction, and dual-descriptor similarity computation, with automated inference requiring no manual intervention after initial training and annotation.
+1. **Problem formulation.** We define non-hand-signing detection as distinct from signature forgery detection and frame it as a detection problem on intra-signer similarity distributions.

-3. **Dual-descriptor verification.** We demonstrate that combining deep-feature cosine similarity with perceptual hashing resolves the fundamental ambiguity between style consistency and image reproduction, and we validate the backbone choice through an ablation study comparing three feature-extraction architectures.
+2. **End-to-end pipeline.** We present a pipeline that processes raw PDF audit reports through VLM-based page identification, YOLO-based signature detection, ResNet-50 feature extraction, and dual-descriptor similarity computation, with automated inference and no manual intervention after initial training.

-4. **Percentile-anchored operational threshold.** We anchor the operational classifier's cosine cut on the whole-sample Firm A P7.5 percentile (cos $> 0.95$), a transparent and reproducible reference drawn from a replication-dominated reference population, and complement it with dHash structural cuts derived from the same reference distribution. Operational thresholds are therefore explained by an empirical reference rather than asserted.
+3. **Dual-descriptor verification.** We demonstrate that combining deep-feature cosine similarity with independent-minimum dHash resolves the ambiguity between *style consistency* and *image reproduction*, and we validate the backbone choice through a feature-backbone ablation.

-5. **Distributional characterisation of per-signature similarity.** We apply three statistical diagnostics---a Hartigan dip test, an EM-fitted Beta mixture with logit-Gaussian robustness check, and a Burgstahler-Dichev / McCrary density-smoothness procedure---to characterise the shape of the per-signature similarity distribution. The three diagnostics jointly find that per-signature similarity forms a continuous quality spectrum, which both motivates the percentile-based operational anchor over a mixture-fit crossing and is itself a substantive finding for the document-forensics literature on similarity-threshold selection.
+4. **Composition decomposition disproves the distributional-threshold path.** We show via a 2×2 factorial diagnostic (firm-mean centring × integer-tie jitter) that the apparent multimodality of the Big-4 accountant-level descriptor distribution is fully attributable to between-firm location shifts and integer mass-point artefacts. The descriptor distributions contain no within-population bimodal antimode; "natural threshold" language in this lineage's prior work is not empirically supported.

-6. **Replication-dominated calibration methodology.** We introduce a calibration strategy using a replication-dominated reference group, distinguishing *replication-dominated* from *replication-pure* anchors; and we validate classification using byte-level pixel identity as an annotation-free gold positive, requiring no manual labeling.
+5. **Anchor-based multi-level inter-CPA coincidence-rate calibration.** We characterise the deployed five-way classifier at three units of analysis: per-comparison ICCR (cos$>0.95$: $0.0006$; dHash$\leq 5$: $0.0013$; joint: $0.00014$), pool-normalised per-signature ICCR ($0.11$ for the deployed any-pair high-confidence rule), and per-document ICCR ($0.34$ for the operational HC$+$MC alarm). We adopt "inter-CPA coincidence rate" as the metric name throughout and reserve "False Acceptance Rate" for terminology that requires ground-truth negative labels, which the corpus does not provide.

-7. **Large-scale empirical analysis.** We report findings from the analysis of over 90,000 audit reports spanning a decade, providing the first large-scale empirical evidence on non-hand-signing practices in financial reporting under a methodology designed for peer-review defensibility.
+6. **Firm heterogeneity quantification and within-firm cross-CPA collision concentration.** Per-firm rates differ by an order of magnitude after pool-size adjustment (Firm A's per-document HC$+$MC alarm at $0.62$ versus Firms B/C/D at $0.09$–$0.16$). Cross-firm hit matrix analysis shows within-firm collision concentrations of $98.8\%$ at Firm A and $76.7$–$83.7\%$ at Firms B/C/D under the deployed any-pair rule (the stricter same-pair joint event saturates at $97.0$–$99.96\%$ within-firm across all four firms); the pattern is consistent with firm-specific template, stamp, or document-production reuse mechanisms — a descriptive finding about deployed-rule behaviour, not a claim of deliberate template sharing.

-The remainder of this paper is organized as follows.
-Section II reviews related work on signature verification, document forensics, perceptual hashing, and the statistical methods we adopt for distributional characterisation.
-Section III describes the proposed methodology.
-Section IV presents experimental results including the signature-level distributional characterisation, pixel-identity validation, and backbone ablation study.
-Section V discusses the implications and limitations of our findings.
-Section VI concludes with directions for future work.
+7. **K=3 as descriptive firm-compositional partition; three-score convergent internal consistency.** We fit a K=3 Gaussian mixture as a descriptive partition of the Big-4 accountant-level distribution (no longer interpreted as three mechanism clusters). Three feature-derived scores agree on the per-CPA descriptor-position ranking at Spearman $\rho \geq 0.879$; we report this as internal consistency rather than external validation, given that the scores share the underlying descriptor pair.
+
+8. **Annotation-free positive-anchor validation and unsupervised validation ceiling.** We achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$) on 262 byte-identical Big-4 signatures, with the conservative-subset caveat that byte-identical pairs are by construction near cos$=1$ and dHash$=0$. We frame the overall validation strategy as a multi-tool collection of ten partial-evidence diagnostics (§III-M Table XXVII), each with an explicitly disclosed untested assumption; their conjunction constitutes the unsupervised validation ceiling achievable on this corpus. We do not claim a validated forensic detector; we position the system as a specificity-proxy-anchored screening framework with human-in-the-loop review.
+
+The remainder of the paper is organized as follows. Section II reviews related work on signature verification, document forensics, perceptual hashing, and the statistical methods used. Section III describes the proposed methodology. Section IV presents the experimental results — distributional characterisation, mixture fits, convergent internal-consistency checks, leave-one-firm-out reproducibility, pixel-identity validation, and full-dataset robustness. Section V discusses the implications and limitations. Section VI concludes with directions for future work.