# Paper A v4.0 Phase 4 Prose Draft v1 > **Draft note (2026-05-12, Phase 4 v1; internal — remove before submission).** This file replaces the v3.20.0 Abstract, §I Introduction, §II Related Work, §V Discussion, and §VI Conclusion blocks with the Big-4 reframed v4.0 prose. The methodology and results sections (§III v6 and §IV v3.2 on this branch, committed 2026-05-12) are the technical foundation; Phase 4 prose aligns the narrative with the convergent post-codex-rounds 21–25 framing. Inherited corpus-wide setup material (§III-A through §III-F; §IV-A through §IV-C; §IV-I; §IV-L) is referenced rather than restated. Empirical anchors cite Scripts 32–42 on branch `paper-a-v4-big4`. --- # Abstract > *IEEE Access target: <= 250 words, single paragraph.* Regulations require Certified Public Accountants (CPAs) to attest to each audit report with a signature, but digitization makes reusing a stored signature image across reports — through administrative stamping or firm-level electronic signing — technically trivial and visually invisible to report users, undermining individualized attestation. We build an end-to-end pipeline that detects such *non-hand-signed* signatures at scale: a Vision-Language Model identifies signature pages, a YOLOv11 detector localizes signature regions, ResNet-50 supplies deep features, and a dual-descriptor verification layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate *style consistency* from *image reproduction*. Applied to 90,282 audit reports filed in Taiwan over 2013–2023, the pipeline yields 182,328 signatures from 758 CPAs. Our primary statistical analysis is scoped to the Big-4 sub-corpus (437 CPAs, 150,442 signatures with both descriptors), where the accountant-level $(\overline{\text{cos}}, \overline{\text{dHash}})$ distribution exhibits dip-test multimodality on both axes ($p < 5 \times 10^{-4}$). A 3-component Gaussian mixture recovers a stable cross-firm hand-leaning component (weight 0.143; LOOO drift $\leq 0.005$ cos / 0.96 dh / 0.023 weight) alongside a templated component dominated by Firm A. Three independent feature-derived scores — mixture posterior, reverse-anchor cosine percentile against a non-Big-4 reference, and a published box-rule hand-leaning rate — converge on the per-CPA ranking at Spearman $\rho \geq 0.879$. Against 262 byte-identical signatures (conservative-subset ground truth for replication), all three candidate classifiers achieve 0% positive-anchor miss rate (Wilson 95% upper bound 1.45%). A full-dataset robustness check ($n = 686$ CPAs) preserves the Spearman convergence with rho drift 0.007. --- # I. Introduction > *Target: ~1.5 pages double-column IEEE format. Double-blind: no author/institution info.* Financial audit reports serve as a critical mechanism for ensuring corporate accountability and investor protection. In Taiwan, the Certified Public Accountant Act (會計師法 §4) and the Financial Supervisory Commission's attestation regulations (查核簽證核准準則 §6) require certifying CPAs to affix their signature or seal (簽名或蓋章) to each audit report [1]. While the law permits either a handwritten signature or a seal, the CPA's attestation on each report is intended to represent a deliberate, individual act of professional endorsement for that specific audit engagement [2]. The digitization of financial reporting has introduced a practice that complicates this intent. As audit reports are now routinely generated, transmitted, and archived as PDF documents, it is technically and operationally straightforward to reproduce a CPA's stored signature image across many reports rather than re-executing the signing act for each one. This reproduction can occur either through an administrative stamping workflow — in which scanned signature images are affixed by staff as part of the report-assembly process — or through a firm-level electronic signing system that automates the same step. We refer to signatures produced by either workflow collectively as *non-hand-signed*. Although this practice may fall within the literal statutory requirement of "signature or seal," it raises substantive concerns about audit quality, as an identically reproduced signature applied across hundreds of reports may not represent meaningful individual attestation for each engagement. The accounting literature has examined the audit-quality consequences of partner-level engagement transparency: studies of partner-signature mandates in the United Kingdom find measurable downstream effects [31], cross-jurisdictional evidence on individual partner signature requirements highlights similar quality channels [32], and Taiwan-specific evidence on mandatory partner rotation documents how individual-partner identification interacts with audit-quality outcomes [33]. Unlike traditional signature forgery, where a third party attempts to imitate another person's handwriting, non-hand-signing involves the legitimate signer's own stored signature being reused, and is visually invisible to report users at scale. The distinction between *non-hand-signing detection* and *signature forgery detection* is conceptually and technically important. The extensive body of research on offline signature verification [3]–[8] focuses almost exclusively on forgery detection — determining whether a questioned signature was produced by its purported author. In our context, identity is not in question; the CPA is indeed the legitimate signer. The question is whether the physical act of signing occurred for each individual report, or whether a single signing event was reproduced as an image across many reports. This detection problem differs fundamentally from forgery detection: while it does not require modeling skilled-forger variability, it introduces the distinct challenge of separating legitimate intra-signer consistency from image-level reproduction. A methodological concern shapes the research design. Many prior similarity-based classification studies rely on ad-hoc thresholds — declaring two images equivalent above a hand-picked cosine cutoff, for example — without principled statistical justification. Such thresholds are fragile in an archival-data setting. A defensible approach requires (i) a transparent threshold whose calibration scope is statistically supported; (ii) statistical diagnostics that characterise the *shape* of the underlying similarity distribution at the chosen scope; (iii) external validation against naturally-occurring anchor populations, with Wilson 95% confidence intervals on per-rule capture and miss rates; and (iv) cross-validation that demonstrates the calibration reproduces rather than collapses when the training composition changes. Despite the significance of the problem for audit quality and regulatory oversight, no prior work has specifically addressed non-hand-signing detection in financial audit documents at scale with these methodological safeguards. Woodruff et al. [9] developed an automated pipeline for signature analysis in corporate filings for anti-money-laundering investigations, but their work focused on author clustering rather than detecting image reuse. Copy-move forgery detection methods [10], [11] address duplicated regions within or across images but are designed for natural images and do not account for the specific characteristics of scanned document signatures. Research on near-duplicate image detection using perceptual hashing combined with deep learning [12], [13] provides relevant methodological foundations but has not been applied to document forensics or signature analysis. From the statistical side, the methods we adopt for distributional characterisation — the Hartigan dip test [37] and finite mixture modelling via the EM algorithm [40], [41], complemented by a Burgstahler-Dichev / McCrary density-smoothness diagnostic [38], [39] — have been developed in statistics and accounting-econometrics but have not been combined as a joint diagnostic toolkit for document-forensics threshold characterisation. In this paper we present a fully automated, end-to-end pipeline for detecting non-hand-signed CPA signatures in audit reports at scale. The pipeline processes raw PDF documents through (1) signature page identification with a Vision-Language Model; (2) signature region detection with a trained YOLOv11 object detector; (3) deep feature extraction via a pre-trained ResNet-50; (4) dual-descriptor similarity (cosine + independent-minimum dHash); (5) accountant-level distributional characterisation at the Big-4 sub-corpus scope (§III-D, §IV-D / §IV-E); (6) convergent internal-consistency analysis using three feature-derived scores (§III-K, §IV-F); (7) leave-one-firm-out reproducibility (§III-J, §IV-G); and (8) annotation-free validation against a pixel-identity positive anchor, an inter-CPA negative anchor, and a full-dataset robustness re-run (§IV-H, §IV-I, §IV-K). The methodological reframing relative to earlier versions of this work is central to our v4.0 contribution. The accountant-level $(\overline{\text{cos}}, \overline{\text{dHash}})$ distribution at the Big-4 sub-corpus scope is the smallest tested scope at which the Hartigan dip test rejects unimodality on both axes ($p < 5 \times 10^{-4}$, $n = 437$); neither any single firm pooled alone nor the broader full-dataset variant rejects unimodality at conventional levels. A 2-component Gaussian mixture fit to this distribution recovers marginal crossings $\overline{\text{cos}}^* = 0.9755$ and $\overline{\text{dHash}}^* = 3.755$, with tight bootstrap 95% confidence intervals (cosine half-width $0.0015$). However, leave-one-firm-out cross-validation reveals that the K=2 boundary is essentially a Firm-A-vs-others separator: the maximum across-fold cosine-crossing deviation is $0.028$, $5.6\times$ the report's stability tolerance, and holding Firm A out flips the held-out classification rate from 0% to 100% replicated. We therefore do not use the K=2 fit as the operational classifier. A 3-component fit (BIC lower by $3.48$) instead identifies three accountant-level components: a templated cluster (cos $\approx 0.983$, dHash $\approx 2.41$, weight $0.321$), a mixed cluster (cos $\approx 0.956$, dHash $\approx 6.66$, weight $0.536$), and a cross-firm hand-leaning cluster (cos $\approx 0.946$, dHash $\approx 9.17$, weight $0.143$). The hand-leaning component shape is reproducible across LOOO folds (max drift $0.005$ in cos, $0.96$ in dHash, $0.023$ in weight); hard-posterior membership remains composition-sensitive (absolute differences $1.8$–$12.8$ pp), so we use K=3 as an *accountant-level descriptive characterisation*, not as an operational classifier. The signature-level operational classifier remains the inherited five-way box rule from v3.x literature, retained for continuity. Three independent feature-derived scores converge on the per-CPA hand-leaning ranking with Spearman $\rho \geq 0.879$: the K=3 mixture posterior; a reverse-anchor cosine percentile relative to a strictly-out-of-target non-Big-4 reference distribution; and the inherited box-rule hand-leaning rate. The three scores are not statistically independent measurements (they are functions of the same descriptor pair), so the convergence is documented as internal consistency among feature-derived ranks rather than external validation. External hard ground truth is provided by 262 byte-identical signatures in the Big-4 subset, against which all three candidate classifiers achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). We apply this pipeline to 90,282 audit reports filed by publicly listed companies in Taiwan between 2013 and 2023, extracting and analyzing 182,328 individual CPA signatures from 758 unique accountants. The Big-4 sub-corpus comprises 437 CPAs and 150,442 signatures with both descriptors available. The contributions of this paper are: 1. **Problem formulation.** We define non-hand-signing detection as distinct from signature forgery detection and frame it as a detection problem on intra-signer similarity distributions. 2. **End-to-end pipeline.** We present a pipeline that processes raw PDF audit reports through VLM-based page identification, YOLO-based signature detection, ResNet-50 feature extraction, and dual-descriptor similarity computation, with automated inference and no manual intervention after initial training. 3. **Dual-descriptor verification.** We demonstrate that combining deep-feature cosine similarity with independent-minimum dHash resolves the ambiguity between *style consistency* and *image reproduction*, and we validate the backbone choice through a feature-backbone ablation. 4. **Big-4 sub-corpus scope as the methodologically privileged calibration unit.** We show that the accountant-level mixture model becomes statistically supportable (dip-test rejects unimodality) at the Big-4 scope and not at narrower scopes, and we report the scope choice as a substantive methodological finding rather than an arbitrary subset. 5. **Accountant-level K=3 mixture as descriptive characterisation.** We fit and document a K=3 mixture whose hand-leaning component shape is LOOO-reproducible while its hard-posterior membership remains composition-sensitive. We do not treat K=3 as an operational classifier; we treat it as the descriptive accountant-level summary of the Big-4 distribution. 6. **Three-score convergent internal-consistency.** We measure agreement among three feature-derived scores at $\rho \geq 0.879$ and report this as internal consistency rather than external validation, given the shared-feature caveat. 7. **Annotation-free validation against a hard positive anchor.** We achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$) on 262 byte-identical Big-4 signatures, complementing v3.x's inherited inter-CPA negative-anchor FAR ($0.0005$ at the operational cosine cut). 8. **Full-dataset robustness as light secondary scope.** We re-run the K=3 + box-rule convergence at the full $n = 686$ CPA scope and observe Spearman $\rho = 0.9558$ (drift $0.007$ from Big-4), demonstrating pipeline reproducibility at multiple scopes while preserving Big-4 as the primary methodological scope. The remainder of the paper is organized as follows. Section II reviews related work on signature verification, document forensics, perceptual hashing, and the statistical methods used. Section III describes the proposed methodology. Section IV presents the experimental results — distributional characterisation, mixture fits, convergent internal-consistency checks, leave-one-firm-out reproducibility, pixel-identity validation, and full-dataset robustness. Section V discusses the implications and limitations. Section VI concludes with directions for future work. --- # II. Related Work > *Inherited largely unchanged from v3.20.0; v4.0 updates only the framing-relevant sentences.* The literature reviewed in v3.20.0 §II covers offline signature verification [3]–[8], near-duplicate image detection [12], [13], copy-move forgery detection [10], [11], the Hartigan dip test [37], finite mixture EM methods [40], [41], and the Burgstahler-Dichev / McCrary density-smoothness diagnostic [38], [39]. The v3.20.0 §II content is retained without substantive change in v4.0. One v4.0-specific addition belongs in §II: the methodological motivation for *leave-one-firm-out* cross-validation in a small-cluster scope. LOOO is standard in cross-validation literature [add citation] but has been used selectively in document-forensics calibration. Our application differs from the standard validation usage in two respects: (i) the hold-out unit is the firm (not the individual CPA or signature), so the analysis directly probes cross-firm reproducibility rather than within-firm sampling variance; and (ii) the held-out predictions are interpreted as a *calibration uncertainty band* on the operational rule, not as a sufficiency claim for the rule itself. We treat LOOO drift as descriptive information about the rule's composition sensitivity rather than as a pass/fail test. --- # V. Discussion ## A. Non-Hand-Signing Detection as a Distinct Problem Non-hand-signing differs from forgery in that the questioned signature is produced by its legitimate signer's own stored image rather than by an impostor. The detection problem is therefore framed around *intra-signer image reproduction* rather than *inter-signer imitation*. This framing has analytical consequences. The within-CPA signature distribution is the analytical population of interest; the cross-CPA inter-class distribution is a *reference* against which intra-CPA similarity is interpreted, not the population to be modelled. This contrasts with most prior offline signature verification work, which treats genuine-versus-forged as the central two-class problem. ## B. Per-Signature Similarity is a Continuous Quality Spectrum; the Accountant-Level Distribution is Multimodal A central empirical finding of v3.x was that *per-signature* similarity does not admit a clean two-mechanism mixture: dip-test fails to reject unimodality at the signature level for Firm A, BIC prefers a 3-component fit, and BD/McCrary candidate transitions lie inside the high-similarity mode rather than between modes. v4.0 inherits and confirms this signature-level reading. The v4.0 *accountant-level* analysis at the Big-4 scope tells a complementary story. When per-CPA means $(\overline{\text{cos}}_a, \overline{\text{dHash}}_a)$ are computed for the 437 Big-4 CPAs, the Hartigan dip test rejects unimodality on both marginals at $p < 5 \times 10^{-4}$. This is the smallest tested scope at which a finite-mixture model is statistically supported. The accountant-level multimodality reflects between-CPA heterogeneity in signing practice (some CPAs are templated, some are hand-leaning, some are mixed) — not a two-mechanism boundary within any single CPA's signing pattern. The two readings are consistent: a CPA can be hand-leaning on average while still occasionally reusing template signatures, which produces a unimodal per-signature distribution within the CPA but a multimodal per-CPA distribution across CPAs. ## C. Firm A as the Templated End of Big-4 (Case Study, Not Calibration Anchor) Firm A is empirically the most digitally-replicated of the four Big-4 firms in our corpus. In the Big-4 K=3 hard-posterior assignment, Firm A accounts for $0\%$ of the hand-leaning component and $82.5\%$ of the templated component; the opposite pattern holds at one of the non-Firm-A Big-4 firms (Firm C, which has the highest hand-leaning concentration at $23.5\%$). Byte-level pair analysis (inherited from v3.x §IV-F.1) identifies 145 Firm A pixel-identical signatures across 50 distinct Firm A partners, with 35 byte-identical matches spanning different fiscal years. In v4.0 we treat Firm A as a *templated-end case study* rather than as the calibration anchor for the operational threshold. Firm A enters the Big-4 mixture on equal footing with the other three Big-4 firms; the K=3 components are derived from the joint Big-4 distribution. Firm A's role in the methodology is descriptive: it is the within-Big-4 firm whose CPAs are most concentrated in the high-cosine, low-dHash corner, and the byte-level pair evidence provides the firm-level signature-reuse ground truth that anchors §IV-H's positive-anchor miss-rate analysis. This is a deliberate departure from v3.x's "Firm A as replication-dominated calibration reference" framing, motivated by the LOOO evidence below. ## D. K=2 is Firm-Mass Driven; K=3 is Reproducible in Shape Leave-one-firm-out cross-validation of the Big-4 mixture fit reveals a sharp contrast between K=2 and K=3 behaviour. K=2 is unstable: across-fold cosine-crossing deviation is $0.028$, and holding Firm A out gives a fold rule (cos $> 0.938$, dHash $\leq 8.79$) that classifies $100\%$ of held-out Firm A as templated, while holding any non-Firm-A Big-4 firm out gives a fold rule near (cos $> 0.975$, dHash $\leq 3.76$) that classifies $0\%$ of the held-out firm as templated. The K=2 boundary is essentially a Firm-A-vs-others separator, not a within-Big-4 mechanism boundary. This is a *negative result* about K=2 as an operational rule and motivates the K=3 selection that follows. K=3 in contrast has a *reproducible component shape*: across the four folds the hand-leaning component C1 cosine mean varies by at most $0.005$, the dHash mean by at most $0.96$, and the weight by at most $0.023$. Hard-posterior membership for the held-out firm is more composition-sensitive (absolute differences $1.8$–$12.8$ pp across folds). We do not use K=3 hard-posterior membership as an operational classifier; we use the component-shape stability as descriptive evidence that a cross-firm hand-leaning sub-population exists at the Big-4 scope, and we report the membership uncertainty as a calibration band. ## E. Three-Score Convergent Internal-Consistency Three feature-derived scores agree on the per-CPA hand-leaning ranking at Spearman $\rho \geq 0.879$: the K=3 mixture posterior; the reverse-anchor cosine percentile under a non-Big-4 reference distribution; and the inherited box-rule hand-leaning rate. The three scores are *not* statistically independent measurements — they are functions of the same per-CPA descriptor pair — so the convergence is documented as internal consistency rather than external validation against an independent ground truth (which the corpus does not provide for the hand-signed class). The strength of the convergence (all pairwise $|\rho| > 0.87$) and its persistence at the signature level (Cohen $\kappa = 0.87$ between per-CPA-fit and per-signature-fit K=3 binary labels) are nevertheless informative: per-CPA aggregation does not collapse the broad three-component ordering, and three different summarisations of the descriptor space — mixture posterior, deviation from an external reference, and a published box rule — produce the same per-CPA ranking. ## F. Pixel-Identity as a Hard Positive Anchor, Inherited Inter-CPA Negative-Anchor FAR The only hard ground-truth subset in the corpus is pixel-identical signatures: those whose nearest same-CPA match is byte-identical after crop and normalisation. Independent hand-signing cannot produce byte-identical images, so these signatures are conservative-subset ground truth for the *replicated* class. On the Big-4 subset ($n = 262$ pixel-identical signatures), all three candidate classifiers — the inherited box rule, the K=3 hard label, and the reverse-anchor metric with a prevalence-calibrated cut — achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). We caution that this result is necessary but not sufficient: for the box rule it is close to tautological, because byte-identical neighbours have cosine $\approx 1$ and dHash $\approx 0$, well inside the rule's high-confidence region. The corresponding signature-level *negative* anchor (the inter-CPA negative anchor, inherited from v3.20.0 §IV-F.1) provides specificity evidence: at the operational cosine cut of $0.95$, the inter-CPA FAR is $0.0005$ (Wilson 95% CI $[0.0003, 0.0007]$). ## G. Limitations Several limitations should be transparent. *Scope.* The v4.0 primary analyses are scoped to the Big-4 sub-corpus, restricted from the full $n = 686$ accountant population because the dip-test multimodality and the LOOO firm-fold structure are not available at narrower or broader scopes in our setting. Generalisation to mid- and small-firm contexts is left as future work; the §IV-K full-dataset robustness check shows the Spearman convergence is preserved at $n = 686$ but does not validate the Big-4 operational thresholds at the broader scope. *No signature-level hand-signed ground truth.* The corpus does not contain a labelled hand-signed class at the signature level, so the reverse-anchor classifier's cut is chosen by prevalence calibration against the inherited box rule's overall replicated rate rather than by direct ROC optimisation. Future work could supplement v4.0 with a small human-rated validation set. *Pixel-identity is a conservative subset.* Byte-identical pairs are the easiest replicated cases. A classifier that fails the pixel-identity check would be disqualified, but passing the check does not guarantee correct behaviour on the broader replicated population (e.g., re-stamped or noisy-template-variant signatures). *Inherited rule components are not separately v4-validated.* The five-way classifier's moderate-confidence band (cos $> 0.95$ AND $5 < \text{dHash} \leq 15$), the style-consistency band ($\text{dHash} > 15$), and the document-level worst-case aggregation rule retain their v3.20.0 calibration and capture-rate evidence; v4.0's convergent-checks evidence (§III-K, §IV-F) covers only the binary high-confidence sub-rule. *A1 pair-detectability stipulation.* The per-signature detector requires at least one same-CPA pair to be near-identical when a CPA uses image replication. A1 is plausible for high-volume stamping or firm-level electronic signing but not guaranteed when a corpus contains only one observed replicated report for a CPA, multiple template variants used in parallel, or scan-stage noise that pushes a replicated pair outside the detection regime. *Composition sensitivity of K=3.* The K=3 hard-posterior membership for any single firm varies by up to $12.8$ pp across LOOO folds. This is documented as calibration uncertainty rather than failure, but it means K=3 hard labels are not used as v4.0 operational classifier output. *Domain ground truth.* v4.0 reports population-level patterns; it does not perform partner-level mechanism attribution or report-level claims of intent. The signature-level outputs are signature-level quantities throughout. --- # VI. Conclusion and Future Work We present a fully automated pipeline for detecting non-hand-signed CPA signatures in Taiwan-listed financial audit reports and a methodological framework for characterising and validating it at the Big-4 sub-corpus scope. The pipeline processes raw PDFs through VLM-based page identification, YOLO-based signature detection, ResNet-50 feature extraction, and dual-descriptor (cosine + independent-minimum dHash) similarity computation. Applied to 90,282 audit reports filed between 2013 and 2023, it extracts 182,328 signatures from 758 CPAs, with the Big-4 sub-corpus (437 CPAs / 150,442 signatures) as the primary analytical population. Our central methodological contributions are: (1) the identification of the Big-4 sub-corpus as the smallest tested scope at which an accountant-level finite-mixture model is statistically supportable (dip-test $p < 5 \times 10^{-4}$); (2) the recognition that a 2-component mixture at this scope is a firm-mass separator rather than a mechanism boundary, evidenced by leave-one-firm-out instability; (3) a 3-component mixture whose hand-leaning component shape is LOOO-reproducible (cos drift $\leq 0.005$, dHash drift $\leq 0.96$, weight drift $\leq 0.023$) but whose hard-posterior membership is composition-sensitive ($\pm 5$–$13$ pp across folds), supporting K=3 as descriptive characterisation rather than operational classifier; (4) three feature-derived scores that converge on the per-CPA hand-leaning ranking at Spearman $\rho \geq 0.879$, with the explicit caveat that the scores are not statistically independent; (5) $0\%$ positive-anchor miss rate on 262 byte-identical Big-4 signatures (Wilson 95% upper bound $1.45\%$); and (6) preservation of the K=3 + box-rule Spearman convergence at the full $n = 686$ CPA scope (drift $0.007$), demonstrating cross-scope pipeline reproducibility. Future work falls in four directions. *First*, a small-scale human-rated validation set would enable direct ROC optimisation of the reverse-anchor cut and provide a signature-level hand-signed ground truth that v4.0 currently lacks. *Second*, the within-firm composition sensitivity of K=3 hard labels could be addressed with hierarchical mixture models or partial-pooling formulations that explicitly model firm-level random effects. *Third*, the policy and audit-quality implications of within-Big-4 contrast — Firm A's $82\%$ templated concentration vs Firm C's $23.5\%$ hand-leaning concentration — invite a substantive companion analysis examining whether differences in firm signing practice correlate with established audit-quality measures. *Fourth*, generalisation to mid- and small-firm contexts requires either new methodological scope choices (since mid/small-firm samples do not support the dip-test or LOOO structure we use at Big-4) or a different validation framework altogether. --- ## Notes for Phase 4 close-out Items remaining for the Phase 4 close-out pass before §I, §II, §V, §VI prose can be moved into the manuscript master file: 1. **Abstract word count.** Current draft is approximately 235 words; verify against IEEE Access's $\leq 250$ word constraint after final paragraph edits. 2. **§I contributions list (8 items).** v3.20.0's contribution list had 7 items; v4.0's has 8 to reflect the Big-4 scope, K=3 descriptive role, and three-score convergence as separate contributions. Confirm whether the journal style supports 8 contributions or whether items can be merged. 3. **§II Related Work LOOO citation.** A standard cross-validation citation for the LOOO addition is flagged "[add citation]" in the draft and needs to be filled with a specific reference (Geisser 1975 / Stone 1974 / a modern survey). 4. **§V-G Limitations.** The seven limitations are listed flat; the journal style may prefer them grouped (scope vs ground-truth vs methodology) — consider reorganisation at copy-edit time. 5. **§VI Future Work directions.** Four directions are listed; the third (audit-quality companion analysis) ties to the Paper B placeholder in the project memory and should be cross-checked for consistency with the planned Paper B framing. 6. **Internal draft note + this close-out checklist.** Strip before submission packaging, per the across-paper "internal — remove before submission" policy applied to §III v6 and §IV v3.2 draft notes.