Apply codex round-27 narrow fixes; Phase 4 prose v2.1

Codex round 27 returned Minor Revision: 10/11 Major + 14/15 Minor CLOSED. Two narrow residuals applied: 1. §V-F line 99 'all three candidate classifiers' replaced with 'all three candidate checks' with explicit enumeration (the inherited box rule, the K=3 hard label, and the prevalence-calibrated reverse-anchor cut). Keeps the K=3 hard label explicitly descriptive rather than operational. 2. Close-out checklist's stale '~235 words' abstract claim updated to the verified 243-244 word count. Deferred to manuscript-assembly time (not blockers for Phase 5 cross-AI peer review): - §II [42]-[44] citation finalisation (placeholders are transparent in the current draft state). - Internal draft notes and close-out checklists (these explicitly help reviewers track the convergence cycle). - Manuscript-level lint pass (last step before submission packaging). Closure summary across 7 codex rounds (21-27): - Empirical: ALL Major + Minor findings CLOSED on the §III/§IV/Phase 4 substantive content. - Packaging: 2 OPEN items (§II citations, internal notes) intentionally deferred to manuscript-assembly time. Phase 5 readiness: substantively YES. The §III v6 + §IV v3.2 + Phase 4 v2.1 is converged for cross-AI peer review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> EOF
2026-05-13 00:15:35 +08:00
parent 918d55154a
commit 6db5d635f5
2 changed files with 104 additions and 2 deletions
@@ -32,7 +32,7 @@ The methodological reframing relative to earlier versions of this work is centra

 A 3-component fit (BIC lower by $3.48$) instead identifies three accountant-level components: a templated cluster (cos $\approx 0.983$, dHash $\approx 2.41$, weight $0.321$), a mixed cluster (cos $\approx 0.956$, dHash $\approx 6.66$, weight $0.536$), and a cross-firm hand-leaning cluster (cos $\approx 0.946$, dHash $\approx 9.17$, weight $0.143$). The hand-leaning component shape is reproducible across LOOO folds (max drift $0.005$ in cos, $0.96$ in dHash, $0.023$ in weight); hard-posterior membership remains composition-sensitive (absolute differences $1.8$–$12.8$ pp), so we use K=3 as an *accountant-level descriptive characterisation*, not as an operational classifier. The signature-level operational classifier remains the inherited Paper A v3.x five-way box rule, retained for continuity with the prior literature.

-Three feature-derived scores converge on the per-CPA hand-leaning ranking with Spearman $\rho \geq 0.879$: the K=3 mixture posterior; a reverse-anchor cosine percentile relative to a strictly-out-of-target non-Big-4 reference distribution; and the inherited box-rule hand-leaning rate. The three scores are not statistically independent — they are deterministic functions of the same per-CPA descriptor pair — so the convergence is documented as internal consistency among feature-derived ranks rather than as external validation. Hard ground truth for the *replicated* class is provided by 262 byte-identical signatures in the Big-4 subset (Firm A 145, Firm B 8, Firm C 107, Firm D 2), against which all three candidate scores achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). For the box rule this result is close to tautological at byte-identity, and we discuss the conservative-subset caveat in §V-G.
+Three feature-derived scores converge on the per-CPA hand-leaning ranking with Spearman $\rho \geq 0.879$: the K=3 mixture posterior; a reverse-anchor cosine percentile relative to a strictly-out-of-target non-Big-4 reference distribution; and the inherited box-rule hand-leaning rate. The three scores are not statistically independent — they are deterministic functions of the same per-CPA descriptor pair — so the convergence is documented as internal consistency among feature-derived ranks rather than as external validation. Hard ground truth for the *replicated* class is provided by 262 byte-identical signatures in the Big-4 subset (Firm A 145, Firm B 8, Firm C 107, Firm D 2), against which all three candidate checks (the inherited box rule, the K=3 hard label, and the prevalence-calibrated reverse-anchor cut) achieve $0\%$ positive-anchor miss rate (Wilson 95% upper bound $1.45\%$). For the box rule this result is close to tautological at byte-identity, and we discuss the conservative-subset caveat in §V-G.

 We apply this pipeline to 90,282 audit reports filed by publicly listed companies in Taiwan between 2013 and 2023, extracting and analyzing 182,328 individual CPA signatures from 758 unique accountants. The Big-4 sub-corpus comprises 437 CPAs and 150,442 signatures with both descriptors available.

@@ -142,7 +142,7 @@ Future work falls in four directions. *First*, a small-scale human-rated validat

 Items remaining for the Phase 4 close-out pass before §I, §II, §V, §VI prose can be moved into the manuscript master file:

-1. **Abstract word count.** Current draft is approximately 235 words; verify against IEEE Access's $\leq 250$ word constraint after final paragraph edits.
+1. **Abstract word count.** Current draft is 243–244 words (shell `wc -w` on the paragraph returns 243; one-token tokenization difference depending on counter); both satisfy IEEE Access's $\leq 250$ word constraint with $\sim 6$ words of margin.
 2. **§I contributions list (8 items).** v3.20.0's contribution list had 7 items; v4.0's has 8 to reflect the Big-4 scope, K=3 descriptive role, and three-score convergence as separate contributions. Confirm whether the journal style supports 8 contributions or whether items can be merged.
 3. **§II Related Work LOOO citation.** A standard cross-validation citation for the LOOO addition is flagged "[add citation]" in the draft and needs to be filled with a specific reference (Geisser 1975 / Stone 1974 / a modern survey).
 4. **§V-G Limitations.** The seven limitations are listed flat; the journal style may prefer them grouped (scope vs ground-truth vs methodology) — consider reorganisation at copy-edit time.