62a22ceb83
Codex gpt-5.5 xhigh review of v1 draft returned Major Revision with
5 Major findings + 7 Minor + editorial nits. v2 addresses all of
them.
Key v2 changes:
1. Primary classifier declared: inherited v3.x five-way per-signature
box rule. K=3 mixture is demoted to accountant-level descriptive
characterisation (Script 35 / Script 38 footing), explicitly NOT
used to assign signature- or document-level labels.
2. §III-J reframed as "Mixture Model and Accountant-Level
Characterisation" (was "Mixture Model and Operational Threshold
Derivation"). K=3 LOOO P2_PARTIAL verdict surfaced in prose
including the "not predictively useful as an operational
classifier" interpretation from the Script 37 verdict legend.
3. §III-K renamed "Convergent Internal-Consistency Checks" (was
"Convergent Validation") with explicit caveat that the three
scores share underlying features and are not statistically
independent measurements.
4. §III-H reverse-anchor paragraph rewritten: the directional
error in v1 (the non-Big-4 reference described as a "more-
replicated-population baseline") is corrected -- the reference
is in fact in the LESS-replicated regime relative to Big-4,
and the score measures deviation in the hand-leaning direction.
5. Pixel-identity metric renamed from "FAR" to "positive-anchor
miss rate" with explicit conservative-subset caveat
("near-tautological for the box rule because byte-identical
=> cosine ~1 / dHash ~0").
6. §III-L title changed to "Signature- and Document-Level
Classification" (was "Per-Document Classification") and
reorganised so the per-signature five-way rule + document-level
worst-case aggregation are both clearly under this section.
7. Empirical slips corrected:
- K=2 LOOO comparison: now correctly says "5.6x the stability
tolerance 0.005" rather than "5.6x the bootstrap CI half-width";
full-Big-4 bootstrap half-width 0.0015 cited separately.
- all-non-Firm-A dip: now correctly (0.998, 0.907), not "p > 0.99".
- BD/McCrary: now narrowed to Big-4 scope (Script 34 null), with
Script 32 dHash transitions for non-Big-4 subsets noted but
not used as operational thresholds.
- Firm A byte-identical "50 partners of 180 registered, 35
cross-year" -- now explicitly inherited from v3.x §IV-F.1 /
Script 28 / Appendix B; provenance row in the new table flags
this as inherited, not v4-regenerated.
- "mid/small-firm tail actively pulling" -> "the full-sample and
Big-4-only calibrations differ" (causal language softened).
- $\Delta\text{BIC}$ sign convention: explicit "lower BIC is
preferred; BIC(K=3) - BIC(K=2) = -3.48".
8. Editorial nits applied:
- "failure rate" -> "box-rule hand-leaning rate"
- "boundary moves modestly" -> "membership remains
composition-sensitive"
- "calibration uncertainty band +/- 5-13 pp" -> "observed absolute
differences of 1.8-12.8 pp, with Firm C exceeding the 5 pp
viability bar"
- "strongest single methodology-validation signal" -> "strongest
internal-consistency signal"
- "the same component structure recovers" -> "a broadly similar
three-component ordering recovers"
- Cross-reference index marked as author checklist (remove
before submission).
9. New provenance table at end of §III mapping every numerical claim
to (script, source, direct/derived/inherited).
10. Open questions reduced from 5 to 3 (codex resolved questions 2,
3, 4 with concrete answers); remaining 3 are forward-looking
(5-way moderate band, pseudonym consistency, table numbering).
Also commits: paper/codex_review_gpt55_v4_round1.md (codex review
artifact, 143 lines).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>