Codex GPT-5.5 3-axis peer review (paper/codex_review_gpt55_v4_round_3axis.md)
identified 11 MAJOR + 5 MINOR + 0 BLOCKER on three axes: (1) abstract/body
tone consistency, (2) methodology clarity / v3 residue, (3) no implicit
within-CPA or cross-year signature-consistency assumptions. 13 patches
applied across 4 source files; mirrored in paper_a_v4_combined.md.
Axis 1 (tone consistency between abstract and body):
- S I L33: "resolves the ambiguity" -> "provides complementary evidence
for screening cases where ... hypotheses diverge"
- S I L35: "disproves the distributional-threshold path" -> "does not
support the distributional-threshold path"
- S I L37 / S V-F L29: "characterise the deployed five-way classifier
at three units" -> "characterise the deployed HC sub-rule and
document-level HC+MC alarm derived from the five-way classifier at
three units" (consistent with S V-H which says only HC sub-rule and
HC+MC alarm are re-characterised by the present ICCR battery)
- S I L39 / S V-C / S III-L.4: "consistent with firm-specific template,
stamp, or document-production reuse mechanisms" -> "consistent with --
but does not independently establish -- firm-level template-like
reuse, digitisation-pipeline homogeneity, or signing-style
homogeneity, which descriptor-only data cannot separate (S V-H)"
(mirrors abstract)
Axis 2 (methodology clarity / v3 residue):
- S III-G: added unit-bridge sentence distinguishing "descriptor-summary
units" (signature/accountant) from "operational reporting units"
(per-comparison/per-signature/per-document, S III-L)
- S III-H.2: "The calibration distinguishes two reference populations"
-> "The supporting diagnostics use two reference populations" with
explicit "neither is the calibration anchor"
- S III-L.1: "specificity" -> "ICCR refinement"
- S III-L.2: added "descriptive intuition, not an independence
assumption used for estimation" caveat after the 1-(1-p)^n form
Axis 3 (no implicit signature-consistency assumptions):
- S III-F: hand-signing motivation rewritten as working hypothesis that
"the classifier does not require ... to hold for all CPAs"
- S III-G A1: added "A1 does not assume temporal stability of
handwriting or scanning workflow within or across years"
- S III-H.1: added label-caveat paragraph (operational rule outputs,
not validated ground-truth classes); HC "strong replication evidence"
-> "image-similarity evidence consistent with replication"; HSC
"consistent with a CPA who signs very consistently" -> "mechanism not
resolved by descriptor data alone"; LH explicitly owns that
cross-year handwriting drift, scanner workflow change, or template
variant rotation can also yield low max-cosine within a same-CPA pool
- S III-L.6 / S IV-M.6: "same-CPA repeatability signal" -> "observed
same-CPA-pool excess ... not attributed to within-CPA handwriting
repeatability"
Deferred (structural, not single-sentence patch): codex S III-I.2 /
S III-J K=2/K=3 deduplication; codex S III-K LOOO / S III-J duplication.
Both are MINOR stylistic redundancies, not reviewer-rejection risks.
DOCX rebuilt via export_v3.py; v4.0_20260515 file refreshed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>