Codex round-3 verification (commit 9e68f2e) flagged remaining items:
BLOCKER:
- Appendix A Table A.I was inside an HTML comment but visible prose at L1268
and L1275 referenced it as if it rendered. Un-commented Table A.I (same fix
pattern as Tables I-IV in round-3).
MAJOR:
- Table XXVII (diagnostic summary) appeared in §III-M at L593 before Tables
III-XXVI in §IV, breaking sequential IEEE-style numbering. Moved the table
to Appendix A as Table A.II (new §A.2 "Diagnostic Summary"). §III-M now
references "Appendix A Table A.II" instead of hosting the table inline;
§VI Conclusion contribution (8) updated similarly. Appendix A heading
generalised to "Supplementary Diagnostic Detail" with §A.1 (BD/McCrary)
and §A.2 (Diagnostic Summary).
- §III-M L614 rate-definition conflation: the sentence "per-signature and
per-document rates ($0.11$ and $0.34$ respectively under the deployed
any-pair HC + MC alarm)" mixed unit labels. Rewrote to label each rate by
its actual unit and source table: per-signature any-pair HC ICCR ($0.11$;
Table XXII) and per-document HC+MC alarm-rate ICCR ($0.34$; Table XXIII).
MINORS:
- L400 §III-L stale ref retargeted: "operational signature-level classifier
of §III-L" -> "of §III-H.1 calibrated in §III-L".
- L663 §III-L stale ref retargeted: "KDE crossover used in per-document
classification (Section III-L)" -> "(Section III-H.1)".
- L1146 Conclusion "corpus-universal" overstated generality -> "across the
tested eligible scopes" (non-Big-4 evidence is cosine-axis only, not full
calibration scope).
- L1024 Results §IV-M.4 footnote: np.argmax / np.unique implementation
detail moved to "alphabetically-ordered tie-break ... full implementation
detail in the supplementary materials" (less script-internal prose).
Artefacts:
- Combined manuscript regenerated: paper_a_v4_combined.md, 1316 lines.
- Final main-text table sequence: I, II, III, ..., XXVI (26 tables, all
sequential in rendered order). Appendix tables: A.I, A.II.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolved Codex review (gpt-5.5 xhigh) findings against b6913d2.
BLOCKERS:
- Appendix B reference mismatch: rewrote all main-text "Appendix B" references
to "supplementary materials" since Appendix B is now a redirect stub. Affected
the SSIM design-argument pointer, threshold provenance, byte-level
decomposition, MC band capture-rate, and backbone-ablation table references
across §III-F / §III-H.1 / §III-H.2 / §III-K / §III-L.4 / §III-M / §IV-F /
§IV-J / §IV-K / §IV-L / §V-C / §V-H.
- Table rendering: un-commented Tables I-IV (Dataset Summary, YOLO Detection,
Extraction Results, Cosine Distribution Statistics) which were inside HTML
comment blocks and would not have rendered in the submission.
- Table numbering out of order: Table XIX appeared before Tables XVI-XVIII.
Renumbered XIX -> XVI (document-level worst-case counts), XVI -> XVII (Firm x
K=3 cross-tab), XVII -> XVIII (K=3 component comparison), XVIII -> XIX
(Spearman correlation). Cross-references updated in §IV-J / §IV-K and §V-C.
- Table V mis-citation: §IV-C said "KDE crossover ... (Table V)" but Table V is
the dip test. Dropped the (Table V) tag; crossover is a textual finding.
- Submission cleanup: wrapped the archived Impact Statement section heading and
body inside the existing HTML comment (was rendering). Funding placeholder
wrapped in HTML comment with a TO-DO note (won't render but is preserved as
reminder).
MAJORS:
- Line 1077 numerical conflation: rewrote the §V-C / §III-L.4 paragraph that
labelled Firm A's per-document HC+MC inter-CPA proxy ICCR of 0.6201 as a rate
"on real same-CPA pools." 0.6201 is a counterfactual proxy under inter-CPA
candidate-pool replacement, not the observed rate. Added explicit disambig:
the corresponding observed rate from Table XVI (formerly XIX) is 97.5%
HC+MC for Firm A; the proxy and observed rates measure different quantities.
- Residual "validation" language softened: "Dual-descriptor verification" ->
"Dual-descriptor similarity"; "we validate the backbone choice" -> "we
support the backbone choice"; "pixel-identity validation" -> "pixel-identity
positive-anchor check"; "## M. Validation Strategy and Limitations under
Unsupervised Setting" -> "## M. Unsupervised Diagnostic Strategy and Limits".
- "Specificity behaviour" overclaim: "characterises the cosine threshold's
specificity behaviour" -> "specificity-proxy behaviour" (methodology §III-L.0
and discussion §V-F).
- "Prior published / prior calibration" ambiguity: replaced "prior published
per-comparison rate" with "the corpus-wide rate reported in §IV-I"; replaced
"(prior published operating point)" with "(alternative operating point from
supplementary calibration evidence)" in Tables XXI; replaced "prior reporting
and the existing literature" with "the existing literature and the
supplementary calibration evidence."
MINORS:
- Line 116 Bayes-optimal qualifier: "the local density minimum ... is the
Bayes-optimal decision boundary under equal priors" -> "In idealized
two-class mixture settings with equal priors and equal misclassification
costs, the local density minimum ... coincides with the Bayes-optimal
decision boundary."
- Stale section refs: §V-G for the fine-tuning caveat retargeted to §V-H
Engineering-level caveats (where it lives after the §V-H reorganisation);
§III-L for the worst-case rule retargeted to §III-H.1; "Section IV-D.2"
(nonexistent) retargeted to "Section IV-D Table VI."
- Abstract / Introduction "after pool-size adjustment": separated the
document-level D2 proxy ICCR claim from the per-signature logistic regression
claim. Now: "Per-document D2 inter-CPA proxy ICCRs differ by an order of
magnitude across firms ... a per-signature logistic regression confirms the
firm gap persists after pool-size control."
NIT:
- Related Work HTML comment "(see paper_a_references_v3.md for full list)"
-> "(full list in the References section)"; removes the version-coded
filename reference from the source.
Artefacts:
- Combined manuscript regenerated: paper_a_v4_combined.md, 1312 lines.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Structural:
- Promote operational classifier definition from §III-L.0 to new §III-H.1, so
the reader meets the five-way HC/MC/HSC/UN/LH rule before the §III-I/J/K
diagnostic chain instead of ~130 lines after. §III-L renamed to
"Anchor-Based Threshold Calibration"; §III-L.0 retains only calibration
methodology, three units of analysis, any-pair semantics, and the FAR
terminological note. §III-L.7 deleted (redundant with §III-J).
- Reorganise §V-H Limitations into Primary / Secondary / Documented features /
Engineering groupings (was a flat 14-item list).
- Reframe §III-M from "ten-tool unsupervised-validation collection" to
"each diagnostic addresses one specific unsupervised failure mode";
rename "What v4.0 does/does not claim" → "Limits / Scope of the present
analysis"; retitle Table XXVII.
Framing alignment (cross-section):
- Strip all v3.x / v4.0 / v3.20 / v4-new / inherited lineage labels from
rendered text (Abstract, Intro, §II, §III, §IV, §V, §VI, Appendix, Impact).
- Replace "Paper A" rule references with "deployed" rule references.
- Soften "validation" to "characterise" / "check" / "screening label" /
"consistency check" / "support"; "verdict" → "screening label".
- Remove codex-verified spike claims (non-Big-4 jittered dHash, Big-4 pooled
cosine after firm-mean centring). Only formally scripted evidence
(Scripts 39b–39e) retained; non-Big-4 evidence framed as corroborating
raw-axis cosine, not as calibration evidence.
- Strip script-provenance parentheticals from Introduction; defer Script 39c
internal references and similar to Methodology / Appendix.
Numerical / table fixes:
- §III-C document-count arithmetic: 12 corrupted → 13 corrupted/unreadable,
verified against sqlite DB and total-pdf/ folder counts (90,282 - 4,198
no-sig - 13 corrupted = 86,071 → 85,042 with detections → 182,328 sigs →
168,755 CPA-matched). Table I shows VLM-positive (86,084) and
processed-for-extraction (86,071) as separate rows.
- Wilson 95% CIs added for joint-rule ICCR rows in Table XXI / methodology
table ([0.00011, 0.00018] and [0.00008, 0.00014]).
- Unit error fixed: 0.3856 pp / 0.4431 pp → 0.3856 (38.6 pp) / 0.4431 (44.3 pp).
Smaller revisions:
- Pipeline framing: "detecting" → "screening" in Abstract / Intro / Conclusion
for consistency with the unsupervised-screening positioning.
- "hard ground-truth subset" → "conservative hard-positive subset" throughout.
- §III-F SSIM / pixel-comparison rebuttal compressed from ~15 lines to 4;
design-level argument deferred to supplementary materials.
- "stakeholders can adopt / can derive thresholds" → "alternative operating
points can be characterised by inverting" (less prescriptive).
- "the same mechanism extending in milder form to Firms B/C/D" → "similar,
milder production-related reuse patterns at Firms B/C/D" (mechanism claim
softened).
- Appendix A "non-hand-signed mode" / "two-mechanism mixture" lineage language
aligned with v4 framing.
Appendix B:
- Rebuilt as a redirect-only stub. The HTML-commented obsolete table mapping
(Table IX–XVIII labels with FAR / capture-rate / validation language) is
removed; replaced with a short paragraph pointing to supplementary
materials for full table-to-script provenance.
Cross-references:
- All §III-L references for the rule definition retargeted to §III-H.1;
references for calibration still point to §III-L.
- §III-H references for byte-level Firm A evidence / non-Big-4 reverse anchor
retargeted to §III-H.2.
Artefacts:
- Combined manuscript regenerated: paper_a_v4_combined.md, 1314 lines
(was 1346 pre-review).
- Two review handoff documents added:
paper/review_handoff_abstract_intro_20260515.md
paper/review_handoff_body_20260515.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>