Structural:
- Promote operational classifier definition from §III-L.0 to new §III-H.1, so
the reader meets the five-way HC/MC/HSC/UN/LH rule before the §III-I/J/K
diagnostic chain instead of ~130 lines after. §III-L renamed to
"Anchor-Based Threshold Calibration"; §III-L.0 retains only calibration
methodology, three units of analysis, any-pair semantics, and the FAR
terminological note. §III-L.7 deleted (redundant with §III-J).
- Reorganise §V-H Limitations into Primary / Secondary / Documented features /
Engineering groupings (was a flat 14-item list).
- Reframe §III-M from "ten-tool unsupervised-validation collection" to
"each diagnostic addresses one specific unsupervised failure mode";
rename "What v4.0 does/does not claim" → "Limits / Scope of the present
analysis"; retitle Table XXVII.
Framing alignment (cross-section):
- Strip all v3.x / v4.0 / v3.20 / v4-new / inherited lineage labels from
rendered text (Abstract, Intro, §II, §III, §IV, §V, §VI, Appendix, Impact).
- Replace "Paper A" rule references with "deployed" rule references.
- Soften "validation" to "characterise" / "check" / "screening label" /
"consistency check" / "support"; "verdict" → "screening label".
- Remove codex-verified spike claims (non-Big-4 jittered dHash, Big-4 pooled
cosine after firm-mean centring). Only formally scripted evidence
(Scripts 39b–39e) retained; non-Big-4 evidence framed as corroborating
raw-axis cosine, not as calibration evidence.
- Strip script-provenance parentheticals from Introduction; defer Script 39c
internal references and similar to Methodology / Appendix.
Numerical / table fixes:
- §III-C document-count arithmetic: 12 corrupted → 13 corrupted/unreadable,
verified against sqlite DB and total-pdf/ folder counts (90,282 - 4,198
no-sig - 13 corrupted = 86,071 → 85,042 with detections → 182,328 sigs →
168,755 CPA-matched). Table I shows VLM-positive (86,084) and
processed-for-extraction (86,071) as separate rows.
- Wilson 95% CIs added for joint-rule ICCR rows in Table XXI / methodology
table ([0.00011, 0.00018] and [0.00008, 0.00014]).
- Unit error fixed: 0.3856 pp / 0.4431 pp → 0.3856 (38.6 pp) / 0.4431 (44.3 pp).
Smaller revisions:
- Pipeline framing: "detecting" → "screening" in Abstract / Intro / Conclusion
for consistency with the unsupervised-screening positioning.
- "hard ground-truth subset" → "conservative hard-positive subset" throughout.
- §III-F SSIM / pixel-comparison rebuttal compressed from ~15 lines to 4;
design-level argument deferred to supplementary materials.
- "stakeholders can adopt / can derive thresholds" → "alternative operating
points can be characterised by inverting" (less prescriptive).
- "the same mechanism extending in milder form to Firms B/C/D" → "similar,
milder production-related reuse patterns at Firms B/C/D" (mechanism claim
softened).
- Appendix A "non-hand-signed mode" / "two-mechanism mixture" lineage language
aligned with v4 framing.
Appendix B:
- Rebuilt as a redirect-only stub. The HTML-commented obsolete table mapping
(Table IX–XVIII labels with FAR / capture-rate / validation language) is
removed; replaced with a short paragraph pointing to supplementary
materials for full table-to-script provenance.
Cross-references:
- All §III-L references for the rule definition retargeted to §III-H.1;
references for calibration still point to §III-L.
- §III-H references for byte-level Firm A evidence / non-Big-4 reverse anchor
retargeted to §III-H.2.
Artefacts:
- Combined manuscript regenerated: paper_a_v4_combined.md, 1314 lines
(was 1346 pre-review).
- Two review handoff documents added:
paper/review_handoff_abstract_intro_20260515.md
paper/review_handoff_body_20260515.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>