Files
pdf_signature_extraction/paper
gbanyan b884d39544 Apply Phase 5 round-2 fixes from Opus M1-M4 + Gemini Table XV footnote
Addresses round-1 findings from all three AI reviewers in a single
pass. Substantive empirical content unchanged; fixes are factual
corrections, terminology consistency, and table-numbering hygiene.

Opus M3 (Abstract-level factual misstatement): "98-100% of inter-CPA
collisions within source firm" repeated in Abstract / §I body / §I
item 6 / §V-C / §V-G limitation 2 / §VI item 4 / §VI Future Work
conflated the same-pair joint rate (97.0-99.96%) with the any-pair
deployed rule rate (76.7-98.8% across Firms A/B/C/D — Firm A 98.8,
B 76.7, C 83.7, D 77.4 from Table XXV). Replaced with the actual
any-pair range and explicit same-pair sub-range. Removed §V-C's
"regardless of which Big-4 firm is the source" — within-firm
concentration is firm-dependent.

Opus M1 (§IV K=3 mechanism-label reversion): §IV silently regressed
to v3.x "C1 hand-leaning / C2 mixed / C3 replicated" naming that
§III-J line 90 explicitly retires post-composition-decomposition.
Replaced in Tables IX/X/XIV/XVI/XVII column headers and §IV-F /
§IV-H / §IV-J / §IV-K prose. New convention matches §III-J:
- C1 (hand-leaning) -> C1 (low-cos / high-dHash)
- C2 (mixed) -> C2 (central)
- C3 (replicated) -> C3 (high-cos / low-dHash)
- "hand-leaning rate" -> "less-replication-dominated rate"
"Replicated class" retained where it refers to byte-identical
ground truth (line 143/153 — actual byte-level reuse, not K=3
mechanism inference).

Opus M4 (§V duplicate G heading): Phase 4 prose §V had "G.
Pixel-Identity..." at line 105 and "G. Limitations" at line 109.
Renamed second heading to "H. Limitations".

Opus M2 + Gemini Table XV-B (table-numbering cascade): Renamed
Table XV-B to Table XIX, then cascaded XIX -> XX -> ... -> XXV ->
XXVI to keep sequential integer numbering. Cross-reference at
§IV-J also updated. No cross-refs to these tables exist outside §IV
(verified by grep against §III + Phase 4 prose).

Gemini sample-size footnote (Table XV): expanded the source note
to explicitly explain the 150,442 (descriptor-complete) vs 150,453
(vector-complete) distinction across §IV sub-sections and point
back to §III-G sample-size reconciliation.

§III prose softening (lines 99, 283): "nearly all (98%)" framing
that read the Firm A rate as representative of all four Big-4 firms
replaced with the per-firm any-pair / same-pair breakdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:57:19 +08:00
..