Files
pdf_signature_extraction/paper/paper_a_impact_statement_v3.md
T
gbanyan 9e68f2e1d3 Phase 6 round-3 codex-review fixes: blockers + majors + minors
Resolved Codex review (gpt-5.5 xhigh) findings against b6913d2.

BLOCKERS:
- Appendix B reference mismatch: rewrote all main-text "Appendix B" references
  to "supplementary materials" since Appendix B is now a redirect stub. Affected
  the SSIM design-argument pointer, threshold provenance, byte-level
  decomposition, MC band capture-rate, and backbone-ablation table references
  across §III-F / §III-H.1 / §III-H.2 / §III-K / §III-L.4 / §III-M / §IV-F /
  §IV-J / §IV-K / §IV-L / §V-C / §V-H.
- Table rendering: un-commented Tables I-IV (Dataset Summary, YOLO Detection,
  Extraction Results, Cosine Distribution Statistics) which were inside HTML
  comment blocks and would not have rendered in the submission.
- Table numbering out of order: Table XIX appeared before Tables XVI-XVIII.
  Renumbered XIX -> XVI (document-level worst-case counts), XVI -> XVII (Firm x
  K=3 cross-tab), XVII -> XVIII (K=3 component comparison), XVIII -> XIX
  (Spearman correlation). Cross-references updated in §IV-J / §IV-K and §V-C.
- Table V mis-citation: §IV-C said "KDE crossover ... (Table V)" but Table V is
  the dip test. Dropped the (Table V) tag; crossover is a textual finding.
- Submission cleanup: wrapped the archived Impact Statement section heading and
  body inside the existing HTML comment (was rendering). Funding placeholder
  wrapped in HTML comment with a TO-DO note (won't render but is preserved as
  reminder).

MAJORS:
- Line 1077 numerical conflation: rewrote the §V-C / §III-L.4 paragraph that
  labelled Firm A's per-document HC+MC inter-CPA proxy ICCR of 0.6201 as a rate
  "on real same-CPA pools." 0.6201 is a counterfactual proxy under inter-CPA
  candidate-pool replacement, not the observed rate. Added explicit disambig:
  the corresponding observed rate from Table XVI (formerly XIX) is 97.5%
  HC+MC for Firm A; the proxy and observed rates measure different quantities.
- Residual "validation" language softened: "Dual-descriptor verification" ->
  "Dual-descriptor similarity"; "we validate the backbone choice" -> "we
  support the backbone choice"; "pixel-identity validation" -> "pixel-identity
  positive-anchor check"; "## M. Validation Strategy and Limitations under
  Unsupervised Setting" -> "## M. Unsupervised Diagnostic Strategy and Limits".
- "Specificity behaviour" overclaim: "characterises the cosine threshold's
  specificity behaviour" -> "specificity-proxy behaviour" (methodology §III-L.0
  and discussion §V-F).
- "Prior published / prior calibration" ambiguity: replaced "prior published
  per-comparison rate" with "the corpus-wide rate reported in §IV-I"; replaced
  "(prior published operating point)" with "(alternative operating point from
  supplementary calibration evidence)" in Tables XXI; replaced "prior reporting
  and the existing literature" with "the existing literature and the
  supplementary calibration evidence."

MINORS:
- Line 116 Bayes-optimal qualifier: "the local density minimum ... is the
  Bayes-optimal decision boundary under equal priors" -> "In idealized
  two-class mixture settings with equal priors and equal misclassification
  costs, the local density minimum ... coincides with the Bayes-optimal
  decision boundary."
- Stale section refs: §V-G for the fine-tuning caveat retargeted to §V-H
  Engineering-level caveats (where it lives after the §V-H reorganisation);
  §III-L for the worst-case rule retargeted to §III-H.1; "Section IV-D.2"
  (nonexistent) retargeted to "Section IV-D Table VI."
- Abstract / Introduction "after pool-size adjustment": separated the
  document-level D2 proxy ICCR claim from the per-signature logistic regression
  claim. Now: "Per-document D2 inter-CPA proxy ICCRs differ by an order of
  magnitude across firms ... a per-signature logistic regression confirms the
  firm gap persists after pool-size control."

NIT:
- Related Work HTML comment "(see paper_a_references_v3.md for full list)"
  -> "(full list in the References section)"; removes the version-coded
  filename reference from the source.

Artefacts:
- Combined manuscript regenerated: paper_a_v4_combined.md, 1312 lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:28:14 +08:00

1.6 KiB