Resolved Codex review (gpt-5.5 xhigh) findings against b6913d2.
BLOCKERS:
- Appendix B reference mismatch: rewrote all main-text "Appendix B" references
to "supplementary materials" since Appendix B is now a redirect stub. Affected
the SSIM design-argument pointer, threshold provenance, byte-level
decomposition, MC band capture-rate, and backbone-ablation table references
across §III-F / §III-H.1 / §III-H.2 / §III-K / §III-L.4 / §III-M / §IV-F /
§IV-J / §IV-K / §IV-L / §V-C / §V-H.
- Table rendering: un-commented Tables I-IV (Dataset Summary, YOLO Detection,
Extraction Results, Cosine Distribution Statistics) which were inside HTML
comment blocks and would not have rendered in the submission.
- Table numbering out of order: Table XIX appeared before Tables XVI-XVIII.
Renumbered XIX -> XVI (document-level worst-case counts), XVI -> XVII (Firm x
K=3 cross-tab), XVII -> XVIII (K=3 component comparison), XVIII -> XIX
(Spearman correlation). Cross-references updated in §IV-J / §IV-K and §V-C.
- Table V mis-citation: §IV-C said "KDE crossover ... (Table V)" but Table V is
the dip test. Dropped the (Table V) tag; crossover is a textual finding.
- Submission cleanup: wrapped the archived Impact Statement section heading and
body inside the existing HTML comment (was rendering). Funding placeholder
wrapped in HTML comment with a TO-DO note (won't render but is preserved as
reminder).
MAJORS:
- Line 1077 numerical conflation: rewrote the §V-C / §III-L.4 paragraph that
labelled Firm A's per-document HC+MC inter-CPA proxy ICCR of 0.6201 as a rate
"on real same-CPA pools." 0.6201 is a counterfactual proxy under inter-CPA
candidate-pool replacement, not the observed rate. Added explicit disambig:
the corresponding observed rate from Table XVI (formerly XIX) is 97.5%
HC+MC for Firm A; the proxy and observed rates measure different quantities.
- Residual "validation" language softened: "Dual-descriptor verification" ->
"Dual-descriptor similarity"; "we validate the backbone choice" -> "we
support the backbone choice"; "pixel-identity validation" -> "pixel-identity
positive-anchor check"; "## M. Validation Strategy and Limitations under
Unsupervised Setting" -> "## M. Unsupervised Diagnostic Strategy and Limits".
- "Specificity behaviour" overclaim: "characterises the cosine threshold's
specificity behaviour" -> "specificity-proxy behaviour" (methodology §III-L.0
and discussion §V-F).
- "Prior published / prior calibration" ambiguity: replaced "prior published
per-comparison rate" with "the corpus-wide rate reported in §IV-I"; replaced
"(prior published operating point)" with "(alternative operating point from
supplementary calibration evidence)" in Tables XXI; replaced "prior reporting
and the existing literature" with "the existing literature and the
supplementary calibration evidence."
MINORS:
- Line 116 Bayes-optimal qualifier: "the local density minimum ... is the
Bayes-optimal decision boundary under equal priors" -> "In idealized
two-class mixture settings with equal priors and equal misclassification
costs, the local density minimum ... coincides with the Bayes-optimal
decision boundary."
- Stale section refs: §V-G for the fine-tuning caveat retargeted to §V-H
Engineering-level caveats (where it lives after the §V-H reorganisation);
§III-L for the worst-case rule retargeted to §III-H.1; "Section IV-D.2"
(nonexistent) retargeted to "Section IV-D Table VI."
- Abstract / Introduction "after pool-size adjustment": separated the
document-level D2 proxy ICCR claim from the per-signature logistic regression
claim. Now: "Per-document D2 inter-CPA proxy ICCRs differ by an order of
magnitude across firms ... a per-signature logistic regression confirms the
firm gap persists after pool-size control."
NIT:
- Related Work HTML comment "(see paper_a_references_v3.md for full list)"
-> "(full list in the References section)"; removes the version-coded
filename reference from the source.
Artefacts:
- Combined manuscript regenerated: paper_a_v4_combined.md, 1312 lines.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.1 KiB
Abstract
Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes it feasible to reuse a stored signature image across reports — through administrative stamping or firm-level electronic signing — thereby undermining individualized attestation. We build an end-to-end pipeline for screening such non-hand-signed signatures at scale: a Vision-Language Model identifies signature pages, YOLOv11 localizes signatures, ResNet-50 supplies deep features, and a dual-descriptor layer combines cosine similarity with an independent-minimum perceptual hash (dHash) to separate style consistency from image reproduction. Applied to 90,282 Taiwan audit reports (2013–2023), the pipeline yields 182,328 signatures from 758 CPAs; primary analyses are scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures). Distributional diagnostics show that the apparent multimodality of the descriptor distribution dissolves under joint firm-mean centring and integer-tie jitter (p rises to 0.35), so no within-population bimodal antimode anchors the operational thresholds. We instead adopt an anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units: per-comparison (0.0006 at cos$>0.95$; 0.0013 at dHash$\leq 5$; 0.00014 jointly), pool-normalised per-signature (0.11 under the deployed any-pair high-confidence rule), and per-document (0.34 for the operational HC+MC alarm). Firm heterogeneity is decisive: Firm A's per-document HC+MC inter-CPA proxy ICCR is 0.62 versus $0.09$–0.16 at Firms B/C/D, and a per-signature logistic regression confirms the firm gap persists after controlling for pool size; under the deployed any-pair rule $77$–99\% of inter-CPA collisions concentrate within the source firm — consistent with firm-level template-like reuse. We position the system as a specificity-proxy-anchored screening framework with human-in-the-loop review, not as a validated forensic detector; no calibrated error rates are reportable without signature-level ground truth.