Files
pdf_signature_extraction/paper/codex_review_gpt55_v4_round_3axis.md
gbanyan becce857e1 Phase 6 round-7 codex 3-axis review fixes: 11 MAJOR + 5 MINOR
Codex GPT-5.5 3-axis peer review (paper/codex_review_gpt55_v4_round_3axis.md)
identified 11 MAJOR + 5 MINOR + 0 BLOCKER on three axes: (1) abstract/body
tone consistency, (2) methodology clarity / v3 residue, (3) no implicit
within-CPA or cross-year signature-consistency assumptions. 13 patches
applied across 4 source files; mirrored in paper_a_v4_combined.md.

Axis 1 (tone consistency between abstract and body):
- S I L33: "resolves the ambiguity" -> "provides complementary evidence
  for screening cases where ... hypotheses diverge"
- S I L35: "disproves the distributional-threshold path" -> "does not
  support the distributional-threshold path"
- S I L37 / S V-F L29: "characterise the deployed five-way classifier
  at three units" -> "characterise the deployed HC sub-rule and
  document-level HC+MC alarm derived from the five-way classifier at
  three units" (consistent with S V-H which says only HC sub-rule and
  HC+MC alarm are re-characterised by the present ICCR battery)
- S I L39 / S V-C / S III-L.4: "consistent with firm-specific template,
  stamp, or document-production reuse mechanisms" -> "consistent with --
  but does not independently establish -- firm-level template-like
  reuse, digitisation-pipeline homogeneity, or signing-style
  homogeneity, which descriptor-only data cannot separate (S V-H)"
  (mirrors abstract)

Axis 2 (methodology clarity / v3 residue):
- S III-G: added unit-bridge sentence distinguishing "descriptor-summary
  units" (signature/accountant) from "operational reporting units"
  (per-comparison/per-signature/per-document, S III-L)
- S III-H.2: "The calibration distinguishes two reference populations"
  -> "The supporting diagnostics use two reference populations" with
  explicit "neither is the calibration anchor"
- S III-L.1: "specificity" -> "ICCR refinement"
- S III-L.2: added "descriptive intuition, not an independence
  assumption used for estimation" caveat after the 1-(1-p)^n form

Axis 3 (no implicit signature-consistency assumptions):
- S III-F: hand-signing motivation rewritten as working hypothesis that
  "the classifier does not require ... to hold for all CPAs"
- S III-G A1: added "A1 does not assume temporal stability of
  handwriting or scanning workflow within or across years"
- S III-H.1: added label-caveat paragraph (operational rule outputs,
  not validated ground-truth classes); HC "strong replication evidence"
  -> "image-similarity evidence consistent with replication"; HSC
  "consistent with a CPA who signs very consistently" -> "mechanism not
  resolved by descriptor data alone"; LH explicitly owns that
  cross-year handwriting drift, scanner workflow change, or template
  variant rotation can also yield low max-cosine within a same-CPA pool
- S III-L.6 / S IV-M.6: "same-CPA repeatability signal" -> "observed
  same-CPA-pool excess ... not attributed to within-CPA handwriting
  repeatability"

Deferred (structural, not single-sentence patch): codex S III-I.2 /
S III-J K=2/K=3 deduplication; codex S III-K LOOO / S III-J duplication.
Both are MINOR stylistic redundancies, not reviewer-rejection risks.

DOCX rebuilt via export_v3.py; v4.0_20260515 file refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 03:11:53 +08:00

8.2 KiB
Raw Permalink Blame History

[軸 1]

[MAJOR] §I Contributions, L42
原句:「resolves the ambiguity between style consistency and image reproduction
問題:這比摘要語氣強。descriptor-only framework 不能真正「解開」style consistency 與 image reproduction 的機制歸因,§V-H 也說不能分離。
修改建議:改為「provides complementary evidence for screening cases where style consistency and image reproduction hypotheses diverge」。

[MAJOR] §III-H.1, L314
原句:「High-confidence non-hand-signed (HC)」
問題:作為 rule label 可接受,但正文與表格反覆使用時,容易讀成已驗證分類結果,而非 screening label。
修改建議:改為「High-confidence image-reuse screening label (HC)」,並在首次定義處明說「label names are operational labels, not ground-truth classes」。

[MAJOR] §III-H.1, L314
原句:「Both descriptors converge on strong replication evidence.」
問題:「strong replication evidence」過強;目前只保證兩個影像相似 descriptor 同時落入 rule box,不能保證 replication mechanism。
修改建議:改為「Both descriptors converge on image-similarity evidence consistent with replication」。

[MAJOR] §III-H.1, L318
原句:「Likely hand-signed (LH): Cosine \leq 0.837.」
問題:沒有 hand-signed ground truth,不能把 low-similarity screening bin 命名成「likely hand-signed」而不冒認 ground-truth status。
修改建議:改為「Low-replication-similarity (LRS)」或「Low-alert similarity」,保留舊縮寫可在括號說明。

[MAJOR] §V-C, L1060
原句:「similar, milder production-related reuse patterns at Firms B/C/D」
問題:這裡把 Firms B/C/D 的較溫和 within-firm collision 解讀為 production-related reuse,和 §V-H 的三機制不可分離聲明不一致。
修改建議:改為「similar, milder within-firm collision patterns, whose mechanisms may include template reuse, digitisation-pipeline homogeneity, or signing-style homogeneity」。

[MINOR] §V-F, L1074
原句:「the deployed five-way classifier is characterised at three units」
問題:§V-H L1100 說 MC/HSC 與 document worst-case rule 未被本診斷組重新 characterise;這句像是整個 five-way classifier 都完成 ICCR calibration。
修改建議:改為「the HC sub-rule and document-level alarm definitions derived from the five-way output are characterised...」。

[MINOR] §I Contributions, L44
原句:「Composition decomposition disproves the distributional-threshold path.」
問題:「disproves」語氣過硬;目前是對本資料與本診斷下不支持 natural-threshold reading。
修改建議:改為「does not support」或「rules out within the tested diagnostics」。

[軸 2]

[MAJOR] §III-G vs §III-L, L286 / L458
原句:「We analyse signatures at two units of resolution.」
問題:§III-G 說兩個 unitssignature/accountant),§III-L 又說 calibration 有三個 unitsper-comparison/per-signature/per-document)。讀者第一次讀會混淆「statistical summary unit」與「calibration/reporting unit」。
修改建議:在 §III-G 結尾加一個 bridgeaccountant/signature 是 descriptor-summary units;§III-L 的 three units 是 ICCR reporting units。

[MAJOR] §III-H.2, L326
原句:「The calibration distinguishes two reference populations」
問題:Firm A 後文反覆說不是 calibration anchor;這句仍像 v3 殘留,讓 Firm A 看起來參與 threshold calibration。
修改建議:改為「The supporting diagnostics use two reference populations」。

[MAJOR] §III-H.1 / §III-L, L320 / L456
原句:「retain their prior calibration provenance」
問題:§III-L 說本分析不 re-derive thresholds,但標題仍叫 threshold calibration,且 §III-H.1 只在 L320 一句帶過。第一次閱讀時不夠清楚:deployed 5-way rule 是既有 rule,ICCR 是行為刻畫,不是重新最佳化。
修改建議:在 §III-H.1 後加一小段明確列出:「rule definition」「what §III-L calibrates」「what remains from supplement」。

[MAJOR] §III-I.2 / §III-J, L342 / L369
原句:「K=2 / K=3 Gaussian mixture fits」
問題:K=2/K=3 數字、BIC、解讀在 §III-I.2 與 §III-J 重複,仍有 v3 splice 的疊床架屋感。
修改建議:§III-I.2 只保留「mixture path checked and demoted」摘要,完整模型細節集中到 §III-J。

[MINOR] §III-K, L432
原句:「Leave-one-firm-out reproducibility ... Discussed in §III-J above.」
問題:LOOO 已在 §III-J 詳述,又在 §III-K 作為 internal-consistency check;分類上不自然,且增加重複。
修改建議:把 §III-K.3 改成單句 cross-reference,或移回 §III-J。

[MINOR] §III-L.1, L489
原句:「dHash provides \sim 4.3\times further per-comparison specificity」
問題:這裡漏了 proxy/disclaimer;全文已避免 FAR,但「specificity」單獨出現會弱化 ICCR 語氣。
修改建議:改為「specificity-proxy refinement」或「ICCR refinement」。

[MINOR] §III-L.2, L513
原句:「consistent with the 1 - (1 - p_{\text{pair}})^{n_{\text{pool}}} form」
問題:這是有用直覺,但 independence limit 與 within-firm violation 的關係應在同段提醒,否則會像正式模型。
修改建議:補一句「This is an intuition, not an independence assumption used for estimation」。

[軸 3]

[MAJOR] §III-F, L277
原句:「Hand-signing, by contrast, often yields high dHash similarity」
問題:這句預設同一 CPA 多次親簽時「overall layout typically preserved」,接近不應預設的個別 CPA 跨文件一致性。雖然用 often,但仍在方法動機處承擔了未驗證手寫行為。
修改建議:改為「One working hypothesis is that some hand-signed repetitions may preserve coarse layout while varying in fine execution; the classifier does not require this to hold for all CPAs」。

[MAJOR] §III-H.1, L316
原句:「consistent with a CPA who signs very consistently」
問題:HSC 被解讀成「同一 CPA 簽名很一致但非 reproduction」,這直接把高 cosine / 高 dHash 的 same-CPA pattern 歸因到個人書寫一致性。
修改建議:改為「high feature similarity without structural corroboration; mechanism unresolved」。

[MAJOR] §III-H.1, L318
原句:「Likely hand-signed」
問題:低 max-cosine 並不等於親簽;也可能是跨年度書寫變化、掃描/PDF pipeline、裁切或多 template variant。這是對「沒有高 same-CPA match」的過度解讀。
修改建議:改成 descriptor-based label,例如「low-replication-similarity」。

[MINOR] §III-G A1, L292
原句:「within the cross-year same-CPA pool」
問題:A1 本身不是年度一致性假設,但「cross-year」容易被讀成跨年度簽名應可比或應一致。
修改建議:改為「within the observed same-CPA candidate pool pooled over years; this does not assume temporal stability of handwriting or scanning」。

[MINOR] §III-L.6, L587
原句:「same-CPA repeatability signal」
問題:已加 caveat,但「repeatability」仍可能被讀成個人簽名一致性訊號。
修改建議:改為「observed same-CPA-pool excess signal, whose sources are not identifiable」。

[MINOR] §IV-M.6, L1043
原句:「interpreted as a same-CPA repeatability signal」
問題:同上,且出現在 results consolidation,容易被當成結果主張。
修改建議:改為「reported as same-CPA-pool excess under §III-M caveats, not attributed to handwriting repeatability」。

總體判讀

軸 1 verdict:大方向已和摘要一致,但仍有幾個「validated detector / mechanism attribution」味道偏重的句子,尤其是「resolves ambiguity」、HC/LH label、§V-C 對 B/C/D 的 production-related reuse。
軸 2 verdictv3 殘留大多已被 demote,但 §III 的敘事仍偏重複;最大問題是 unit taxonomy 與 calibration/re-characterisation 範圍需要更早講清。
軸 3 verdict:沒有發現核心計算邏輯必然依賴「同一 CPA 或跨年度簽名必須一致」;但若干命名與動機句會讓讀者以為有這個假設。

是否可送 partner 最終審查:可,但建議先做一輪小修,主要是改 label/語氣與 §III roadmap。
BLOCKER:無。