pdf_signature_extraction/paper/codex_review_gpt55_v4_round_3axis.md at paper-a-v4-big4

Files

T

gbanyan becce857e1 Phase 6 round-7 codex 3-axis review fixes: 11 MAJOR + 5 MINOR

Codex GPT-5.5 3-axis peer review (paper/codex_review_gpt55_v4_round_3axis.md)
identified 11 MAJOR + 5 MINOR + 0 BLOCKER on three axes: (1) abstract/body
tone consistency, (2) methodology clarity / v3 residue, (3) no implicit
within-CPA or cross-year signature-consistency assumptions. 13 patches
applied across 4 source files; mirrored in paper_a_v4_combined.md.

Axis 1 (tone consistency between abstract and body):
- S I L33: "resolves the ambiguity" -> "provides complementary evidence
  for screening cases where ... hypotheses diverge"
- S I L35: "disproves the distributional-threshold path" -> "does not
  support the distributional-threshold path"
- S I L37 / S V-F L29: "characterise the deployed five-way classifier
  at three units" -> "characterise the deployed HC sub-rule and
  document-level HC+MC alarm derived from the five-way classifier at
  three units" (consistent with S V-H which says only HC sub-rule and
  HC+MC alarm are re-characterised by the present ICCR battery)
- S I L39 / S V-C / S III-L.4: "consistent with firm-specific template,
  stamp, or document-production reuse mechanisms" -> "consistent with --
  but does not independently establish -- firm-level template-like
  reuse, digitisation-pipeline homogeneity, or signing-style
  homogeneity, which descriptor-only data cannot separate (S V-H)"
  (mirrors abstract)

Axis 2 (methodology clarity / v3 residue):
- S III-G: added unit-bridge sentence distinguishing "descriptor-summary
  units" (signature/accountant) from "operational reporting units"
  (per-comparison/per-signature/per-document, S III-L)
- S III-H.2: "The calibration distinguishes two reference populations"
  -> "The supporting diagnostics use two reference populations" with
  explicit "neither is the calibration anchor"
- S III-L.1: "specificity" -> "ICCR refinement"
- S III-L.2: added "descriptive intuition, not an independence
  assumption used for estimation" caveat after the 1-(1-p)^n form

Axis 3 (no implicit signature-consistency assumptions):
- S III-F: hand-signing motivation rewritten as working hypothesis that
  "the classifier does not require ... to hold for all CPAs"
- S III-G A1: added "A1 does not assume temporal stability of
  handwriting or scanning workflow within or across years"
- S III-H.1: added label-caveat paragraph (operational rule outputs,
  not validated ground-truth classes); HC "strong replication evidence"
  -> "image-similarity evidence consistent with replication"; HSC
  "consistent with a CPA who signs very consistently" -> "mechanism not
  resolved by descriptor data alone"; LH explicitly owns that
  cross-year handwriting drift, scanner workflow change, or template
  variant rotation can also yield low max-cosine within a same-CPA pool
- S III-L.6 / S IV-M.6: "same-CPA repeatability signal" -> "observed
  same-CPA-pool excess ... not attributed to within-CPA handwriting
  repeatability"

Deferred (structural, not single-sentence patch): codex S III-I.2 /
S III-J K=2/K=3 deduplication; codex S III-K LOOO / S III-J duplication.
Both are MINOR stylistic redundancies, not reviewer-rejection risks.

DOCX rebuilt via export_v3.py; v4.0_20260515 file refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 03:11:53 +08:00

8.2 KiB

Raw Permalink Blame History

[軸 1]

[MAJOR] §I Contributions, L42
原句：「resolves the ambiguity between style consistency and image reproduction」
問題：這比摘要語氣強。descriptor-only framework 不能真正「解開」style consistency 與 image reproduction 的機制歸因，§V-H 也說不能分離。
修改建議：改為「provides complementary evidence for screening cases where style consistency and image reproduction hypotheses diverge」。

[MAJOR] §III-H.1, L314
原句：「High-confidence non-hand-signed (HC)」
問題：作為 rule label 可接受，但正文與表格反覆使用時，容易讀成已驗證分類結果，而非 screening label。
修改建議：改為「High-confidence image-reuse screening label (HC)」，並在首次定義處明說「label names are operational labels, not ground-truth classes」。

[MAJOR] §III-H.1, L314
原句：「Both descriptors converge on strong replication evidence.」
問題：「strong replication evidence」過強；目前只保證兩個影像相似 descriptor 同時落入 rule box，不能保證 replication mechanism。
修改建議：改為「Both descriptors converge on image-similarity evidence consistent with replication」。

[MAJOR] §III-H.1, L318
原句：「Likely hand-signed (LH): Cosine \leq 0.837.」
問題：沒有 hand-signed ground truth，不能把 low-similarity screening bin 命名成「likely hand-signed」而不冒認 ground-truth status。
修改建議：改為「Low-replication-similarity (LRS)」或「Low-alert similarity」，保留舊縮寫可在括號說明。

[MAJOR] §V-C, L1060
原句：「similar, milder production-related reuse patterns at Firms B/C/D」
問題：這裡把 Firms B/C/D 的較溫和 within-firm collision 解讀為 production-related reuse，和 §V-H 的三機制不可分離聲明不一致。
修改建議：改為「similar, milder within-firm collision patterns, whose mechanisms may include template reuse, digitisation-pipeline homogeneity, or signing-style homogeneity」。

[MINOR] §V-F, L1074
原句：「the deployed five-way classifier is characterised at three units」
問題：§V-H L1100 說 MC/HSC 與 document worst-case rule 未被本診斷組重新 characterise；這句像是整個 five-way classifier 都完成 ICCR calibration。
修改建議：改為「the HC sub-rule and document-level alarm definitions derived from the five-way output are characterised...」。

[MINOR] §I Contributions, L44
原句：「Composition decomposition disproves the distributional-threshold path.」
問題：「disproves」語氣過硬；目前是對本資料與本診斷下不支持 natural-threshold reading。
修改建議：改為「does not support」或「rules out within the tested diagnostics」。

[軸 2]

[MAJOR] §III-G vs §III-L, L286 / L458
原句：「We analyse signatures at two units of resolution.」
問題：§III-G 說兩個 units（signature/accountant），§III-L 又說 calibration 有三個 units（per-comparison/per-signature/per-document）。讀者第一次讀會混淆「statistical summary unit」與「calibration/reporting unit」。
修改建議：在 §III-G 結尾加一個 bridge：accountant/signature 是 descriptor-summary units；§III-L 的 three units 是 ICCR reporting units。

[MAJOR] §III-H.2, L326
原句：「The calibration distinguishes two reference populations」
問題：Firm A 後文反覆說不是 calibration anchor；這句仍像 v3 殘留，讓 Firm A 看起來參與 threshold calibration。
修改建議：改為「The supporting diagnostics use two reference populations」。

[MAJOR] §III-H.1 / §III-L, L320 / L456
原句：「retain their prior calibration provenance」
問題：§III-L 說本分析不 re-derive thresholds，但標題仍叫 threshold calibration，且 §III-H.1 只在 L320 一句帶過。第一次閱讀時不夠清楚：deployed 5-way rule 是既有 rule，ICCR 是行為刻畫，不是重新最佳化。
修改建議：在 §III-H.1 後加一小段明確列出：「rule definition」「what §III-L calibrates」「what remains from supplement」。

[MAJOR] §III-I.2 / §III-J, L342 / L369
原句：「K=2 / K=3 Gaussian mixture fits」
問題：K=2/K=3 數字、BIC、解讀在 §III-I.2 與 §III-J 重複，仍有 v3 splice 的疊床架屋感。
修改建議：§III-I.2 只保留「mixture path checked and demoted」摘要，完整模型細節集中到 §III-J。

[MINOR] §III-K, L432
原句：「Leave-one-firm-out reproducibility ... Discussed in §III-J above.」
問題：LOOO 已在 §III-J 詳述，又在 §III-K 作為 internal-consistency check；分類上不自然，且增加重複。
修改建議：把 §III-K.3 改成單句 cross-reference，或移回 §III-J。

[MINOR] §III-L.1, L489
原句：「dHash provides \sim 4.3\times further per-comparison specificity」
問題：這裡漏了 proxy/disclaimer；全文已避免 FAR，但「specificity」單獨出現會弱化 ICCR 語氣。
修改建議：改為「specificity-proxy refinement」或「ICCR refinement」。

[MINOR] §III-L.2, L513
原句：「consistent with the 1 - (1 - p_{\text{pair}})^{n_{\text{pool}}} form」
問題：這是有用直覺，但 independence limit 與 within-firm violation 的關係應在同段提醒，否則會像正式模型。
修改建議：補一句「This is an intuition, not an independence assumption used for estimation」。

[軸 3]

[MAJOR] §III-F, L277
原句：「Hand-signing, by contrast, often yields high dHash similarity」
問題：這句預設同一 CPA 多次親簽時「overall layout typically preserved」，接近不應預設的個別 CPA 跨文件一致性。雖然用 often，但仍在方法動機處承擔了未驗證手寫行為。
修改建議：改為「One working hypothesis is that some hand-signed repetitions may preserve coarse layout while varying in fine execution; the classifier does not require this to hold for all CPAs」。

[MAJOR] §III-H.1, L316
原句：「consistent with a CPA who signs very consistently」
問題：HSC 被解讀成「同一 CPA 簽名很一致但非 reproduction」，這直接把高 cosine / 高 dHash 的 same-CPA pattern 歸因到個人書寫一致性。
修改建議：改為「high feature similarity without structural corroboration; mechanism unresolved」。

[MAJOR] §III-H.1, L318
原句：「Likely hand-signed」
問題：低 max-cosine 並不等於親簽；也可能是跨年度書寫變化、掃描/PDF pipeline、裁切或多 template variant。這是對「沒有高 same-CPA match」的過度解讀。
修改建議：改成 descriptor-based label，例如「low-replication-similarity」。

[MINOR] §III-G A1, L292
原句：「within the cross-year same-CPA pool」
問題：A1 本身不是年度一致性假設，但「cross-year」容易被讀成跨年度簽名應可比或應一致。
修改建議：改為「within the observed same-CPA candidate pool pooled over years; this does not assume temporal stability of handwriting or scanning」。

[MINOR] §III-L.6, L587
原句：「same-CPA repeatability signal」
問題：已加 caveat，但「repeatability」仍可能被讀成個人簽名一致性訊號。
修改建議：改為「observed same-CPA-pool excess signal, whose sources are not identifiable」。

[MINOR] §IV-M.6, L1043
原句：「interpreted as a same-CPA repeatability signal」
問題：同上，且出現在 results consolidation，容易被當成結果主張。
修改建議：改為「reported as same-CPA-pool excess under §III-M caveats, not attributed to handwriting repeatability」。

總體判讀

軸 1 verdict：大方向已和摘要一致，但仍有幾個「validated detector / mechanism attribution」味道偏重的句子，尤其是「resolves ambiguity」、HC/LH label、§V-C 對 B/C/D 的 production-related reuse。
軸 2 verdict：v3 殘留大多已被 demote，但 §III 的敘事仍偏重複；最大問題是 unit taxonomy 與 calibration/re-characterisation 範圍需要更早講清。
軸 3 verdict：沒有發現核心計算邏輯必然依賴「同一 CPA 或跨年度簽名必須一致」；但若干命名與動機句會讓讀者以為有這個假設。

是否可送 partner 最終審查：可，但建議先做一輪小修，主要是改 label/語氣與 §III roadmap。
BLOCKER：無。

8.2 KiB Raw Permalink Blame History Unescape Escape

8.2 KiB

Raw Permalink Blame History