Phase 6 round-7 codex 3-axis review fixes: 11 MAJOR + 5 MINOR

Codex GPT-5.5 3-axis peer review (paper/codex_review_gpt55_v4_round_3axis.md) identified 11 MAJOR + 5 MINOR + 0 BLOCKER on three axes: (1) abstract/body tone consistency, (2) methodology clarity / v3 residue, (3) no implicit within-CPA or cross-year signature-consistency assumptions. 13 patches applied across 4 source files; mirrored in paper_a_v4_combined.md. Axis 1 (tone consistency between abstract and body): - S I L33: "resolves the ambiguity" -> "provides complementary evidence for screening cases where ... hypotheses diverge" - S I L35: "disproves the distributional-threshold path" -> "does not support the distributional-threshold path" - S I L37 / S V-F L29: "characterise the deployed five-way classifier at three units" -> "characterise the deployed HC sub-rule and document-level HC+MC alarm derived from the five-way classifier at three units" (consistent with S V-H which says only HC sub-rule and HC+MC alarm are re-characterised by the present ICCR battery) - S I L39 / S V-C / S III-L.4: "consistent with firm-specific template, stamp, or document-production reuse mechanisms" -> "consistent with -- but does not independently establish -- firm-level template-like reuse, digitisation-pipeline homogeneity, or signing-style homogeneity, which descriptor-only data cannot separate (S V-H)" (mirrors abstract) Axis 2 (methodology clarity / v3 residue): - S III-G: added unit-bridge sentence distinguishing "descriptor-summary units" (signature/accountant) from "operational reporting units" (per-comparison/per-signature/per-document, S III-L) - S III-H.2: "The calibration distinguishes two reference populations" -> "The supporting diagnostics use two reference populations" with explicit "neither is the calibration anchor" - S III-L.1: "specificity" -> "ICCR refinement" - S III-L.2: added "descriptive intuition, not an independence assumption used for estimation" caveat after the 1-(1-p)^n form Axis 3 (no implicit signature-consistency assumptions): - S III-F: hand-signing motivation rewritten as working hypothesis that "the classifier does not require ... to hold for all CPAs" - S III-G A1: added "A1 does not assume temporal stability of handwriting or scanning workflow within or across years" - S III-H.1: added label-caveat paragraph (operational rule outputs, not validated ground-truth classes); HC "strong replication evidence" -> "image-similarity evidence consistent with replication"; HSC "consistent with a CPA who signs very consistently" -> "mechanism not resolved by descriptor data alone"; LH explicitly owns that cross-year handwriting drift, scanner workflow change, or template variant rotation can also yield low max-cosine within a same-CPA pool - S III-L.6 / S IV-M.6: "same-CPA repeatability signal" -> "observed same-CPA-pool excess ... not attributed to within-CPA handwriting repeatability" Deferred (structural, not single-sentence patch): codex S III-I.2 / S III-J K=2/K=3 deduplication; codex S III-K LOOO / S III-J duplication. Both are MINOR stylistic redundancies, not reviewer-rejection risks. DOCX rebuilt via export_v3.py; v4.0_20260515 file refreshed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 03:11:53 +08:00
parent 3672c9343e
commit becce857e1
8 changed files with 154 additions and 40 deletions
@@ -0,0 +1,114 @@
+[軸 1]
+
+[MAJOR] §I Contributions, L42  
+原句：「resolves the ambiguity between *style consistency* and *image reproduction*」  
+問題：這比摘要語氣強。descriptor-only framework 不能真正「解開」style consistency 與 image reproduction 的機制歸因，§V-H 也說不能分離。  
+修改建議：改為「provides complementary evidence for screening cases where style consistency and image reproduction hypotheses diverge」。
+
+[MAJOR] §III-H.1, L314  
+原句：「High-confidence non-hand-signed (HC)」  
+問題：作為 rule label 可接受，但正文與表格反覆使用時，容易讀成已驗證分類結果，而非 screening label。  
+修改建議：改為「High-confidence image-reuse screening label (HC)」，並在首次定義處明說「label names are operational labels, not ground-truth classes」。
+
+[MAJOR] §III-H.1, L314  
+原句：「Both descriptors converge on strong replication evidence.」  
+問題：「strong replication evidence」過強；目前只保證兩個影像相似 descriptor 同時落入 rule box，不能保證 replication mechanism。  
+修改建議：改為「Both descriptors converge on image-similarity evidence consistent with replication」。
+
+[MAJOR] §III-H.1, L318  
+原句：「Likely hand-signed (LH): Cosine $\leq 0.837$.」  
+問題：沒有 hand-signed ground truth，不能把 low-similarity screening bin 命名成「likely hand-signed」而不冒認 ground-truth status。  
+修改建議：改為「Low-replication-similarity (LRS)」或「Low-alert similarity」，保留舊縮寫可在括號說明。
+
+[MAJOR] §V-C, L1060  
+原句：「similar, milder production-related reuse patterns at Firms B/C/D」  
+問題：這裡把 Firms B/C/D 的較溫和 within-firm collision 解讀為 production-related reuse，和 §V-H 的三機制不可分離聲明不一致。  
+修改建議：改為「similar, milder within-firm collision patterns, whose mechanisms may include template reuse, digitisation-pipeline homogeneity, or signing-style homogeneity」。
+
+[MINOR] §V-F, L1074  
+原句：「the deployed five-way classifier is characterised at three units」  
+問題：§V-H L1100 說 MC/HSC 與 document worst-case rule 未被本診斷組重新 characterise；這句像是整個 five-way classifier 都完成 ICCR calibration。  
+修改建議：改為「the HC sub-rule and document-level alarm definitions derived from the five-way output are characterised...」。
+
+[MINOR] §I Contributions, L44  
+原句：「Composition decomposition disproves the distributional-threshold path.」  
+問題：「disproves」語氣過硬；目前是對本資料與本診斷下不支持 natural-threshold reading。  
+修改建議：改為「does not support」或「rules out within the tested diagnostics」。
+
+[軸 2]
+
+[MAJOR] §III-G vs §III-L, L286 / L458  
+原句：「We analyse signatures at two units of resolution.」  
+問題：§III-G 說兩個 units（signature/accountant），§III-L 又說 calibration 有三個 units（per-comparison/per-signature/per-document）。讀者第一次讀會混淆「statistical summary unit」與「calibration/reporting unit」。  
+修改建議：在 §III-G 結尾加一個 bridge：accountant/signature 是 descriptor-summary units；§III-L 的 three units 是 ICCR reporting units。
+
+[MAJOR] §III-H.2, L326  
+原句：「The calibration distinguishes two reference populations」  
+問題：Firm A 後文反覆說不是 calibration anchor；這句仍像 v3 殘留，讓 Firm A 看起來參與 threshold calibration。  
+修改建議：改為「The supporting diagnostics use two reference populations」。
+
+[MAJOR] §III-H.1 / §III-L, L320 / L456  
+原句：「retain their prior calibration provenance」  
+問題：§III-L 說本分析不 re-derive thresholds，但標題仍叫 threshold calibration，且 §III-H.1 只在 L320 一句帶過。第一次閱讀時不夠清楚：deployed 5-way rule 是既有 rule，ICCR 是行為刻畫，不是重新最佳化。  
+修改建議：在 §III-H.1 後加一小段明確列出：「rule definition」「what §III-L calibrates」「what remains from supplement」。
+
+[MAJOR] §III-I.2 / §III-J, L342 / L369  
+原句：「K=2 / K=3 Gaussian mixture fits」  
+問題：K=2/K=3 數字、BIC、解讀在 §III-I.2 與 §III-J 重複，仍有 v3 splice 的疊床架屋感。  
+修改建議：§III-I.2 只保留「mixture path checked and demoted」摘要，完整模型細節集中到 §III-J。
+
+[MINOR] §III-K, L432  
+原句：「Leave-one-firm-out reproducibility ... Discussed in §III-J above.」  
+問題：LOOO 已在 §III-J 詳述，又在 §III-K 作為 internal-consistency check；分類上不自然，且增加重複。  
+修改建議：把 §III-K.3 改成單句 cross-reference，或移回 §III-J。
+
+[MINOR] §III-L.1, L489  
+原句：「dHash provides $\sim 4.3\times$ further per-comparison specificity」  
+問題：這裡漏了 proxy/disclaimer；全文已避免 FAR，但「specificity」單獨出現會弱化 ICCR 語氣。  
+修改建議：改為「specificity-proxy refinement」或「ICCR refinement」。
+
+[MINOR] §III-L.2, L513  
+原句：「consistent with the $1 - (1 - p_{\text{pair}})^{n_{\text{pool}}}$ form」  
+問題：這是有用直覺，但 independence limit 與 within-firm violation 的關係應在同段提醒，否則會像正式模型。  
+修改建議：補一句「This is an intuition, not an independence assumption used for estimation」。
+
+[軸 3]
+
+[MAJOR] §III-F, L277  
+原句：「Hand-signing, by contrast, often yields high dHash similarity」  
+問題：這句預設同一 CPA 多次親簽時「overall layout typically preserved」，接近不應預設的個別 CPA 跨文件一致性。雖然用 often，但仍在方法動機處承擔了未驗證手寫行為。  
+修改建議：改為「One working hypothesis is that some hand-signed repetitions may preserve coarse layout while varying in fine execution; the classifier does not require this to hold for all CPAs」。
+
+[MAJOR] §III-H.1, L316  
+原句：「consistent with a CPA who signs very consistently」  
+問題：HSC 被解讀成「同一 CPA 簽名很一致但非 reproduction」，這直接把高 cosine / 高 dHash 的 same-CPA pattern 歸因到個人書寫一致性。  
+修改建議：改為「high feature similarity without structural corroboration; mechanism unresolved」。
+
+[MAJOR] §III-H.1, L318  
+原句：「Likely hand-signed」  
+問題：低 max-cosine 並不等於親簽；也可能是跨年度書寫變化、掃描/PDF pipeline、裁切或多 template variant。這是對「沒有高 same-CPA match」的過度解讀。  
+修改建議：改成 descriptor-based label，例如「low-replication-similarity」。
+
+[MINOR] §III-G A1, L292  
+原句：「within the cross-year same-CPA pool」  
+問題：A1 本身不是年度一致性假設，但「cross-year」容易被讀成跨年度簽名應可比或應一致。  
+修改建議：改為「within the observed same-CPA candidate pool pooled over years; this does not assume temporal stability of handwriting or scanning」。
+
+[MINOR] §III-L.6, L587  
+原句：「same-CPA repeatability signal」  
+問題：已加 caveat，但「repeatability」仍可能被讀成個人簽名一致性訊號。  
+修改建議：改為「observed same-CPA-pool excess signal, whose sources are not identifiable」。
+
+[MINOR] §IV-M.6, L1043  
+原句：「interpreted as a same-CPA repeatability signal」  
+問題：同上，且出現在 results consolidation，容易被當成結果主張。  
+修改建議：改為「reported as same-CPA-pool excess under §III-M caveats, not attributed to handwriting repeatability」。
+
+總體判讀
+
+軸 1 verdict：大方向已和摘要一致，但仍有幾個「validated detector / mechanism attribution」味道偏重的句子，尤其是「resolves ambiguity」、HC/LH label、§V-C 對 B/C/D 的 production-related reuse。  
+軸 2 verdict：v3 殘留大多已被 demote，但 §III 的敘事仍偏重複；最大問題是 unit taxonomy 與 calibration/re-characterisation 範圍需要更早講清。  
+軸 3 verdict：沒有發現核心計算邏輯必然依賴「同一 CPA 或跨年度簽名必須一致」；但若干命名與動機句會讓讀者以為有這個假設。
+
+是否可送 partner 最終審查：可，但建議先做一輪小修，主要是改 label/語氣與 §III roadmap。  
+BLOCKER：無。