Phase 6 round-7 codex 3-axis review fixes: 11 MAJOR + 5 MINOR
Codex GPT-5.5 3-axis peer review (paper/codex_review_gpt55_v4_round_3axis.md) identified 11 MAJOR + 5 MINOR + 0 BLOCKER on three axes: (1) abstract/body tone consistency, (2) methodology clarity / v3 residue, (3) no implicit within-CPA or cross-year signature-consistency assumptions. 13 patches applied across 4 source files; mirrored in paper_a_v4_combined.md. Axis 1 (tone consistency between abstract and body): - S I L33: "resolves the ambiguity" -> "provides complementary evidence for screening cases where ... hypotheses diverge" - S I L35: "disproves the distributional-threshold path" -> "does not support the distributional-threshold path" - S I L37 / S V-F L29: "characterise the deployed five-way classifier at three units" -> "characterise the deployed HC sub-rule and document-level HC+MC alarm derived from the five-way classifier at three units" (consistent with S V-H which says only HC sub-rule and HC+MC alarm are re-characterised by the present ICCR battery) - S I L39 / S V-C / S III-L.4: "consistent with firm-specific template, stamp, or document-production reuse mechanisms" -> "consistent with -- but does not independently establish -- firm-level template-like reuse, digitisation-pipeline homogeneity, or signing-style homogeneity, which descriptor-only data cannot separate (S V-H)" (mirrors abstract) Axis 2 (methodology clarity / v3 residue): - S III-G: added unit-bridge sentence distinguishing "descriptor-summary units" (signature/accountant) from "operational reporting units" (per-comparison/per-signature/per-document, S III-L) - S III-H.2: "The calibration distinguishes two reference populations" -> "The supporting diagnostics use two reference populations" with explicit "neither is the calibration anchor" - S III-L.1: "specificity" -> "ICCR refinement" - S III-L.2: added "descriptive intuition, not an independence assumption used for estimation" caveat after the 1-(1-p)^n form Axis 3 (no implicit signature-consistency assumptions): - S III-F: hand-signing motivation rewritten as working hypothesis that "the classifier does not require ... to hold for all CPAs" - S III-G A1: added "A1 does not assume temporal stability of handwriting or scanning workflow within or across years" - S III-H.1: added label-caveat paragraph (operational rule outputs, not validated ground-truth classes); HC "strong replication evidence" -> "image-similarity evidence consistent with replication"; HSC "consistent with a CPA who signs very consistently" -> "mechanism not resolved by descriptor data alone"; LH explicitly owns that cross-year handwriting drift, scanner workflow change, or template variant rotation can also yield low max-cosine within a same-CPA pool - S III-L.6 / S IV-M.6: "same-CPA repeatability signal" -> "observed same-CPA-pool excess ... not attributed to within-CPA handwriting repeatability" Deferred (structural, not single-sentence patch): codex S III-I.2 / S III-J K=2/K=3 deduplication; codex S III-K LOOO / S III-J duplication. Both are MINOR stylistic redundancies, not reviewer-rejection risks. DOCX rebuilt via export_v3.py; v4.0_20260515 file refreshed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,114 @@
|
||||
[軸 1]
|
||||
|
||||
[MAJOR] §I Contributions, L42
|
||||
原句:「resolves the ambiguity between *style consistency* and *image reproduction*」
|
||||
問題:這比摘要語氣強。descriptor-only framework 不能真正「解開」style consistency 與 image reproduction 的機制歸因,§V-H 也說不能分離。
|
||||
修改建議:改為「provides complementary evidence for screening cases where style consistency and image reproduction hypotheses diverge」。
|
||||
|
||||
[MAJOR] §III-H.1, L314
|
||||
原句:「High-confidence non-hand-signed (HC)」
|
||||
問題:作為 rule label 可接受,但正文與表格反覆使用時,容易讀成已驗證分類結果,而非 screening label。
|
||||
修改建議:改為「High-confidence image-reuse screening label (HC)」,並在首次定義處明說「label names are operational labels, not ground-truth classes」。
|
||||
|
||||
[MAJOR] §III-H.1, L314
|
||||
原句:「Both descriptors converge on strong replication evidence.」
|
||||
問題:「strong replication evidence」過強;目前只保證兩個影像相似 descriptor 同時落入 rule box,不能保證 replication mechanism。
|
||||
修改建議:改為「Both descriptors converge on image-similarity evidence consistent with replication」。
|
||||
|
||||
[MAJOR] §III-H.1, L318
|
||||
原句:「Likely hand-signed (LH): Cosine $\leq 0.837$.」
|
||||
問題:沒有 hand-signed ground truth,不能把 low-similarity screening bin 命名成「likely hand-signed」而不冒認 ground-truth status。
|
||||
修改建議:改為「Low-replication-similarity (LRS)」或「Low-alert similarity」,保留舊縮寫可在括號說明。
|
||||
|
||||
[MAJOR] §V-C, L1060
|
||||
原句:「similar, milder production-related reuse patterns at Firms B/C/D」
|
||||
問題:這裡把 Firms B/C/D 的較溫和 within-firm collision 解讀為 production-related reuse,和 §V-H 的三機制不可分離聲明不一致。
|
||||
修改建議:改為「similar, milder within-firm collision patterns, whose mechanisms may include template reuse, digitisation-pipeline homogeneity, or signing-style homogeneity」。
|
||||
|
||||
[MINOR] §V-F, L1074
|
||||
原句:「the deployed five-way classifier is characterised at three units」
|
||||
問題:§V-H L1100 說 MC/HSC 與 document worst-case rule 未被本診斷組重新 characterise;這句像是整個 five-way classifier 都完成 ICCR calibration。
|
||||
修改建議:改為「the HC sub-rule and document-level alarm definitions derived from the five-way output are characterised...」。
|
||||
|
||||
[MINOR] §I Contributions, L44
|
||||
原句:「Composition decomposition disproves the distributional-threshold path.」
|
||||
問題:「disproves」語氣過硬;目前是對本資料與本診斷下不支持 natural-threshold reading。
|
||||
修改建議:改為「does not support」或「rules out within the tested diagnostics」。
|
||||
|
||||
[軸 2]
|
||||
|
||||
[MAJOR] §III-G vs §III-L, L286 / L458
|
||||
原句:「We analyse signatures at two units of resolution.」
|
||||
問題:§III-G 說兩個 units(signature/accountant),§III-L 又說 calibration 有三個 units(per-comparison/per-signature/per-document)。讀者第一次讀會混淆「statistical summary unit」與「calibration/reporting unit」。
|
||||
修改建議:在 §III-G 結尾加一個 bridge:accountant/signature 是 descriptor-summary units;§III-L 的 three units 是 ICCR reporting units。
|
||||
|
||||
[MAJOR] §III-H.2, L326
|
||||
原句:「The calibration distinguishes two reference populations」
|
||||
問題:Firm A 後文反覆說不是 calibration anchor;這句仍像 v3 殘留,讓 Firm A 看起來參與 threshold calibration。
|
||||
修改建議:改為「The supporting diagnostics use two reference populations」。
|
||||
|
||||
[MAJOR] §III-H.1 / §III-L, L320 / L456
|
||||
原句:「retain their prior calibration provenance」
|
||||
問題:§III-L 說本分析不 re-derive thresholds,但標題仍叫 threshold calibration,且 §III-H.1 只在 L320 一句帶過。第一次閱讀時不夠清楚:deployed 5-way rule 是既有 rule,ICCR 是行為刻畫,不是重新最佳化。
|
||||
修改建議:在 §III-H.1 後加一小段明確列出:「rule definition」「what §III-L calibrates」「what remains from supplement」。
|
||||
|
||||
[MAJOR] §III-I.2 / §III-J, L342 / L369
|
||||
原句:「K=2 / K=3 Gaussian mixture fits」
|
||||
問題:K=2/K=3 數字、BIC、解讀在 §III-I.2 與 §III-J 重複,仍有 v3 splice 的疊床架屋感。
|
||||
修改建議:§III-I.2 只保留「mixture path checked and demoted」摘要,完整模型細節集中到 §III-J。
|
||||
|
||||
[MINOR] §III-K, L432
|
||||
原句:「Leave-one-firm-out reproducibility ... Discussed in §III-J above.」
|
||||
問題:LOOO 已在 §III-J 詳述,又在 §III-K 作為 internal-consistency check;分類上不自然,且增加重複。
|
||||
修改建議:把 §III-K.3 改成單句 cross-reference,或移回 §III-J。
|
||||
|
||||
[MINOR] §III-L.1, L489
|
||||
原句:「dHash provides $\sim 4.3\times$ further per-comparison specificity」
|
||||
問題:這裡漏了 proxy/disclaimer;全文已避免 FAR,但「specificity」單獨出現會弱化 ICCR 語氣。
|
||||
修改建議:改為「specificity-proxy refinement」或「ICCR refinement」。
|
||||
|
||||
[MINOR] §III-L.2, L513
|
||||
原句:「consistent with the $1 - (1 - p_{\text{pair}})^{n_{\text{pool}}}$ form」
|
||||
問題:這是有用直覺,但 independence limit 與 within-firm violation 的關係應在同段提醒,否則會像正式模型。
|
||||
修改建議:補一句「This is an intuition, not an independence assumption used for estimation」。
|
||||
|
||||
[軸 3]
|
||||
|
||||
[MAJOR] §III-F, L277
|
||||
原句:「Hand-signing, by contrast, often yields high dHash similarity」
|
||||
問題:這句預設同一 CPA 多次親簽時「overall layout typically preserved」,接近不應預設的個別 CPA 跨文件一致性。雖然用 often,但仍在方法動機處承擔了未驗證手寫行為。
|
||||
修改建議:改為「One working hypothesis is that some hand-signed repetitions may preserve coarse layout while varying in fine execution; the classifier does not require this to hold for all CPAs」。
|
||||
|
||||
[MAJOR] §III-H.1, L316
|
||||
原句:「consistent with a CPA who signs very consistently」
|
||||
問題:HSC 被解讀成「同一 CPA 簽名很一致但非 reproduction」,這直接把高 cosine / 高 dHash 的 same-CPA pattern 歸因到個人書寫一致性。
|
||||
修改建議:改為「high feature similarity without structural corroboration; mechanism unresolved」。
|
||||
|
||||
[MAJOR] §III-H.1, L318
|
||||
原句:「Likely hand-signed」
|
||||
問題:低 max-cosine 並不等於親簽;也可能是跨年度書寫變化、掃描/PDF pipeline、裁切或多 template variant。這是對「沒有高 same-CPA match」的過度解讀。
|
||||
修改建議:改成 descriptor-based label,例如「low-replication-similarity」。
|
||||
|
||||
[MINOR] §III-G A1, L292
|
||||
原句:「within the cross-year same-CPA pool」
|
||||
問題:A1 本身不是年度一致性假設,但「cross-year」容易被讀成跨年度簽名應可比或應一致。
|
||||
修改建議:改為「within the observed same-CPA candidate pool pooled over years; this does not assume temporal stability of handwriting or scanning」。
|
||||
|
||||
[MINOR] §III-L.6, L587
|
||||
原句:「same-CPA repeatability signal」
|
||||
問題:已加 caveat,但「repeatability」仍可能被讀成個人簽名一致性訊號。
|
||||
修改建議:改為「observed same-CPA-pool excess signal, whose sources are not identifiable」。
|
||||
|
||||
[MINOR] §IV-M.6, L1043
|
||||
原句:「interpreted as a same-CPA repeatability signal」
|
||||
問題:同上,且出現在 results consolidation,容易被當成結果主張。
|
||||
修改建議:改為「reported as same-CPA-pool excess under §III-M caveats, not attributed to handwriting repeatability」。
|
||||
|
||||
總體判讀
|
||||
|
||||
軸 1 verdict:大方向已和摘要一致,但仍有幾個「validated detector / mechanism attribution」味道偏重的句子,尤其是「resolves ambiguity」、HC/LH label、§V-C 對 B/C/D 的 production-related reuse。
|
||||
軸 2 verdict:v3 殘留大多已被 demote,但 §III 的敘事仍偏重複;最大問題是 unit taxonomy 與 calibration/re-characterisation 範圍需要更早講清。
|
||||
軸 3 verdict:沒有發現核心計算邏輯必然依賴「同一 CPA 或跨年度簽名必須一致」;但若干命名與動機句會讓讀者以為有這個假設。
|
||||
|
||||
是否可送 partner 最終審查:可,但建議先做一輪小修,主要是改 label/語氣與 §III roadmap。
|
||||
BLOCKER:無。
|
||||
Reference in New Issue
Block a user