Files

T

gbanyan c95c8cb01d Add Opus 4.7 max-effort Phase 5 round-1 independent peer review on v4 drafts

Verdict: Minor Revision (corroborates codex round-7 + Gemini round-1
on disposition) but with explicit dissent on readiness — three Major
findings both prior reviewers missed must close before Phase 5 splice.

Both-missed Major findings:
- M3 (factual overstatement): "98-100% within-source-firm collisions"
  in Abstract / §I item 6 / §V-C / §V-G / §VI item 4 actually applies
  only to the stricter same-pair joint event; computed from Table
  XXIV the deployed any-pair rule yields 98.8 / 76.7 / 83.7 / 77.4
  (range 76.7-98.8%). Abstract's "regardless of which Big-4 firm" is
  wrong as written.
- M1 (K=3 mechanism reversion in §IV): Table XVI column headers plus
  Tables IX/X/XIV/XVII/XVIII still use "hand-leaning / mixed /
  replicated" mechanism naming that §III-J line 90 explicitly
  retires; §III/§I/§V/§VI properly use descriptor-position language.
- M4 (duplicate heading): Phase 4 prose §V has both "G. Pixel-Identity"
  (line 105) and "G. Limitations" (line 109); second should be "H".

Plus M2 (Gemini-missed): Table-numbering cascade. Renaming XV-B → XIX
in isolation collides with §IV-M's existing XIX-XXV; requires cascade
XIX→XX, XX→XXI, …, XXV→XXVI.

Provenance: 5 fresh spot-checks complementing Gemini's 5; only minor
disclosure gap flagged (Script 46 dh=15 plateau ratio derived
post-hoc from JSON, not fabrication risk).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-14 16:44:08 +08:00

34 KiB

Raw Blame History

Paper A Phase 5 Round 1 — Opus 4.7 max-effort independent review

Reviewer: Claude Opus 4.7 Date: 2026-05-14 Target: paper/v4/paper_a_prose_v4_phase4.md + paper/v4/paper_a_methodology_v4_section_iii.md + paper/v4/paper_a_results_v4_section_iv.md Prior reviewer artifacts: paper/codex_review_gpt55_v4_round7.md (codex, Minor Revision); paper/gemini_review_v4_round1.md (Gemini 3.1 Pro, Minor Revision)

Verdict

Minor Revision (corroborates codex + Gemini on overall disposition), but I dissent on readiness. The central empirical narrative — anchor-based multi-level ICCR calibration, the composition-decomposition demolition of distributional thresholds, the K=3-as-firm-compositional demotion, and the disclosed unsupervised-validation ceiling — is methodologically sound and survives my spot-checks. However, my independent pass surfaces three substantive blockers that both prior reviewers missed: (1) the headline "98–100% of inter-CPA collisions concentrated within the source firm" claim, repeated verbatim in the Abstract, §I contribution 6, §V-C, §V-G limitation 2, and §VI conclusion item 4, is factually wrong for the deployed any-pair rule at three of the four firms (Firms B/C/D within-firm rates are 77%/84%/77%, not 98–100%; the 98–100% range applies only to the stricter same-pair joint event); (2) §IV retains the demoted mechanism labels ("hand-leaning / mixed / replicated" in Table XVI columns, "hand-leaning rate" in Tables IX, X, XVII, XVIII, and prose at lines 102, 215, 226, 232, 254) that §III-J line 90 explicitly says are "replaced"; (3) §V has two sub-sections labelled "G" (line 105 "Pixel-Identity ..." and line 109 "Limitations"). I therefore corroborate Minor Revision on the empirical core but treat these three findings as additional copy-edit-or-rewrite blockers that must close before Phase 5 splice — finding (1) verges on empirical depending on whether the author chooses to rewrite or restrict scope.

Cross-reviewer agreement matrix

Theme	codex round-7	Gemini round-1	Opus round-1
Overall disposition	Minor Revision	Minor Revision	Minor Revision
Refs [42]-[44] present in `paper_a_references_v3.md`	Missed — claimed they were placeholders	Caught codex error	Agree with Gemini. Refs are at lines 87–91 of `paper/paper_a_references_v3.md`
Partner "statistically insignificant" framing risk	Not flagged	Flagged as Major	Agree with Gemini. OR magnitudes 0.05/0.01/0.03 are decisive heterogeneity, not absence-of-effect
Table XV-B → XIX renumbering	Not flagged	Flagged as Minor	Agree with Gemini AND extend: renumbering cascades — see Major M2 below
Sample-size 150,442 vs 150,453	Not flagged	Flagged as newly-introduced issue	Partial agree: the §IV-J line 177 footnote already reconciles this inline; the missing piece is a Table XV cross-pointer to §III-G
Internal draft notes / checklists	Flagged	Flagged	Agree. Three draft-note blocks remain (Phase 4 prose, §III v7, §IV v3.3) plus three close-out checklists
Three-score "feature-derived" caveat coverage	Closed	Closed	Partial dissent. Caveat is present at Abstract / §I / §III-K / §V-E, BUT §IV Tables IX/X/XVII/XVIII reintroduce mechanistic "hand-leaning" framing without the §III-K caveat
K=3 demotion language consistency	Closed	Closed (line 25 says "Consistent descriptive framing")	Dissent. See Major M1 — §IV retains mechanistic labels throughout
Cross-firm collision concentration "98–100%"	Closed implicitly	Verified VERIFIED in §IV-M.5 / §III-L.4 spot-check, but did not check whether prose claim matches the any-pair table	Strong dissent. See Major M3
Duplicate §V-G heading	Not flagged	Not flagged	Both missed. See Major M4

Major findings

§IV retains "hand-leaning / mixed / replicated" mechanism labels that §III-J line 90 explicitly demotes (both missed).
- Issue. §III-I.4 establishes that the descriptor distributions contain no within-population bimodality; §III-J line 90 states: "The 'descriptive position' column replaces v3.x's 'hand-leaning / mixed / replicated' mechanism labels." §V-D, §I item 7, and §VI item 5 propagate the descriptive framing. §IV does not.
- Where.
  - paper_a_results_v4_section_iv.md line 219 (Table XVI column headers): C1 (hand-leaning) | C2 (mixed) | C3 (replicated) — verbatim mechanism labels
  - line 85 (Table IX): "K=3 P(C1) vs Paper A box-rule hand-leaning rate"
  - line 86 (Table IX): "Reverse-anchor cosine percentile vs Paper A box-rule hand-leaning rate"
  - line 93 (Table X): "mean Paper A hand-leaning rate"
  - line 100: "more hand-leaning relative to the non-Big-4 reference"
  - line 102: "the most-hand-leaning end of Big-4"; "more hand-leaning"; "ranks Firm D fractionally above Firm C"
  - line 147 (Table XIV column): "Misclassified as hand-leaning"
  - line 215 (§IV-J): "the per-CPA hand-leaning ranking"
  - line 226 (Table XVI reading): "C3 replicated component"; "highest hand-leaning concentration of the Big-4"
  - line 232 (§IV-K): "Paper A operational hand-leaning rate"
  - line 238 (Table XVII): "C1 hand-leaning" / "C3 replicated"
  - line 244 (Table XVIII title): "Paper A operational hand-leaning rate"
  - line 246 (Table XVIII): "P(C1) vs Paper A hand-leaning rate"
  - line 254 (§IV-K reading): "more non-templated CPAs"; "more mid/small-firm hand-leaning CPAs"
- Reasoning. §III-K explicitly renames Score 3 to "inherited binary high-confidence box rule rate" / "less-replication-dominated rate" (lines 119, 129), and §V-E uses "less-replication-dominated rate" (line 97). The §IV body silently retains v3.x mechanistic naming. A reader reaching §IV after §III-J's demotion will perceive that §IV has reverted to causal/mechanistic K=3 framing. This is the exact regression the v4 pivot is supposed to prevent.
- Fix. Global s/hand-leaning/less-replication-dominated/ in §IV; rename Table XVI columns to C1 (low-cos / high-dHash) / C2 (central) / C3 (high-cos / low-dHash) matching the §III-J Table 8 "descriptive position" column; rename Table XVII C1/C3 row labels and Table XVIII row labels likewise; rewrite the §IV-K Reading prose to use descriptor-position language ("CPAs whose descriptor mean sits further from the templated end of the descriptor plane").
- Tag. Both missed.
Table-numbering cascade if XV-B → XIX is finalised (Gemini missed the cascade; codex did not flag).
- Issue. Gemini correctly recommended renumbering Table XV-B → Table XIX. But the §IV-M tables (§IV-M.1 through §IV-M.6) currently occupy Tables XIX, XX, XXI, XXII, XXIII, XXIV, XXV. Bumping XV-B → XIX forces a cascade renumber of all seven §IV-M tables to XX–XXVI. There is no acknowledgement of this cascade in either §IV-J close-out item 2 (line 370) or in Gemini's recommendation. Without addressing the cascade, fixing XV-B alone produces a duplicate Table XIX in the manuscript.
- Where. paper_a_results_v4_section_iv.md lines 192 (XV-B), 266 (XIX), 280 (XX), 300 (XXI), 317 (XXII), 329 (XXIII), 340 (XXIV), 353 (XXV).
- Reasoning. Sequential integer numbering is the IEEE Access norm. Either Table XV-B stays (and §IV-M tables stay XIX–XXV) or the rename cascades.
- Fix. If XV-B → XIX is preferred, also renumber §IV-M tables XIX→XX, XX→XXI, ..., XXV→XXVI; update all in-text references (§III provenance table at line 387 onward; §IV-J line 188 and 217 referencing "Table XVI"; §I and §V cross-references). Run a manuscript-wide grep on "Table XVI"…"Table XXVI" after the renumbering to catch stale internal pointers.
- Tag. Gemini missed (cascade implication); codex missed entirely.
The "98–100% of inter-CPA collisions concentrated within the source firm" claim is factually wrong for the deployed any-pair rule at three of four firms (both missed).
- Issue. The Abstract (line 11), §I item 6 (line 53), §V-C (line 87), §V-G limitation 2 (line 115), and §VI conclusion item 4 (line 147) report "98–100% of inter-CPA collisions concentrated within the source firm". This range applies only to the same-pair joint event (a single inter-CPA candidate satisfying both cos>0.95 AND dHash≤5), which is the stricter alternative classifier explicitly contrasted to the deployed any-pair rule in §III-L.0 (line 183) and §III-L.4 (line 281).
- Verification. §IV-M Table XXIV (line 342) reports any-pair cross-firm hit counts. Computing within-firm fractions from Table XXIV:
  - Firm A: 14,447 / 14,622 = 98.8%
  - Firm B: 371 / 484 = 76.7%
  - Firm C: 149 / 178 = 83.7%
  - Firm D: 106 / 137 = 77.4%
  - Any-pair within-firm range: [76.7%, 98.8%], not [98%, 100%]. The 98–100% range corresponds to the same-pair rates (99.96/97.7/98.2/97.0%) stated separately at §III-L.4 line 281 and §IV-M.5 line 349.
- Reasoning. §III-L.0 makes a careful any-pair vs same-pair distinction and emphasises the deployed rule is any-pair. The Abstract / §I / §VI summarise the within-firm concentration using the same-pair number while attributing it to the deployed rule. §V-C line 87 is even worse: "98–100% of inter-CPA collisions originate from candidates within the source firm, regardless of which Big-4 firm is the source" — explicitly asserting the strong claim across all four firms, contradicted by Firms B/C/D any-pair rates of 77/84/77%.
- Fix options.
  - Option A (preferred): Restate as "98% within-firm at Firm A, 77–84% at Firms B/C/D for the deployed any-pair rule; 97–100% within-firm under the stricter same-pair joint event across all four firms" in Abstract / §I / §V / §VI. This preserves the headline finding (within-firm dominance) while accurately characterising the gap between any-pair and same-pair.
  - Option B: If the manuscript prefers the cleaner same-pair number, every occurrence must be re-attributed to "the same-pair joint event under the stricter alternative classifier (§III-L.4)", not to "the deployed rule" / "inter-CPA collisions". The current Abstract / §I framing ties the number to "inter-CPA collisions" without qualifier, which reads as applying to the deployed rule.
- Severity. This is the headline forensic finding in the Abstract and §I. Mis-stating it across five locations is more than a copy-edit — partners and reviewers form their understanding of "what the paper shows" from the Abstract.
- Tag. Both missed. Gemini's provenance spot-check #3 verified the byte-identical split (145/8/107/2) but did not verify whether the 98–100% claim in the Abstract matches the deployed-rule numbers in Table XXIV.
Duplicate §V-G heading (both missed).
- Issue. paper_a_prose_v4_phase4.md has two §V-G headings: line 105 "G. Pixel-Identity as a Hard Positive Anchor; Inherited Inter-CPA Negative Anchor Reframed as Coincidence Rate" and line 109 "G. Limitations". The second should be "H. Limitations".
- Where. paper_a_prose_v4_phase4.md lines 105, 109. Also: line 37 (§I) cross-references "§V-G" for the conservative-subset caveat — ambiguous given two G sections; the content actually lives in BOTH (line 105 has "we caution that this result is necessary but not sufficient: for the box rule it is close to tautological", and line 119 has the "Pixel-identity is a conservative subset" bullet).
- Reasoning. §V originally went A through F in the v3 draft. The Phase 4 prose added a new §V-G ("Pixel-Identity ...") between §V-F and the existing §V-G Limitations without renaming the latter. Both prior reviewers spot-checked §V-G Limitations content for completeness (codex M9, Gemini codex-closure check #3) and confirmed the inherited v3.20.0 limitations are restored — but neither noticed the duplicate letter.
- Fix. Rename the second to §V-H ("H. Limitations"). Update every internal §V-G cross-reference: line 37, line 111 ("inherited from v3.20.0 §V-G"), line 160 (close-out checklist), and any §III or §IV pointer.
- Tag. Both missed.
Stale "seven limitations" assertion in close-out checklist (both missed).
- Issue. Phase 4 close-out item 4 (line 160) says "The seven limitations are listed flat". Actual count in the §V-G Limitations section is 14 (9 v4.0-specific + 5 inherited from v3.20.0). §V-G line 111 correctly says "The first nine are v4.0-specific; the last five are inherited". The checklist item is stale from a previous version where the v4.0-specific count was 2.
- Fix. Either update to "The fourteen limitations are listed flat" or remove the checklist on Phase 5 splice (already on Gemini's list under m1 and codex's list under A8).
- Tag. Both missed (a small instance of the broader internal-notes cleanup).
§I contribution 4 and §IV-D both refer to a "2×2 factorial diagnostic" but only §IV-M Table XIX summarises it; §IV-D's body does not reproduce the 2×2 factorial table from §III-I.4 (codex implicitly covered; Gemini partly covered).
- Issue. §IV-D line 23 says "the v4-new composition-decomposition diagnostics that establish this finding are tabulated in §IV-M below". The reader expects to find the §III-I.4 factorial table (4 rows: raw / centred-only / jittered-only / both) reproduced in §IV-M. But §IV-M Table XIX only summarises three of the four rows (collapses the "centred-only" and "jittered-only" cells into descriptive prose) and omits the "raw" baseline row's $p$-value comparison. Readers comparing §III-I.4 line 67 to §IV-M Table XIX will see a partial reproduction.
- Severity. Cosmetic — the same content is in §III-I.4 in full. But the §IV reader is implicitly told §IV-M is the destination for the §III-I.4 evidence, and §IV-M is not as complete as §III-I.4.
- Fix. Either reproduce the full 4-row 2×2 factorial table in §IV-M (currently Table XIX), or change §IV-D line 23 to "see §III-I.4 Table for the full factorial; §IV-M Table XIX summarises the key diagnostic outcomes".
- Tag. Both implicitly missed.
Inconsistent precision in three-score Spearman across §III-K and §IV-F (Gemini implicitly missed).
- Issue. §III-K Table at line 125–127 reports three-score pairwise Spearman as +0.963, +0.889, +0.879 (3 decimal places). §IV-F Table IX at lines 85–87 reports the same correlations as +0.9627, +0.8890, +0.8794 (4 decimal places). §III provenance table line 365 uses 3 decimals.
- Severity. Low — the values are consistent at 3-dp rounding. But mixed precision across §III table / §IV table / Abstract is a copy-edit signal.
- Fix. Standardise on 4 decimal places everywhere (the §IV precision matches Script 38's output more faithfully) and update §III-K Table + §III provenance row + the Phase 4 abstract / §I body to match. Alternatively, standardise on the lower-bound floor "ρ ≥ 0.879" that the Abstract uses.
- Tag. Both missed (low priority).

Minor findings

Stale "approximately 235 words" / "243-244 words" abstract counts in two checklists. Phase 4 close-out item 1 (line 157) says "Current draft is 243–244 words"; codex round-7 verified wc -w returns 243. Both are within the IEEE Access 250-word budget. Just remove the abstract-word-count notes before splice. (Gemini m3.)
Three "internal — remove before submission" draft notes still present. Phase 4 line 3, §III v7 line 3 (lines 1–5), §IV v3.3 line 3. Plus two close-out checklists (§IV lines 365–375, Phase 4 lines 153–162) and §III's cross-reference index (lines 431–448). All flagged by codex (A8) and Gemini (m1). I corroborate.
§IV-J Table XV-B header pointer. §IV-J line 228 "Document-level worst-case aggregation outputs are reported in Table XV-B above." — fine prose but does not specify which table number is the document-level table when XV-B → XIX renumber occurs. Once Major M2 is resolved, update accordingly.
Mixed pp vs decimal vs percentage notation across Abstract / §I / §III-L / §IV-M for the same numerical claims. Examples:
- Abstract line 11 uses "0.34 for the operational HC+MC alarm"
- §I line 33 uses "33.75% of Big-4 documents"
- §III-L.3 Table reports "0.3375 [Wilson 95% [0.3342, 0.3409]]"
- §IV-M Table XXII reports "0.3375"
- Phase 4 conclusion line 147 uses "(0.34 for the operational HC+MC alarm)"
- Standardise on either decimal or percentage throughout numeric quotations; the lossy "0.34" in the Abstract drops the 95% CI which is reported elsewhere. The IEEE Access norm in this corpus is decimal-with-CI in tables, percentage-without-CI in prose summary.
"v3.x §IV-F.1" cross-reference is not validated against the v3 file's actual section letter. §III-H line 37 and §V-C line 85 both cite "v3.x §IV-F.1" for the 145 pixel-identical signatures across ~50 distinct Firm A partners. If paper_a_results_v3.md uses a different sub-section letter (e.g., §IV-G or §IV-F-1), the citation will be stale at splice time. I did not verify this but flag it as a splice-time check.
"~50 distinct partners of 180" vs §III-H "50 distinct Firm A partners of 180 registered" — Firm A registered partner count of 180 is inherited from v3.20.0 / Script 28 with no in-paper provenance verification. This is flagged in §III-H line 37 ("inherited from v3 §IV-F.1 / Script 28 / Appendix B byte-decomposition output and were not regenerated in v4.0 spike scripts") — adequately disclosed. Just confirm at splice that v3.20.0's count of 180 is reproducible.
§I.5 contribution 5 line 51 claim "We adopt 'inter-CPA coincidence rate' as the metric name throughout and reserve 'False Acceptance Rate' for terminology that requires ground-truth negative labels". §IV-I line 159 prose uses "False Acceptance Rate (FAR)" in the historical-context phrase "previously reported as 'False Acceptance Rate' in v3.x" — this is intentional historical reference, but it does mean "FAR" appears once in §IV body, which contradicts the §I claim of "throughout". Either weaken §I.5 to "throughout v4.0 framing, with one historical-reference exception in §IV-I" or strip "FAR" from §IV-I line 159 and reword to "previously reported using biometric-verification 'False Acceptance Rate' terminology".
MC band per-firm proportions in §IV-J line 215. "10.76% / 35.88% / 41.44% / 29.33% across Firms A through D". Cross-checking against Table XV per-firm breakdown line 183–186: Firm A MC = 10.76%, Firm B MC = 35.88%, Firm C MC = 41.44%, Firm D MC = 29.33%. Consistent ✓.
§V-G limitation 1 list says "first nine are v4.0-specific" but Phase 4 close-out item 4 calls them "seven limitations". Already covered under Major M5.
Provenance-table cross-references inside the §III provenance table. Line 364: "K=3 LOOO held-out C1 absolute differences 1.8–12.8 pp | direct | Script 37 held-out prediction check". Cross-checked against §IV-G Table XIII (line 134–137): the four held-out absolute differences are 4.68, 1.76, 12.77, 5.81 → range 1.76 to 12.77, which rounds to "1.8–12.8" ✓.
Abstract line 11 phrasing "98–100% of inter-CPA collisions concentrated within the source firm — consistent with firm-level template-like reuse" — this is the most-public statement of Major M3. The Abstract is the only thing many readers will read; the imprecision is more costly here than in §V-G.

Provenance spot-checks

I deliberately chose 5 claims NOT covered by Gemini's spot-checks.

Within-source-firm any-pair collision rates (the "98–100%" claim).
- Claim text. "with $98$–100\% of inter-CPA collisions concentrated within the source firm" (Abstract line 11, §I line 53, §V-C line 87, §V-G line 115, §VI line 147).
- Manuscript location of evidence. Table XXIV in §IV-M.5 (line 342); §III-L.4 cross-firm hit matrix (line 274–281).
- Cited script. Script 44.
- Verdict. INCONSISTENT. Computed from Table XXIV: any-pair within-firm fractions are 98.8% / 76.7% / 83.7% / 77.4% (range 76.7–98.8%, not 98–100%). The 98–100% range corresponds to the same-pair joint event explicitly reported as the stricter alternative at §III-L.4 line 281 and Table XXIV line 349 (99.96% / 97.7% / 98.2% / 97.0%). The narrative-level claim conflates two distinct rule semantics. The script logic (44_firm_matched_pool_regression.py lines 274–327) does compute both matrices; the manuscript reports both correctly inside §III-L.4 / §IV-M.5; the failure is in the Abstract / §I / §V-C / §V-G / §VI narrative-summary layer. See Major M3.
Per-pair conditional ICCR dHash≤5 given cos>0.95 = 0.234 (Wilson [0.190, 0.285], 70 of 299 pairs).
- Claim text. §III-L.1 line 208; Phase 4 §V-F line 103.
- Cited script. Script 40b.
- Verdict. VERIFIED for logic; numerical value not independently verifiable from worktree. Script 40b at lines 22–23 explicitly computes "Conditional FAR(dh<=k | cos>0.95)" for k∈[0, 20] and prints "P(dh<=k | cos>0.95)" for each k (lines 226–229 of the script). The 0.234 / 70 / 299 numbers cannot be checked without access to /Volumes/NV2/PDF-Processing/signature-analysis/reports/v4_big4/inter_cpa_far_sweep/ (not in the worktree). Wilson [0.190, 0.285] is a plausible 95% CI for \hat{p} = 70/299 = 0.2341 (manual check: SE ≈ \sqrt{0.234 \cdot 0.766/299} = 0.0245, normal-approx 95% CI [0.186, 0.282] — Wilson CIs are slightly wider on each side, so [0.190, 0.285] is consistent). Logic + analytic plausibility verified.
Pooled Big-4 per-signature any-pair ICCR 0.1102 with Wilson [0.1086, 0.1118].
- Claim text. §III-L.2 Table at line 220; Phase 4 §V-F line 101.
- Cited script. Script 43.
- Verdict. VERIFIED analytically. For \hat{p} = 0.1102 on n_{\text{sig}} = 150{,}453, the Wilson-approximate 95% CI half-width is \approx 1.96 \cdot \sqrt{0.1102 \cdot 0.8898 / 150453} = 0.00159, giving [0.1086, 0.1118] — exactly the reported interval. CPA-block bootstrap [0.0908, 0.1330] is much wider, as expected once CPA-level clustering is recognised (CIs widen by factor of \sim 9\times here), consistent with the strong within-CPA correlation that the manuscript discusses.
Logistic OR pool-size effect = 4.01.
- Claim text. §III-L.4 Table line 266; §IV-M.5 Table XXIII line 336.
- Cited script. Script 44.
- Verdict. VERIFIED for logic; magnitude is intuitive. Script 44 at lines 219–227 fits logistic(hit) ~ intercept + FirmB + FirmC + FirmD + log(pool_size_centered) and reports OR = exp(beta). A 4× per-log-unit pool-size effect means each e\times pool-size increase quadruples the per-signature odds — consistent with the §III-L.2 decile trend (decile 1 = 0.0249, decile 10 = 0.1905, ratio 7.65×, pool ratio approximately 1115/201 \approx 5.5, log-ratio \approx 1.71, predicted OR \approx 4.01^{1.71} = 10.7 vs observed 0.1905/0.0249 = 7.65; the gap is small enough to be consistent with monotonicity + finite-sample). No red flag.
Alert-rate sensitivity local-gradient ratios (cos=0.95: ≈25×; dHash=5: ≈3.8×; dHash=15: ≈0.08).
- Claim text. §III-L.5 line 291; §IV-M.6 Table XXV line 357–359.
- Cited script. Script 46.
- Verdict. PARTIALLY VERIFIABLE. Script 46 at lines 236–276 explicitly computes the local-to-median gradient ratio at the v3-inherited threshold (cos=0.95 and dh=5) and emits ratio_local_to_median for both. However, the script does NOT emit a dh=15 ratio: DH_GRID = np.arange(0, 21, 1) covers 15 (so the swept rates ARE available in the saved JSON), but the script only computes the ratio at dh=5. The ≈0.08 ratio at dh=15 in Table XXV must therefore be a derived quantity computed by the author from the JSON output post-hoc, not a direct script output. This is not a fabrication risk (the author has access to the swept rates), but the manuscript should either (a) add a brief in-paper note that the dh=15 ratio is computed identically to the dh=5 ratio but post-hoc on the JSON, or (b) extend Script 46's plateau-detection block to emit ratios at both dh=5 and dh=15.
Secondary note. The cosine sweep prose at §III-L.5 line 291 quotes "cosine sweep at dHash ≤ 5 yields rates of 0.5091 at cos > 0.945 vs 0.4789 at cos > 0.955". From Script 46 line 56–57: COS_FOR_2D = np.arange(0.85, 1.00, 0.01) only covers integer hundredths, so 0.945 and 0.955 are NOT on the 2D grid. The 1D COS_GRID covers the sweep at which 0.945 / 0.955 may be in the grid; I cannot fully verify without seeing the saved JSON. The numerical specificity is suspiciously precise (four decimals on a derived rate) — flag for partner spot-check at splice.

Newly introduced issues

The "specificity-proxy-anchored screening framework with human-in-the-loop review" positioning is consistent across Abstract / §I / §III-M / §V-G / §VI — except that the term "human-in-the-loop" appears NOWHERE in §III or §IV, only in §I line 30 (item 5), §III-L.3 line 255, §III-M line 334, and §VI line 147. §I item 5 promises "anchor-based threshold calibration at three units of analysis ... against an inter-CPA negative-anchor coincidence-rate proxy", and §III-M closes with "positioning as a specificity-proxy-anchored screening framework with human-in-the-loop review". But §IV's results discussion never operationalises what "human-in-the-loop review" means: the per-document HC+MC alarm rate of 0.34 is reported as a number; whether the human reviewer triages all 34% of flagged docs or only a sub-set is not specified. Newly introduced v4-positional claim that v3.20.0 did not make, lacking a concrete operationalisation. Either explicitly punt to future work or describe the intended triage workflow briefly in §V or §VI.
The "feature-derived" qualifier coverage IS reliable across §III-K, §V-E, Abstract, §I, §VI — but breaks down in §IV. Major M1 above. Newly introduced in v4 because v3.20.0 used "three independent scores" without the caveat; v4 added the caveat in §III/§V/§I/§VI but not §IV.
The "9-tool nine-diagnostic" validation collection (§III-M Table at lines 318–329) and §VI item 8 nine-tool claim — count is internally consistent (9 rows in the §III-M table), but the table is currently unnumbered. All other tables in §III/§IV have a Table N: header. The §III-M validation table should be Table XXVI (or whatever number after the §IV-M cascade) for IEEE Access. Otherwise it is an unnumbered table inside a discussion of "Table-numbering scheme" elsewhere — inconsistent.
The §I cross-reference list at line 29 ("(1) signature page identification...; (2) signature region detection...; ... (8) a multi-tool unsupervised validation strategy") promises 8 pipeline steps in numbered order, but the actual pipeline order in §III is steps 1–5 are the embedding pipeline and 6–8 are validation framework / decomposition / framework positioning, not strict pipeline stages. Cosmetic; the v3.20.0 contribution list had 7 sequential pipeline-stages; the v4.0 list collapses pipeline + validation into 8 items, blurring stage vs framework. Either say "Eight elements" rather than imply 8 pipeline steps, or split into "Pipeline (1–4): ...; Calibration and validation framework (5–8): ...".

Disagreements with prior reviewers

I disagree with codex round-7 M6 (closed) "§II citation-number gap and placeholder contradiction." codex round-7 says "[42]-[44] remain placeholders and absent from the reference list." Gemini correctly identified that codex was wrong: [42]–[44] are present at lines 87–91 of paper_a_references_v3.md. I corroborate Gemini's correction. The only residual is the [add citation] placeholder string in Phase 4 close-out note line 159 (just a stale comment in a checklist that will be stripped at splice). codex's claim was based on the file ending at [41] in some earlier snapshot; the current file ends at [44].
I partially disagree with Gemini's finding #4 "K=3 Demotion Language Consistency" (verdict: "CLOSED. Consistent descriptive framing."). §III, §I, §V, §VI properly demote K=3, but §IV (which Gemini did not separately enumerate for this check) retains mechanism labels throughout (Major M1 above). The K=3 demotion is partially closed, not fully closed.
I corroborate Gemini's finding #1 on "statistically insignificant" framing risk (Major in Gemini's review, missed by codex). The OR magnitudes are 0.05/0.01/0.03 — these are 19×/100×/37× effects after pool-size adjustment, with standard errors that the manuscript flags as needing cluster-robust treatment but which are nonetheless an order of magnitude below 1. There is no defensible reading under which this is "statistically insignificant"; any such partner framing must be rejected.
Neither codex nor Gemini examined the cross-firm hit matrix arithmetic to verify that the Abstract's "98–100%" claim is consistent with Table XXIV. Major M3 is my net-new finding.
Neither codex nor Gemini flagged the duplicate §V-G heading. Major M4 is my net-new finding.
Neither codex nor Gemini flagged the §IV vs §III terminology drift around "hand-leaning / replicated". Major M1 is my net-new finding.
codex M1 (Abstract independent-score correction closed) — I corroborate the closure for the Abstract (line 11 says "Three feature-derived scores"), but extend the finding: the same correction is NOT propagated to §IV Tables IX/X/XVII/XVIII where the "Paper A box-rule hand-leaning rate" label retains mechanistic framing. This is Major M1 again.

Phase 5 readiness

Partial — closer to "blocked on copy-edit pass" than to "ready". Three major issues require manuscript-text rewrites before Phase 5 splice:

Empirical-language blocker: Major M3 (98–100% within-firm claim) requires Abstract / §I / §V-C / §V-G / §VI prose rewrite. This is more than copy-edit — the headline finding's numerical scope must be reconciled with Table XXIV.
Terminology blocker: Major M1 (§IV mechanism labels) requires global s/hand-leaning/less-replication-dominated/ across §IV tables and prose plus Table XVI column-header rename.
Structural blocker: Major M4 (duplicate §V-G heading) requires §V renumber.
Numbering blocker: Major M2 (table-numbering cascade) requires either keeping XV-B suffix or cascading XIX→XX→...→XXVI across §IV-M and updating every cross-reference.

The empirical core (scripts 32–46, ICCR multi-level calibration, composition decomposition, three-score convergence, byte-identical anchor, logistic regression) is sound and reproducible from script inspection. No new statistical work is required for Phase 5.

Recommended next-step actions

Ranked by severity, distinguishing prose-rewrite blockers from copy-edit-only items.

[Manuscript rewrite — prose blocker] Reconcile the "98–100% within-firm" claim with Table XXIV. Adopt Major M3 Option A or Option B in Abstract / §I item 6 / §V-C / §V-G limitation 2 / §VI item 4. Preferred phrasing (Option A): "98% of inter-CPA collisions at Firm A and 77–84% at Firms B/C/D under the deployed any-pair rule, rising to 97–100% across all four firms under the stricter same-pair joint event".
[Manuscript rewrite — terminology blocker] Global s/hand-leaning/less-replication-dominated/ across §IV (Table IX/X column labels, Table XIV column, Table XVI columns, Table XVII rows, Table XVIII title, §IV-J/K/M prose). Rename Table XVI columns to match §IV-E Table VIII's "descriptive position" column. Keep §I, §V, §VI as currently worded.
[Manuscript rewrite — structural blocker] Rename §V's second "G. Limitations" to "H. Limitations". Update every §V-G cross-reference.
[Manuscript rewrite — partner framing] Explicitly state at §V-C or §V-G that firm heterogeneity is highly statistically significant (OR magnitudes 19×, 100×, 37× after pool-size adjustment with $z$-equivalent ≥ 10 in absolute value on standard SE; cluster-robust SE is flagged as a robustness check, not as the reason heterogeneity might be insignificant). Reject any reframing as "statistically insignificant". (Gemini Major #1.)
[Copy-edit blocker — table renumbering] Decide on the XV-B suffix. If renamed to XIX, cascade through §IV-M XIX→XX, XX→XXI, …, XXV→XXVI. Update §III provenance table cross-references. Update §IV-J line 188 / 217 / 228 references. Update §IV-L line 258 to disambiguate from v3.20.0 Table XVIII.
[Copy-edit blocker — strip internal artefacts] Remove:
- Phase 4 prose draft note (line 3) and close-out checklist (lines 153–162)
- §III v7 draft note (lines 1–5) and cross-reference index + open-questions block (lines 431–448)
- §IV v3.3 draft note (line 3) and close-out checklist (lines 365–375)
[Copy-edit — terminology consistency] Decide on decimal vs percentage notation for the headline ICCRs (0.34 vs 33.75% etc.) and apply uniformly across Abstract / §I / §V / §VI.
[Copy-edit — Spearman precision] Standardise three-score correlations on either 3-dp (0.963/0.889/0.879) or 4-dp (0.9627/0.8890/0.8794) across §III-K / §IV-F / §III provenance.
[Copy-edit — §III-M validation table] Assign a table number (Table XXVI after the §IV-M cascade) to the nine-tool validation table at §III-M lines 318–329.
[Copy-edit — §V-G item count] Update Phase 4 close-out item 4 "seven limitations" to "fourteen limitations" (or strip the checklist).
[Splice-time check] Verify v3.20.0 §IV-F.1 cross-reference letters resolve correctly when §III-H / §V-C are spliced into the master file. Verify v3.20.0 Table XVIII (backbone ablation) does not collide with v4 Table XVIII (Spearman drift) once the final renumbering is applied. Verify all "v3.x" pointers point to actual sections that survive in the master manuscript.
[Minor — script 46 dh=15 ratio] Either add a brief note in §IV-M that the dh=15 plateau ratio is computed post-hoc from the swept JSON (the swept rates themselves are emitted by Script 46), or extend Script 46 to compute the ratio at dh=15 directly.
[Minor — pipeline-step vs framework-element framing] Reword §I line 29's "(1)…(8)" enumeration to distinguish pipeline stages (1–4) from validation/framework elements (5–8).
[Minor — human-in-the-loop operationalisation] Add one sentence in §V or §VI describing what "human-in-the-loop review" means operationally (e.g., manual inspection of the 0.34 flagged-document fraction; sampling strategy; reviewer-effort estimate).

34 KiB Raw Blame History Unescape Escape