Add Gemini 3.1 Pro Phase 5 round-1 independent peer review on v4 drafts

Verdict: Minor Revision (corroborates codex round-7). Convergence with codex: all 4 spot-checked round-26 Major findings confirmed CLOSED in current drafts; all 5 numerical provenance spot-checks VERIFIED against named scripts (Spearman 0.879 / S38; Firm A doc 0.62 / S45; byte-identical 145/8/107/2 / S40; dip p_median=0.35 / S39e; logistic OR 0.053/0.010/0.027 / S44). Net-new findings beyond codex round-7: - Empirical blocker: partner's "statistically insignificant" framing of firm heterogeneity (raised 2026-05-13) is explicitly unsupported — OR of 0.053/0.010/0.027 means 19x-100x lower odds for B/C/D vs Firm A even after pool-size control. Gemini recommends explicit rejection in any partner-side response. - Net-new minor: §IV "Table XV-B" should be renumbered to "Table XIX" for IEEE Access sequential-integer style. - Net-new minor: Table XV (150,442 descriptor-complete) and §III-L.2 ICCR analyses (150,453 vector-complete) need a footnote pointing back to §III-G's sample-size reconciliation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:33:20 +08:00
parent 9604b273c0
commit e33c538162
1 changed files with 75 additions and 0 deletions
@@ -0,0 +1,75 @@
 # Paper A Phase 5 Round 1 — Gemini 3.1 Pro independent review
 Reviewer: Gemini 3.1 Pro
 Date: 2026-05-14
 Target: paper/v4/paper_a_prose_v4_phase4.md + paper/v4/paper_a_methodology_v4_section_iii.md + paper/v4/paper_a_results_v4_section_iv.md
 Prior reviewer artifact: paper/codex_review_gpt55_v4_round7.md (codex GPT-5.5, Minor Revision)
 ## Verdict
 Minor Revision. I corroborate codex's overall conclusion of Minor Revision, as the central empirical narrative (inter-CPA coincidence-rate calibration, K=3 descriptive demotion, and composition decomposition) is robust and strongly supported by the methodology and results sections. However, I explicitly dissent from codex on several critical findings. Most importantly, codex missed that references [42]-[44] *are* already present in the reference list, and codex did not flag the partner's dangerous suggestion to frame firm heterogeneity as "statistically insignificant." These gaps are addressed below.
 ## Codex round-7 closure cross-check
 | # | Spot-checked finding | Verdict | Evidence |
 |---|---|---|---|
 | 1 | Replace "candidate classifiers" with "candidate checks" | CLOSED | `paper_a_prose_v4_phase4.md` explicitly uses "candidate checks" (e.g., "all three candidate checks — the inherited box rule..."); `paper_a_methodology_v4_section_iii.md` §III-K.4 uses "candidate check's positive-anchor miss rate". |
 | 2 | §II LOOO paragraph with refs [42]-[44] | CLOSED | `paper_a_prose_v4_phase4.md` §II contains a fully drafted addition citing Stone [42], Geisser [43], and Vehtari et al. [44]. |
 | 3 | Restore inherited v3.20.0 limitations in §V-G | CLOSED | `paper_a_prose_v4_phase4.md` §V-G lists "The last five are inherited from v3.20.0 §V-G..." and explicitly covers ImageNet features, red-stamp HSV, longitudinal confounds, source-exemplar misattribution, and legal interpretation. |
 | 4 | Limit full-dataset claims to K=3 + Spearman re-run | CLOSED | `paper_a_prose_v4_phase4.md` §V-G clarifies that full-dataset claims are limited: "We did not perform the full per-signature pool-normalised ICCR analysis at the full n = 686 scope; the §IV-K full-dataset Spearman re-run shows the K=3 + box-rule rank-convergence is preserved". |
 ## Major findings
 1. **[Partner Query / Framing Risk] "Statistically insignificant" firm heterogeneity framing is unsupported.** (Codex missed) The partner queried whether firm heterogeneity could be framed as "statistically insignificant." This is completely unsupported and must be explicitly rejected. In `paper_a_results_v4_section_iv.md` §IV-M.5 (Table XXIII) and `paper_a_methodology_v4_section_iii.md` §III-L.4, logistic regression odds ratios for Firms B/C/D versus Firm A are 0.053, 0.010, 0.027. This indicates that Firms B/C/D have 19x to 100x *lower* odds than Firm A of firing the HC hit indicator even after controlling for pool size. This is an order-of-magnitude difference and highly statistically/practically significant.
 2. **[Codex Disagree] Codex falsely claimed refs [42]-[44] were absent.** (Codex disagree) Codex's round-7 review claimed that references [42]-[44] remained placeholders and were absent from `paper_a_references_v3.md`. This is incorrect; the reference file contains these three citations at the end of the list. The only residual issue is the draft note in the phase 4 close-out section (line ~145) stating `[add citation]`, which simply needs to be deleted.
 3. **[Disclaimer Adequacy] Unsupervised limits effectively disclosed.** The text properly disclaims the limits of its unsupervised setting. The "FAR" to "ICCR" terminology replacement reflects the structural fact that inter-CPA collision acts as a specificity proxy, fully acknowledging the "within-firm template-like collision" caveat in §III-L.4.
 4. **[K=3 Demotion Language] Consistent descriptive framing.** The language properly frames the K=2 and K=3 mixtures as firm-compositional partitions rather than inferential evidence for discrete mechanisms, correctly demoting K=3's operational standing based on the dip-test decomposition.
 5. **[Feature-Derived Scores] Caveat phrasing.** §III-K.1 and §V-E clearly caveat the high Spearman correlations ($\rho \ge 0.879$) as "not statistically independent measurements" since they are deterministic functions of the same descriptor pair, successfully framing it as internal consistency rather than external validation.
 ## Minor findings
 1. **[m1] Stale internal draft notes and checklists.** The Phase 4 prose (`paper_a_prose_v4_phase4.md`) still contains internal draft notes at the top and the "Notes for Phase 4 close-out" at the bottom. Section III and IV files also contain similar `internal — remove before submission` blocks. These must be stripped.
 2. **[m2] Table numbering clash (XV-B vs XIX).** In `paper_a_results_v4_section_iv.md` §IV-J, a note acknowledges Table XV-B might need to be renumbered to Table XIX depending on journal style. This should be finalized to ensure sequential integer numbering (preferring Table XIX to avoid "B" suffixes).
 3. **[m3] Word count note.** The abstract word count note in the close-out checklist should be removed, as the abstract is independently verified at 243 words, satisfying the $\le 250$ requirement.
 ## Provenance spot-checks
 1. **Spearman $\rho \ge 0.879$ floor.**
   - *Claim Text:* "Three feature-derived scores agree on the per-CPA descriptor-position ranking at Spearman $\rho \ge 0.879$"
   - *Location:* `paper_a_prose_v4_phase4.md` (Abstract & §V-E) and `paper_a_methodology_v4_section_iii.md` (§III-K.1).
   - *Cited Script:* Script 38.
   - *Verdict:* VERIFIED.
 2. **Firm A per-doc HC+MC alarm 0.62.**
   - *Claim Text:* "Firm A's per-document HC+MC alarm rate is 0.62 versus 0.09–0.16 at Firms B/C/D"
   - *Location:* `paper_a_prose_v4_phase4.md` (Abstract & §V-C).
   - *Cited Script:* Script 45.
   - *Verdict:* VERIFIED.
 3. **145/8/107/2 byte-identical split.**
   - *Claim Text:* "262 byte-identical signatures in the Big-4 subset (Firm A 145, Firm B 8, Firm C 107, Firm D 2)"
   - *Location:* `paper_a_methodology_v4_section_iii.md` (§III-K.4).
   - *Cited Script:* Script 40.
   - *Verdict:* VERIFIED.
 4. **Dip test $p_{\text{median}} = 0.35$ under joint centring + jitter.**
   - *Claim Text:* "Once both confounds are removed (firm-mean centring plus uniform integer jitter), the Big-4 pooled dHash dip test yields $p_{\text{median}} = 0.35$"
   - *Location:* `paper_a_methodology_v4_section_iii.md` (§III-I.4).
   - *Cited Script:* Script 39e.
   - *Verdict:* VERIFIED.
 5. **Logistic OR 0.053 / 0.010 / 0.027.**
   - *Claim Text:* "logistic regression... yields odds ratios of 0.053 (Firm B), 0.010 (Firm C), and 0.027 (Firm D)"
   - *Location:* `paper_a_methodology_v4_section_iii.md` (§III-L.4) and `paper_a_results_v4_section_iv.md` (Table XXIII).
   - *Cited Script:* Script 44.
   - *Verdict:* VERIFIED.
 ## Newly introduced issues
 1. **Sample size nuance between text and tables:** §IV-J Table XV correctly sums to 150,442 signatures across Firms A-D, consistent with the descriptor-complete count. However, the vector-complete count of 150,453 used in the ICCR analyses (§III-L.2) could confuse readers comparing numbers across tables. A small footnote in Table XV directing readers back to the sample-size reconciliation in §III-G is recommended.
 ## Phase 5 readiness
 Partial. The technical framing is solid and the methodological pivot is successfully integrated. Phase 5 readiness requires stripping the remaining internal draft notes and firmly rejecting any attempt to describe firm heterogeneity as "statistically insignificant."
 ## Recommended next-step actions
 1. **[Empirical Blocker]** Explicitly reject the "statistically insignificant" framing of firm heterogeneity. The odds ratios derived from Script 44 confirm a massive, statistically significant difference between Firm A and Firms B/C/D.
 2. **[Copy-Edit Blocker]** Strip all internal draft notes, metadata tags, and the Phase 4 and Phase 3 close-out checklists from the prose, methodology, and results files.
 3. **[Copy-Edit Blocker]** Finalize Table XV-B's numbering (to Table XIX) to comply with sequential integer numbering formats typically preferred by IEEE Access.
 4. **[Copy-Edit Blocker]** Remove the `[add citation]` placeholder string in the notes; references [42]-[44] are fully integrated and listed in `paper_a_references_v3.md`.