Files
pdf_signature_extraction/paper/gemini_review_v4_round1.md
T
gbanyan e33c538162 Add Gemini 3.1 Pro Phase 5 round-1 independent peer review on v4 drafts
Verdict: Minor Revision (corroborates codex round-7).

Convergence with codex: all 4 spot-checked round-26 Major findings
confirmed CLOSED in current drafts; all 5 numerical provenance
spot-checks VERIFIED against named scripts (Spearman 0.879 / S38;
Firm A doc 0.62 / S45; byte-identical 145/8/107/2 / S40; dip
p_median=0.35 / S39e; logistic OR 0.053/0.010/0.027 / S44).

Net-new findings beyond codex round-7:
- Empirical blocker: partner's "statistically insignificant" framing
  of firm heterogeneity (raised 2026-05-13) is explicitly unsupported
  — OR of 0.053/0.010/0.027 means 19x-100x lower odds for B/C/D vs
  Firm A even after pool-size control. Gemini recommends explicit
  rejection in any partner-side response.
- Net-new minor: §IV "Table XV-B" should be renumbered to "Table XIX"
  for IEEE Access sequential-integer style.
- Net-new minor: Table XV (150,442 descriptor-complete) and §III-L.2
  ICCR analyses (150,453 vector-complete) need a footnote pointing
  back to §III-G's sample-size reconciliation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:33:20 +08:00

75 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Paper A Phase 5 Round 1 — Gemini 3.1 Pro independent review
Reviewer: Gemini 3.1 Pro
Date: 2026-05-14
Target: paper/v4/paper_a_prose_v4_phase4.md + paper/v4/paper_a_methodology_v4_section_iii.md + paper/v4/paper_a_results_v4_section_iv.md
Prior reviewer artifact: paper/codex_review_gpt55_v4_round7.md (codex GPT-5.5, Minor Revision)
## Verdict
Minor Revision. I corroborate codex's overall conclusion of Minor Revision, as the central empirical narrative (inter-CPA coincidence-rate calibration, K=3 descriptive demotion, and composition decomposition) is robust and strongly supported by the methodology and results sections. However, I explicitly dissent from codex on several critical findings. Most importantly, codex missed that references [42]-[44] *are* already present in the reference list, and codex did not flag the partner's dangerous suggestion to frame firm heterogeneity as "statistically insignificant." These gaps are addressed below.
## Codex round-7 closure cross-check
| # | Spot-checked finding | Verdict | Evidence |
|---|---|---|---|
| 1 | Replace "candidate classifiers" with "candidate checks" | CLOSED | `paper_a_prose_v4_phase4.md` explicitly uses "candidate checks" (e.g., "all three candidate checks — the inherited box rule..."); `paper_a_methodology_v4_section_iii.md` §III-K.4 uses "candidate check's positive-anchor miss rate". |
| 2 | §II LOOO paragraph with refs [42]-[44] | CLOSED | `paper_a_prose_v4_phase4.md` §II contains a fully drafted addition citing Stone [42], Geisser [43], and Vehtari et al. [44]. |
| 3 | Restore inherited v3.20.0 limitations in §V-G | CLOSED | `paper_a_prose_v4_phase4.md` §V-G lists "The last five are inherited from v3.20.0 §V-G..." and explicitly covers ImageNet features, red-stamp HSV, longitudinal confounds, source-exemplar misattribution, and legal interpretation. |
| 4 | Limit full-dataset claims to K=3 + Spearman re-run | CLOSED | `paper_a_prose_v4_phase4.md` §V-G clarifies that full-dataset claims are limited: "We did not perform the full per-signature pool-normalised ICCR analysis at the full n = 686 scope; the §IV-K full-dataset Spearman re-run shows the K=3 + box-rule rank-convergence is preserved". |
## Major findings
1. **[Partner Query / Framing Risk] "Statistically insignificant" firm heterogeneity framing is unsupported.** (Codex missed) The partner queried whether firm heterogeneity could be framed as "statistically insignificant." This is completely unsupported and must be explicitly rejected. In `paper_a_results_v4_section_iv.md` §IV-M.5 (Table XXIII) and `paper_a_methodology_v4_section_iii.md` §III-L.4, logistic regression odds ratios for Firms B/C/D versus Firm A are 0.053, 0.010, 0.027. This indicates that Firms B/C/D have 19x to 100x *lower* odds than Firm A of firing the HC hit indicator even after controlling for pool size. This is an order-of-magnitude difference and highly statistically/practically significant.
2. **[Codex Disagree] Codex falsely claimed refs [42]-[44] were absent.** (Codex disagree) Codex's round-7 review claimed that references [42]-[44] remained placeholders and were absent from `paper_a_references_v3.md`. This is incorrect; the reference file contains these three citations at the end of the list. The only residual issue is the draft note in the phase 4 close-out section (line ~145) stating `[add citation]`, which simply needs to be deleted.
3. **[Disclaimer Adequacy] Unsupervised limits effectively disclosed.** The text properly disclaims the limits of its unsupervised setting. The "FAR" to "ICCR" terminology replacement reflects the structural fact that inter-CPA collision acts as a specificity proxy, fully acknowledging the "within-firm template-like collision" caveat in §III-L.4.
4. **[K=3 Demotion Language] Consistent descriptive framing.** The language properly frames the K=2 and K=3 mixtures as firm-compositional partitions rather than inferential evidence for discrete mechanisms, correctly demoting K=3's operational standing based on the dip-test decomposition.
5. **[Feature-Derived Scores] Caveat phrasing.** §III-K.1 and §V-E clearly caveat the high Spearman correlations ($\rho \ge 0.879$) as "not statistically independent measurements" since they are deterministic functions of the same descriptor pair, successfully framing it as internal consistency rather than external validation.
## Minor findings
1. **[m1] Stale internal draft notes and checklists.** The Phase 4 prose (`paper_a_prose_v4_phase4.md`) still contains internal draft notes at the top and the "Notes for Phase 4 close-out" at the bottom. Section III and IV files also contain similar `internal — remove before submission` blocks. These must be stripped.
2. **[m2] Table numbering clash (XV-B vs XIX).** In `paper_a_results_v4_section_iv.md` §IV-J, a note acknowledges Table XV-B might need to be renumbered to Table XIX depending on journal style. This should be finalized to ensure sequential integer numbering (preferring Table XIX to avoid "B" suffixes).
3. **[m3] Word count note.** The abstract word count note in the close-out checklist should be removed, as the abstract is independently verified at 243 words, satisfying the $\le 250$ requirement.
## Provenance spot-checks
1. **Spearman $\rho \ge 0.879$ floor.**
- *Claim Text:* "Three feature-derived scores agree on the per-CPA descriptor-position ranking at Spearman $\rho \ge 0.879$"
- *Location:* `paper_a_prose_v4_phase4.md` (Abstract & §V-E) and `paper_a_methodology_v4_section_iii.md` (§III-K.1).
- *Cited Script:* Script 38.
- *Verdict:* VERIFIED.
2. **Firm A per-doc HC+MC alarm 0.62.**
- *Claim Text:* "Firm A's per-document HC+MC alarm rate is 0.62 versus 0.090.16 at Firms B/C/D"
- *Location:* `paper_a_prose_v4_phase4.md` (Abstract & §V-C).
- *Cited Script:* Script 45.
- *Verdict:* VERIFIED.
3. **145/8/107/2 byte-identical split.**
- *Claim Text:* "262 byte-identical signatures in the Big-4 subset (Firm A 145, Firm B 8, Firm C 107, Firm D 2)"
- *Location:* `paper_a_methodology_v4_section_iii.md` (§III-K.4).
- *Cited Script:* Script 40.
- *Verdict:* VERIFIED.
4. **Dip test $p_{\text{median}} = 0.35$ under joint centring + jitter.**
- *Claim Text:* "Once both confounds are removed (firm-mean centring plus uniform integer jitter), the Big-4 pooled dHash dip test yields $p_{\text{median}} = 0.35$"
- *Location:* `paper_a_methodology_v4_section_iii.md` (§III-I.4).
- *Cited Script:* Script 39e.
- *Verdict:* VERIFIED.
5. **Logistic OR 0.053 / 0.010 / 0.027.**
- *Claim Text:* "logistic regression... yields odds ratios of 0.053 (Firm B), 0.010 (Firm C), and 0.027 (Firm D)"
- *Location:* `paper_a_methodology_v4_section_iii.md` (§III-L.4) and `paper_a_results_v4_section_iv.md` (Table XXIII).
- *Cited Script:* Script 44.
- *Verdict:* VERIFIED.
## Newly introduced issues
1. **Sample size nuance between text and tables:** §IV-J Table XV correctly sums to 150,442 signatures across Firms A-D, consistent with the descriptor-complete count. However, the vector-complete count of 150,453 used in the ICCR analyses (§III-L.2) could confuse readers comparing numbers across tables. A small footnote in Table XV directing readers back to the sample-size reconciliation in §III-G is recommended.
## Phase 5 readiness
Partial. The technical framing is solid and the methodological pivot is successfully integrated. Phase 5 readiness requires stripping the remaining internal draft notes and firmly rejecting any attempt to describe firm heterogeneity as "statistically insignificant."
## Recommended next-step actions
1. **[Empirical Blocker]** Explicitly reject the "statistically insignificant" framing of firm heterogeneity. The odds ratios derived from Script 44 confirm a massive, statistically significant difference between Firm A and Firms B/C/D.
2. **[Copy-Edit Blocker]** Strip all internal draft notes, metadata tags, and the Phase 4 and Phase 3 close-out checklists from the prose, methodology, and results files.
3. **[Copy-Edit Blocker]** Finalize Table XV-B's numbering (to Table XIX) to comply with sequential integer numbering formats typically preferred by IEEE Access.
4. **[Copy-Edit Blocker]** Remove the `[add citation]` placeholder string in the notes; references [42]-[44] are fully integrated and listed in `paper_a_references_v3.md`.