Files
pdf_signature_extraction/paper/gemini_review_v3_8.md
gbanyan 615059a2c1 Paper A v3.10: resolve Opus 4.7 round-9 paper-vs-Appendix-A contradiction
Opus round-9 review (paper/opus_final_review_v3_9.md) dissented from
Gemini round-7 Accept and aligned with codex round-8 Minor, but for a
DIFFERENT issue all prior reviewers missed: the paper's main text in
four locations flatly claimed the BD/McCrary accountant-level null
"persists across the Appendix-A bin-width sweep", yet Appendix A
Table A.I itself documents a significant accountant-level cosine
transition at bin 0.005 with |Z_below|=3.23, |Z_above|=5.18 (both
past 1.96) located at cosine 0.980 --- on the upper edge of our two
threshold estimators' convergence band [0.973, 0.979]. This is a
paper-to-appendix contradiction that a careful reviewer would catch
in 30 seconds.

BLOCKER B1: BD/McCrary accountant-level claim softened across all
four locations to match what Appendix A Table A.I actually reports:
- Results IV-D.1 (lines 85-86): rewritten to say the null is not
  rejected at 2/3 cosine bin widths and 2/3 dHash bin widths, with
  the one cosine transition at bin 0.005 sitting on the upper edge
  of the convergence band and the one dHash transition at |Z|=1.96.
- Results IV-E Table VIII row (line 145): "no transition / no
  transition" changed to "0.980 at bin 0.005 only; null at 0.002,
  0.010" / "3.0 at bin 1.0 only ( |Z|=1.96); null at 0.2, 0.5".
- Results IV-E line 130 (Third finding): "does not produce a
  significant transition (robust across bin-width sweep)" replaced
  with "largely null at the accountant level --- no significant
  transition at 2/3 cosine bin widths and 2/3 dHash bin widths,
  with the one cosine transition at bin 0.005 sitting at cosine
  0.980 on the upper edge of the convergence band".
- Results IV-E line 152 (Table VIII synthesis paragraph): matched
  reframing.
- Discussion V-B (line 27): "does not produce a significant
  transition at the accountant level either" -> "largely null at
  the accountant level ... with the one cosine transition on the
  upper edge of the convergence band".
- Conclusion (line 16): matched reframing with power caveat
  retained.

MAJOR M1: Related Work L67 stale "well suited to detecting the
boundary between two generative mechanisms" framing (residue from
pre-demotion drafts) replaced with a local-density-discontinuity
diagnostic framing that matches the rest of the paper and flags
the signature-level bin-width sensitivity + accountant-level rarity
as documented in Appendix A.

MAJOR M2: Table XII orphaned in-text anchor --- Table XII is defined
inside IV-G.3 but had no in-text "Table XII reports ..." pointer at
its presentation location. Added a single sentence before the table
comment.

MINOR m1: Section IV-I.1 "4 of 30,000+ Firm A documents, 0.01%"
replaced with the exact "4 of 30,226 Firm A documents, 0.013%".

MINOR m2: Section IV-E "the two-dimensional two-component GMM"
wording ambiguity (reader might confuse with the already-selected
K*=3 GMM from BIC) replaced with explicit "a separately fit
two-component 2D GMM (reported as a cross-check on the 1D
accountant-level crossings)".

MINOR m3: Section IV-D L59 "downstream all-pairs analyses
(Tables XII, XVIII)" misnomer --- Table XII is per-signature
classifier output not all-pairs; Table XVIII's all-pairs are over
~16M pairs not 168,740. Replaced with an accurate list:
"same-CPA per-signature best-match analyses (Tables V and XII, and
the Firm-A per-signature rows of Tables XIII and XVIII)".

MINOR m4: Methodology III-H L156 "the validation role is played by
... the held-out Firm A fold" slightly overclaims what the held-out
fold establishes (the fold-level rates differ by 1-5 pp with
p<0.001). Parenthetical hedge added: "(which confirms the qualitative
replication-dominated framing; fold-level rate differences are
disclosed in Section IV-G.2)".

Also add:
- paper/opus_final_review_v3_9.md (Opus 4.7 max-effort review)
- paper/gemini_review_v3_8.md (Gemini round-7 Accept verdict, was
  missing from prior commit)

Abstract remains 243 words (under IEEE Access 250 limit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:25:04 +08:00

6.9 KiB

Independent Peer Review: Paper A (v3.8)

Target Venue: IEEE Access (Regular Paper)
Date: April 21, 2026
Reviewer: Gemini CLI (7th Round Independent Review)


1. Overall Verdict

Verdict: Accept

Rationale:
The authors have systematically and thoroughly addressed the three critical methodological and narrative blind spots identified in the Round-6 review. The manuscript is now methodologically robust, empirically expansive, and narratively disciplined. The statistical overclaim regarding the Burgstahler-Dichev / McCrary (BD/McCrary) test's power has been corrected, tempering the prior "proof of smoothness" into a much more defensible "consistent with smoothly mixed clusters" interpretation. The tautological False Rejection Rate (FRR) and Equal Error Rate (EER) evaluations have been successfully excised from Table X, effectively removing a major piece of reviewer-bait. Furthermore, the necessary narrative guardrails surrounding the document-level worst-case aggregation and the 15-signature count discrepancy have been implemented cleanly and precisely. The manuscript is highly polished and fully ready for submission to IEEE Access.


2. Round-6 Follow-Up Audit

In Round 6, three specific issues were flagged for revision. Below is the audit of their resolution in v3.8.

A. BD/McCrary Power-Artifact Reframe

Status: RESOLVED

The authors have successfully purged the "null proves smoothness" language and accurately reframed the accountant-level BD/McCrary null finding around its limited statistical power.

  • Results IV-D.1: The text now explicitly states that "at N = 686 accountants the BD/McCrary test has limited statistical power, so a non-rejection of the smoothness null does not by itself establish smoothness."
  • Results IV-E: The analysis correctly notes that the lack of a transition is "consistent with---though, at N = 686, not sufficient to affirmatively establish---clustered-but-smoothly-mixed accountant-level aggregates."
  • Discussion V-B: The framing is excellent: "the BD null alone cannot affirmatively establish smoothness---only fail to falsify it---and our substantive claim of smoothly-mixed clustering rests on the joint weight of the GMM fit, the dip test, and the BD null rather than on the BD null alone."
  • Discussion V-G (Limitations): A new, dedicated limitation explicitly highlights that the test "cannot reliably detect anything less than a sharp cliff-type density discontinuity" at this sample size.
  • Conclusion: Symmetrically updated to note that the test "cannot affirmatively establish smoothness, but its non-transition is consistent with the smoothly-mixed cluster boundaries."
  • Appendix A: Concludes perfectly that failure to reject the null "constrains the data only to distributions whose between-cluster transitions are gradual enough to escape the test's sensitivity at that sample size."

The rewrite is exceptionally clean. It does not feel awkward or bolted-on. By anchoring the smoothly-mixed claim on the joint weight of the GMM, the dip test, and the BD null, the authors maintain the strength of their conclusion without committing a Type II error fallacy.

B. Table X EER/FRR Removal

Status: RESOLVED

The tautological presentation of FRR against the byte-identical positive anchor has been entirely resolved.

  • Table X: The EER row and FRR column have been deleted. The table is now properly framed as an evaluation of False Acceptance Rate (FAR) against the 50,000 inter-CPA negative pairs.
  • Table Note: A clear, unambiguous table note has been added explaining why FRR is omitted ("the byte-identical subset has cosine \approx 1 by construction, so FRR against that subset is trivially 0 at every threshold below $1$").
  • Methodology III-K & Results IV-G.1: Both sections now synchronize with this logic, describing the byte-identical set as a "conservative subset" and correctly noting that an EER calculation would be an "arithmetic tautology rather than biometric performance."

This change significantly hardens the paper. By preempting the obvious critique from biometric/forensic reviewers, the authors project statistical maturity.

C. Section IV-I Narrative Safeguard & 15-Signature Footnote

Status: RESOLVED

Both minor narrative omissions have been addressed exactly as requested.

  • Section IV-I Narrative Safeguard: Right before Table XVII, the authors added a robust clarifying paragraph: "We emphasize that the document-level proportions below reflect the worst-case aggregation rule... Document-level rates therefore bound the share of reports in which at least one signature is non-hand-signed rather than the share in which both are." The explicit cross-reference to the intra-report agreement analysis in Table XVI completely defuses the risk of ecological fallacy.
  • 15-Signature Footnote: In Section IV-D, the text now clearly accounts for the discrepancy: "The N = 168{,}740 count used in Table V... is 15 signatures smaller than the 168{,}755 CPA-matched count reported in Table III: these 15 signatures belong to CPAs with exactly one signature in the entire corpus, for whom no same-CPA pairwise best-match statistic can be computed..." This effectively closes the arithmetic loop.

3. New Findings in v3.8

The rewrites in v3.8 are highly successful and introduce no new regressions or inconsistencies.

The primary concern when hedging a statistical claim is that the resulting language will create tension with other sections of the paper that still rely on the original, stronger claim. The authors avoided this trap brilliantly. By repeatedly stating that the conclusion of "smoothly-mixed clusters" rests on the convergence of the Gaussian Mixture Model (GMM) fit, the Hartigan dip test, and the BD/McCrary null—rather than the BD/McCrary null alone—the paper's thesis remains intact and fully supported.

The only minor artifact of the rewrite is a slight repetitiveness regarding the "N=686 limited power" caveat, which appears in IV-D.1, IV-E, V-B, V-G, the Conclusion, and Appendix A. However, in the context of academic publishing where reviewers frequently read sections non-linearly, this repetition is a feature, not a bug. It ensures the caveat is encountered regardless of how a reader approaches the text. The BD/McCrary claim is now perfectly calibrated: it contributes diagnostic value without being overburdened.


4. Final Submission Readiness

v3.8 is fully submission-ready.

The manuscript requires no further revisions (a v3.9 is not warranted). The paper presents a novel, large-scale, technically sophisticated pipeline that addresses a genuine gap in the document forensics literature. The methodological defenses—particularly the replication-dominated calibration strategy and the convergent threshold framework—are constructed to withstand the most rigorous peer review. The authors should proceed to submit to IEEE Access immediately.