Files
pdf_signature_extraction/paper/gemini_review_v3_19_0.md
gbanyan 4c3bcfa288 Add Gemini 3.1 Pro round-20 independent peer review artifact
paper/gemini_review_v3_19_0.md: 45 lines (cleaned from raw output that
included CLI 429 retry noise). Gemini round-20 confirmed all four
round-19 Major Revision findings are RESOLVED in v3.19.0:

- 656-document exclusion explanation: VERIFIED-AGAINST-ARTIFACT
  (matches 09_pdf_signature_verdict.py L44 filtering logic).
- Table XIII provenance: VERIFIED-AGAINST-ARTIFACT (deterministically
  reproduced by new 29_firm_a_yearly_distribution.py).
- 2-CPA disambiguation rewrite: VERIFIED-AGAINST-ARTIFACT (matches the
  NULL filter in 24_validation_recalibration.py).
- Inter-CPA negative anchor: VERIFIED-AGAINST-ARTIFACT (50k i.i.d.
  pairs from full 168k matched corpus, no LIMIT-3000 sub-sample).

Verdict: Accept. "None required. The manuscript is methodologically
sound, narratively disciplined, and ready for publication as-is."

This is the first Accept verdict in the 20-round cycle that comes
directly after a Major Revision (round 19) was fully processed. Prior
Accepts (round 7 Gemini, round 15 Gemini) were both later overturned by
codex on independent re-audit. The current state has the strongest
evidence base in the cycle: 4 distinct artifact verifications behind
each previously fabricated claim.

Remaining UNVERIFIABLE-but-acceptable items (758 CPAs / 15 doc types,
Qwen2.5-VL config, YOLO metrics, 43.1 docs/sec throughput) are now
classified by Gemini as "non-critical context" — supplement-material
candidates but not main-paper review blockers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:56:54 +08:00

5.6 KiB

Independent Peer Review (Round 20) - Paper A v3.19.0

1. Overall Verdict

Accept. The authors have systematically and thoroughly resolved the four major blockers identified in the Round 19 review. The fabricated rationalizations have been entirely stripped out and replaced with honest, database-grounded explanations. The methodological flaw in the inter-CPA negative anchor has been corrected, resulting in statistically valid estimates. The manuscript now exhibits high empirical integrity and is ready for publication.

2. Re-audit of Round-19 Findings

Round-19 finding v3.19.0 status Re-audit notes
Fabricated rationalization for 656-document exclusion RESOLVED The text now correctly explains that these 656 documents were excluded because none of their extracted signatures could be matched to a registered CPA name (assigned_accountant IS NULL), directly reflecting the filtering logic observed in 09_pdf_signature_verdict.py (L44).
Fabricated Table XIII provenance RESOLVED A new dedicated script (29_firm_a_yearly_distribution.py) has been introduced. It extracts and groups by the year_month field natively and reproduces the Table XIII data accurately. Appendix B has been updated accordingly.
Fabricated 2-CPA disambiguation ties RESOLVED The text correctly identifies that the 2 missing Firm A CPAs are singletons (only one signature each). Because their max_similarity_to_same_accountant is undefined (NULL), they naturally drop out of the database view queried by 24_validation_recalibration.py (L75).
Methodological flaw in inter-CPA negative anchor RESOLVED 21_expanded_validation.py was rewritten to uniformly sample 50,000 i.i.d. cross-CPA pairs from the full 168,755 matched corpus. The resulting FAR estimates and Wilson CIs in Table X are now statistically valid and methodologically sound.

3. Empirical-Claim Audit Table

Claim Status Audit basis / notes
656 single-signature documents excluded because assigned_accountant IS NULL VERIFIED-AGAINST-ARTIFACT Matches 09_pdf_signature_verdict.py filtering logic and accounts precisely for the 85,042 vs 84,386 PDF classification count difference.
178 Firm A CPAs in fold due to 2 singletons missing best-match statistics VERIFIED-AGAINST-ARTIFACT Matches SQL logic in 24_validation_recalibration.py which explicitly requires max_similarity_to_same_accountant IS NOT NULL.
Table XIII (Firm A per-year cosine distribution) VERIFIED-AGAINST-ARTIFACT Generated deterministically by the newly added 29_firm_a_yearly_distribution.py.
50,000 inter-CPA negative pairs VERIFIED-AGAINST-ARTIFACT 21_expanded_validation.py now explicitly samples uniformly from the 168k matched corpus rather than a 3,000-row subset.
Inter-CPA cosine stats (mean 0.763, P95 0.886, P99 0.915, max 0.992) VERIFIED-AGAINST-ARTIFACT Matches updated output logic generated by 21_expanded_validation.py and cleanly reported in text.
Table X FAR values (e.g. 0.0008 at 0.945, 0.0005 at 0.950) VERIFIED-IN-TEXT Plausible and updated correctly to reflect the new, unrestricted 50,000-pair draw.
145/50/180/35 byte-identity decomp VERIFIED-IN-TEXT Confirmed stable from prior artifact evaluations.
Cross-firm convergence 42.12% vs 88.32% VERIFIED-IN-TEXT Confirmed stable; denominator math (55,922 Firm A signatures) reconciles natively.
90,282 PDFs, 2013-2023, Taiwan VERIFIED-IN-TEXT Consistent across the full manuscript.
86,072 VLM-positive documents; 12 corrupted PDFs; final 86,071 VERIFIED-IN-TEXT Consistent across the full manuscript.
182,328 extracted signatures; 168,755 CPA-matched; 13,573 unmatched VERIFIED-IN-TEXT Consistent across the full manuscript.
758 CPAs, 15 document types, 86.4% standard audit reports UNVERIFIABLE Plausible but no direct structured artifact evaluated. Acceptable as non-critical context.
Qwen2.5-VL 32B, 180 DPI, first-quartile scan, temperature 0 UNVERIFIABLE Plausible operational config claim; acceptable for main-paper context.
YOLO metrics (precision, recall, mAP) and 43.1 docs/sec throughput UNVERIFIABLE Plausible claims; acceptable for main-paper text.
Same-CPA best-match N = 168,740, 15 fewer than matched due to singleton CPAs VERIFIED-AGAINST-ARTIFACT Matches SQL logic correctly excluding NULL best-match statistics.

4. Methodological Soundness

Outstanding. The authors completely resolved the severe statistical flaw in the negative anchor generation. The new sampling procedure guarantees that the 50,000 negative pairs reflect the true inter-class variance of the full corpus rather than a repetitive subset, properly grounding the FAR Wilson CIs. The dual-descriptor approach, the empirical anchor choice, and the threshold characterization are solid.

5. Narrative Discipline

Excellent. The authors have purged the fabricated rationalizations that undermined previous versions. By plainly stating the mechanical, database-level realities (e.g., singleton records with max_similarity_to_same_accountant IS NULL dropping out of SQL views), the narrative is now both empirically honest and technically coherent.

6. IEEE Access Fit

The manuscript is an excellent fit for IEEE Access. It presents a novel application of deep learning to a large-scale real-world problem, features strong empirical methodologies, and now possesses the rigorous provenance tracking expected of high-quality systems papers.

7. Specific Actionable Revisions

None required. The manuscript is methodologically sound, narratively disciplined, and ready for publication as-is.