Add Gemini 3.1 Pro round-20 independent peer review artifact

paper/gemini_review_v3_19_0.md: 45 lines (cleaned from raw output that
included CLI 429 retry noise). Gemini round-20 confirmed all four
round-19 Major Revision findings are RESOLVED in v3.19.0:

- 656-document exclusion explanation: VERIFIED-AGAINST-ARTIFACT
  (matches 09_pdf_signature_verdict.py L44 filtering logic).
- Table XIII provenance: VERIFIED-AGAINST-ARTIFACT (deterministically
  reproduced by new 29_firm_a_yearly_distribution.py).
- 2-CPA disambiguation rewrite: VERIFIED-AGAINST-ARTIFACT (matches the
  NULL filter in 24_validation_recalibration.py).
- Inter-CPA negative anchor: VERIFIED-AGAINST-ARTIFACT (50k i.i.d.
  pairs from full 168k matched corpus, no LIMIT-3000 sub-sample).

Verdict: Accept. "None required. The manuscript is methodologically
sound, narratively disciplined, and ready for publication as-is."

This is the first Accept verdict in the 20-round cycle that comes
directly after a Major Revision (round 19) was fully processed. Prior
Accepts (round 7 Gemini, round 15 Gemini) were both later overturned by
codex on independent re-audit. The current state has the strongest
evidence base in the cycle: 4 distinct artifact verifications behind
each previously fabricated claim.

Remaining UNVERIFIABLE-but-acceptable items (758 CPAs / 15 doc types,
Qwen2.5-VL config, YOLO metrics, 43.1 docs/sec throughput) are now
classified by Gemini as "non-critical context" — supplement-material
candidates but not main-paper review blockers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-27 21:56:54 +08:00
parent 5e7e76cf35
commit 4c3bcfa288
+45
View File
@@ -0,0 +1,45 @@
# Independent Peer Review (Round 20) - Paper A v3.19.0
## 1. Overall Verdict
**Accept.** The authors have systematically and thoroughly resolved the four major blockers identified in the Round 19 review. The fabricated rationalizations have been entirely stripped out and replaced with honest, database-grounded explanations. The methodological flaw in the inter-CPA negative anchor has been corrected, resulting in statistically valid estimates. The manuscript now exhibits high empirical integrity and is ready for publication.
## 2. Re-audit of Round-19 Findings
| Round-19 finding | v3.19.0 status | Re-audit notes |
|---|---|---|
| Fabricated rationalization for 656-document exclusion | **RESOLVED** | The text now correctly explains that these 656 documents were excluded because none of their extracted signatures could be matched to a registered CPA name (`assigned_accountant IS NULL`), directly reflecting the filtering logic observed in `09_pdf_signature_verdict.py` (L44). |
| Fabricated Table XIII provenance | **RESOLVED** | A new dedicated script (`29_firm_a_yearly_distribution.py`) has been introduced. It extracts and groups by the `year_month` field natively and reproduces the Table XIII data accurately. Appendix B has been updated accordingly. |
| Fabricated 2-CPA disambiguation ties | **RESOLVED** | The text correctly identifies that the 2 missing Firm A CPAs are singletons (only one signature each). Because their `max_similarity_to_same_accountant` is undefined (NULL), they naturally drop out of the database view queried by `24_validation_recalibration.py` (L75). |
| Methodological flaw in inter-CPA negative anchor | **RESOLVED** | `21_expanded_validation.py` was rewritten to uniformly sample 50,000 i.i.d. cross-CPA pairs from the full 168,755 matched corpus. The resulting FAR estimates and Wilson CIs in Table X are now statistically valid and methodologically sound. |
## 3. Empirical-Claim Audit Table
| Claim | Status | Audit basis / notes |
|---|---|---|
| 656 single-signature documents excluded because `assigned_accountant IS NULL` | **VERIFIED-AGAINST-ARTIFACT** | Matches `09_pdf_signature_verdict.py` filtering logic and accounts precisely for the 85,042 vs 84,386 PDF classification count difference. |
| 178 Firm A CPAs in fold due to 2 singletons missing best-match statistics | **VERIFIED-AGAINST-ARTIFACT** | Matches SQL logic in `24_validation_recalibration.py` which explicitly requires `max_similarity_to_same_accountant IS NOT NULL`. |
| Table XIII (Firm A per-year cosine distribution) | **VERIFIED-AGAINST-ARTIFACT** | Generated deterministically by the newly added `29_firm_a_yearly_distribution.py`. |
| 50,000 inter-CPA negative pairs | **VERIFIED-AGAINST-ARTIFACT** | `21_expanded_validation.py` now explicitly samples uniformly from the `168k` matched corpus rather than a 3,000-row subset. |
| Inter-CPA cosine stats (mean 0.763, P95 0.886, P99 0.915, max 0.992) | **VERIFIED-AGAINST-ARTIFACT** | Matches updated output logic generated by `21_expanded_validation.py` and cleanly reported in text. |
| Table X FAR values (e.g. 0.0008 at 0.945, 0.0005 at 0.950) | **VERIFIED-IN-TEXT** | Plausible and updated correctly to reflect the new, unrestricted 50,000-pair draw. |
| 145/50/180/35 byte-identity decomp | **VERIFIED-IN-TEXT** | Confirmed stable from prior artifact evaluations. |
| Cross-firm convergence 42.12% vs 88.32% | **VERIFIED-IN-TEXT** | Confirmed stable; denominator math (55,922 Firm A signatures) reconciles natively. |
| 90,282 PDFs, 2013-2023, Taiwan | **VERIFIED-IN-TEXT** | Consistent across the full manuscript. |
| 86,072 VLM-positive documents; 12 corrupted PDFs; final 86,071 | **VERIFIED-IN-TEXT** | Consistent across the full manuscript. |
| 182,328 extracted signatures; 168,755 CPA-matched; 13,573 unmatched | **VERIFIED-IN-TEXT** | Consistent across the full manuscript. |
| 758 CPAs, 15 document types, 86.4% standard audit reports | **UNVERIFIABLE** | Plausible but no direct structured artifact evaluated. Acceptable as non-critical context. |
| Qwen2.5-VL 32B, 180 DPI, first-quartile scan, temperature 0 | **UNVERIFIABLE** | Plausible operational config claim; acceptable for main-paper context. |
| YOLO metrics (precision, recall, mAP) and 43.1 docs/sec throughput | **UNVERIFIABLE** | Plausible claims; acceptable for main-paper text. |
| Same-CPA best-match N = 168,740, 15 fewer than matched due to singleton CPAs | **VERIFIED-AGAINST-ARTIFACT** | Matches SQL logic correctly excluding NULL best-match statistics. |
## 4. Methodological Soundness
Outstanding. The authors completely resolved the severe statistical flaw in the negative anchor generation. The new sampling procedure guarantees that the 50,000 negative pairs reflect the true inter-class variance of the full corpus rather than a repetitive subset, properly grounding the FAR Wilson CIs. The dual-descriptor approach, the empirical anchor choice, and the threshold characterization are solid.
## 5. Narrative Discipline
Excellent. The authors have purged the fabricated rationalizations that undermined previous versions. By plainly stating the mechanical, database-level realities (e.g., singleton records with `max_similarity_to_same_accountant IS NULL` dropping out of SQL views), the narrative is now both empirically honest and technically coherent.
## 6. IEEE Access Fit
The manuscript is an excellent fit for IEEE Access. It presents a novel application of deep learning to a large-scale real-world problem, features strong empirical methodologies, and now possesses the rigorous provenance tracking expected of high-quality systems papers.
## 7. Specific Actionable Revisions
None required. The manuscript is methodologically sound, narratively disciplined, and ready for publication as-is.