gbanyan 9b0b8358a2 Paper A v3.12: resolve Gemini 3.1 Pro round-11 full-paper review findings
Round-11 Gemini 3.1 Pro fresh full-paper review (Minor Revision)
surfaced four issues that the prior 10 rounds (codex gpt-5.4 x4, codex
gpt-5.5 x1, Gemini 3.1 Pro x2, Opus 4.7 x1, paragraph-level v3.11
review) all missed:

1. MAJOR - Percentile-terminology contradiction between Section III-L
   L290 and Section III-H L160. III-L called 0.95 the "whole-sample
   Firm A P95" of the per-signature best-match cosine distribution,
   but III-H states 92.5% of Firm A signatures exceed 0.95. Under
   standard bottom-up percentile convention this makes 0.95 the P7.5,
   not the P95; Table XI calibration-fold data (Firm A cosine
   median = 0.9862, P5 = 0.9407) confirms true P95 is near 0.998.
   Fix: rewrote III-L L290 to state 0.95 corresponds to approximately
   the whole-sample Firm A P7.5 with the 92.5%/7.5% complement stated
   explicitly. dHash P95 claims elsewhere (Table XI, L229/L233) were
   already correct under standard convention and are unchanged.

2. MINOR - Firm A CPA count inconsistency. Discussion V-C L44 said
   "Nine additional Firm A CPAs are excluded from the GMM for having
   fewer than 10 signatures" but Results IV-G.2 L216 defines 178
   valid Firm A CPAs (180 registry minus 2 disambiguation-excluded);
   178 - 171 = 7. Fix: corrected to "seven are outside the GMM" with
   explicit 178-baseline and cross-reference to IV-G.2.

3. MINOR - Table XVI mixed-firm handling broken promise. Results
   L355-356 previously said "mixed-firm reports are reported
   separately" but Table XVI only lists single-firm rows summing to
   exactly 83,970, and no subsequent prose reports the 384 mixed-firm
   agreement rate. Fix: rewrote L355-356 to state Table XVI covers
   the 83,970 single-firm reports only and that the 384 mixed-firm
   reports (0.46%) are excluded because firm-level agreement is not
   well defined when the two signers are at different firms.

4. MINOR - Contribution-count structural inconsistency. Introduction
   enumerates seven contributions, Conclusion opens with "Our
   contributions are fourfold." Fix: rewrote the Conclusion lead to
   "The seven numbered contributions listed in Section I can be
   grouped into four broader methodological themes," making the
   grouping explicit.

No re-computation. All tables (IV-XVIII) and Appendix A numbers
unchanged. Abstract unchanged (still 248/250 words).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:10:20 +08:00

PDF Signature Extraction System

Automated extraction of handwritten Chinese signatures from PDF documents using hybrid VLM + Computer Vision approach.

Quick Start

Step 1: Extract Pages from CSV

cd /Volumes/NV2/pdf_recognize
source venv/bin/activate
python extract_pages_from_csv.py

Step 2: Extract Signatures

python extract_signatures_hybrid.py

Documentation

Current Performance

Test Dataset: 5 PDF pages

  • Signatures expected: 10
  • Signatures found: 7
  • Precision: 100% (no false positives)
  • Recall: 70%

Key Features

Hybrid Approach: VLM name extraction + CV detection + VLM verification Name-Based: Signatures saved as signature_周寶蓮.png No False Positives: Name-specific verification filters out dates, text, stamps Duplicate Prevention: Only one signature per person Handles Both: PDFs with/without text layer

File Structure

extract_pages_from_csv.py          # Step 1: Extract pages
extract_signatures_hybrid.py       # Step 2: Extract signatures (CURRENT)
README.md                          # This file
PROJECT_DOCUMENTATION.md           # Complete documentation
README_page_extraction.md          # Page extraction guide
README_hybrid_extraction.md        # Signature extraction guide

Requirements

Data

  • Input: /Volumes/NV2/PDF-Processing/master_signatures.csv (86,073 rows)
  • PDFs: /Volumes/NV2/PDF-Processing/total-pdf/batch_*/
  • Output: /Volumes/NV2/PDF-Processing/signature-image-output/

Status

Page extraction: Tested with 100 files, working Signature extraction: Tested with 5 files, 70% recall, 100% precision Large-scale testing: Pending Full dataset (86K files): Pending

See PROJECT_DOCUMENTATION.md for complete details.

S
Description
Automated extraction of handwritten Chinese signatures from PDF documents using hybrid VLM + Computer Vision approach. 70% recall, 100% precision.
Readme 3.7 MiB
Languages
Python 100%