6adbc4d3d793616896b626c189841b1cb7aa99f9
Verdict: Minor Revision (corroborates codex round-8 disposition;
does not corroborate Gemini round-2 Accept verdict).
Round-1 panel closure verification (line-cited audit):
- M1: hand-leaning eradicated from §IV body (grep verified 0 §IV
hits; 2 §III hits both in internal-strip text)
- M2: Table cascade XV→XIX + §IV-M XX-XXVI verified consistent
- M3: Abstract uses rounded 77-99% any-pair; §I/§V-C/§V-H/§VI all
give correct any-pair 76.7-83.7% + same-pair 97.0-99.96% split
- M4: §V headings A-H sequential
Codex round-8 blocker closure verified:
- Abstract 247 w (under 250 target)
- §IV-I now points to §IV-M Tables XXI-XXVI
- §IV-J line 177 footnote correctly classifies §IV-M.2/M.3/M.5 as
vector-complete 150,453
- Binary-collapse labels updated
Three substantive net-new findings all three prior reviewers + Gemini
round-2 missed:
N1 - Denominator inconsistency between §IV-J Table XIX Firm C
n=19,122 (single-firm-only) and §IV-M.4 Table XXIII Firm C
n=19,501 (mode-of-firms). 379-PDF mixed-firm count all
resolves to Firm C via Script 45's np.argmax mode-of-firms
rule. Not a bug; not disclosed. Verified against Script 45
line 256 source.
N2 - §III-M nine-tool validation table omits the composition-
decomposition diagnostic (Scripts 39b-39e) that anchors the
entire v4 pivot. The "nine-tool" framing — referenced from
Abstract, §I item 4, §VI item 1, and §I item 8 / §VI item 8
itself — is structurally incomplete without the v4 founda-
tional diagnostic. Highest-priority net-new.
N3 - §III-M validation table unnumbered (Opus round-1 flagged;
codex round-8 reflagged; still unfixed). Should be Table
XXVII.
Plus N4 (cross-firm hit matrix "None" assumption understates
mode-of-firms tie-break + any-pair semantics), N5 (§V-H limit 2
doesn't disclose firm-dependent within-firm violation), N6 (§III-K.4
line 149 stale cross-reference to v3.x §IV-I).
Provenance spot-checks (3 fresh):
- §IV-F line 112 K=3 cosine drift 0.018/0.006 — VERIFIED
- §IV-G Table XIII C1 shape stability 0.005/0.96/0.023 — VERIFIED
against Script 37 report
- §IV-M.4 Table XXIII D1 rate 0.1797 Wilson CI [0.1770, 0.1825] —
VERIFIED arithmetically; reconciled with per-firm 0.6201 /
0.1600 / 0.1635 / 0.0863 from Script 45 report (with N1 caveat)
Phase 5 splice readiness: Partial. Empirical core ready; recommended
round-4 copy-edit pass to patch N1 + N2 + N3 before splice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PDF Signature Extraction System
Automated extraction of handwritten Chinese signatures from PDF documents using hybrid VLM + Computer Vision approach.
Quick Start
Step 1: Extract Pages from CSV
cd /Volumes/NV2/pdf_recognize
source venv/bin/activate
python extract_pages_from_csv.py
Step 2: Extract Signatures
python extract_signatures_hybrid.py
Documentation
- PROJECT_DOCUMENTATION.md - Complete project history, all approaches tested, detailed results
- README_page_extraction.md - Page extraction documentation
- README_hybrid_extraction.md - Hybrid signature extraction documentation
Current Performance
Test Dataset: 5 PDF pages
- Signatures expected: 10
- Signatures found: 7
- Precision: 100% (no false positives)
- Recall: 70%
Key Features
✅ Hybrid Approach: VLM name extraction + CV detection + VLM verification
✅ Name-Based: Signatures saved as signature_周寶蓮.png
✅ No False Positives: Name-specific verification filters out dates, text, stamps
✅ Duplicate Prevention: Only one signature per person
✅ Handles Both: PDFs with/without text layer
File Structure
extract_pages_from_csv.py # Step 1: Extract pages
extract_signatures_hybrid.py # Step 2: Extract signatures (CURRENT)
README.md # This file
PROJECT_DOCUMENTATION.md # Complete documentation
README_page_extraction.md # Page extraction guide
README_hybrid_extraction.md # Signature extraction guide
Requirements
- Python 3.9+
- PyMuPDF, OpenCV, NumPy, Requests
- Ollama with qwen2.5vl:32b model
- Ollama instance: http://192.168.30.36:11434
Data
- Input:
/Volumes/NV2/PDF-Processing/master_signatures.csv(86,073 rows) - PDFs:
/Volumes/NV2/PDF-Processing/total-pdf/batch_*/ - Output:
/Volumes/NV2/PDF-Processing/signature-image-output/
Status
✅ Page extraction: Tested with 100 files, working ✅ Signature extraction: Tested with 5 files, 70% recall, 100% precision ⏳ Large-scale testing: Pending ⏳ Full dataset (86K files): Pending
See PROJECT_DOCUMENTATION.md for complete details.
Description
Automated extraction of handwritten Chinese signatures from PDF documents using hybrid VLM + Computer Vision approach. 70% recall, 100% precision.
Languages
Python
100%