pdf_signature_extraction/NEW_SESSION_PROMPT.txt

I'm continuing work on the PDF signature extraction project at /Volumes/NV2/pdf_recognize/

Please read these files to understand the current state:
1. /Volumes/NV2/pdf_recognize/SESSION_INIT.md (start here)
2. /Volumes/NV2/pdf_recognize/PROJECT_DOCUMENTATION.md (complete history)

Key context:
- Working hybrid approach: VLM name extraction + CV detection + VLM verification
- Test results: 70% recall, 100% precision (5 PDFs tested)
- Important: VLM coordinates are unreliable (32% offset discovered), we use names instead
- Current script: extract_signatures_hybrid.py

I want to: [CHOOSE ONE OR DESCRIBE YOUR GOAL]

Option A: Improve recall from 70% to 90%+
- Tune CV detection parameters to catch more signatures
- Test if missing signatures are in rejected folder

Option B: Scale up testing to 100 PDFs
- Verify reliability on larger dataset
- Analyze results and calculate overall metrics

Option C: Commit current solution to git
- Follow instructions in COMMIT_SUMMARY.md
- Tag release as v1.0-hybrid-70percent

Option D: Process full dataset (86,073 files)
- Estimate time and optimize if needed
- Set up monitoring and resume capability

Option E: Debug specific issue
- [Describe the issue you're encountering]

Option F: Other
- [Describe what you want to work on]