Compare commits
3 Commits
v1.0-hybri
...
pp-ocrv5-r
| Author | SHA1 | Date | |
|---|---|---|---|
| 21df0ff387 | |||
| 8f231da3bc | |||
| 479d4e0019 |
252
CURRENT_STATUS.md
Normal file
252
CURRENT_STATUS.md
Normal file
@@ -0,0 +1,252 @@
|
||||
# 项目当前状态
|
||||
|
||||
**更新时间**: 2025-10-29
|
||||
**分支**: `paddleocr-improvements`
|
||||
**PaddleOCR版本**: 2.7.3 (稳定版本)
|
||||
|
||||
---
|
||||
|
||||
## 当前进度总结
|
||||
|
||||
### ✅ 已完成
|
||||
|
||||
1. **PaddleOCR服务器部署** (192.168.30.36:5555)
|
||||
- 版本: PaddleOCR 2.7.3
|
||||
- GPU: 启用
|
||||
- 语言: 中文
|
||||
- 状态: 稳定运行
|
||||
|
||||
2. **基础Pipeline实现**
|
||||
- ✅ PDF → 图像渲染 (DPI=300)
|
||||
- ✅ PaddleOCR文字检测 (26个区域/页)
|
||||
- ✅ 文本区域遮罩 (padding=25px)
|
||||
- ✅ 候选区域检测
|
||||
- ✅ 区域合并算法 (12→4 regions)
|
||||
|
||||
3. **OpenCV分离方法测试**
|
||||
- Method 1: 笔画宽度分析 - ❌ 效果差
|
||||
- Method 2: 连通组件基础分析 - ⚠️ 中等效果
|
||||
- Method 3: 综合特征分析 - ✅ **最佳方案** (86.5%手写保留率)
|
||||
|
||||
4. **测试结果**
|
||||
- 测试文件: `201301_1324_AI1_page3.pdf`
|
||||
- 预期签名: 2个 (楊智惠, 張志銘)
|
||||
- 检测结果: 2个签名区域成功合并
|
||||
- 保留率: 86.5% 手写内容
|
||||
|
||||
---
|
||||
|
||||
## 技术架构
|
||||
|
||||
```
|
||||
PDF文档
|
||||
↓
|
||||
1. 渲染 (PyMuPDF, 300 DPI)
|
||||
↓
|
||||
2. PaddleOCR检测 (识别印刷文字)
|
||||
↓
|
||||
3. 遮罩印刷文字 (黑色填充, padding=25px)
|
||||
↓
|
||||
4. 区域检测 (OpenCV形态学)
|
||||
↓
|
||||
5. 区域合并 (距离阈值: H≤100px, V≤50px)
|
||||
↓
|
||||
6. 特征分析 (大小+笔画长度+规律性)
|
||||
↓
|
||||
7. [TODO] VLM验证
|
||||
↓
|
||||
签名提取结果
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 核心文件
|
||||
|
||||
| 文件 | 说明 | 状态 |
|
||||
|------|------|------|
|
||||
| `paddleocr_client.py` | PaddleOCR REST客户端 | ✅ 稳定 |
|
||||
| `test_mask_and_detect.py` | 基础遮罩+检测测试 | ✅ 完成 |
|
||||
| `test_opencv_separation.py` | OpenCV方法1+2测试 | ✅ 完成 |
|
||||
| `test_opencv_advanced.py` | OpenCV方法3(最佳) | ✅ 完成 |
|
||||
| `extract_signatures_paddleocr_improved.py` | 完整Pipeline (Method B+E) | ⚠️ Method E有问题 |
|
||||
| `PADDLEOCR_STATUS.md` | 详细技术文档 | ✅ 完成 |
|
||||
|
||||
---
|
||||
|
||||
## Method 3: 综合特征分析 (当前最佳方案)
|
||||
|
||||
### 判断依据
|
||||
|
||||
**您的观察** (非常准确):
|
||||
1. ✅ **手写字比印刷字大** - height > 50px
|
||||
2. ✅ **手写笔画长度更长** - stroke_ratio > 0.4
|
||||
3. ✅ **印刷体规律,手写潦草** - compactness, solidity
|
||||
|
||||
### 评分系统
|
||||
|
||||
```python
|
||||
handwriting_score = 0
|
||||
|
||||
# 大小评分
|
||||
if height > 50: score += 3
|
||||
elif height > 35: score += 2
|
||||
|
||||
# 笔画长度评分
|
||||
if stroke_ratio > 0.5: score += 2
|
||||
elif stroke_ratio > 0.35: score += 1
|
||||
|
||||
# 规律性评分
|
||||
if is_irregular: score += 1 # 不规律 = 手写
|
||||
else: score -= 1 # 规律 = 印刷
|
||||
|
||||
# 面积评分
|
||||
if area > 2000: score += 2
|
||||
elif area < 500: score -= 1
|
||||
|
||||
# 分类: score > 0 → 手写
|
||||
```
|
||||
|
||||
### 效果
|
||||
|
||||
- 手写像素保留: **86.5%** ✅
|
||||
- 印刷像素过滤: 13.5%
|
||||
- Top 10组件全部正确分类
|
||||
|
||||
---
|
||||
|
||||
## 已识别问题
|
||||
|
||||
### 1. Method E (两阶段OCR) 失效 ❌
|
||||
|
||||
**原因**: PaddleOCR无法区分"印刷"和"手写",第二次OCR会把手写也识别并删除
|
||||
|
||||
**解决方案**:
|
||||
- ❌ 不使用Method E
|
||||
- ✅ 使用Method B (区域合并) + OpenCV Method 3
|
||||
|
||||
### 2. 印刷名字与手写签名重叠
|
||||
|
||||
**现象**: 区域包含"楊 智 惠"(印刷) + 手写签名
|
||||
**策略**: 接受少量印刷残留,优先保证手写完整性
|
||||
**后续**: 用VLM最终验证
|
||||
|
||||
### 3. Masking padding 矛盾
|
||||
|
||||
**小padding (5-10px)**: 印刷残留多,但不伤手写
|
||||
**大padding (25px)**: 印刷删除干净,但可能遮住手写边缘
|
||||
**当前**: 使用 25px,依赖OpenCV Method 3过滤残留
|
||||
|
||||
---
|
||||
|
||||
## 下一步计划
|
||||
|
||||
### 短期 (继续当前方案)
|
||||
|
||||
- [ ] 整合 Method B + OpenCV Method 3 为完整Pipeline
|
||||
- [ ] 添加VLM验证步骤
|
||||
- [ ] 在10个样本上测试
|
||||
- [ ] 调优参数 (height阈值, merge距离等)
|
||||
|
||||
### 中期 (PP-OCRv5研究)
|
||||
|
||||
**新branch**: `pp-ocrv5-research`
|
||||
|
||||
- [ ] 研究PaddleOCR 3.3.0新API
|
||||
- [ ] 测试PP-OCRv5手写检测能力
|
||||
- [ ] 对比性能: v4 vs v5
|
||||
- [ ] 评估是否升级
|
||||
|
||||
---
|
||||
|
||||
## 服务器配置
|
||||
|
||||
### PaddleOCR服务器 (Linux)
|
||||
|
||||
```
|
||||
Host: 192.168.30.36:5555
|
||||
SSH: ssh gblinux
|
||||
路径: ~/Project/paddleocr-server/
|
||||
版本: PaddleOCR 2.7.3, numpy 1.26.4, opencv-contrib 4.6.0.66
|
||||
启动: cd ~/Project/paddleocr-server && source venv/bin/activate && python paddleocr_server.py
|
||||
日志: ~/Project/paddleocr-server/server_stable.log
|
||||
```
|
||||
|
||||
### VLM服务器 (Ollama)
|
||||
|
||||
```
|
||||
Host: 192.168.30.36:11434
|
||||
模型: qwen2.5vl:32b
|
||||
状态: 未在当前Pipeline中使用
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 测试数据
|
||||
|
||||
### 样本文件
|
||||
|
||||
```
|
||||
/Volumes/NV2/PDF-Processing/signature-image-output/201301_1324_AI1_page3.pdf
|
||||
- 页面: 第3页
|
||||
- 预期签名: 2个 (楊智惠, 張志銘)
|
||||
- 尺寸: 2481x3510 pixels
|
||||
```
|
||||
|
||||
### 输出目录
|
||||
|
||||
```
|
||||
/Volumes/NV2/PDF-Processing/signature-image-output/
|
||||
├── mask_test/ # 基础遮罩测试结果
|
||||
├── paddleocr_improved/ # Method B+E测试 (E失败)
|
||||
├── opencv_separation_test/ # Method 1+2测试
|
||||
└── opencv_advanced_test/ # Method 3测试 (最佳)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 性能对比
|
||||
|
||||
| 方法 | 手写保留 | 印刷去除 | 总评 |
|
||||
|------|---------|---------|------|
|
||||
| 基础遮罩 | 100% | 低 | ⚠️ 太多印刷残留 |
|
||||
| Method 1 (笔画宽度) | 0% | - | ❌ 完全失败 |
|
||||
| Method 2 (连通组件) | 1% | 中 | ❌ 丢失太多手写 |
|
||||
| Method 3 (综合特征) | **86.5%** | 高 | ✅ **最佳** |
|
||||
|
||||
---
|
||||
|
||||
## Git状态
|
||||
|
||||
```
|
||||
当前分支: paddleocr-improvements
|
||||
基于: PaddleOCR-Cover
|
||||
标签: paddleocr-v1-basic (基础遮罩版本)
|
||||
|
||||
待提交:
|
||||
- OpenCV高级分离方法 (Method 3)
|
||||
- 完整测试脚本和结果
|
||||
- 文档更新
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 已知限制
|
||||
|
||||
1. **参数需调优**: height阈值、merge距离等在不同文档可能需要调整
|
||||
2. **依赖文档质量**: 模糊、倾斜的文档可能效果变差
|
||||
3. **计算性能**: OpenCV处理较快,但完整Pipeline需要优化
|
||||
4. **泛化能力**: 仅在1个样本测试,需要更多样本验证
|
||||
|
||||
---
|
||||
|
||||
## 联系与协作
|
||||
|
||||
**主要开发者**: Claude Code
|
||||
**协作方式**: 会话式开发
|
||||
**代码仓库**: 本地Git仓库
|
||||
**测试环境**: macOS (本地) + Linux (服务器)
|
||||
|
||||
---
|
||||
|
||||
**状态**: ✅ 当前方案稳定,可继续开发
|
||||
**建议**: 先在更多样本测试Method 3,再考虑PP-OCRv5升级
|
||||
432
NEW_SESSION_HANDOFF.md
Normal file
432
NEW_SESSION_HANDOFF.md
Normal file
@@ -0,0 +1,432 @@
|
||||
# 新对话交接文档 - PP-OCRv5研究
|
||||
|
||||
**日期**: 2025-10-29
|
||||
**前序对话**: PaddleOCR-Cover分支开发
|
||||
**当前分支**: `paddleocr-improvements` (稳定)
|
||||
**新分支**: `pp-ocrv5-research` (待创建)
|
||||
|
||||
---
|
||||
|
||||
## 🎯 任务目标
|
||||
|
||||
研究和实现 **PP-OCRv5** 的手写签名检测功能
|
||||
|
||||
---
|
||||
|
||||
## 📋 背景信息
|
||||
|
||||
### 当前状况
|
||||
|
||||
✅ **已有稳定方案** (`paddleocr-improvements` 分支):
|
||||
- PaddleOCR 2.7.3 + OpenCV Method 3
|
||||
- 86.5%手写保留率
|
||||
- 区域合并算法工作良好
|
||||
- 测试: 1个PDF成功检测2个签名
|
||||
|
||||
⚠️ **PP-OCRv5升级遇到问题**:
|
||||
- PaddleOCR 3.3.0 API完全改变
|
||||
- 旧服务器代码不兼容
|
||||
- 需要深入研究新API
|
||||
|
||||
### 为什么要研究PP-OCRv5?
|
||||
|
||||
**文档显示**: https://www.paddleocr.ai/main/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5.html
|
||||
|
||||
PP-OCRv5性能提升:
|
||||
- 手写中文检测: **0.706 → 0.803** (+13.7%)
|
||||
- 手写英文检测: **0.249 → 0.841** (+237%)
|
||||
- 可能支持直接输出手写区域坐标
|
||||
|
||||
**潜在优势**:
|
||||
1. 更好的手写识别能力
|
||||
2. 可能内置手写/印刷分类
|
||||
3. 更准确的坐标输出
|
||||
4. 减少复杂的后处理
|
||||
|
||||
---
|
||||
|
||||
## 🔧 技术栈
|
||||
|
||||
### 服务器环境
|
||||
|
||||
```
|
||||
Host: 192.168.30.36 (Linux GPU服务器)
|
||||
SSH: ssh gblinux
|
||||
目录: ~/Project/paddleocr-server/
|
||||
```
|
||||
|
||||
**当前稳定版本**:
|
||||
- PaddleOCR: 2.7.3
|
||||
- numpy: 1.26.4
|
||||
- opencv-contrib-python: 4.6.0.66
|
||||
- 服务器文件: `paddleocr_server.py`
|
||||
|
||||
**已安装但未使用**:
|
||||
- PaddleOCR 3.3.0 (PP-OCRv5)
|
||||
- 临时服务器: `paddleocr_server_v5.py` (未完成)
|
||||
|
||||
### 本地环境
|
||||
|
||||
```
|
||||
macOS
|
||||
Python: 3.14
|
||||
虚拟环境: venv/
|
||||
客户端: paddleocr_client.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 核心问题
|
||||
|
||||
### 1. API变更
|
||||
|
||||
**旧API (2.7.3)**:
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(lang='ch')
|
||||
result = ocr.ocr(image_np, cls=False)
|
||||
|
||||
# 返回格式:
|
||||
# [[[box], (text, confidence)], ...]
|
||||
```
|
||||
|
||||
**新API (3.3.0)** - ⚠️ 未完全理解:
|
||||
```python
|
||||
# 方式1: 传统方式 (Deprecated)
|
||||
result = ocr.ocr(image_np) # 警告: Please use predict instead
|
||||
|
||||
# 方式2: 新方式
|
||||
from paddlex import create_model
|
||||
model = create_model("???") # 模型名未知
|
||||
result = model.predict(image_np)
|
||||
|
||||
# 返回格式: ???
|
||||
```
|
||||
|
||||
### 2. 遇到的错误
|
||||
|
||||
**错误1**: `cls` 参数不再支持
|
||||
```python
|
||||
# 错误: PaddleOCR.predict() got an unexpected keyword argument 'cls'
|
||||
result = ocr.ocr(image_np, cls=False) # ❌
|
||||
```
|
||||
|
||||
**错误2**: 返回格式改变
|
||||
```python
|
||||
# 旧代码解析失败:
|
||||
text = item[1][0] # ❌ IndexError
|
||||
confidence = item[1][1] # ❌ IndexError
|
||||
```
|
||||
|
||||
**错误3**: 模型名称错误
|
||||
```python
|
||||
model = create_model("PP-OCRv5_server") # ❌ Model not supported
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 研究任务清单
|
||||
|
||||
### Phase 1: API研究 (优先级高)
|
||||
|
||||
- [ ] **阅读官方文档**
|
||||
- PP-OCRv5完整文档
|
||||
- PaddleX API文档
|
||||
- 迁移指南 (如果有)
|
||||
|
||||
- [ ] **理解新API**
|
||||
```python
|
||||
# 需要搞清楚:
|
||||
1. 正确的导入方式
|
||||
2. 模型初始化方法
|
||||
3. predict()参数和返回格式
|
||||
4. 如何区分手写/印刷
|
||||
5. 是否有手写检测专用功能
|
||||
```
|
||||
|
||||
- [ ] **编写测试脚本**
|
||||
- `test_pp_ocrv5_api.py` - 测试基础API调用
|
||||
- 打印完整的result数据结构
|
||||
- 对比v4和v5的返回差异
|
||||
|
||||
### Phase 2: 服务器适配
|
||||
|
||||
- [ ] **重写服务器代码**
|
||||
- 适配新API
|
||||
- 正确解析返回数据
|
||||
- 保持REST接口兼容
|
||||
|
||||
- [ ] **测试稳定性**
|
||||
- 测试10个PDF样本
|
||||
- 检查GPU利用率
|
||||
- 对比v4性能
|
||||
|
||||
### Phase 3: 手写检测功能
|
||||
|
||||
- [ ] **查找手写检测能力**
|
||||
```python
|
||||
# 可能的方式:
|
||||
1. result中是否有 text_type 字段?
|
||||
2. 是否有专门的 handwriting_detection 模型?
|
||||
3. 是否有置信度差异可以利用?
|
||||
4. PP-Structure 的 layout 分析?
|
||||
```
|
||||
|
||||
- [ ] **对比测试**
|
||||
- v4 (当前方案) vs v5
|
||||
- 准确率、召回率、速度
|
||||
- 手写检测能力
|
||||
|
||||
### Phase 4: 集成决策
|
||||
|
||||
- [ ] **性能评估**
|
||||
- 如果v5更好 → 升级
|
||||
- 如果改进不明显 → 保持v4
|
||||
|
||||
- [ ] **文档更新**
|
||||
- 记录v5使用方法
|
||||
- 更新PADDLEOCR_STATUS.md
|
||||
|
||||
---
|
||||
|
||||
## 🔍 调试技巧
|
||||
|
||||
### 1. 查看完整返回数据
|
||||
|
||||
```python
|
||||
import pprint
|
||||
result = model.predict(image)
|
||||
pprint.pprint(result) # 完整输出所有字段
|
||||
|
||||
# 或者
|
||||
import json
|
||||
print(json.dumps(result, indent=2, ensure_ascii=False))
|
||||
```
|
||||
|
||||
### 2. 查找官方示例
|
||||
|
||||
```bash
|
||||
# 在服务器上查找PaddleOCR安装示例
|
||||
find ~/Project/paddleocr-server/venv/lib/python3.12/site-packages/paddleocr -name "*.py" | grep example
|
||||
|
||||
# 查看源码
|
||||
less ~/Project/paddleocr-server/venv/lib/python3.12/site-packages/paddleocr/paddleocr.py
|
||||
```
|
||||
|
||||
### 3. 查看可用模型
|
||||
|
||||
```python
|
||||
from paddlex.inference.models import OFFICIAL_MODELS
|
||||
print(OFFICIAL_MODELS) # 列出所有支持的模型名
|
||||
```
|
||||
|
||||
### 4. Web文档搜索
|
||||
|
||||
重点查看:
|
||||
- https://github.com/PaddlePaddle/PaddleOCR
|
||||
- https://www.paddleocr.ai
|
||||
- https://github.com/PaddlePaddle/PaddleX
|
||||
|
||||
---
|
||||
|
||||
## 📂 文件结构
|
||||
|
||||
```
|
||||
/Volumes/NV2/pdf_recognize/
|
||||
├── CURRENT_STATUS.md # 当前状态文档 ✅
|
||||
├── NEW_SESSION_HANDOFF.md # 本文件 ✅
|
||||
├── PADDLEOCR_STATUS.md # 详细技术文档 ✅
|
||||
├── SESSION_INIT.md # 初始会话信息
|
||||
│
|
||||
├── paddleocr_client.py # 稳定客户端 (v2.7.3) ✅
|
||||
├── paddleocr_server_v5.py # v5服务器 (未完成) ⚠️
|
||||
│
|
||||
├── test_paddleocr_client.py # 基础测试
|
||||
├── test_mask_and_detect.py # 遮罩+检测
|
||||
├── test_opencv_separation.py # Method 1+2
|
||||
├── test_opencv_advanced.py # Method 3 (最佳) ✅
|
||||
├── extract_signatures_paddleocr_improved.py # 完整Pipeline
|
||||
│
|
||||
└── check_rejected_for_missing.py # 诊断脚本
|
||||
```
|
||||
|
||||
**服务器端** (`ssh gblinux`):
|
||||
```
|
||||
~/Project/paddleocr-server/
|
||||
├── paddleocr_server.py # v2.7.3稳定版 ✅
|
||||
├── paddleocr_server_v5.py # v5版本 (待完成) ⚠️
|
||||
├── paddleocr_server_backup.py # 备份
|
||||
├── server_stable.log # 当前运行日志
|
||||
└── venv/ # 虚拟环境
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 快速启动
|
||||
|
||||
### 启动稳定服务器 (v2.7.3)
|
||||
|
||||
```bash
|
||||
ssh gblinux
|
||||
cd ~/Project/paddleocr-server
|
||||
source venv/bin/activate
|
||||
python paddleocr_server.py
|
||||
```
|
||||
|
||||
### 测试连接
|
||||
|
||||
```bash
|
||||
# 本地Mac
|
||||
cd /Volumes/NV2/pdf_recognize
|
||||
source venv/bin/activate
|
||||
python test_paddleocr_client.py
|
||||
```
|
||||
|
||||
### 创建新研究分支
|
||||
|
||||
```bash
|
||||
cd /Volumes/NV2/pdf_recognize
|
||||
git checkout -b pp-ocrv5-research
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚨 注意事项
|
||||
|
||||
### 1. 不要破坏稳定版本
|
||||
|
||||
- `paddleocr-improvements` 分支保持稳定
|
||||
- 所有v5实验在新分支 `pp-ocrv5-research`
|
||||
- 服务器保留 `paddleocr_server.py` (v2.7.3)
|
||||
- 新代码命名: `paddleocr_server_v5.py`
|
||||
|
||||
### 2. 环境隔离
|
||||
|
||||
- 服务器虚拟环境可能需要重建
|
||||
- 或者用Docker隔离v4和v5
|
||||
- 避免版本冲突
|
||||
|
||||
### 3. 性能测试
|
||||
|
||||
- 记录v4和v5的具体指标
|
||||
- 至少测试10个样本
|
||||
- 包括速度、准确率、召回率
|
||||
|
||||
### 4. 文档驱动
|
||||
|
||||
- 每个发现记录到文档
|
||||
- API用法写清楚
|
||||
- 便于未来维护
|
||||
|
||||
---
|
||||
|
||||
## 📊 成功标准
|
||||
|
||||
### 最低目标
|
||||
|
||||
- [ ] 成功运行PP-OCRv5基础OCR
|
||||
- [ ] 理解新API调用方式
|
||||
- [ ] 服务器稳定运行
|
||||
- [ ] 记录完整文档
|
||||
|
||||
### 理想目标
|
||||
|
||||
- [ ] 发现手写检测功能
|
||||
- [ ] 性能超过v4方案
|
||||
- [ ] 简化Pipeline复杂度
|
||||
- [ ] 提升准确率 > 90%
|
||||
|
||||
### 决策点
|
||||
|
||||
**如果v5明显更好** → 升级到v5,废弃v4
|
||||
**如果v5改进不明显** → 保持v4,v5仅作研究记录
|
||||
**如果v5有bug** → 等待官方修复,暂用v4
|
||||
|
||||
---
|
||||
|
||||
## 📞 问题排查
|
||||
|
||||
### 遇到问题时
|
||||
|
||||
1. **先查日志**: `tail -f ~/Project/paddleocr-server/server_stable.log`
|
||||
2. **查看源码**: 在venv里找PaddleOCR代码
|
||||
3. **搜索Issues**: https://github.com/PaddlePaddle/PaddleOCR/issues
|
||||
4. **降级测试**: 确认v2.7.3是否还能用
|
||||
|
||||
### 常见问题
|
||||
|
||||
**Q: 服务器启动失败?**
|
||||
A: 检查numpy版本 (需要 < 2.0)
|
||||
|
||||
**Q: 找不到模型?**
|
||||
A: 模型名可能变化,查看OFFICIAL_MODELS
|
||||
|
||||
**Q: API调用失败?**
|
||||
A: 对比官方文档,可能参数格式变化
|
||||
|
||||
---
|
||||
|
||||
## 🎓 学习资源
|
||||
|
||||
### 官方文档
|
||||
|
||||
1. **PP-OCRv5**: https://www.paddleocr.ai/main/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5.html
|
||||
2. **PaddleOCR GitHub**: https://github.com/PaddlePaddle/PaddleOCR
|
||||
3. **PaddleX**: https://github.com/PaddlePaddle/PaddleX
|
||||
|
||||
### 相关技术
|
||||
|
||||
- PaddlePaddle深度学习框架
|
||||
- PP-Structure文档结构分析
|
||||
- 手写识别 (Handwriting Recognition)
|
||||
- 版面分析 (Layout Analysis)
|
||||
|
||||
---
|
||||
|
||||
## 💡 提示
|
||||
|
||||
### 如果发现内置手写检测
|
||||
|
||||
可能的用法:
|
||||
```python
|
||||
# 猜测1: 返回结果包含类型
|
||||
for item in result:
|
||||
text_type = item.get('type') # 'printed' or 'handwritten'?
|
||||
|
||||
# 猜测2: 专门的layout模型
|
||||
from paddlex import create_model
|
||||
layout_model = create_model("PP-Structure")
|
||||
layout_result = layout_model.predict(image)
|
||||
# 可能返回: text, handwriting, figure, table...
|
||||
|
||||
# 猜测3: 置信度差异
|
||||
# 手写文字置信度可能更低
|
||||
```
|
||||
|
||||
### 如果没有内置手写检测
|
||||
|
||||
那么当前OpenCV Method 3仍然是最佳方案,v5仅提供更好的OCR准确度。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 完成检查清单
|
||||
|
||||
研究完成后,确保:
|
||||
|
||||
- [ ] 新API用法完全理解并文档化
|
||||
- [ ] 服务器代码重写并测试通过
|
||||
- [ ] 性能对比数据记录
|
||||
- [ ] 决策文档 (升级 vs 保持v4)
|
||||
- [ ] 代码提交到 `pp-ocrv5-research` 分支
|
||||
- [ ] 更新 `CURRENT_STATUS.md`
|
||||
- [ ] 如果升级: 合并到main分支
|
||||
|
||||
---
|
||||
|
||||
**祝研究顺利!** 🚀
|
||||
|
||||
有问题随时查阅:
|
||||
- `CURRENT_STATUS.md` - 当前方案详情
|
||||
- `PADDLEOCR_STATUS.md` - 技术细节和问题分析
|
||||
|
||||
**最重要**: 记录所有发现,无论成功或失败,都是宝贵经验!
|
||||
475
PADDLEOCR_STATUS.md
Normal file
475
PADDLEOCR_STATUS.md
Normal file
@@ -0,0 +1,475 @@
|
||||
# PaddleOCR Signature Extraction - Status & Options
|
||||
|
||||
**Date**: October 28, 2025
|
||||
**Branch**: `PaddleOCR-Cover`
|
||||
**Current Stage**: Masking + Region Detection Working, Refinement Needed
|
||||
|
||||
---
|
||||
|
||||
## Current Approach Overview
|
||||
|
||||
**Strategy**: PaddleOCR masks printed text → Detect remaining regions → VLM verification
|
||||
|
||||
### Pipeline Steps
|
||||
|
||||
```
|
||||
1. PaddleOCR (Linux server 192.168.30.36:5555)
|
||||
└─> Detect printed text bounding boxes
|
||||
|
||||
2. OpenCV Masking (Local)
|
||||
└─> Black out all printed text areas
|
||||
|
||||
3. Region Detection (Local)
|
||||
└─> Find non-white areas (potential handwriting)
|
||||
|
||||
4. VLM Verification (TODO)
|
||||
└─> Confirm which regions are handwritten signatures
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Results (File: 201301_1324_AI1_page3.pdf)
|
||||
|
||||
### Performance
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Printed text regions masked | 26 |
|
||||
| Candidate regions detected | 12 |
|
||||
| Actual signatures found | 2 ✅ |
|
||||
| False positives (printed text) | 9 |
|
||||
| Split signatures | 1 (Region 5 might be part of Region 4) |
|
||||
|
||||
### Success
|
||||
|
||||
✅ **PaddleOCR detected most printed text** (26 regions)
|
||||
✅ **Masking works correctly** (black rectangles)
|
||||
✅ **Region detection found both signatures** (regions 2, 4)
|
||||
✅ **No false negatives** (didn't miss any signatures)
|
||||
|
||||
### Issues Identified
|
||||
|
||||
❌ **Problem 1: Handwriting Split Into Multiple Regions**
|
||||
- Some signatures may be split into 2+ separate regions
|
||||
- Example: Region 4 and Region 5 might be parts of same signature area
|
||||
- Caused by gaps between handwritten strokes after masking
|
||||
|
||||
❌ **Problem 2: Printed Name + Handwritten Signature Mixed**
|
||||
- Region 2: Contains "張 志 銘" (printed) + handwritten signature
|
||||
- Region 4: Contains "楊 智 惠" (printed) + handwritten signature
|
||||
- PaddleOCR missed these printed names, so they weren't masked
|
||||
- Final output includes both printed and handwritten parts
|
||||
|
||||
❌ **Problem 3: Printed Text Not Masked by PaddleOCR**
|
||||
- 9 regions contain printed text that PaddleOCR didn't detect
|
||||
- These became false positive candidates
|
||||
- Examples: dates, company names, paragraph text
|
||||
- Shows PaddleOCR's detection isn't 100% complete
|
||||
|
||||
---
|
||||
|
||||
## Proposed Solutions
|
||||
|
||||
### Problem 1: Split Signatures
|
||||
|
||||
#### Option A: More Aggressive Morphology ⭐ EASY
|
||||
**Approach**: Increase kernel size and iterations to connect nearby strokes
|
||||
|
||||
```python
|
||||
# Current settings:
|
||||
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
|
||||
morphed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=2)
|
||||
|
||||
# Proposed settings:
|
||||
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 15)) # 3x larger
|
||||
morphed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=5) # More iterations
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Simple one-line change
|
||||
- Connects nearby strokes automatically
|
||||
- Fast execution
|
||||
|
||||
**Cons**:
|
||||
- May merge unrelated regions if too aggressive
|
||||
- Need to tune parameters carefully
|
||||
- Could lose fine details
|
||||
|
||||
**Recommendation**: ⭐ Try first - easiest to implement and test
|
||||
|
||||
---
|
||||
|
||||
#### Option B: Region Merging After Detection ⭐⭐ MEDIUM (RECOMMENDED)
|
||||
**Approach**: After detecting all regions, merge those that are close together
|
||||
|
||||
```python
|
||||
def merge_nearby_regions(regions, distance_threshold=50):
|
||||
"""
|
||||
Merge regions that are within distance_threshold pixels of each other.
|
||||
|
||||
Args:
|
||||
regions: List of region dicts with 'box' (x, y, w, h)
|
||||
distance_threshold: Maximum pixels between regions to merge
|
||||
|
||||
Returns:
|
||||
List of merged regions
|
||||
"""
|
||||
# Algorithm:
|
||||
# 1. Calculate distance between all region pairs
|
||||
# 2. If distance < threshold, merge their bounding boxes
|
||||
# 3. Repeat until no more merges possible
|
||||
|
||||
merged = []
|
||||
# Implementation here...
|
||||
return merged
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Keeps signatures together intelligently
|
||||
- Won't merge distant unrelated regions
|
||||
- Preserves original stroke details
|
||||
- Can use vertical/horizontal distance separately
|
||||
|
||||
**Cons**:
|
||||
- Need to tune distance threshold
|
||||
- More complex than Option A
|
||||
- May need multiple merge passes
|
||||
|
||||
**Recommendation**: ⭐⭐ **Best balance** - implement this first
|
||||
|
||||
---
|
||||
|
||||
#### Option C: Don't Split - Extract Larger Context ⭐ EASY
|
||||
**Approach**: When extracting regions, add significant padding to capture full context
|
||||
|
||||
```python
|
||||
# Current: padding = 10 pixels
|
||||
padding = 50 # Much larger padding
|
||||
|
||||
# Or: Merge all regions in the bottom 20% of page
|
||||
# (signatures are usually at the bottom)
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Guaranteed to capture complete signatures
|
||||
- Very simple to implement
|
||||
- No risk of losing parts
|
||||
|
||||
**Cons**:
|
||||
- May include extra unwanted content
|
||||
- Larger image files
|
||||
- Makes VLM verification more complex
|
||||
|
||||
**Recommendation**: ⭐ Use as fallback if B doesn't work
|
||||
|
||||
---
|
||||
|
||||
### Problem 2: Printed + Handwritten in Same Region
|
||||
|
||||
#### Option A: Expand PaddleOCR Masking Boxes ⭐ EASY
|
||||
**Approach**: Add padding when masking text boxes to catch edges
|
||||
|
||||
```python
|
||||
padding = 20 # pixels
|
||||
|
||||
for (x, y, w, h) in text_boxes:
|
||||
# Expand box in all directions
|
||||
x_pad = max(0, x - padding)
|
||||
y_pad = max(0, y - padding)
|
||||
w_pad = min(image.shape[1] - x_pad, w + 2*padding)
|
||||
h_pad = min(image.shape[0] - y_pad, h + 2*padding)
|
||||
|
||||
cv2.rectangle(masked_image, (x_pad, y_pad),
|
||||
(x_pad + w_pad, y_pad + h_pad), (0, 0, 0), -1)
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Very simple - one parameter change
|
||||
- Catches text edges and nearby text
|
||||
- Fast execution
|
||||
|
||||
**Cons**:
|
||||
- If padding too large, may mask handwriting
|
||||
- If padding too small, still misses text
|
||||
- Hard to find perfect padding value
|
||||
|
||||
**Recommendation**: ⭐ Quick test - try with padding=20-30
|
||||
|
||||
---
|
||||
|
||||
#### Option B: Run PaddleOCR Again on Each Region ⭐⭐ MEDIUM
|
||||
**Approach**: Second-pass OCR on extracted regions to find remaining printed text
|
||||
|
||||
```python
|
||||
def clean_region(region_image, ocr_client):
|
||||
"""
|
||||
Remove any remaining printed text from a region.
|
||||
|
||||
Args:
|
||||
region_image: Extracted candidate region
|
||||
ocr_client: PaddleOCR client
|
||||
|
||||
Returns:
|
||||
Cleaned image with only handwriting
|
||||
"""
|
||||
# Run OCR on this specific region
|
||||
text_boxes = ocr_client.get_text_boxes(region_image)
|
||||
|
||||
# Mask any detected printed text
|
||||
cleaned = region_image.copy()
|
||||
for (x, y, w, h) in text_boxes:
|
||||
cv2.rectangle(cleaned, (x, y), (x+w, y+h), (0, 0, 0), -1)
|
||||
|
||||
return cleaned
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Very accurate - catches printed text PaddleOCR missed initially
|
||||
- Clean separation of printed vs handwritten
|
||||
- No manual tuning needed
|
||||
|
||||
**Cons**:
|
||||
- 2x slower (OCR call per region)
|
||||
- May occasionally mask handwritten text if it looks printed
|
||||
- More complex pipeline
|
||||
|
||||
**Recommendation**: ⭐⭐ Good option if masking padding isn't enough
|
||||
|
||||
---
|
||||
|
||||
#### Option C: Computer Vision Stroke Analysis ⭐⭐⭐ HARD
|
||||
**Approach**: Analyze stroke characteristics to distinguish printed vs handwritten
|
||||
|
||||
```python
|
||||
def separate_printed_handwritten(region_image):
|
||||
"""
|
||||
Use CV techniques to separate printed from handwritten.
|
||||
|
||||
Techniques:
|
||||
- Stroke width analysis (printed = uniform, handwritten = variable)
|
||||
- Edge detection + smoothness (printed = sharp, handwritten = organic)
|
||||
- Connected component analysis
|
||||
- Hough line detection (printed = straight, handwritten = curved)
|
||||
"""
|
||||
# Complex implementation...
|
||||
pass
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- No API calls needed (fast)
|
||||
- Can work when OCR fails
|
||||
- Learns patterns in data
|
||||
|
||||
**Cons**:
|
||||
- Very complex to implement
|
||||
- May not be reliable across different documents
|
||||
- Requires significant tuning
|
||||
- Hard to maintain
|
||||
|
||||
**Recommendation**: ❌ Skip for now - too complex, uncertain results
|
||||
|
||||
---
|
||||
|
||||
#### Option D: VLM Crop Guidance ⚠️ RISKY
|
||||
**Approach**: Ask VLM to provide coordinates of handwriting location
|
||||
|
||||
```python
|
||||
prompt = """
|
||||
This image contains both printed and handwritten text.
|
||||
Where is the handwritten signature located?
|
||||
Provide coordinates as: x_start, y_start, x_end, y_end
|
||||
"""
|
||||
|
||||
# VLM returns coordinates
|
||||
# Crop to that region only
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- VLM understands visual context
|
||||
- Can distinguish printed vs handwritten
|
||||
|
||||
**Cons**:
|
||||
- **VLM coordinates are unreliable** (32% offset discovered in previous tests!)
|
||||
- This was the original problem that led to PaddleOCR approach
|
||||
- May extract wrong region
|
||||
|
||||
**Recommendation**: ❌ **DO NOT USE** - VLM coordinates proven unreliable
|
||||
|
||||
---
|
||||
|
||||
#### Option E: Two-Stage Hybrid Approach ⭐⭐⭐ BEST (RECOMMENDED)
|
||||
**Approach**: Combine detection with targeted cleaning
|
||||
|
||||
```python
|
||||
def extract_signatures_twostage(pdf_path):
|
||||
"""
|
||||
Stage 1: Detect candidate regions (current pipeline)
|
||||
Stage 2: Clean each region
|
||||
"""
|
||||
# Stage 1: Full page processing
|
||||
image = render_pdf(pdf_path)
|
||||
text_boxes = ocr_client.get_text_boxes(image)
|
||||
masked_image = mask_text_regions(image, text_boxes, padding=20)
|
||||
candidate_regions = detect_regions(masked_image)
|
||||
|
||||
# Stage 2: Per-region cleaning
|
||||
signatures = []
|
||||
for region_box in candidate_regions:
|
||||
# Extract region from ORIGINAL image (not masked)
|
||||
region_img = extract_region(image, region_box)
|
||||
|
||||
# Option 1: Run OCR again to find remaining printed text
|
||||
region_text_boxes = ocr_client.get_text_boxes(region_img)
|
||||
cleaned_region = mask_text_regions(region_img, region_text_boxes)
|
||||
|
||||
# Option 2: Ask VLM if it contains handwriting (no coordinates!)
|
||||
is_handwriting = vlm_verify(cleaned_region)
|
||||
|
||||
if is_handwriting:
|
||||
signatures.append(cleaned_region)
|
||||
|
||||
return signatures
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Best accuracy - two passes of OCR
|
||||
- Combines strengths of both approaches
|
||||
- VLM only for yes/no, not coordinates
|
||||
- Clean final output with only handwriting
|
||||
|
||||
**Cons**:
|
||||
- Slower (2 OCR calls per page)
|
||||
- More complex code
|
||||
- Higher computational cost
|
||||
|
||||
**Recommendation**: ⭐⭐⭐ **BEST OVERALL** - implement this for production
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### Phase 1: Quick Wins (Test Immediately)
|
||||
1. **Expand masking padding** (Problem 2, Option A) - 5 minutes
|
||||
2. **More aggressive morphology** (Problem 1, Option A) - 5 minutes
|
||||
3. **Test and measure improvement**
|
||||
|
||||
### Phase 2: Region Merging (If Phase 1 insufficient)
|
||||
4. **Implement region merging algorithm** (Problem 1, Option B) - 30 minutes
|
||||
5. **Test on multiple PDFs**
|
||||
6. **Tune distance threshold**
|
||||
|
||||
### Phase 3: Two-Stage Approach (Best quality)
|
||||
7. **Implement second-pass OCR on regions** (Problem 2, Option E) - 1 hour
|
||||
8. **Add VLM verification** (Step 4 of pipeline) - 30 minutes
|
||||
9. **Full pipeline testing**
|
||||
|
||||
---
|
||||
|
||||
## Code Files Status
|
||||
|
||||
### Existing Files ✅
|
||||
- **`paddleocr_client.py`** - REST API client for PaddleOCR server
|
||||
- **`test_paddleocr_client.py`** - Connection and OCR test
|
||||
- **`test_mask_and_detect.py`** - Current masking + detection pipeline
|
||||
|
||||
### To Be Created 📝
|
||||
- **`extract_signatures_paddleocr.py`** - Production pipeline with all improvements
|
||||
- **`region_merger.py`** - Region merging utilities
|
||||
- **`vlm_verifier.py`** - VLM handwriting verification
|
||||
|
||||
---
|
||||
|
||||
## Server Configuration
|
||||
|
||||
**PaddleOCR Server**:
|
||||
- Host: `192.168.30.36:5555`
|
||||
- Running: ✅ Yes (PID: 210417)
|
||||
- Version: 3.3.0
|
||||
- GPU: Enabled
|
||||
- Language: Chinese (lang='ch')
|
||||
|
||||
**VLM Server**:
|
||||
- Host: `192.168.30.36:11434` (Ollama)
|
||||
- Model: `qwen2.5vl:32b`
|
||||
- Status: Not tested yet in this pipeline
|
||||
|
||||
---
|
||||
|
||||
## Test Plan
|
||||
|
||||
### Test File
|
||||
- **File**: `201301_1324_AI1_page3.pdf`
|
||||
- **Expected signatures**: 2 (楊智惠, 張志銘)
|
||||
- **Current recall**: 100% (found both)
|
||||
- **Current precision**: 16.7% (2 correct out of 12 regions)
|
||||
|
||||
### Success Metrics After Improvements
|
||||
|
||||
| Metric | Current | Target |
|
||||
|--------|---------|--------|
|
||||
| Signatures found | 2/2 (100%) | 2/2 (100%) |
|
||||
| False positives | 10 | < 2 |
|
||||
| Precision | 16.7% | > 80% |
|
||||
| Signatures split | Unknown | 0 |
|
||||
| Printed text in regions | Yes | No |
|
||||
|
||||
---
|
||||
|
||||
## Git Branch Strategy
|
||||
|
||||
**Current branch**: `PaddleOCR-Cover`
|
||||
**Status**: Masking + Region Detection working, needs refinement
|
||||
|
||||
**Recommended next steps**:
|
||||
1. Commit current state with tag: `paddleocr-v1-basic`
|
||||
2. Create feature branches:
|
||||
- `paddleocr-region-merging` - For Problem 1 solutions
|
||||
- `paddleocr-two-stage` - For Problem 2 solutions
|
||||
3. Merge best solution back to `PaddleOCR-Cover`
|
||||
|
||||
---
|
||||
|
||||
## Next Actions
|
||||
|
||||
### Immediate (Today)
|
||||
- [ ] Commit current working state
|
||||
- [ ] Test Phase 1 quick wins (padding + morphology)
|
||||
- [ ] Measure improvement
|
||||
|
||||
### Short-term (This week)
|
||||
- [ ] Implement Region Merging (Option B)
|
||||
- [ ] Implement Two-Stage OCR (Option E)
|
||||
- [ ] Add VLM verification
|
||||
- [ ] Test on 10 PDFs
|
||||
|
||||
### Long-term (Production)
|
||||
- [ ] Optimize performance (parallel processing)
|
||||
- [ ] Error handling and logging
|
||||
- [ ] Process full 86K dataset
|
||||
- [ ] Compare with previous hybrid approach (70% recall)
|
||||
|
||||
---
|
||||
|
||||
## Comparison: PaddleOCR vs Previous Hybrid Approach
|
||||
|
||||
### Previous Approach (VLM-Cover branch)
|
||||
- **Method**: VLM names + CV detection + VLM verification
|
||||
- **Results**: 70% recall, 100% precision
|
||||
- **Problem**: Missed 30% of signatures (CV parameters too conservative)
|
||||
|
||||
### PaddleOCR Approach (Current)
|
||||
- **Method**: PaddleOCR masking + CV detection + VLM verification
|
||||
- **Results**: 100% recall (found both signatures)
|
||||
- **Problem**: Low precision (many false positives), printed text not fully removed
|
||||
|
||||
### Winner: TBD
|
||||
- PaddleOCR shows **better recall potential**
|
||||
- After implementing refinements (Phase 2-3), should achieve **high recall + high precision**
|
||||
- Need to test on larger dataset to confirm
|
||||
|
||||
---
|
||||
|
||||
**Document version**: 1.0
|
||||
**Last updated**: October 28, 2025
|
||||
**Author**: Claude Code
|
||||
**Status**: Ready for implementation
|
||||
281
PP_OCRV5_RESEARCH_FINDINGS.md
Normal file
281
PP_OCRV5_RESEARCH_FINDINGS.md
Normal file
@@ -0,0 +1,281 @@
|
||||
# PP-OCRv5 研究發現
|
||||
|
||||
**日期**: 2025-01-27
|
||||
**分支**: pp-ocrv5-research
|
||||
**狀態**: 研究完成
|
||||
|
||||
---
|
||||
|
||||
## 📋 研究摘要
|
||||
|
||||
我們成功升級並測試了 PP-OCRv5,以下是關鍵發現:
|
||||
|
||||
### ✅ 成功完成
|
||||
1. PaddleOCR 升級:2.7.3 → 3.3.2
|
||||
2. 新 API 理解和驗證
|
||||
3. 手寫檢測能力測試
|
||||
4. 數據結構分析
|
||||
|
||||
### ❌ 關鍵限制
|
||||
**PP-OCRv5 沒有內建的手寫 vs 印刷文字分類功能**
|
||||
|
||||
---
|
||||
|
||||
## 🔧 技術細節
|
||||
|
||||
### API 變更
|
||||
|
||||
**舊 API (2.7.3)**:
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(lang='ch', show_log=False)
|
||||
result = ocr.ocr(image_np, cls=False)
|
||||
```
|
||||
|
||||
**新 API (3.3.2)**:
|
||||
```python
|
||||
from paddleocr import PaddleOCR
|
||||
|
||||
ocr = PaddleOCR(
|
||||
text_detection_model_name="PP-OCRv5_server_det",
|
||||
text_recognition_model_name="PP-OCRv5_server_rec",
|
||||
use_doc_orientation_classify=False,
|
||||
use_doc_unwarping=False,
|
||||
use_textline_orientation=False
|
||||
# ❌ 不再支持: show_log, cls
|
||||
)
|
||||
|
||||
result = ocr.predict(image_path) # ✅ 使用 predict() 而不是 ocr()
|
||||
```
|
||||
|
||||
### 主要 API 差異
|
||||
|
||||
| 特性 | v2.7.3 | v3.3.2 |
|
||||
|------|--------|--------|
|
||||
| 初始化 | `PaddleOCR(lang='ch')` | `PaddleOCR(text_detection_model_name=...)` |
|
||||
| 預測方法 | `ocr.ocr()` | `ocr.predict()` |
|
||||
| `cls` 參數 | ✅ 支持 | ❌ 已移除 |
|
||||
| `show_log` 參數 | ✅ 支持 | ❌ 已移除 |
|
||||
| 返回格式 | `[[[box], (text, conf)], ...]` | `OCRResult` 對象 with `.json` 屬性 |
|
||||
| 依賴 | 獨立 | 需要 PaddleX >=3.3.0 |
|
||||
|
||||
---
|
||||
|
||||
## 📊 返回數據結構
|
||||
|
||||
### v3.3.2 返回格式
|
||||
|
||||
```python
|
||||
result = ocr.predict(image_path)
|
||||
json_data = result[0].json['res']
|
||||
|
||||
# 可用字段:
|
||||
json_data = {
|
||||
'input_path': str, # 輸入圖片路徑
|
||||
'page_index': None, # PDF 頁碼(圖片為 None)
|
||||
'model_settings': dict, # 模型配置
|
||||
'dt_polys': list, # 檢測多邊形框 (N, 4, 2)
|
||||
'dt_scores': list, # 檢測置信度
|
||||
'rec_texts': list, # 識別文字
|
||||
'rec_scores': list, # 識別置信度
|
||||
'rec_boxes': list, # 矩形框 [x_min, y_min, x_max, y_max]
|
||||
'rec_polys': list, # 識別多邊形框
|
||||
'text_det_params': dict, # 檢測參數
|
||||
'text_rec_score_thresh': float, # 識別閾值
|
||||
'text_type': str, # ⚠️ 'general' (語言類型,不是手寫分類)
|
||||
'textline_orientation_angles': list, # 文字方向角度
|
||||
'return_word_box': bool # 是否返回詞級框
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 手寫檢測功能測試
|
||||
|
||||
### 測試問題
|
||||
**PP-OCRv5 是否能區分手寫和印刷文字?**
|
||||
|
||||
### 測試結果:❌ 不能
|
||||
|
||||
#### 測試過程
|
||||
1. ✅ 發現 `text_type` 字段
|
||||
2. ❌ 但 `text_type = 'general'` 是**語言類型**,不是書寫風格
|
||||
3. ✅ 查閱官方文檔確認
|
||||
4. ❌ 沒有任何字段標註手寫 vs 印刷
|
||||
|
||||
#### 官方文檔說明
|
||||
- `text_type` 可能的值:'general', 'ch', 'en', 'japan', 'pinyin'
|
||||
- 這些值指的是**語言/腳本類型**
|
||||
- **不是**手寫 (handwritten) vs 印刷 (printed) 的分類
|
||||
|
||||
### 結論
|
||||
PP-OCRv5 雖然能**識別**手寫文字,但**不會標註**某個文字區域是手寫還是印刷。
|
||||
|
||||
---
|
||||
|
||||
## 📈 性能提升(根據官方文檔)
|
||||
|
||||
### 手寫文字識別準確率
|
||||
|
||||
| 類型 | PP-OCRv4 | PP-OCRv5 | 提升 |
|
||||
|------|----------|----------|------|
|
||||
| 手寫中文 | 0.706 | 0.803 | **+13.7%** |
|
||||
| 手寫英文 | 0.249 | 0.841 | **+237%** |
|
||||
|
||||
### 實測結果(full_page_original.png)
|
||||
|
||||
**v3.3.2 (PP-OCRv5)**:
|
||||
- 檢測到 **50** 個文字區域
|
||||
- 平均置信度:~0.98
|
||||
- 示例:
|
||||
- "依本會計師核閱結果..." (0.9936)
|
||||
- "在所有重大方面有違反..." (0.9976)
|
||||
|
||||
**待測試**: v2.7.3 的對比結果(需要回退測試)
|
||||
|
||||
---
|
||||
|
||||
## 💡 升級影響分析
|
||||
|
||||
### 優勢
|
||||
1. ✅ **更好的手寫識別能力**(+13.7%)
|
||||
2. ✅ **可能檢測到更多手寫區域**
|
||||
3. ✅ **更高的識別置信度**
|
||||
4. ✅ **統一的 Pipeline 架構**
|
||||
|
||||
### 劣勢
|
||||
1. ❌ **無法區分手寫和印刷**(仍需 OpenCV Method 3)
|
||||
2. ⚠️ **API 完全不兼容**(需重寫服務器代碼)
|
||||
3. ⚠️ **依賴 PaddleX**(額外的依賴)
|
||||
4. ⚠️ **OpenCV 版本升級**(4.6 → 4.10)
|
||||
|
||||
---
|
||||
|
||||
## 🎯 對我們項目的影響
|
||||
|
||||
### 當前方案(v2.7.3 + OpenCV Method 3)
|
||||
```
|
||||
PDF → PaddleOCR 檢測 → 遮罩印刷文字 → OpenCV Method 3 分離手寫 → VLM 驗證
|
||||
↑ 86.5% 手寫保留率
|
||||
```
|
||||
|
||||
### PP-OCRv5 方案
|
||||
```
|
||||
PDF → PP-OCRv5 檢測 → 遮罩印刷文字 → OpenCV Method 3 分離手寫 → VLM 驗證
|
||||
↑ 可能檢測更多手寫 ↑ 仍然需要!
|
||||
```
|
||||
|
||||
### 關鍵發現
|
||||
**PP-OCRv5 不能替代 OpenCV Method 3!**
|
||||
|
||||
---
|
||||
|
||||
## 🤔 升級建議
|
||||
|
||||
### 升級的理由
|
||||
1. 更好地檢測手寫簽名(+13.7% 準確率)
|
||||
2. 可能減少漏檢
|
||||
3. 更高的識別置信度可以幫助後續分析
|
||||
|
||||
### 不升級的理由
|
||||
1. 當前方案已經穩定(86.5% 保留率)
|
||||
2. 仍然需要 OpenCV Method 3
|
||||
3. API 重寫成本高
|
||||
4. 額外的依賴和複雜度
|
||||
|
||||
### 推薦決策
|
||||
|
||||
**階段性升級策略**:
|
||||
|
||||
1. **短期(當前)**:
|
||||
- ✅ 保持 v2.7.3 穩定方案
|
||||
- ✅ 繼續使用 OpenCV Method 3
|
||||
- ✅ 在更多樣本上測試當前方案
|
||||
|
||||
2. **中期(如果需要優化)**:
|
||||
- 對比測試 v2.7.3 vs v3.3.2 在真實簽名樣本上的性能
|
||||
- 如果 v5 明顯減少漏檢 → 升級
|
||||
- 如果差異不大 → 保持 v2.7.3
|
||||
|
||||
3. **長期**:
|
||||
- 關注 PaddleOCR 是否會添加手寫分類功能
|
||||
- 如果有 → 重新評估升級價值
|
||||
|
||||
---
|
||||
|
||||
## 📝 技術債務記錄
|
||||
|
||||
### 如果決定升級到 v3.3.2
|
||||
|
||||
需要完成的工作:
|
||||
|
||||
1. **服務器端**:
|
||||
- [ ] 重寫 `paddleocr_server.py` 適配新 API
|
||||
- [ ] 測試 GPU 利用率和速度
|
||||
- [ ] 處理 OpenCV 4.10 兼容性
|
||||
- [ ] 更新依賴文檔
|
||||
|
||||
2. **客戶端**:
|
||||
- [ ] 更新 `paddleocr_client.py`(如果 REST 接口改變)
|
||||
- [ ] 適配新的返回格式
|
||||
|
||||
3. **測試**:
|
||||
- [ ] 10+ 樣本對比測試
|
||||
- [ ] 性能基準測試
|
||||
- [ ] 穩定性測試
|
||||
|
||||
4. **文檔**:
|
||||
- [ ] 更新 CURRENT_STATUS.md
|
||||
- [ ] 記錄 API 遷移指南
|
||||
- [ ] 更新部署文檔
|
||||
|
||||
---
|
||||
|
||||
## ✅ 完成的工作
|
||||
|
||||
1. ✅ 升級 PaddleOCR: 2.7.3 → 3.3.2
|
||||
2. ✅ 理解新 API 結構
|
||||
3. ✅ 測試基礎功能
|
||||
4. ✅ 分析返回數據結構
|
||||
5. ✅ 測試手寫分類功能(結論:無)
|
||||
6. ✅ 查閱官方文檔驗證
|
||||
7. ✅ 記錄完整研究過程
|
||||
|
||||
---
|
||||
|
||||
## 🎓 學到的經驗
|
||||
|
||||
1. **API 版本升級風險**:主版本升級通常有破壞性變更
|
||||
2. **功能驗證的重要性**:文檔提到的「手寫支持」不等於「手寫分類」
|
||||
3. **現有方案的價值**:OpenCV Method 3 仍然是必需的
|
||||
4. **性能 vs 複雜度權衡**:不是所有性能提升都值得立即升級
|
||||
|
||||
---
|
||||
|
||||
## 🔗 相關文檔
|
||||
|
||||
- [CURRENT_STATUS.md](./CURRENT_STATUS.md) - 當前穩定方案
|
||||
- [NEW_SESSION_HANDOFF.md](./NEW_SESSION_HANDOFF.md) - 研究任務清單
|
||||
- [PADDLEOCR_STATUS.md](./PADDLEOCR_STATUS.md) - 詳細技術分析
|
||||
|
||||
---
|
||||
|
||||
## 📌 下一步
|
||||
|
||||
建議用戶:
|
||||
|
||||
1. **立即行動**:
|
||||
- 在更多 PDF 樣本上測試當前方案
|
||||
- 記錄成功率和失敗案例
|
||||
|
||||
2. **評估升級**:
|
||||
- 如果當前方案滿意 → 保持 v2.7.3
|
||||
- 如果遇到大量漏檢 → 考慮 v3.3.2
|
||||
|
||||
3. **長期監控**:
|
||||
- 關注 PaddleOCR GitHub Issues
|
||||
- 追蹤是否有手寫分類功能的更新
|
||||
|
||||
---
|
||||
|
||||
**結論**: PP-OCRv5 提升了手寫識別能力,但不能替代 OpenCV Method 3 來分離手寫和印刷文字。當前方案(v2.7.3 + OpenCV Method 3)已經足夠好,除非遇到性能瓶頸,否則不建議立即升級。
|
||||
75
check_rejected_for_missing.py
Normal file
75
check_rejected_for_missing.py
Normal file
@@ -0,0 +1,75 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Check if rejected regions contain the missing signatures."""
|
||||
|
||||
import base64
|
||||
import requests
|
||||
from pathlib import Path
|
||||
|
||||
OLLAMA_URL = "http://192.168.30.36:11434"
|
||||
OLLAMA_MODEL = "qwen2.5vl:32b"
|
||||
REJECTED_PATH = "/Volumes/NV2/PDF-Processing/signature-image-output/signatures/rejected"
|
||||
|
||||
# Missing signatures based on test results
|
||||
MISSING = {
|
||||
"201301_2061_AI1_page5": "林姿妤",
|
||||
"201301_2458_AI1_page4": "魏興海",
|
||||
"201301_2923_AI1_page3": "陈丽琦"
|
||||
}
|
||||
|
||||
def encode_image_to_base64(image_path):
|
||||
"""Encode image file to base64."""
|
||||
with open(image_path, 'rb') as f:
|
||||
return base64.b64encode(f.read()).decode('utf-8')
|
||||
|
||||
def ask_vlm_about_signature(image_base64, expected_name):
|
||||
"""Ask VLM if the image contains the expected signature."""
|
||||
prompt = f"""Does this image contain a handwritten signature with the Chinese name: "{expected_name}"?
|
||||
|
||||
Look carefully for handwritten Chinese characters matching this name.
|
||||
|
||||
Answer only 'yes' or 'no'."""
|
||||
|
||||
payload = {
|
||||
"model": OLLAMA_MODEL,
|
||||
"prompt": prompt,
|
||||
"images": [image_base64],
|
||||
"stream": False
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.post(f"{OLLAMA_URL}/api/generate", json=payload, timeout=60)
|
||||
response.raise_for_status()
|
||||
answer = response.json()['response'].strip().lower()
|
||||
return answer
|
||||
except Exception as e:
|
||||
return f"error: {str(e)}"
|
||||
|
||||
# Check each missing signature
|
||||
for pdf_stem, missing_name in MISSING.items():
|
||||
print(f"\n{'='*80}")
|
||||
print(f"Checking rejected regions from: {pdf_stem}")
|
||||
print(f"Looking for missing signature: {missing_name}")
|
||||
print('='*80)
|
||||
|
||||
# Find all rejected regions from this PDF
|
||||
rejected_regions = sorted(Path(REJECTED_PATH).glob(f"{pdf_stem}_region_*.png"))
|
||||
|
||||
print(f"Found {len(rejected_regions)} rejected regions to check")
|
||||
|
||||
for region_path in rejected_regions:
|
||||
region_name = region_path.name
|
||||
print(f"\nChecking: {region_name}...", end='', flush=True)
|
||||
|
||||
# Encode and ask VLM
|
||||
image_base64 = encode_image_to_base64(region_path)
|
||||
answer = ask_vlm_about_signature(image_base64, missing_name)
|
||||
|
||||
if 'yes' in answer:
|
||||
print(f" ✅ FOUND! This region contains {missing_name}")
|
||||
print(f" → The signature was detected by CV but rejected by verification!")
|
||||
else:
|
||||
print(f" ❌ No (VLM says: {answer})")
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print("Analysis complete!")
|
||||
print('='*80)
|
||||
415
extract_signatures_paddleocr_improved.py
Normal file
415
extract_signatures_paddleocr_improved.py
Normal file
@@ -0,0 +1,415 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
PaddleOCR Signature Extraction - Improved Pipeline
|
||||
|
||||
Implements:
|
||||
- Method B: Region Merging (merge nearby regions to avoid splits)
|
||||
- Method E: Two-Stage Approach (second OCR pass on regions)
|
||||
|
||||
Pipeline:
|
||||
1. PaddleOCR detects printed text on full page
|
||||
2. Mask printed text with padding
|
||||
3. Detect candidate regions
|
||||
4. Merge nearby regions (METHOD B)
|
||||
5. For each region: Run OCR again to remove remaining printed text (METHOD E)
|
||||
6. VLM verification (optional)
|
||||
7. Save cleaned handwriting regions
|
||||
"""
|
||||
|
||||
import fitz # PyMuPDF
|
||||
import numpy as np
|
||||
import cv2
|
||||
from pathlib import Path
|
||||
from paddleocr_client import create_ocr_client
|
||||
from typing import List, Dict, Tuple
|
||||
import base64
|
||||
import requests
|
||||
|
||||
# Configuration
|
||||
TEST_PDF = "/Volumes/NV2/PDF-Processing/signature-image-output/201301_1324_AI1_page3.pdf"
|
||||
OUTPUT_DIR = "/Volumes/NV2/PDF-Processing/signature-image-output/paddleocr_improved"
|
||||
DPI = 300
|
||||
|
||||
# PaddleOCR Settings
|
||||
MASKING_PADDING = 25 # Pixels to expand text boxes when masking
|
||||
|
||||
# Region Detection Parameters
|
||||
MIN_REGION_AREA = 3000
|
||||
MAX_REGION_AREA = 300000
|
||||
MIN_ASPECT_RATIO = 0.3
|
||||
MAX_ASPECT_RATIO = 15.0
|
||||
|
||||
# Region Merging Parameters (METHOD B)
|
||||
MERGE_DISTANCE_HORIZONTAL = 100 # pixels
|
||||
MERGE_DISTANCE_VERTICAL = 50 # pixels
|
||||
|
||||
# VLM Settings (optional)
|
||||
USE_VLM_VERIFICATION = False # Set to True to enable VLM filtering
|
||||
OLLAMA_URL = "http://192.168.30.36:11434"
|
||||
OLLAMA_MODEL = "qwen2.5vl:32b"
|
||||
|
||||
|
||||
def merge_nearby_regions(regions: List[Dict],
|
||||
h_distance: int = 100,
|
||||
v_distance: int = 50) -> List[Dict]:
|
||||
"""
|
||||
Merge regions that are close to each other (METHOD B).
|
||||
|
||||
Args:
|
||||
regions: List of region dicts with 'box': (x, y, w, h)
|
||||
h_distance: Maximum horizontal distance between regions to merge
|
||||
v_distance: Maximum vertical distance between regions to merge
|
||||
|
||||
Returns:
|
||||
List of merged regions
|
||||
"""
|
||||
if not regions:
|
||||
return []
|
||||
|
||||
# Sort regions by y-coordinate (top to bottom)
|
||||
regions = sorted(regions, key=lambda r: r['box'][1])
|
||||
|
||||
merged = []
|
||||
skip_indices = set()
|
||||
|
||||
for i, region1 in enumerate(regions):
|
||||
if i in skip_indices:
|
||||
continue
|
||||
|
||||
x1, y1, w1, h1 = region1['box']
|
||||
|
||||
# Find all regions that should merge with this one
|
||||
merge_group = [region1]
|
||||
|
||||
for j, region2 in enumerate(regions[i+1:], start=i+1):
|
||||
if j in skip_indices:
|
||||
continue
|
||||
|
||||
x2, y2, w2, h2 = region2['box']
|
||||
|
||||
# Calculate distances
|
||||
# Horizontal distance: gap between boxes horizontally
|
||||
h_dist = max(0, max(x1, x2) - min(x1 + w1, x2 + w2))
|
||||
|
||||
# Vertical distance: gap between boxes vertically
|
||||
v_dist = max(0, max(y1, y2) - min(y1 + h1, y2 + h2))
|
||||
|
||||
# Check if regions are close enough to merge
|
||||
if h_dist <= h_distance and v_dist <= v_distance:
|
||||
merge_group.append(region2)
|
||||
skip_indices.add(j)
|
||||
# Update bounding box to include new region
|
||||
x1 = min(x1, x2)
|
||||
y1 = min(y1, y2)
|
||||
w1 = max(x1 + w1, x2 + w2) - x1
|
||||
h1 = max(y1 + h1, y2 + h2) - y1
|
||||
|
||||
# Create merged region
|
||||
merged_box = (x1, y1, w1, h1)
|
||||
merged_area = w1 * h1
|
||||
merged_aspect = w1 / h1 if h1 > 0 else 0
|
||||
|
||||
merged.append({
|
||||
'box': merged_box,
|
||||
'area': merged_area,
|
||||
'aspect_ratio': merged_aspect,
|
||||
'merged_count': len(merge_group)
|
||||
})
|
||||
|
||||
return merged
|
||||
|
||||
|
||||
def clean_region_with_ocr(region_image: np.ndarray,
|
||||
ocr_client,
|
||||
padding: int = 10) -> np.ndarray:
|
||||
"""
|
||||
Remove printed text from a region using second OCR pass (METHOD E).
|
||||
|
||||
Args:
|
||||
region_image: The region image to clean
|
||||
ocr_client: PaddleOCR client
|
||||
padding: Padding around detected text boxes
|
||||
|
||||
Returns:
|
||||
Cleaned region with printed text masked
|
||||
"""
|
||||
try:
|
||||
# Run OCR on this specific region
|
||||
text_boxes = ocr_client.get_text_boxes(region_image)
|
||||
|
||||
if not text_boxes:
|
||||
return region_image # No text found, return as-is
|
||||
|
||||
# Mask detected printed text
|
||||
cleaned = region_image.copy()
|
||||
for (x, y, w, h) in text_boxes:
|
||||
# Add padding
|
||||
x_pad = max(0, x - padding)
|
||||
y_pad = max(0, y - padding)
|
||||
w_pad = min(cleaned.shape[1] - x_pad, w + 2*padding)
|
||||
h_pad = min(cleaned.shape[0] - y_pad, h + 2*padding)
|
||||
|
||||
cv2.rectangle(cleaned, (x_pad, y_pad),
|
||||
(x_pad + w_pad, y_pad + h_pad),
|
||||
(255, 255, 255), -1) # Fill with white
|
||||
|
||||
return cleaned
|
||||
|
||||
except Exception as e:
|
||||
print(f" Warning: OCR cleaning failed: {e}")
|
||||
return region_image
|
||||
|
||||
|
||||
def verify_handwriting_with_vlm(image: np.ndarray) -> Tuple[bool, float]:
|
||||
"""
|
||||
Use VLM to verify if image contains handwriting.
|
||||
|
||||
Args:
|
||||
image: Region image (RGB numpy array)
|
||||
|
||||
Returns:
|
||||
(is_handwriting: bool, confidence: float)
|
||||
"""
|
||||
try:
|
||||
# Convert image to base64
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
|
||||
pil_image = Image.fromarray(image.astype(np.uint8))
|
||||
buffered = BytesIO()
|
||||
pil_image.save(buffered, format="PNG")
|
||||
image_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
|
||||
|
||||
# Ask VLM
|
||||
prompt = """Does this image contain handwritten text or a handwritten signature?
|
||||
|
||||
Answer only 'yes' or 'no', followed by a confidence score 0-100.
|
||||
Format: yes 95 OR no 80"""
|
||||
|
||||
payload = {
|
||||
"model": OLLAMA_MODEL,
|
||||
"prompt": prompt,
|
||||
"images": [image_base64],
|
||||
"stream": False
|
||||
}
|
||||
|
||||
response = requests.post(f"{OLLAMA_URL}/api/generate",
|
||||
json=payload, timeout=30)
|
||||
response.raise_for_status()
|
||||
answer = response.json()['response'].strip().lower()
|
||||
|
||||
# Parse answer
|
||||
is_handwriting = 'yes' in answer
|
||||
|
||||
# Try to extract confidence
|
||||
confidence = 0.5
|
||||
parts = answer.split()
|
||||
for part in parts:
|
||||
try:
|
||||
conf = float(part)
|
||||
if 0 <= conf <= 100:
|
||||
confidence = conf / 100
|
||||
break
|
||||
except:
|
||||
continue
|
||||
|
||||
return is_handwriting, confidence
|
||||
|
||||
except Exception as e:
|
||||
print(f" Warning: VLM verification failed: {e}")
|
||||
return True, 0.5 # Default to accepting the region
|
||||
|
||||
|
||||
print("="*80)
|
||||
print("PaddleOCR Improved Pipeline - Region Merging + Two-Stage Cleaning")
|
||||
print("="*80)
|
||||
|
||||
# Create output directory
|
||||
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Step 1: Connect to PaddleOCR
|
||||
print("\n1. Connecting to PaddleOCR server...")
|
||||
try:
|
||||
ocr_client = create_ocr_client()
|
||||
print(f" ✅ Connected: {ocr_client.server_url}")
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 2: Render PDF
|
||||
print("\n2. Rendering PDF...")
|
||||
try:
|
||||
doc = fitz.open(TEST_PDF)
|
||||
page = doc[0]
|
||||
mat = fitz.Matrix(DPI/72, DPI/72)
|
||||
pix = page.get_pixmap(matrix=mat)
|
||||
original_image = np.frombuffer(pix.samples, dtype=np.uint8).reshape(
|
||||
pix.height, pix.width, pix.n)
|
||||
|
||||
if pix.n == 4:
|
||||
original_image = cv2.cvtColor(original_image, cv2.COLOR_RGBA2RGB)
|
||||
|
||||
print(f" ✅ Rendered: {original_image.shape[1]}x{original_image.shape[0]}")
|
||||
doc.close()
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 3: Detect printed text (Stage 1)
|
||||
print("\n3. Detecting printed text (Stage 1 OCR)...")
|
||||
try:
|
||||
text_boxes = ocr_client.get_text_boxes(original_image)
|
||||
print(f" ✅ Detected {len(text_boxes)} text regions")
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 4: Mask printed text with padding
|
||||
print(f"\n4. Masking printed text (padding={MASKING_PADDING}px)...")
|
||||
try:
|
||||
masked_image = original_image.copy()
|
||||
|
||||
for (x, y, w, h) in text_boxes:
|
||||
# Add padding
|
||||
x_pad = max(0, x - MASKING_PADDING)
|
||||
y_pad = max(0, y - MASKING_PADDING)
|
||||
w_pad = min(masked_image.shape[1] - x_pad, w + 2*MASKING_PADDING)
|
||||
h_pad = min(masked_image.shape[0] - y_pad, h + 2*MASKING_PADDING)
|
||||
|
||||
cv2.rectangle(masked_image, (x_pad, y_pad),
|
||||
(x_pad + w_pad, y_pad + h_pad), (0, 0, 0), -1)
|
||||
|
||||
print(f" ✅ Masked {len(text_boxes)} regions")
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 5: Detect candidate regions
|
||||
print("\n5. Detecting candidate regions...")
|
||||
try:
|
||||
gray = cv2.cvtColor(masked_image, cv2.COLOR_RGB2GRAY)
|
||||
_, binary = cv2.threshold(gray, 250, 255, cv2.THRESH_BINARY_INV)
|
||||
|
||||
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
|
||||
morphed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=2)
|
||||
|
||||
contours, _ = cv2.findContours(morphed, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||
|
||||
candidate_regions = []
|
||||
for contour in contours:
|
||||
x, y, w, h = cv2.boundingRect(contour)
|
||||
area = w * h
|
||||
aspect_ratio = w / h if h > 0 else 0
|
||||
|
||||
if (MIN_REGION_AREA <= area <= MAX_REGION_AREA and
|
||||
MIN_ASPECT_RATIO <= aspect_ratio <= MAX_ASPECT_RATIO):
|
||||
candidate_regions.append({
|
||||
'box': (x, y, w, h),
|
||||
'area': area,
|
||||
'aspect_ratio': aspect_ratio
|
||||
})
|
||||
|
||||
print(f" ✅ Found {len(candidate_regions)} candidate regions")
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 6: Merge nearby regions (METHOD B)
|
||||
print(f"\n6. Merging nearby regions (h_dist<={MERGE_DISTANCE_HORIZONTAL}, v_dist<={MERGE_DISTANCE_VERTICAL})...")
|
||||
try:
|
||||
merged_regions = merge_nearby_regions(
|
||||
candidate_regions,
|
||||
h_distance=MERGE_DISTANCE_HORIZONTAL,
|
||||
v_distance=MERGE_DISTANCE_VERTICAL
|
||||
)
|
||||
print(f" ✅ Merged {len(candidate_regions)} → {len(merged_regions)} regions")
|
||||
|
||||
for i, region in enumerate(merged_regions):
|
||||
if region['merged_count'] > 1:
|
||||
print(f" Region {i+1}: Merged {region['merged_count']} sub-regions")
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
exit(1)
|
||||
|
||||
# Step 7: Extract and clean each region (METHOD E)
|
||||
print("\n7. Extracting and cleaning regions (Stage 2 OCR)...")
|
||||
final_signatures = []
|
||||
|
||||
for i, region in enumerate(merged_regions):
|
||||
x, y, w, h = region['box']
|
||||
print(f"\n Region {i+1}/{len(merged_regions)}: ({x}, {y}, {w}, {h})")
|
||||
|
||||
# Extract region from ORIGINAL image (not masked)
|
||||
padding = 10
|
||||
x_pad = max(0, x - padding)
|
||||
y_pad = max(0, y - padding)
|
||||
w_pad = min(original_image.shape[1] - x_pad, w + 2*padding)
|
||||
h_pad = min(original_image.shape[0] - y_pad, h + 2*padding)
|
||||
|
||||
region_img = original_image[y_pad:y_pad+h_pad, x_pad:x_pad+w_pad].copy()
|
||||
|
||||
print(f" - Extracted: {region_img.shape[1]}x{region_img.shape[0]}px")
|
||||
|
||||
# Clean with second OCR pass
|
||||
print(f" - Running Stage 2 OCR to remove printed text...")
|
||||
cleaned_region = clean_region_with_ocr(region_img, ocr_client, padding=5)
|
||||
|
||||
# VLM verification (optional)
|
||||
if USE_VLM_VERIFICATION:
|
||||
print(f" - VLM verification...")
|
||||
is_handwriting, confidence = verify_handwriting_with_vlm(cleaned_region)
|
||||
print(f" - VLM says: {'✅ Handwriting' if is_handwriting else '❌ Not handwriting'} (confidence: {confidence:.2f})")
|
||||
|
||||
if not is_handwriting:
|
||||
print(f" - Skipping (not handwriting)")
|
||||
continue
|
||||
|
||||
# Save
|
||||
final_signatures.append({
|
||||
'image': cleaned_region,
|
||||
'box': region['box'],
|
||||
'original_image': region_img
|
||||
})
|
||||
|
||||
print(f" ✅ Kept as signature candidate")
|
||||
|
||||
print(f"\n ✅ Final signatures: {len(final_signatures)}")
|
||||
|
||||
# Step 8: Save results
|
||||
print("\n8. Saving results...")
|
||||
|
||||
for i, sig in enumerate(final_signatures):
|
||||
# Save cleaned signature
|
||||
sig_path = Path(OUTPUT_DIR) / f"signature_{i+1:02d}_cleaned.png"
|
||||
cv2.imwrite(str(sig_path), cv2.cvtColor(sig['image'], cv2.COLOR_RGB2BGR))
|
||||
|
||||
# Save original region for comparison
|
||||
orig_path = Path(OUTPUT_DIR) / f"signature_{i+1:02d}_original.png"
|
||||
cv2.imwrite(str(orig_path), cv2.cvtColor(sig['original_image'], cv2.COLOR_RGB2BGR))
|
||||
|
||||
print(f" 📁 Signature {i+1}: {sig_path.name}")
|
||||
|
||||
# Save visualizations
|
||||
vis_merged = original_image.copy()
|
||||
for region in merged_regions:
|
||||
x, y, w, h = region['box']
|
||||
color = (255, 0, 0) if region in [{'box': s['box']} for s in final_signatures] else (128, 128, 128)
|
||||
cv2.rectangle(vis_merged, (x, y), (x + w, y + h), color, 3)
|
||||
|
||||
vis_path = Path(OUTPUT_DIR) / "visualization_merged_regions.png"
|
||||
cv2.imwrite(str(vis_path), cv2.cvtColor(vis_merged, cv2.COLOR_RGB2BGR))
|
||||
print(f" 📁 Visualization: {vis_path.name}")
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("Pipeline completed!")
|
||||
print(f"Results: {OUTPUT_DIR}")
|
||||
print("="*80)
|
||||
print(f"\nSummary:")
|
||||
print(f" - Stage 1 OCR: {len(text_boxes)} text regions masked")
|
||||
print(f" - Initial candidates: {len(candidate_regions)}")
|
||||
print(f" - After merging: {len(merged_regions)}")
|
||||
print(f" - Final signatures: {len(final_signatures)}")
|
||||
print(f" - Expected signatures: 2 (楊智惠, 張志銘)")
|
||||
print("="*80)
|
||||
169
paddleocr_client.py
Normal file
169
paddleocr_client.py
Normal file
@@ -0,0 +1,169 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
PaddleOCR Client
|
||||
Connects to remote PaddleOCR server for OCR inference
|
||||
"""
|
||||
|
||||
import requests
|
||||
import base64
|
||||
import numpy as np
|
||||
from typing import List, Dict, Tuple, Optional
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
|
||||
class PaddleOCRClient:
|
||||
"""Client for remote PaddleOCR server."""
|
||||
|
||||
def __init__(self, server_url: str = "http://192.168.30.36:5555"):
|
||||
"""
|
||||
Initialize PaddleOCR client.
|
||||
|
||||
Args:
|
||||
server_url: URL of the PaddleOCR server
|
||||
"""
|
||||
self.server_url = server_url.rstrip('/')
|
||||
self.timeout = 30 # seconds
|
||||
|
||||
def health_check(self) -> bool:
|
||||
"""
|
||||
Check if server is healthy.
|
||||
|
||||
Returns:
|
||||
True if server is healthy, False otherwise
|
||||
"""
|
||||
try:
|
||||
response = requests.get(
|
||||
f"{self.server_url}/health",
|
||||
timeout=5
|
||||
)
|
||||
return response.status_code == 200 and response.json().get('status') == 'ok'
|
||||
except Exception as e:
|
||||
print(f"Health check failed: {e}")
|
||||
return False
|
||||
|
||||
def ocr(self, image: np.ndarray) -> List[Dict]:
|
||||
"""
|
||||
Perform OCR on an image.
|
||||
|
||||
Args:
|
||||
image: numpy array of the image (RGB format)
|
||||
|
||||
Returns:
|
||||
List of detection results, each containing:
|
||||
- box: [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
|
||||
- text: detected text string
|
||||
- confidence: confidence score (0-1)
|
||||
|
||||
Raises:
|
||||
Exception if OCR fails
|
||||
"""
|
||||
# Convert numpy array to PIL Image
|
||||
if len(image.shape) == 2: # Grayscale
|
||||
pil_image = Image.fromarray(image)
|
||||
else: # RGB or RGBA
|
||||
pil_image = Image.fromarray(image.astype(np.uint8))
|
||||
|
||||
# Encode to base64
|
||||
buffered = BytesIO()
|
||||
pil_image.save(buffered, format="PNG")
|
||||
image_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
|
||||
|
||||
# Send request
|
||||
try:
|
||||
response = requests.post(
|
||||
f"{self.server_url}/ocr",
|
||||
json={"image": image_base64},
|
||||
timeout=self.timeout
|
||||
)
|
||||
response.raise_for_status()
|
||||
|
||||
result = response.json()
|
||||
|
||||
if not result.get('success'):
|
||||
error_msg = result.get('error', 'Unknown error')
|
||||
raise Exception(f"OCR failed: {error_msg}")
|
||||
|
||||
return result.get('results', [])
|
||||
|
||||
except requests.exceptions.Timeout:
|
||||
raise Exception(f"OCR request timed out after {self.timeout} seconds")
|
||||
except requests.exceptions.ConnectionError:
|
||||
raise Exception(f"Could not connect to server at {self.server_url}")
|
||||
except Exception as e:
|
||||
raise Exception(f"OCR request failed: {str(e)}")
|
||||
|
||||
def get_text_boxes(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]:
|
||||
"""
|
||||
Get bounding boxes of all detected text.
|
||||
|
||||
Args:
|
||||
image: numpy array of the image
|
||||
|
||||
Returns:
|
||||
List of bounding boxes as (x, y, w, h) tuples
|
||||
"""
|
||||
results = self.ocr(image)
|
||||
boxes = []
|
||||
|
||||
for result in results:
|
||||
box = result['box'] # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
|
||||
|
||||
# Convert polygon to bounding box
|
||||
xs = [point[0] for point in box]
|
||||
ys = [point[1] for point in box]
|
||||
|
||||
x = int(min(xs))
|
||||
y = int(min(ys))
|
||||
w = int(max(xs) - min(xs))
|
||||
h = int(max(ys) - min(ys))
|
||||
|
||||
boxes.append((x, y, w, h))
|
||||
|
||||
return boxes
|
||||
|
||||
def __repr__(self):
|
||||
return f"PaddleOCRClient(server_url='{self.server_url}')"
|
||||
|
||||
|
||||
# Convenience function
|
||||
def create_ocr_client(server_url: str = "http://192.168.30.36:5555") -> PaddleOCRClient:
|
||||
"""
|
||||
Create and test PaddleOCR client.
|
||||
|
||||
Args:
|
||||
server_url: URL of the PaddleOCR server
|
||||
|
||||
Returns:
|
||||
PaddleOCRClient instance
|
||||
|
||||
Raises:
|
||||
Exception if server is not reachable
|
||||
"""
|
||||
client = PaddleOCRClient(server_url)
|
||||
|
||||
if not client.health_check():
|
||||
raise Exception(
|
||||
f"PaddleOCR server at {server_url} is not responding. "
|
||||
"Make sure the server is running on the Linux machine."
|
||||
)
|
||||
|
||||
return client
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Test the client
|
||||
print("Testing PaddleOCR client...")
|
||||
|
||||
try:
|
||||
client = create_ocr_client()
|
||||
print(f"✅ Connected to server: {client.server_url}")
|
||||
|
||||
# Create a test image
|
||||
test_image = np.ones((100, 100, 3), dtype=np.uint8) * 255
|
||||
|
||||
print("Running test OCR...")
|
||||
results = client.ocr(test_image)
|
||||
print(f"✅ OCR test successful! Found {len(results)} text regions")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error: {e}")
|
||||
91
paddleocr_server_v5.py
Normal file
91
paddleocr_server_v5.py
Normal file
@@ -0,0 +1,91 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
PaddleOCR Server v5 (PP-OCRv5)
|
||||
Flask HTTP server exposing PaddleOCR v3.3.0 functionality
|
||||
"""
|
||||
|
||||
from paddlex import create_model
|
||||
import base64
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
from flask import Flask, request, jsonify
|
||||
import traceback
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
# Initialize PP-OCRv5 model
|
||||
print("Initializing PP-OCRv5 model...")
|
||||
model = create_model("PP-OCRv5_server")
|
||||
print("PP-OCRv5 model loaded successfully!")
|
||||
|
||||
@app.route('/health', methods=['GET'])
|
||||
def health():
|
||||
"""Health check endpoint."""
|
||||
return jsonify({
|
||||
'status': 'ok',
|
||||
'service': 'paddleocr-server-v5',
|
||||
'version': '3.3.0',
|
||||
'model': 'PP-OCRv5_server',
|
||||
'gpu_enabled': True
|
||||
})
|
||||
|
||||
@app.route('/ocr', methods=['POST'])
|
||||
def ocr_endpoint():
|
||||
"""
|
||||
OCR endpoint using PP-OCRv5.
|
||||
|
||||
Accepts: {"image": "base64_encoded_image"}
|
||||
Returns: {"success": true, "count": N, "results": [...]}
|
||||
"""
|
||||
try:
|
||||
# Parse request
|
||||
data = request.get_json()
|
||||
image_base64 = data['image']
|
||||
|
||||
# Decode image
|
||||
image_bytes = base64.b64decode(image_base64)
|
||||
image = Image.open(BytesIO(image_bytes))
|
||||
image_np = np.array(image)
|
||||
|
||||
# Run OCR with PP-OCRv5
|
||||
result = model.predict(image_np)
|
||||
|
||||
# Format results
|
||||
formatted_results = []
|
||||
|
||||
if result and 'dt_polys' in result[0] and 'rec_text' in result[0]:
|
||||
dt_polys = result[0]['dt_polys']
|
||||
rec_texts = result[0]['rec_text']
|
||||
rec_scores = result[0]['rec_score']
|
||||
|
||||
for i in range(len(dt_polys)):
|
||||
box = dt_polys[i].tolist() # Convert to list
|
||||
text = rec_texts[i]
|
||||
confidence = float(rec_scores[i])
|
||||
|
||||
formatted_results.append({
|
||||
'box': box,
|
||||
'text': text,
|
||||
'confidence': confidence
|
||||
})
|
||||
|
||||
return jsonify({
|
||||
'success': True,
|
||||
'count': len(formatted_results),
|
||||
'results': formatted_results
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error during OCR: {str(e)}")
|
||||
traceback.print_exc()
|
||||
return jsonify({
|
||||
'success': False,
|
||||
'error': str(e)
|
||||
}), 500
|
||||
|
||||
if __name__ == '__main__':
|
||||
print("Starting PP-OCRv5 server on port 5555...")
|
||||
print("Model: PP-OCRv5_server")
|
||||
print("Version: 3.3.0")
|
||||
app.run(host='0.0.0.0', port=5555, debug=False)
|
||||
17
signature-comparison/v4-current/SUMMARY.txt
Normal file
17
signature-comparison/v4-current/SUMMARY.txt
Normal file
@@ -0,0 +1,17 @@
|
||||
|
||||
PaddleOCR v2.7.3 (v4) 完整 Pipeline 測試結果
|
||||
============================================================
|
||||
|
||||
1. OCR 檢測: 14 個文字區域
|
||||
2. 遮罩印刷文字: 完成
|
||||
3. 檢測候選區域: 4 個
|
||||
4. 提取簽名: 4 個
|
||||
|
||||
候選區域詳情:
|
||||
------------------------------------------------------------
|
||||
Region 1: 位置(1211, 1462), 大小965x191, 面積=184315
|
||||
Region 2: 位置(1215, 877), 大小1150x511, 面積=587650
|
||||
Region 3: 位置(332, 150), 大小197x96, 面積=18912
|
||||
Region 4: 位置(1147, 3303), 大小159x42, 面積=6678
|
||||
|
||||
所有結果保存在: /Volumes/NV2/pdf_recognize/signature-comparison/v4-current
|
||||
20
signature-comparison/v5-new/SUMMARY.txt
Normal file
20
signature-comparison/v5-new/SUMMARY.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
|
||||
PP-OCRv5 完整 Pipeline 測試結果
|
||||
============================================================
|
||||
|
||||
1. OCR 檢測: 50 個文字區域
|
||||
2. 遮罩印刷文字: /Volumes/NV2/pdf_recognize/test_results/v5_pipeline/01_masked.png
|
||||
3. 檢測候選區域: 7 個
|
||||
4. 提取簽名: 7 個
|
||||
|
||||
候選區域詳情:
|
||||
------------------------------------------------------------
|
||||
Region 1: 位置(1218, 877), 大小1144x511, 面積=584584
|
||||
Region 2: 位置(1213, 1457), 大小961x196, 面積=188356
|
||||
Region 3: 位置(228, 386), 大小2028x209, 面積=423852
|
||||
Region 4: 位置(330, 310), 大小1932x63, 面積=121716
|
||||
Region 5: 位置(1990, 945), 大小375x212, 面積=79500
|
||||
Region 6: 位置(327, 145), 大小203x101, 面積=20503
|
||||
Region 7: 位置(1139, 3289), 大小174x63, 面積=10962
|
||||
|
||||
所有結果保存在: /Volumes/NV2/pdf_recognize/test_results/v5_pipeline
|
||||
216
test_mask_and_detect.py
Normal file
216
test_mask_and_detect.py
Normal file
@@ -0,0 +1,216 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test PaddleOCR Masking + Region Detection Pipeline
|
||||
|
||||
This script demonstrates:
|
||||
1. PaddleOCR detects printed text bounding boxes
|
||||
2. Mask out all printed text areas (fill with black)
|
||||
3. Detect remaining non-white regions (potential handwriting)
|
||||
4. Visualize the results
|
||||
"""
|
||||
|
||||
import fitz # PyMuPDF
|
||||
import numpy as np
|
||||
import cv2
|
||||
from pathlib import Path
|
||||
from paddleocr_client import create_ocr_client
|
||||
|
||||
# Configuration
|
||||
TEST_PDF = "/Volumes/NV2/PDF-Processing/signature-image-output/201301_1324_AI1_page3.pdf"
|
||||
OUTPUT_DIR = "/Volumes/NV2/PDF-Processing/signature-image-output/mask_test"
|
||||
DPI = 300
|
||||
|
||||
# Region detection parameters
|
||||
MIN_REGION_AREA = 3000 # Minimum pixels for a region
|
||||
MAX_REGION_AREA = 300000 # Maximum pixels for a region
|
||||
MIN_ASPECT_RATIO = 0.3 # Minimum width/height ratio
|
||||
MAX_ASPECT_RATIO = 15.0 # Maximum width/height ratio
|
||||
|
||||
print("="*80)
|
||||
print("PaddleOCR Masking + Region Detection Test")
|
||||
print("="*80)
|
||||
|
||||
# Create output directory
|
||||
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Step 1: Connect to PaddleOCR server
|
||||
print("\n1. Connecting to PaddleOCR server...")
|
||||
try:
|
||||
ocr_client = create_ocr_client()
|
||||
print(f" ✅ Connected: {ocr_client.server_url}")
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 2: Render PDF to image
|
||||
print("\n2. Rendering PDF to image...")
|
||||
try:
|
||||
doc = fitz.open(TEST_PDF)
|
||||
page = doc[0]
|
||||
mat = fitz.Matrix(DPI/72, DPI/72)
|
||||
pix = page.get_pixmap(matrix=mat)
|
||||
original_image = np.frombuffer(pix.samples, dtype=np.uint8).reshape(pix.height, pix.width, pix.n)
|
||||
|
||||
if pix.n == 4: # RGBA
|
||||
original_image = cv2.cvtColor(original_image, cv2.COLOR_RGBA2RGB)
|
||||
|
||||
print(f" ✅ Rendered: {original_image.shape[1]}x{original_image.shape[0]} pixels")
|
||||
doc.close()
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 3: Detect printed text with PaddleOCR
|
||||
print("\n3. Detecting printed text with PaddleOCR...")
|
||||
try:
|
||||
text_boxes = ocr_client.get_text_boxes(original_image)
|
||||
print(f" ✅ Detected {len(text_boxes)} text regions")
|
||||
|
||||
# Show some sample boxes
|
||||
if text_boxes:
|
||||
print(" Sample text boxes (x, y, w, h):")
|
||||
for i, box in enumerate(text_boxes[:3]):
|
||||
print(f" {i+1}. {box}")
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 4: Mask out printed text areas
|
||||
print("\n4. Masking printed text areas...")
|
||||
try:
|
||||
masked_image = original_image.copy()
|
||||
|
||||
# Fill each text box with black
|
||||
for (x, y, w, h) in text_boxes:
|
||||
cv2.rectangle(masked_image, (x, y), (x + w, y + h), (0, 0, 0), -1)
|
||||
|
||||
print(f" ✅ Masked {len(text_boxes)} text regions")
|
||||
|
||||
# Save masked image
|
||||
masked_path = Path(OUTPUT_DIR) / "01_masked_image.png"
|
||||
cv2.imwrite(str(masked_path), cv2.cvtColor(masked_image, cv2.COLOR_RGB2BGR))
|
||||
print(f" 📁 Saved: {masked_path}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 5: Detect remaining non-white regions
|
||||
print("\n5. Detecting remaining non-white regions...")
|
||||
try:
|
||||
# Convert to grayscale
|
||||
gray = cv2.cvtColor(masked_image, cv2.COLOR_RGB2GRAY)
|
||||
|
||||
# Threshold to find non-white areas
|
||||
# Anything darker than 250 is considered "content"
|
||||
_, binary = cv2.threshold(gray, 250, 255, cv2.THRESH_BINARY_INV)
|
||||
|
||||
# Apply morphological operations to connect nearby regions
|
||||
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
|
||||
morphed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=2)
|
||||
|
||||
# Find contours
|
||||
contours, _ = cv2.findContours(morphed, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||
|
||||
print(f" ✅ Found {len(contours)} contours")
|
||||
|
||||
# Filter contours by size and aspect ratio
|
||||
potential_regions = []
|
||||
|
||||
for contour in contours:
|
||||
x, y, w, h = cv2.boundingRect(contour)
|
||||
area = w * h
|
||||
aspect_ratio = w / h if h > 0 else 0
|
||||
|
||||
# Check constraints
|
||||
if (MIN_REGION_AREA <= area <= MAX_REGION_AREA and
|
||||
MIN_ASPECT_RATIO <= aspect_ratio <= MAX_ASPECT_RATIO):
|
||||
potential_regions.append({
|
||||
'box': (x, y, w, h),
|
||||
'area': area,
|
||||
'aspect_ratio': aspect_ratio
|
||||
})
|
||||
|
||||
print(f" ✅ Filtered to {len(potential_regions)} potential handwriting regions")
|
||||
|
||||
# Show region details
|
||||
if potential_regions:
|
||||
print("\n Detected regions:")
|
||||
for i, region in enumerate(potential_regions[:5]):
|
||||
x, y, w, h = region['box']
|
||||
print(f" {i+1}. Box: ({x}, {y}, {w}, {h}), "
|
||||
f"Area: {region['area']}, "
|
||||
f"Aspect: {region['aspect_ratio']:.2f}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
exit(1)
|
||||
|
||||
# Step 6: Visualize results
|
||||
print("\n6. Creating visualizations...")
|
||||
try:
|
||||
# Visualization 1: Original with text boxes
|
||||
vis_original = original_image.copy()
|
||||
for (x, y, w, h) in text_boxes:
|
||||
cv2.rectangle(vis_original, (x, y), (x + w, y + h), (0, 255, 0), 3)
|
||||
|
||||
vis_original_path = Path(OUTPUT_DIR) / "02_original_with_text_boxes.png"
|
||||
cv2.imwrite(str(vis_original_path), cv2.cvtColor(vis_original, cv2.COLOR_RGB2BGR))
|
||||
print(f" 📁 Original + text boxes: {vis_original_path}")
|
||||
|
||||
# Visualization 2: Masked image with detected regions
|
||||
vis_masked = masked_image.copy()
|
||||
for region in potential_regions:
|
||||
x, y, w, h = region['box']
|
||||
cv2.rectangle(vis_masked, (x, y), (x + w, y + h), (255, 0, 0), 3)
|
||||
|
||||
vis_masked_path = Path(OUTPUT_DIR) / "03_masked_with_regions.png"
|
||||
cv2.imwrite(str(vis_masked_path), cv2.cvtColor(vis_masked, cv2.COLOR_RGB2BGR))
|
||||
print(f" 📁 Masked + regions: {vis_masked_path}")
|
||||
|
||||
# Visualization 3: Binary threshold result
|
||||
binary_path = Path(OUTPUT_DIR) / "04_binary_threshold.png"
|
||||
cv2.imwrite(str(binary_path), binary)
|
||||
print(f" 📁 Binary threshold: {binary_path}")
|
||||
|
||||
# Visualization 4: Morphed result
|
||||
morphed_path = Path(OUTPUT_DIR) / "05_morphed.png"
|
||||
cv2.imwrite(str(morphed_path), morphed)
|
||||
print(f" 📁 Morphed: {morphed_path}")
|
||||
|
||||
# Extract and save each detected region
|
||||
print("\n7. Extracting detected regions...")
|
||||
for i, region in enumerate(potential_regions):
|
||||
x, y, w, h = region['box']
|
||||
|
||||
# Add padding
|
||||
padding = 10
|
||||
x_pad = max(0, x - padding)
|
||||
y_pad = max(0, y - padding)
|
||||
w_pad = min(original_image.shape[1] - x_pad, w + 2*padding)
|
||||
h_pad = min(original_image.shape[0] - y_pad, h + 2*padding)
|
||||
|
||||
# Extract region from original image
|
||||
region_img = original_image[y_pad:y_pad+h_pad, x_pad:x_pad+w_pad]
|
||||
|
||||
# Save region
|
||||
region_path = Path(OUTPUT_DIR) / f"region_{i+1:02d}.png"
|
||||
cv2.imwrite(str(region_path), cv2.cvtColor(region_img, cv2.COLOR_RGB2BGR))
|
||||
print(f" 📁 Region {i+1}: {region_path}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("Test completed!")
|
||||
print(f"Results saved to: {OUTPUT_DIR}")
|
||||
print("="*80)
|
||||
print("\nSummary:")
|
||||
print(f" - Printed text regions detected: {len(text_boxes)}")
|
||||
print(f" - Potential handwriting regions: {len(potential_regions)}")
|
||||
print(f" - Expected signatures: 2 (楊智惠, 張志銘)")
|
||||
print("="*80)
|
||||
256
test_opencv_advanced.py
Normal file
256
test_opencv_advanced.py
Normal file
@@ -0,0 +1,256 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Advanced OpenCV separation based on key observations:
|
||||
1. 手写字比印刷字大 (Handwriting is LARGER)
|
||||
2. 手写笔画长度更长 (Handwriting strokes are LONGER)
|
||||
3. 印刷标楷体规律,手写潦草 (Printed is regular, handwriting is messy)
|
||||
"""
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
from scipy import ndimage
|
||||
|
||||
# Test image
|
||||
TEST_IMAGE = "/Volumes/NV2/PDF-Processing/signature-image-output/paddleocr_improved/signature_02_original.png"
|
||||
OUTPUT_DIR = "/Volumes/NV2/PDF-Processing/signature-image-output/opencv_advanced_test"
|
||||
|
||||
print("="*80)
|
||||
print("Advanced OpenCV Separation - Size + Stroke Length + Regularity")
|
||||
print("="*80)
|
||||
|
||||
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Load and preprocess
|
||||
image = cv2.imread(TEST_IMAGE)
|
||||
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
_, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
|
||||
|
||||
print(f"\nImage: {image.shape[1]}x{image.shape[0]}")
|
||||
|
||||
# Save binary
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "00_binary.png"), binary)
|
||||
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("METHOD 3: Comprehensive Feature Analysis")
|
||||
print("="*80)
|
||||
|
||||
# Find connected components
|
||||
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(binary, connectivity=8)
|
||||
|
||||
print(f"\nFound {num_labels - 1} connected components")
|
||||
print("\nAnalyzing each component...")
|
||||
|
||||
# Store analysis for each component
|
||||
components_analysis = []
|
||||
|
||||
for i in range(1, num_labels):
|
||||
x, y, w, h, area = stats[i]
|
||||
|
||||
# Extract component mask
|
||||
component_mask = (labels == i).astype(np.uint8) * 255
|
||||
|
||||
# ============================================
|
||||
# FEATURE 1: Size (手写字比印刷字大)
|
||||
# ============================================
|
||||
bbox_area = w * h
|
||||
font_height = h # Character height is a good indicator
|
||||
|
||||
# ============================================
|
||||
# FEATURE 2: Stroke Length (笔画长度)
|
||||
# ============================================
|
||||
# Skeletonize to get the actual stroke centerline
|
||||
from skimage.morphology import skeletonize
|
||||
skeleton = skeletonize(component_mask // 255)
|
||||
stroke_length = np.sum(skeleton) # Total length of strokes
|
||||
|
||||
# Stroke length ratio (length relative to area)
|
||||
stroke_length_ratio = stroke_length / area if area > 0 else 0
|
||||
|
||||
# ============================================
|
||||
# FEATURE 3: Regularity vs Messiness
|
||||
# ============================================
|
||||
# 3a. Compactness (regular shapes are more compact)
|
||||
contours, _ = cv2.findContours(component_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||
if contours:
|
||||
perimeter = cv2.arcLength(contours[0], True)
|
||||
compactness = (4 * np.pi * area) / (perimeter * perimeter) if perimeter > 0 else 0
|
||||
else:
|
||||
compactness = 0
|
||||
|
||||
# 3b. Solidity (ratio of area to convex hull area)
|
||||
if contours:
|
||||
hull = cv2.convexHull(contours[0])
|
||||
hull_area = cv2.contourArea(hull)
|
||||
solidity = area / hull_area if hull_area > 0 else 0
|
||||
else:
|
||||
solidity = 0
|
||||
|
||||
# 3c. Extent (ratio of area to bounding box area)
|
||||
extent = area / bbox_area if bbox_area > 0 else 0
|
||||
|
||||
# 3d. Edge roughness (measure irregularity)
|
||||
# More irregular edges = more "messy" = likely handwriting
|
||||
edges = cv2.Canny(component_mask, 50, 150)
|
||||
edge_pixels = np.sum(edges > 0)
|
||||
edge_roughness = edge_pixels / perimeter if perimeter > 0 else 0
|
||||
|
||||
# ============================================
|
||||
# CLASSIFICATION LOGIC
|
||||
# ============================================
|
||||
|
||||
# Large characters are likely handwriting
|
||||
is_large = font_height > 40 # Threshold for "large" characters
|
||||
|
||||
# Long strokes relative to area indicate handwriting
|
||||
is_long_stroke = stroke_length_ratio > 0.4 # Handwriting has higher ratio
|
||||
|
||||
# Regular shapes (high compactness, high solidity) = printed
|
||||
# Irregular shapes (low compactness, low solidity) = handwriting
|
||||
is_irregular = compactness < 0.3 or solidity < 0.7 or extent < 0.5
|
||||
|
||||
# DECISION RULES
|
||||
handwriting_score = 0
|
||||
|
||||
# Size-based scoring (重要!)
|
||||
if font_height > 50:
|
||||
handwriting_score += 3 # Very large = likely handwriting
|
||||
elif font_height > 35:
|
||||
handwriting_score += 2 # Medium-large = possibly handwriting
|
||||
elif font_height < 25:
|
||||
handwriting_score -= 2 # Small = likely printed
|
||||
|
||||
# Stroke length scoring
|
||||
if stroke_length_ratio > 0.5:
|
||||
handwriting_score += 2 # Long strokes
|
||||
elif stroke_length_ratio > 0.35:
|
||||
handwriting_score += 1
|
||||
|
||||
# Regularity scoring (标楷体 is regular, 手写 is messy)
|
||||
if is_irregular:
|
||||
handwriting_score += 1 # Irregular = handwriting
|
||||
else:
|
||||
handwriting_score -= 1 # Regular = printed
|
||||
|
||||
# Area scoring
|
||||
if area > 2000:
|
||||
handwriting_score += 2 # Large area = handwriting
|
||||
elif area < 500:
|
||||
handwriting_score -= 1 # Small area = printed
|
||||
|
||||
# Final classification
|
||||
is_handwriting = handwriting_score > 0
|
||||
|
||||
components_analysis.append({
|
||||
'id': i,
|
||||
'box': (x, y, w, h),
|
||||
'area': area,
|
||||
'height': font_height,
|
||||
'stroke_length': stroke_length,
|
||||
'stroke_ratio': stroke_length_ratio,
|
||||
'compactness': compactness,
|
||||
'solidity': solidity,
|
||||
'extent': extent,
|
||||
'edge_roughness': edge_roughness,
|
||||
'handwriting_score': handwriting_score,
|
||||
'is_handwriting': is_handwriting,
|
||||
'mask': component_mask
|
||||
})
|
||||
|
||||
# Sort by area (largest first)
|
||||
components_analysis.sort(key=lambda c: c['area'], reverse=True)
|
||||
|
||||
# Print analysis
|
||||
print("\n" + "-"*80)
|
||||
print("Top 10 Components Analysis:")
|
||||
print("-"*80)
|
||||
print(f"{'ID':<4} {'Area':<6} {'H':<4} {'StrokeLen':<9} {'StrokeR':<7} {'Compact':<7} "
|
||||
f"{'Solid':<6} {'Score':<5} {'Type':<12}")
|
||||
print("-"*80)
|
||||
|
||||
for i, comp in enumerate(components_analysis[:10]):
|
||||
comp_type = "✅ Handwriting" if comp['is_handwriting'] else "❌ Printed"
|
||||
print(f"{comp['id']:<4} {comp['area']:<6} {comp['height']:<4} "
|
||||
f"{comp['stroke_length']:<9.0f} {comp['stroke_ratio']:<7.3f} "
|
||||
f"{comp['compactness']:<7.3f} {comp['solidity']:<6.3f} "
|
||||
f"{comp['handwriting_score']:>+5} {comp_type:<12}")
|
||||
|
||||
# Create masks
|
||||
handwriting_mask = np.zeros_like(binary)
|
||||
printed_mask = np.zeros_like(binary)
|
||||
|
||||
for comp in components_analysis:
|
||||
if comp['is_handwriting']:
|
||||
handwriting_mask = cv2.bitwise_or(handwriting_mask, comp['mask'])
|
||||
else:
|
||||
printed_mask = cv2.bitwise_or(printed_mask, comp['mask'])
|
||||
|
||||
# Statistics
|
||||
hw_count = sum(1 for c in components_analysis if c['is_handwriting'])
|
||||
pr_count = sum(1 for c in components_analysis if not c['is_handwriting'])
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("Classification Results:")
|
||||
print("="*80)
|
||||
print(f" Handwriting components: {hw_count}")
|
||||
print(f" Printed components: {pr_count}")
|
||||
print(f" Total: {len(components_analysis)}")
|
||||
|
||||
# Apply to original image
|
||||
result_handwriting = cv2.bitwise_and(image, image, mask=handwriting_mask)
|
||||
result_printed = cv2.bitwise_and(image, image, mask=printed_mask)
|
||||
|
||||
# Save results
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method3_handwriting_mask.png"), handwriting_mask)
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method3_printed_mask.png"), printed_mask)
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method3_handwriting_result.png"), result_handwriting)
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method3_printed_result.png"), result_printed)
|
||||
|
||||
# Create visualization
|
||||
vis_overlay = image.copy()
|
||||
vis_overlay[handwriting_mask > 0] = [0, 255, 0] # Green for handwriting
|
||||
vis_overlay[printed_mask > 0] = [0, 0, 255] # Red for printed
|
||||
vis_final = cv2.addWeighted(image, 0.6, vis_overlay, 0.4, 0)
|
||||
|
||||
# Add labels to visualization
|
||||
for comp in components_analysis[:15]: # Label top 15
|
||||
x, y, w, h = comp['box']
|
||||
cx, cy = x + w//2, y + h//2
|
||||
|
||||
color = (0, 255, 0) if comp['is_handwriting'] else (0, 0, 255)
|
||||
label = f"H{comp['handwriting_score']:+d}" if comp['is_handwriting'] else f"P{comp['handwriting_score']:+d}"
|
||||
|
||||
cv2.putText(vis_final, label, (cx-15, cy), cv2.FONT_HERSHEY_SIMPLEX, 0.4, color, 1)
|
||||
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method3_visualization.png"), vis_final)
|
||||
|
||||
print("\n📁 Saved results:")
|
||||
print(" - method3_handwriting_mask.png")
|
||||
print(" - method3_printed_mask.png")
|
||||
print(" - method3_handwriting_result.png")
|
||||
print(" - method3_printed_result.png")
|
||||
print(" - method3_visualization.png")
|
||||
|
||||
# Calculate content pixels
|
||||
hw_pixels = np.count_nonzero(handwriting_mask)
|
||||
pr_pixels = np.count_nonzero(printed_mask)
|
||||
total_pixels = np.count_nonzero(binary)
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("Pixel Distribution:")
|
||||
print("="*80)
|
||||
print(f" Total foreground: {total_pixels:6d} pixels (100.0%)")
|
||||
print(f" Handwriting: {hw_pixels:6d} pixels ({hw_pixels/total_pixels*100:5.1f}%)")
|
||||
print(f" Printed: {pr_pixels:6d} pixels ({pr_pixels/total_pixels*100:5.1f}%)")
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("Test completed!")
|
||||
print(f"Results: {OUTPUT_DIR}")
|
||||
print("="*80)
|
||||
|
||||
print("\n📊 Feature Analysis Summary:")
|
||||
print(" ✅ Size-based classification: Large characters → Handwriting")
|
||||
print(" ✅ Stroke length analysis: Long stroke ratio → Handwriting")
|
||||
print(" ✅ Regularity analysis: Irregular shapes → Handwriting")
|
||||
print("\nNext: Review visualization to tune thresholds if needed")
|
||||
272
test_opencv_separation.py
Normal file
272
test_opencv_separation.py
Normal file
@@ -0,0 +1,272 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test OpenCV methods to separate handwriting from printed text
|
||||
|
||||
Tests two methods:
|
||||
1. Stroke Width Analysis (笔画宽度分析)
|
||||
2. Connected Components + Shape Features (连通组件+形状特征)
|
||||
"""
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
|
||||
# Test image - contains both printed and handwritten
|
||||
TEST_IMAGE = "/Volumes/NV2/PDF-Processing/signature-image-output/paddleocr_improved/signature_02_original.png"
|
||||
OUTPUT_DIR = "/Volumes/NV2/PDF-Processing/signature-image-output/opencv_separation_test"
|
||||
|
||||
print("="*80)
|
||||
print("OpenCV Handwriting Separation Test")
|
||||
print("="*80)
|
||||
|
||||
# Create output directory
|
||||
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Load image
|
||||
print(f"\nLoading test image: {Path(TEST_IMAGE).name}")
|
||||
image = cv2.imread(TEST_IMAGE)
|
||||
if image is None:
|
||||
print(f"Error: Cannot load image from {TEST_IMAGE}")
|
||||
exit(1)
|
||||
|
||||
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
|
||||
print(f"Image size: {image.shape[1]}x{image.shape[0]}")
|
||||
|
||||
# Convert to grayscale
|
||||
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
|
||||
# Binarize
|
||||
_, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
|
||||
|
||||
# Save binary for reference
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "00_binary.png"), binary)
|
||||
print("\n📁 Saved: 00_binary.png")
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("METHOD 1: Stroke Width Analysis (笔画宽度分析)")
|
||||
print("="*80)
|
||||
|
||||
def method1_stroke_width(binary_img, threshold_values=[2.0, 3.0, 4.0, 5.0]):
|
||||
"""
|
||||
Method 1: Separate by stroke width using distance transform
|
||||
|
||||
Args:
|
||||
binary_img: Binary image (foreground = 255, background = 0)
|
||||
threshold_values: List of distance thresholds to test
|
||||
|
||||
Returns:
|
||||
List of (threshold, result_image) tuples
|
||||
"""
|
||||
results = []
|
||||
|
||||
# Calculate distance transform
|
||||
dist_transform = cv2.distanceTransform(binary_img, cv2.DIST_L2, 5)
|
||||
|
||||
# Normalize for visualization
|
||||
dist_normalized = cv2.normalize(dist_transform, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
|
||||
results.append(('distance_transform', dist_normalized))
|
||||
|
||||
print("\n Distance transform statistics:")
|
||||
print(f" Min: {dist_transform.min():.2f}")
|
||||
print(f" Max: {dist_transform.max():.2f}")
|
||||
print(f" Mean: {dist_transform.mean():.2f}")
|
||||
print(f" Median: {np.median(dist_transform):.2f}")
|
||||
|
||||
# Test different thresholds
|
||||
print("\n Testing different stroke width thresholds:")
|
||||
|
||||
for threshold in threshold_values:
|
||||
# Pixels with distance > threshold are considered "thick strokes" (handwriting)
|
||||
handwriting_mask = (dist_transform > threshold).astype(np.uint8) * 255
|
||||
|
||||
# Count pixels
|
||||
total_foreground = np.count_nonzero(binary_img)
|
||||
handwriting_pixels = np.count_nonzero(handwriting_mask)
|
||||
percentage = (handwriting_pixels / total_foreground * 100) if total_foreground > 0 else 0
|
||||
|
||||
print(f" Threshold {threshold:.1f}: {handwriting_pixels} pixels ({percentage:.1f}% of foreground)")
|
||||
|
||||
results.append((f'threshold_{threshold:.1f}', handwriting_mask))
|
||||
|
||||
return results
|
||||
|
||||
# Run Method 1
|
||||
method1_results = method1_stroke_width(binary, threshold_values=[2.0, 2.5, 3.0, 3.5, 4.0, 5.0])
|
||||
|
||||
# Save Method 1 results
|
||||
print("\n Saving results...")
|
||||
for name, result_img in method1_results:
|
||||
output_path = Path(OUTPUT_DIR) / f"method1_{name}.png"
|
||||
cv2.imwrite(str(output_path), result_img)
|
||||
print(f" 📁 {output_path.name}")
|
||||
|
||||
# Apply best threshold result to original image
|
||||
best_threshold = 3.0 # Will adjust based on visual inspection
|
||||
_, best_mask = [(n, r) for n, r in method1_results if f'threshold_{best_threshold}' in n][0]
|
||||
|
||||
# Dilate mask slightly to connect nearby strokes
|
||||
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
|
||||
best_mask_dilated = cv2.dilate(best_mask, kernel, iterations=1)
|
||||
|
||||
# Apply to color image
|
||||
result_method1 = cv2.bitwise_and(image, image, mask=best_mask_dilated)
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method1_final_result.png"), result_method1)
|
||||
print(f"\n 📁 Final result: method1_final_result.png (threshold={best_threshold})")
|
||||
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("METHOD 2: Connected Components + Shape Features (连通组件分析)")
|
||||
print("="*80)
|
||||
|
||||
def method2_component_analysis(binary_img, original_img):
|
||||
"""
|
||||
Method 2: Analyze each connected component's shape features
|
||||
|
||||
Printed text characteristics:
|
||||
- Regular bounding box (aspect ratio ~1:1)
|
||||
- Medium size (200-2000 pixels)
|
||||
- High circularity/compactness
|
||||
|
||||
Handwriting characteristics:
|
||||
- Irregular shapes
|
||||
- May be large (connected strokes)
|
||||
- Variable aspect ratios
|
||||
"""
|
||||
# Find connected components
|
||||
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(binary_img, connectivity=8)
|
||||
|
||||
print(f"\n Found {num_labels - 1} connected components")
|
||||
|
||||
# Create masks for different categories
|
||||
handwriting_mask = np.zeros_like(binary_img)
|
||||
printed_mask = np.zeros_like(binary_img)
|
||||
|
||||
# Analyze each component
|
||||
component_info = []
|
||||
|
||||
for i in range(1, num_labels): # Skip background (0)
|
||||
x, y, w, h, area = stats[i]
|
||||
|
||||
# Calculate features
|
||||
aspect_ratio = w / h if h > 0 else 0
|
||||
perimeter = cv2.arcLength(cv2.findContours((labels == i).astype(np.uint8),
|
||||
cv2.RETR_EXTERNAL,
|
||||
cv2.CHAIN_APPROX_SIMPLE)[0][0], True)
|
||||
compactness = (4 * np.pi * area) / (perimeter * perimeter) if perimeter > 0 else 0
|
||||
|
||||
# Classification logic
|
||||
# Printed text: medium size, regular aspect ratio, compact
|
||||
is_printed = (
|
||||
(200 < area < 3000) and # Medium size
|
||||
(0.3 < aspect_ratio < 3.0) and # Not too elongated
|
||||
(area < 1000) # Small to medium
|
||||
)
|
||||
|
||||
# Handwriting: larger, or irregular, or very wide/tall
|
||||
is_handwriting = (
|
||||
(area >= 3000) or # Large components (likely handwriting)
|
||||
(aspect_ratio > 3.0) or # Very elongated (连笔)
|
||||
(aspect_ratio < 0.3) or # Very tall
|
||||
not is_printed # Default to handwriting if not clearly printed
|
||||
)
|
||||
|
||||
component_info.append({
|
||||
'id': i,
|
||||
'area': area,
|
||||
'aspect_ratio': aspect_ratio,
|
||||
'compactness': compactness,
|
||||
'is_printed': is_printed,
|
||||
'is_handwriting': is_handwriting
|
||||
})
|
||||
|
||||
# Assign to mask
|
||||
if is_handwriting:
|
||||
handwriting_mask[labels == i] = 255
|
||||
if is_printed:
|
||||
printed_mask[labels == i] = 255
|
||||
|
||||
# Print statistics
|
||||
print("\n Component statistics:")
|
||||
handwriting_components = [c for c in component_info if c['is_handwriting']]
|
||||
printed_components = [c for c in component_info if c['is_printed']]
|
||||
|
||||
print(f" Handwriting components: {len(handwriting_components)}")
|
||||
print(f" Printed components: {len(printed_components)}")
|
||||
|
||||
# Show top 5 largest components
|
||||
print("\n Top 5 largest components:")
|
||||
sorted_components = sorted(component_info, key=lambda c: c['area'], reverse=True)
|
||||
for i, comp in enumerate(sorted_components[:5], 1):
|
||||
comp_type = "Handwriting" if comp['is_handwriting'] else "Printed"
|
||||
print(f" {i}. Area: {comp['area']:5d}, Aspect: {comp['aspect_ratio']:.2f}, "
|
||||
f"Type: {comp_type}")
|
||||
|
||||
return handwriting_mask, printed_mask, component_info
|
||||
|
||||
# Run Method 2
|
||||
handwriting_mask_m2, printed_mask_m2, components = method2_component_analysis(binary, image)
|
||||
|
||||
# Save Method 2 results
|
||||
print("\n Saving results...")
|
||||
|
||||
# Handwriting mask
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method2_handwriting_mask.png"), handwriting_mask_m2)
|
||||
print(f" 📁 method2_handwriting_mask.png")
|
||||
|
||||
# Printed mask
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method2_printed_mask.png"), printed_mask_m2)
|
||||
print(f" 📁 method2_printed_mask.png")
|
||||
|
||||
# Apply to original image
|
||||
result_handwriting = cv2.bitwise_and(image, image, mask=handwriting_mask_m2)
|
||||
result_printed = cv2.bitwise_and(image, image, mask=printed_mask_m2)
|
||||
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method2_handwriting_result.png"), result_handwriting)
|
||||
print(f" 📁 method2_handwriting_result.png")
|
||||
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method2_printed_result.png"), result_printed)
|
||||
print(f" 📁 method2_printed_result.png")
|
||||
|
||||
# Create visualization with component labels
|
||||
vis_components = cv2.cvtColor(binary, cv2.COLOR_GRAY2BGR)
|
||||
vis_components = cv2.cvtColor(vis_components, cv2.COLOR_BGR2RGB)
|
||||
|
||||
# Color code: green = handwriting, red = printed
|
||||
vis_overlay = image.copy()
|
||||
vis_overlay[handwriting_mask_m2 > 0] = [0, 255, 0] # Green for handwriting
|
||||
vis_overlay[printed_mask_m2 > 0] = [0, 0, 255] # Red for printed
|
||||
|
||||
# Blend with original
|
||||
vis_final = cv2.addWeighted(image, 0.6, vis_overlay, 0.4, 0)
|
||||
cv2.imwrite(str(Path(OUTPUT_DIR) / "method2_visualization.png"), vis_final)
|
||||
print(f" 📁 method2_visualization.png (green=handwriting, red=printed)")
|
||||
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("COMPARISON")
|
||||
print("="*80)
|
||||
|
||||
# Count non-white pixels in each result
|
||||
def count_content_pixels(img):
|
||||
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) if len(img.shape) == 3 else img
|
||||
return np.count_nonzero(gray > 10)
|
||||
|
||||
original_pixels = count_content_pixels(image)
|
||||
method1_pixels = count_content_pixels(result_method1)
|
||||
method2_pixels = count_content_pixels(result_handwriting)
|
||||
|
||||
print(f"\nContent pixels retained:")
|
||||
print(f" Original image: {original_pixels:6d} pixels")
|
||||
print(f" Method 1 (stroke): {method1_pixels:6d} pixels ({method1_pixels/original_pixels*100:.1f}%)")
|
||||
print(f" Method 2 (component): {method2_pixels:6d} pixels ({method2_pixels/original_pixels*100:.1f}%)")
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("Test completed!")
|
||||
print(f"Results saved to: {OUTPUT_DIR}")
|
||||
print("="*80)
|
||||
|
||||
print("\nNext steps:")
|
||||
print(" 1. Review the output images")
|
||||
print(" 2. Check which method better preserves handwriting")
|
||||
print(" 3. Adjust thresholds if needed")
|
||||
print(" 4. Choose the best method for production pipeline")
|
||||
102
test_paddleocr.py
Normal file
102
test_paddleocr.py
Normal file
@@ -0,0 +1,102 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Test PaddleOCR on a sample PDF page."""
|
||||
|
||||
import fitz # PyMuPDF
|
||||
from paddleocr import PaddleOCR
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
import cv2
|
||||
from pathlib import Path
|
||||
|
||||
# Configuration
|
||||
TEST_PDF = "/Volumes/NV2/PDF-Processing/signature-image-output/201301_1324_AI1_page3.pdf"
|
||||
DPI = 300
|
||||
|
||||
print("="*80)
|
||||
print("Testing PaddleOCR on macOS Apple Silicon")
|
||||
print("="*80)
|
||||
|
||||
# Step 1: Render PDF to image
|
||||
print("\n1. Rendering PDF to image...")
|
||||
try:
|
||||
doc = fitz.open(TEST_PDF)
|
||||
page = doc[0]
|
||||
mat = fitz.Matrix(DPI/72, DPI/72)
|
||||
pix = page.get_pixmap(matrix=mat)
|
||||
image = np.frombuffer(pix.samples, dtype=np.uint8).reshape(pix.height, pix.width, pix.n)
|
||||
|
||||
if pix.n == 4: # RGBA
|
||||
image = cv2.cvtColor(image, cv2.COLOR_RGBA2RGB)
|
||||
|
||||
print(f" ✅ Rendered: {image.shape[1]}x{image.shape[0]} pixels")
|
||||
doc.close()
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 2: Initialize PaddleOCR
|
||||
print("\n2. Initializing PaddleOCR...")
|
||||
print(" (First run will download models, may take a few minutes...)")
|
||||
try:
|
||||
# Use the correct syntax from official docs
|
||||
ocr = PaddleOCR(
|
||||
use_doc_orientation_classify=False,
|
||||
use_doc_unwarping=False,
|
||||
use_textline_orientation=False,
|
||||
lang='ch' # Chinese language
|
||||
)
|
||||
print(" ✅ PaddleOCR initialized successfully")
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
print("\n Note: PaddleOCR requires PaddlePaddle backend.")
|
||||
print(" If this is a module import error, PaddlePaddle may not support this platform.")
|
||||
exit(1)
|
||||
|
||||
# Step 3: Run OCR
|
||||
print("\n3. Running OCR to detect printed text...")
|
||||
try:
|
||||
result = ocr.ocr(image, cls=False)
|
||||
|
||||
if result and result[0]:
|
||||
print(f" ✅ Detected {len(result[0])} text regions")
|
||||
|
||||
# Show first few detections
|
||||
print("\n Sample detections:")
|
||||
for i, item in enumerate(result[0][:5]):
|
||||
box = item[0] # Bounding box coordinates
|
||||
text = item[1][0] # Detected text
|
||||
confidence = item[1][1] # Confidence score
|
||||
print(f" {i+1}. Text: '{text}' (confidence: {confidence:.2f})")
|
||||
print(f" Box: {box}")
|
||||
else:
|
||||
print(" ⚠️ No text detected")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Error during OCR: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
exit(1)
|
||||
|
||||
# Step 4: Visualize detection
|
||||
print("\n4. Creating visualization...")
|
||||
try:
|
||||
vis_image = image.copy()
|
||||
|
||||
if result and result[0]:
|
||||
for item in result[0]:
|
||||
box = np.array(item[0], dtype=np.int32)
|
||||
cv2.polylines(vis_image, [box], True, (0, 255, 0), 2)
|
||||
|
||||
# Save visualization
|
||||
output_path = "/Volumes/NV2/PDF-Processing/signature-image-output/paddleocr_test_detection.png"
|
||||
cv2.imwrite(output_path, cv2.cvtColor(vis_image, cv2.COLOR_RGB2BGR))
|
||||
print(f" ✅ Saved visualization: {output_path}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Error during visualization: {e}")
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("PaddleOCR test completed!")
|
||||
print("="*80)
|
||||
81
test_paddleocr_client.py
Normal file
81
test_paddleocr_client.py
Normal file
@@ -0,0 +1,81 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Test PaddleOCR client with a real PDF page."""
|
||||
|
||||
import fitz # PyMuPDF
|
||||
import numpy as np
|
||||
import cv2
|
||||
from paddleocr_client import create_ocr_client
|
||||
|
||||
# Test PDF
|
||||
TEST_PDF = "/Volumes/NV2/PDF-Processing/signature-image-output/201301_1324_AI1_page3.pdf"
|
||||
DPI = 300
|
||||
|
||||
print("="*80)
|
||||
print("Testing PaddleOCR Client with Real PDF")
|
||||
print("="*80)
|
||||
|
||||
# Step 1: Connect to server
|
||||
print("\n1. Connecting to PaddleOCR server...")
|
||||
try:
|
||||
client = create_ocr_client()
|
||||
print(f" ✅ Connected: {client.server_url}")
|
||||
except Exception as e:
|
||||
print(f" ❌ Connection failed: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 2: Render PDF
|
||||
print("\n2. Rendering PDF to image...")
|
||||
try:
|
||||
doc = fitz.open(TEST_PDF)
|
||||
page = doc[0]
|
||||
mat = fitz.Matrix(DPI/72, DPI/72)
|
||||
pix = page.get_pixmap(matrix=mat)
|
||||
image = np.frombuffer(pix.samples, dtype=np.uint8).reshape(pix.height, pix.width, pix.n)
|
||||
|
||||
if pix.n == 4: # RGBA
|
||||
image = cv2.cvtColor(image, cv2.COLOR_RGBA2RGB)
|
||||
|
||||
print(f" ✅ Rendered: {image.shape[1]}x{image.shape[0]} pixels")
|
||||
doc.close()
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
exit(1)
|
||||
|
||||
# Step 3: Run OCR
|
||||
print("\n3. Running OCR on image...")
|
||||
try:
|
||||
results = client.ocr(image)
|
||||
print(f" ✅ OCR successful!")
|
||||
print(f" Found {len(results)} text regions")
|
||||
|
||||
# Show first few results
|
||||
if results:
|
||||
print("\n Sample detections:")
|
||||
for i, result in enumerate(results[:5]):
|
||||
text = result['text']
|
||||
confidence = result['confidence']
|
||||
print(f" {i+1}. '{text}' (confidence: {confidence:.2f})")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ OCR failed: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
exit(1)
|
||||
|
||||
# Step 4: Get bounding boxes
|
||||
print("\n4. Getting text bounding boxes...")
|
||||
try:
|
||||
boxes = client.get_text_boxes(image)
|
||||
print(f" ✅ Got {len(boxes)} bounding boxes")
|
||||
|
||||
if boxes:
|
||||
print(" Sample boxes (x, y, w, h):")
|
||||
for i, box in enumerate(boxes[:3]):
|
||||
print(f" {i+1}. {box}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {e}")
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("Test completed successfully!")
|
||||
print("="*80)
|
||||
254
test_pp_ocrv5_api.py
Normal file
254
test_pp_ocrv5_api.py
Normal file
@@ -0,0 +1,254 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
測試 PP-OCRv5 API 的基礎功能
|
||||
|
||||
目標:
|
||||
1. 驗證正確的 API 調用方式
|
||||
2. 查看完整的返回數據結構
|
||||
3. 對比 v4 和 v5 的檢測結果
|
||||
4. 確認是否有手寫分類功能
|
||||
"""
|
||||
|
||||
import sys
|
||||
import json
|
||||
import pprint
|
||||
from pathlib import Path
|
||||
|
||||
# 測試圖片路徑
|
||||
TEST_IMAGE = "/Volumes/NV2/pdf_recognize/test_images/page_0.png"
|
||||
|
||||
|
||||
def test_basic_import():
|
||||
"""測試基礎導入"""
|
||||
print("=" * 60)
|
||||
print("測試 1: 基礎導入")
|
||||
print("=" * 60)
|
||||
|
||||
try:
|
||||
from paddleocr import PaddleOCR
|
||||
print("✅ 成功導入 PaddleOCR")
|
||||
return True
|
||||
except ImportError as e:
|
||||
print(f"❌ 導入失敗: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def test_model_initialization():
|
||||
"""測試模型初始化"""
|
||||
print("\n" + "=" * 60)
|
||||
print("測試 2: 模型初始化")
|
||||
print("=" * 60)
|
||||
|
||||
try:
|
||||
from paddleocr import PaddleOCR
|
||||
|
||||
print("\n初始化 PP-OCRv5...")
|
||||
ocr = PaddleOCR(
|
||||
text_detection_model_name="PP-OCRv5_server_det",
|
||||
text_recognition_model_name="PP-OCRv5_server_rec",
|
||||
use_doc_orientation_classify=False,
|
||||
use_doc_unwarping=False,
|
||||
use_textline_orientation=False,
|
||||
show_log=True
|
||||
)
|
||||
|
||||
print("✅ 模型初始化成功")
|
||||
return ocr
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 初始化失敗: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
|
||||
|
||||
def test_prediction(ocr):
|
||||
"""測試預測功能"""
|
||||
print("\n" + "=" * 60)
|
||||
print("測試 3: 預測功能")
|
||||
print("=" * 60)
|
||||
|
||||
if not Path(TEST_IMAGE).exists():
|
||||
print(f"❌ 測試圖片不存在: {TEST_IMAGE}")
|
||||
return None
|
||||
|
||||
try:
|
||||
print(f"\n預測圖片: {TEST_IMAGE}")
|
||||
result = ocr.predict(TEST_IMAGE)
|
||||
|
||||
print(f"✅ 預測成功,返回 {len(result)} 個結果")
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 預測失敗: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
|
||||
|
||||
def analyze_result_structure(result):
|
||||
"""分析返回結果的完整結構"""
|
||||
print("\n" + "=" * 60)
|
||||
print("測試 4: 分析返回結果結構")
|
||||
print("=" * 60)
|
||||
|
||||
if not result:
|
||||
print("❌ 沒有結果可分析")
|
||||
return
|
||||
|
||||
# 獲取第一個結果
|
||||
first_result = result[0]
|
||||
|
||||
print("\n結果類型:", type(first_result))
|
||||
print("結果屬性:", dir(first_result))
|
||||
|
||||
# 查看是否有 json 屬性
|
||||
if hasattr(first_result, 'json'):
|
||||
print("\n✅ 找到 .json 屬性")
|
||||
json_data = first_result.json
|
||||
|
||||
print("\nJSON 數據鍵值:")
|
||||
for key in json_data.keys():
|
||||
print(f" - {key}: {type(json_data[key])}")
|
||||
|
||||
# 檢查是否有手寫分類相關字段
|
||||
print("\n查找手寫分類字段...")
|
||||
handwriting_related_keys = [
|
||||
k for k in json_data.keys()
|
||||
if any(word in k.lower() for word in ['handwriting', 'handwritten', 'type', 'class', 'category'])
|
||||
]
|
||||
|
||||
if handwriting_related_keys:
|
||||
print(f"✅ 找到可能相關的字段: {handwriting_related_keys}")
|
||||
for key in handwriting_related_keys:
|
||||
print(f" {key}: {json_data[key]}")
|
||||
else:
|
||||
print("❌ 未找到手寫分類相關字段")
|
||||
|
||||
# 打印部分檢測結果
|
||||
if 'rec_texts' in json_data and json_data['rec_texts']:
|
||||
print("\n檢測到的文字 (前 5 個):")
|
||||
for i, text in enumerate(json_data['rec_texts'][:5]):
|
||||
box = json_data['rec_boxes'][i] if 'rec_boxes' in json_data else None
|
||||
score = json_data['rec_scores'][i] if 'rec_scores' in json_data else None
|
||||
print(f" [{i}] 文字: {text}")
|
||||
print(f" 分數: {score}")
|
||||
print(f" 位置: {box}")
|
||||
|
||||
# 保存完整 JSON 到文件
|
||||
output_path = "/Volumes/NV2/pdf_recognize/test_results/pp_ocrv5_result.json"
|
||||
Path(output_path).parent.mkdir(exist_ok=True)
|
||||
|
||||
with open(output_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(json_data, f, ensure_ascii=False, indent=2, default=str)
|
||||
|
||||
print(f"\n✅ 完整結果已保存到: {output_path}")
|
||||
|
||||
return json_data
|
||||
|
||||
else:
|
||||
print("❌ 沒有找到 .json 屬性")
|
||||
print("\n直接打印結果:")
|
||||
pprint.pprint(first_result)
|
||||
|
||||
|
||||
def compare_with_v4():
|
||||
"""對比 v4 和 v5 的結果"""
|
||||
print("\n" + "=" * 60)
|
||||
print("測試 5: 對比 v4 和 v5")
|
||||
print("=" * 60)
|
||||
|
||||
try:
|
||||
from paddleocr import PaddleOCR
|
||||
|
||||
# v4
|
||||
print("\n初始化 PP-OCRv4...")
|
||||
ocr_v4 = PaddleOCR(
|
||||
ocr_version="PP-OCRv4",
|
||||
use_doc_orientation_classify=False,
|
||||
show_log=False
|
||||
)
|
||||
|
||||
print("預測 v4...")
|
||||
result_v4 = ocr_v4.predict(TEST_IMAGE)
|
||||
json_v4 = result_v4[0].json if hasattr(result_v4[0], 'json') else None
|
||||
|
||||
# v5
|
||||
print("\n初始化 PP-OCRv5...")
|
||||
ocr_v5 = PaddleOCR(
|
||||
text_detection_model_name="PP-OCRv5_server_det",
|
||||
text_recognition_model_name="PP-OCRv5_server_rec",
|
||||
use_doc_orientation_classify=False,
|
||||
show_log=False
|
||||
)
|
||||
|
||||
print("預測 v5...")
|
||||
result_v5 = ocr_v5.predict(TEST_IMAGE)
|
||||
json_v5 = result_v5[0].json if hasattr(result_v5[0], 'json') else None
|
||||
|
||||
# 對比
|
||||
if json_v4 and json_v5:
|
||||
print("\n對比結果:")
|
||||
print(f" v4 檢測到 {len(json_v4.get('rec_texts', []))} 個文字區域")
|
||||
print(f" v5 檢測到 {len(json_v5.get('rec_texts', []))} 個文字區域")
|
||||
|
||||
# 保存對比結果
|
||||
comparison = {
|
||||
"v4": {
|
||||
"count": len(json_v4.get('rec_texts', [])),
|
||||
"texts": json_v4.get('rec_texts', [])[:10], # 前 10 個
|
||||
"scores": json_v4.get('rec_scores', [])[:10]
|
||||
},
|
||||
"v5": {
|
||||
"count": len(json_v5.get('rec_texts', [])),
|
||||
"texts": json_v5.get('rec_texts', [])[:10],
|
||||
"scores": json_v5.get('rec_scores', [])[:10]
|
||||
}
|
||||
}
|
||||
|
||||
output_path = "/Volumes/NV2/pdf_recognize/test_results/v4_vs_v5_comparison.json"
|
||||
with open(output_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(comparison, f, ensure_ascii=False, indent=2, default=str)
|
||||
|
||||
print(f"\n✅ 對比結果已保存到: {output_path}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 對比失敗: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
def main():
|
||||
"""主測試流程"""
|
||||
print("開始測試 PP-OCRv5 API\n")
|
||||
|
||||
# 測試 1: 導入
|
||||
if not test_basic_import():
|
||||
print("\n❌ 導入失敗,無法繼續測試")
|
||||
return
|
||||
|
||||
# 測試 2: 初始化
|
||||
ocr = test_model_initialization()
|
||||
if not ocr:
|
||||
print("\n❌ 初始化失敗,無法繼續測試")
|
||||
return
|
||||
|
||||
# 測試 3: 預測
|
||||
result = test_prediction(ocr)
|
||||
if not result:
|
||||
print("\n❌ 預測失敗,無法繼續測試")
|
||||
return
|
||||
|
||||
# 測試 4: 分析結構
|
||||
json_data = analyze_result_structure(result)
|
||||
|
||||
# 測試 5: 對比 v4 和 v5
|
||||
compare_with_v4()
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("測試完成")
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
58
test_results/v5_analysis_report.txt
Normal file
58
test_results/v5_analysis_report.txt
Normal file
@@ -0,0 +1,58 @@
|
||||
PP-OCRv5 檢測結果詳細報告
|
||||
================================================================================
|
||||
|
||||
總數: 50
|
||||
平均置信度: 0.4579
|
||||
|
||||
完整檢測列表:
|
||||
--------------------------------------------------------------------------------
|
||||
[ 0] 0.8783 202x100 KPMG
|
||||
[ 1] 0.9936 1931x 62 依本會計師核閱結果,除第三段及第四段所述該等被投資公司財務季報告倘經會計師核閱
|
||||
[ 2] 0.9976 2013x 62 ,對第一段所述合併財務季報告可能有所調整之影響外,並未發現第一段所述合併財務季報告
|
||||
[ 3] 0.9815 2025x 62 在所有重大方面有違反證券發行人財務報告編製準則及金融監督管理委員會認可之國際會計準
|
||||
[ 4] 0.9912 1125x 56 則第三十四號「期中財務報導」而須作修正之情事。
|
||||
[ 5] 0.9712 872x 61 安侯建業聯合會計師事務所
|
||||
[ 6] 0.9123 174x203 寶
|
||||
[ 7] 0.8466 166x179 蓮
|
||||
[ 8] 0.0000 36x 18
|
||||
[ 9] 0.9968 175x193 周
|
||||
[10] 0.0000 33x 69
|
||||
[11] 0.2521 7x 12 5
|
||||
[12] 0.0000 35x 13
|
||||
[13] 0.0000 28x 10
|
||||
[14] 0.4726 12x 9 vA
|
||||
[15] 0.1788 9x 11 上
|
||||
[16] 0.0000 38x 14
|
||||
[17] 0.4133 21x 8 R-
|
||||
[18] 0.4681 15x 8 40
|
||||
[19] 0.0000 38x 13
|
||||
[20] 0.5587 16x 7 GAN
|
||||
[21] 0.9623 291x 61 會計師:
|
||||
[22] 0.9893 213x234 魏
|
||||
[23] 0.1751 190x174 興
|
||||
[24] 0.8862 180x191 海
|
||||
[25] 0.0000 65x 17
|
||||
[26] 0.5110 27x 7 U
|
||||
[27] 0.1669 10x 8 2
|
||||
[28] 0.4839 39x 10 eredooos
|
||||
[29] 0.1775 10x 24 B
|
||||
[30] 0.4896 29x 10 n
|
||||
[31] 0.3774 7x 7 1
|
||||
[32] 0.0000 34x 14
|
||||
[33] 0.0000 7x 15
|
||||
[34] 0.0000 12x 38
|
||||
[35] 0.8701 22x 11 0
|
||||
[36] 0.2034 8x 23 40
|
||||
[37] 0.0000 20x 12
|
||||
[38] 0.0000 29x 10
|
||||
[39] 0.0970 9x 10 m
|
||||
[40] 0.3102 20x 7 A
|
||||
[41] 0.0000 34x 6
|
||||
[42] 0.2435 21x 6 专
|
||||
[43] 0.3260 41x 15 o
|
||||
[44] 0.0000 31x 7
|
||||
[45] 0.9769 960x 73 證券主管機關.金管證六字第0940100754號
|
||||
[46] 0.9747 899x 60 核准簽證文號(88)台財證(六)第18311號
|
||||
[47] 0.9205 824x 67 民國一〇二年五月二
|
||||
[48] 0.9996 47x 46 日
|
||||
[49] 0.8414 173x 62 ~3-1~
|
||||
20
test_results/v5_pipeline/SUMMARY.txt
Normal file
20
test_results/v5_pipeline/SUMMARY.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
|
||||
PP-OCRv5 完整 Pipeline 測試結果
|
||||
============================================================
|
||||
|
||||
1. OCR 檢測: 50 個文字區域
|
||||
2. 遮罩印刷文字: /Volumes/NV2/pdf_recognize/test_results/v5_pipeline/01_masked.png
|
||||
3. 檢測候選區域: 7 個
|
||||
4. 提取簽名: 7 個
|
||||
|
||||
候選區域詳情:
|
||||
------------------------------------------------------------
|
||||
Region 1: 位置(1218, 877), 大小1144x511, 面積=584584
|
||||
Region 2: 位置(1213, 1457), 大小961x196, 面積=188356
|
||||
Region 3: 位置(228, 386), 大小2028x209, 面積=423852
|
||||
Region 4: 位置(330, 310), 大小1932x63, 面積=121716
|
||||
Region 5: 位置(1990, 945), 大小375x212, 面積=79500
|
||||
Region 6: 位置(327, 145), 大小203x101, 面積=20503
|
||||
Region 7: 位置(1139, 3289), 大小174x63, 面積=10962
|
||||
|
||||
所有結果保存在: /Volumes/NV2/pdf_recognize/test_results/v5_pipeline
|
||||
2283
test_results/v5_result.json
Normal file
2283
test_results/v5_result.json
Normal file
File diff suppressed because it is too large
Load Diff
290
test_v4_full_pipeline.py
Normal file
290
test_v4_full_pipeline.py
Normal file
@@ -0,0 +1,290 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
使用 PaddleOCR v2.7.3 (v4) 跑完整的簽名提取 pipeline
|
||||
與 v5 對比
|
||||
"""
|
||||
|
||||
import sys
|
||||
import json
|
||||
import cv2
|
||||
import numpy as np
|
||||
import requests
|
||||
from pathlib import Path
|
||||
|
||||
# 配置
|
||||
OCR_SERVER = "http://192.168.30.36:5555"
|
||||
OUTPUT_DIR = Path("/Volumes/NV2/pdf_recognize/signature-comparison/v4-current")
|
||||
MASKING_PADDING = 0
|
||||
|
||||
|
||||
def setup_output_dir():
|
||||
"""創建輸出目錄"""
|
||||
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
print(f"輸出目錄: {OUTPUT_DIR}")
|
||||
|
||||
|
||||
def get_page_image():
|
||||
"""獲取測試頁面圖片"""
|
||||
test_image = "/Volumes/NV2/pdf_recognize/full_page_original.png"
|
||||
if Path(test_image).exists():
|
||||
return cv2.imread(test_image)
|
||||
else:
|
||||
print(f"❌ 測試圖片不存在: {test_image}")
|
||||
return None
|
||||
|
||||
|
||||
def call_ocr_server(image):
|
||||
"""調用服務器端的 PaddleOCR v2.7.3"""
|
||||
print("\n調用 PaddleOCR v2.7.3 服務器...")
|
||||
|
||||
try:
|
||||
import base64
|
||||
_, buffer = cv2.imencode('.png', image)
|
||||
img_base64 = base64.b64encode(buffer).decode('utf-8')
|
||||
|
||||
response = requests.post(
|
||||
f"{OCR_SERVER}/ocr",
|
||||
json={'image': img_base64},
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
print(f"✅ OCR 完成,檢測到 {len(result.get('results', []))} 個文字區域")
|
||||
return result.get('results', [])
|
||||
else:
|
||||
print(f"❌ 服務器錯誤: {response.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ OCR 調用失敗: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
|
||||
|
||||
def mask_printed_text(image, ocr_results):
|
||||
"""遮罩印刷文字"""
|
||||
print("\n遮罩印刷文字...")
|
||||
|
||||
masked_image = image.copy()
|
||||
|
||||
for i, result in enumerate(ocr_results):
|
||||
box = result.get('box')
|
||||
if box is None:
|
||||
continue
|
||||
|
||||
# v2.7.3 返回多邊形格式: [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
|
||||
# 轉換為矩形
|
||||
box_points = np.array(box)
|
||||
x_min = int(box_points[:, 0].min())
|
||||
y_min = int(box_points[:, 1].min())
|
||||
x_max = int(box_points[:, 0].max())
|
||||
y_max = int(box_points[:, 1].max())
|
||||
|
||||
cv2.rectangle(
|
||||
masked_image,
|
||||
(x_min - MASKING_PADDING, y_min - MASKING_PADDING),
|
||||
(x_max + MASKING_PADDING, y_max + MASKING_PADDING),
|
||||
(0, 0, 0),
|
||||
-1
|
||||
)
|
||||
|
||||
masked_path = OUTPUT_DIR / "01_masked.png"
|
||||
cv2.imwrite(str(masked_path), masked_image)
|
||||
print(f"✅ 遮罩完成: {masked_path}")
|
||||
|
||||
return masked_image
|
||||
|
||||
|
||||
def detect_regions(masked_image):
|
||||
"""檢測候選區域"""
|
||||
print("\n檢測候選區域...")
|
||||
|
||||
gray = cv2.cvtColor(masked_image, cv2.COLOR_BGR2GRAY)
|
||||
_, binary = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY_INV)
|
||||
|
||||
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
|
||||
morphed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=2)
|
||||
|
||||
cv2.imwrite(str(OUTPUT_DIR / "02_binary.png"), binary)
|
||||
cv2.imwrite(str(OUTPUT_DIR / "03_morphed.png"), morphed)
|
||||
|
||||
contours, _ = cv2.findContours(morphed, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||
|
||||
MIN_AREA = 3000
|
||||
MAX_AREA = 300000
|
||||
|
||||
candidate_regions = []
|
||||
for contour in contours:
|
||||
area = cv2.contourArea(contour)
|
||||
if MIN_AREA <= area <= MAX_AREA:
|
||||
x, y, w, h = cv2.boundingRect(contour)
|
||||
aspect_ratio = w / h if h > 0 else 0
|
||||
|
||||
candidate_regions.append({
|
||||
'box': (x, y, w, h),
|
||||
'area': area,
|
||||
'aspect_ratio': aspect_ratio
|
||||
})
|
||||
|
||||
candidate_regions.sort(key=lambda r: r['area'], reverse=True)
|
||||
|
||||
print(f"✅ 找到 {len(candidate_regions)} 個候選區域")
|
||||
|
||||
return candidate_regions
|
||||
|
||||
|
||||
def merge_nearby_regions(regions, h_distance=100, v_distance=50):
|
||||
"""合併鄰近區域"""
|
||||
print("\n合併鄰近區域...")
|
||||
|
||||
if not regions:
|
||||
return []
|
||||
|
||||
merged = []
|
||||
used = set()
|
||||
|
||||
for i, r1 in enumerate(regions):
|
||||
if i in used:
|
||||
continue
|
||||
|
||||
x1, y1, w1, h1 = r1['box']
|
||||
merged_box = [x1, y1, x1 + w1, y1 + h1]
|
||||
group = [i]
|
||||
|
||||
for j, r2 in enumerate(regions):
|
||||
if j <= i or j in used:
|
||||
continue
|
||||
|
||||
x2, y2, w2, h2 = r2['box']
|
||||
|
||||
h_dist = min(abs(x1 - (x2 + w2)), abs((x1 + w1) - x2))
|
||||
v_dist = min(abs(y1 - (y2 + h2)), abs((y1 + h1) - y2))
|
||||
|
||||
x_overlap = not (x1 + w1 < x2 or x2 + w2 < x1)
|
||||
y_overlap = not (y1 + h1 < y2 or y2 + h2 < y1)
|
||||
|
||||
if (x_overlap and v_dist <= v_distance) or (y_overlap and h_dist <= h_distance):
|
||||
merged_box[0] = min(merged_box[0], x2)
|
||||
merged_box[1] = min(merged_box[1], y2)
|
||||
merged_box[2] = max(merged_box[2], x2 + w2)
|
||||
merged_box[3] = max(merged_box[3], y2 + h2)
|
||||
group.append(j)
|
||||
used.add(j)
|
||||
|
||||
used.add(i)
|
||||
|
||||
x, y = merged_box[0], merged_box[1]
|
||||
w, h = merged_box[2] - merged_box[0], merged_box[3] - merged_box[1]
|
||||
|
||||
merged.append({
|
||||
'box': (x, y, w, h),
|
||||
'area': w * h,
|
||||
'merged_count': len(group)
|
||||
})
|
||||
|
||||
print(f"✅ 合併後剩餘 {len(merged)} 個區域")
|
||||
|
||||
return merged
|
||||
|
||||
|
||||
def extract_signatures(image, regions):
|
||||
"""提取簽名區域"""
|
||||
print("\n提取簽名區域...")
|
||||
|
||||
vis_image = image.copy()
|
||||
|
||||
for i, region in enumerate(regions):
|
||||
x, y, w, h = region['box']
|
||||
|
||||
cv2.rectangle(vis_image, (x, y), (x + w, y + h), (0, 255, 0), 3)
|
||||
cv2.putText(vis_image, f"Region {i+1}", (x, y - 10),
|
||||
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
|
||||
signature = image[y:y+h, x:x+w]
|
||||
sig_path = OUTPUT_DIR / f"signature_{i+1}.png"
|
||||
cv2.imwrite(str(sig_path), signature)
|
||||
print(f" Region {i+1}: {w}x{h} 像素, 面積={region['area']}")
|
||||
|
||||
vis_path = OUTPUT_DIR / "04_detected_regions.png"
|
||||
cv2.imwrite(str(vis_path), vis_image)
|
||||
print(f"\n✅ 標註圖已保存: {vis_path}")
|
||||
|
||||
return vis_image
|
||||
|
||||
|
||||
def generate_summary(ocr_count, regions):
|
||||
"""生成摘要報告"""
|
||||
summary = f"""
|
||||
PaddleOCR v2.7.3 (v4) 完整 Pipeline 測試結果
|
||||
{'=' * 60}
|
||||
|
||||
1. OCR 檢測: {ocr_count} 個文字區域
|
||||
2. 遮罩印刷文字: 完成
|
||||
3. 檢測候選區域: {len(regions)} 個
|
||||
4. 提取簽名: {len(regions)} 個
|
||||
|
||||
候選區域詳情:
|
||||
{'-' * 60}
|
||||
"""
|
||||
|
||||
for i, region in enumerate(regions):
|
||||
x, y, w, h = region['box']
|
||||
area = region['area']
|
||||
summary += f"Region {i+1}: 位置({x}, {y}), 大小{w}x{h}, 面積={area}\n"
|
||||
|
||||
summary += f"\n所有結果保存在: {OUTPUT_DIR}\n"
|
||||
|
||||
return summary
|
||||
|
||||
|
||||
def main():
|
||||
print("=" * 60)
|
||||
print("PaddleOCR v2.7.3 (v4) 完整 Pipeline 測試")
|
||||
print("=" * 60)
|
||||
|
||||
setup_output_dir()
|
||||
|
||||
print("\n1. 讀取測試圖片...")
|
||||
image = get_page_image()
|
||||
if image is None:
|
||||
return
|
||||
print(f" 圖片大小: {image.shape}")
|
||||
|
||||
cv2.imwrite(str(OUTPUT_DIR / "00_original.png"), image)
|
||||
|
||||
print("\n2. PaddleOCR v2.7.3 檢測文字...")
|
||||
ocr_results = call_ocr_server(image)
|
||||
if ocr_results is None:
|
||||
print("❌ OCR 失敗,終止測試")
|
||||
return
|
||||
|
||||
print("\n3. 遮罩印刷文字...")
|
||||
masked_image = mask_printed_text(image, ocr_results)
|
||||
|
||||
print("\n4. 檢測候選區域...")
|
||||
regions = detect_regions(masked_image)
|
||||
|
||||
print("\n5. 合併鄰近區域...")
|
||||
merged_regions = merge_nearby_regions(regions)
|
||||
|
||||
print("\n6. 提取簽名...")
|
||||
vis_image = extract_signatures(image, merged_regions)
|
||||
|
||||
print("\n7. 生成摘要報告...")
|
||||
summary = generate_summary(len(ocr_results), merged_regions)
|
||||
print(summary)
|
||||
|
||||
summary_path = OUTPUT_DIR / "SUMMARY.txt"
|
||||
with open(summary_path, 'w', encoding='utf-8') as f:
|
||||
f.write(summary)
|
||||
|
||||
print("=" * 60)
|
||||
print("✅ v4 測試完成!")
|
||||
print(f"結果目錄: {OUTPUT_DIR}")
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
322
test_v5_full_pipeline.py
Normal file
322
test_v5_full_pipeline.py
Normal file
@@ -0,0 +1,322 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
使用 PP-OCRv5 跑完整的簽名提取 pipeline
|
||||
|
||||
流程:
|
||||
1. 使用服務器上的 PP-OCRv5 檢測文字
|
||||
2. 遮罩印刷文字
|
||||
3. 檢測候選區域
|
||||
4. 提取簽名
|
||||
"""
|
||||
|
||||
import sys
|
||||
import json
|
||||
import cv2
|
||||
import numpy as np
|
||||
import requests
|
||||
from pathlib import Path
|
||||
|
||||
# 配置
|
||||
OCR_SERVER = "http://192.168.30.36:5555"
|
||||
PDF_PATH = "/Volumes/NV2/pdf_recognize/test.pdf"
|
||||
OUTPUT_DIR = Path("/Volumes/NV2/pdf_recognize/test_results/v5_pipeline")
|
||||
MASKING_PADDING = 0
|
||||
|
||||
|
||||
def setup_output_dir():
|
||||
"""創建輸出目錄"""
|
||||
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
print(f"輸出目錄: {OUTPUT_DIR}")
|
||||
|
||||
|
||||
def get_page_image():
|
||||
"""獲取測試頁面圖片"""
|
||||
# 使用已有的測試圖片
|
||||
test_image = "/Volumes/NV2/pdf_recognize/full_page_original.png"
|
||||
if Path(test_image).exists():
|
||||
return cv2.imread(test_image)
|
||||
else:
|
||||
print(f"❌ 測試圖片不存在: {test_image}")
|
||||
return None
|
||||
|
||||
|
||||
def call_ocr_server(image):
|
||||
"""調用服務器端的 PP-OCRv5"""
|
||||
print("\n調用 PP-OCRv5 服務器...")
|
||||
|
||||
try:
|
||||
# 編碼圖片
|
||||
import base64
|
||||
_, buffer = cv2.imencode('.png', image)
|
||||
img_base64 = base64.b64encode(buffer).decode('utf-8')
|
||||
|
||||
# 發送請求
|
||||
response = requests.post(
|
||||
f"{OCR_SERVER}/ocr",
|
||||
json={'image': img_base64},
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
print(f"✅ OCR 完成,檢測到 {len(result.get('results', []))} 個文字區域")
|
||||
return result.get('results', [])
|
||||
else:
|
||||
print(f"❌ 服務器錯誤: {response.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ OCR 調用失敗: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
|
||||
|
||||
def mask_printed_text(image, ocr_results):
|
||||
"""遮罩印刷文字"""
|
||||
print("\n遮罩印刷文字...")
|
||||
|
||||
masked_image = image.copy()
|
||||
|
||||
for i, result in enumerate(ocr_results):
|
||||
box = result.get('box')
|
||||
if box is None:
|
||||
continue
|
||||
|
||||
# box 格式: [x, y, w, h]
|
||||
x, y, w, h = box
|
||||
|
||||
# 遮罩(黑色矩形)
|
||||
cv2.rectangle(
|
||||
masked_image,
|
||||
(x - MASKING_PADDING, y - MASKING_PADDING),
|
||||
(x + w + MASKING_PADDING, y + h + MASKING_PADDING),
|
||||
(0, 0, 0),
|
||||
-1
|
||||
)
|
||||
|
||||
# 保存遮罩後的圖片
|
||||
masked_path = OUTPUT_DIR / "01_masked.png"
|
||||
cv2.imwrite(str(masked_path), masked_image)
|
||||
print(f"✅ 遮罩完成: {masked_path}")
|
||||
|
||||
return masked_image
|
||||
|
||||
|
||||
def detect_regions(masked_image):
|
||||
"""檢測候選區域"""
|
||||
print("\n檢測候選區域...")
|
||||
|
||||
# 轉灰度
|
||||
gray = cv2.cvtColor(masked_image, cv2.COLOR_BGR2GRAY)
|
||||
|
||||
# 二值化
|
||||
_, binary = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY_INV)
|
||||
|
||||
# 形態學操作
|
||||
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
|
||||
morphed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=2)
|
||||
|
||||
# 保存中間結果
|
||||
cv2.imwrite(str(OUTPUT_DIR / "02_binary.png"), binary)
|
||||
cv2.imwrite(str(OUTPUT_DIR / "03_morphed.png"), morphed)
|
||||
|
||||
# 找輪廓
|
||||
contours, _ = cv2.findContours(morphed, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||
|
||||
# 過濾候選區域
|
||||
MIN_AREA = 3000
|
||||
MAX_AREA = 300000
|
||||
|
||||
candidate_regions = []
|
||||
for contour in contours:
|
||||
area = cv2.contourArea(contour)
|
||||
if MIN_AREA <= area <= MAX_AREA:
|
||||
x, y, w, h = cv2.boundingRect(contour)
|
||||
aspect_ratio = w / h if h > 0 else 0
|
||||
|
||||
candidate_regions.append({
|
||||
'box': (x, y, w, h),
|
||||
'area': area,
|
||||
'aspect_ratio': aspect_ratio
|
||||
})
|
||||
|
||||
# 按面積排序
|
||||
candidate_regions.sort(key=lambda r: r['area'], reverse=True)
|
||||
|
||||
print(f"✅ 找到 {len(candidate_regions)} 個候選區域")
|
||||
|
||||
return candidate_regions
|
||||
|
||||
|
||||
def merge_nearby_regions(regions, h_distance=100, v_distance=50):
|
||||
"""合併鄰近區域"""
|
||||
print("\n合併鄰近區域...")
|
||||
|
||||
if not regions:
|
||||
return []
|
||||
|
||||
merged = []
|
||||
used = set()
|
||||
|
||||
for i, r1 in enumerate(regions):
|
||||
if i in used:
|
||||
continue
|
||||
|
||||
x1, y1, w1, h1 = r1['box']
|
||||
merged_box = [x1, y1, x1 + w1, y1 + h1] # [x_min, y_min, x_max, y_max]
|
||||
group = [i]
|
||||
|
||||
for j, r2 in enumerate(regions):
|
||||
if j <= i or j in used:
|
||||
continue
|
||||
|
||||
x2, y2, w2, h2 = r2['box']
|
||||
|
||||
# 計算距離
|
||||
h_dist = min(abs(x1 - (x2 + w2)), abs((x1 + w1) - x2))
|
||||
v_dist = min(abs(y1 - (y2 + h2)), abs((y1 + h1) - y2))
|
||||
|
||||
# 檢查重疊或接近
|
||||
x_overlap = not (x1 + w1 < x2 or x2 + w2 < x1)
|
||||
y_overlap = not (y1 + h1 < y2 or y2 + h2 < y1)
|
||||
|
||||
if (x_overlap and v_dist <= v_distance) or (y_overlap and h_dist <= h_distance):
|
||||
# 合併
|
||||
merged_box[0] = min(merged_box[0], x2)
|
||||
merged_box[1] = min(merged_box[1], y2)
|
||||
merged_box[2] = max(merged_box[2], x2 + w2)
|
||||
merged_box[3] = max(merged_box[3], y2 + h2)
|
||||
group.append(j)
|
||||
used.add(j)
|
||||
|
||||
used.add(i)
|
||||
|
||||
# 轉回 (x, y, w, h) 格式
|
||||
x, y = merged_box[0], merged_box[1]
|
||||
w, h = merged_box[2] - merged_box[0], merged_box[3] - merged_box[1]
|
||||
|
||||
merged.append({
|
||||
'box': (x, y, w, h),
|
||||
'area': w * h,
|
||||
'merged_count': len(group)
|
||||
})
|
||||
|
||||
print(f"✅ 合併後剩餘 {len(merged)} 個區域")
|
||||
|
||||
return merged
|
||||
|
||||
|
||||
def extract_signatures(image, regions):
|
||||
"""提取簽名區域"""
|
||||
print("\n提取簽名區域...")
|
||||
|
||||
# 在圖片上標註所有區域
|
||||
vis_image = image.copy()
|
||||
|
||||
for i, region in enumerate(regions):
|
||||
x, y, w, h = region['box']
|
||||
|
||||
# 繪製框
|
||||
cv2.rectangle(vis_image, (x, y), (x + w, y + h), (0, 255, 0), 3)
|
||||
cv2.putText(vis_image, f"Region {i+1}", (x, y - 10),
|
||||
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
|
||||
|
||||
# 提取並保存
|
||||
signature = image[y:y+h, x:x+w]
|
||||
sig_path = OUTPUT_DIR / f"signature_{i+1}.png"
|
||||
cv2.imwrite(str(sig_path), signature)
|
||||
print(f" Region {i+1}: {w}x{h} 像素, 面積={region['area']}")
|
||||
|
||||
# 保存標註圖
|
||||
vis_path = OUTPUT_DIR / "04_detected_regions.png"
|
||||
cv2.imwrite(str(vis_path), vis_image)
|
||||
print(f"\n✅ 標註圖已保存: {vis_path}")
|
||||
|
||||
return vis_image
|
||||
|
||||
|
||||
def generate_summary(ocr_count, masked_path, regions):
|
||||
"""生成摘要報告"""
|
||||
summary = f"""
|
||||
PP-OCRv5 完整 Pipeline 測試結果
|
||||
{'=' * 60}
|
||||
|
||||
1. OCR 檢測: {ocr_count} 個文字區域
|
||||
2. 遮罩印刷文字: {masked_path}
|
||||
3. 檢測候選區域: {len(regions)} 個
|
||||
4. 提取簽名: {len(regions)} 個
|
||||
|
||||
候選區域詳情:
|
||||
{'-' * 60}
|
||||
"""
|
||||
|
||||
for i, region in enumerate(regions):
|
||||
x, y, w, h = region['box']
|
||||
area = region['area']
|
||||
summary += f"Region {i+1}: 位置({x}, {y}), 大小{w}x{h}, 面積={area}\n"
|
||||
|
||||
summary += f"\n所有結果保存在: {OUTPUT_DIR}\n"
|
||||
|
||||
return summary
|
||||
|
||||
|
||||
def main():
|
||||
print("=" * 60)
|
||||
print("PP-OCRv5 完整 Pipeline 測試")
|
||||
print("=" * 60)
|
||||
|
||||
# 準備
|
||||
setup_output_dir()
|
||||
|
||||
# 1. 獲取圖片
|
||||
print("\n1. 讀取測試圖片...")
|
||||
image = get_page_image()
|
||||
if image is None:
|
||||
return
|
||||
print(f" 圖片大小: {image.shape}")
|
||||
|
||||
# 保存原圖
|
||||
cv2.imwrite(str(OUTPUT_DIR / "00_original.png"), image)
|
||||
|
||||
# 2. OCR 檢測
|
||||
print("\n2. PP-OCRv5 檢測文字...")
|
||||
ocr_results = call_ocr_server(image)
|
||||
if ocr_results is None:
|
||||
print("❌ OCR 失敗,終止測試")
|
||||
return
|
||||
|
||||
# 3. 遮罩印刷文字
|
||||
print("\n3. 遮罩印刷文字...")
|
||||
masked_image = mask_printed_text(image, ocr_results)
|
||||
|
||||
# 4. 檢測候選區域
|
||||
print("\n4. 檢測候選區域...")
|
||||
regions = detect_regions(masked_image)
|
||||
|
||||
# 5. 合併鄰近區域
|
||||
print("\n5. 合併鄰近區域...")
|
||||
merged_regions = merge_nearby_regions(regions)
|
||||
|
||||
# 6. 提取簽名
|
||||
print("\n6. 提取簽名...")
|
||||
vis_image = extract_signatures(image, merged_regions)
|
||||
|
||||
# 7. 生成摘要
|
||||
print("\n7. 生成摘要報告...")
|
||||
summary = generate_summary(len(ocr_results), OUTPUT_DIR / "01_masked.png", merged_regions)
|
||||
print(summary)
|
||||
|
||||
# 保存摘要
|
||||
summary_path = OUTPUT_DIR / "SUMMARY.txt"
|
||||
with open(summary_path, 'w', encoding='utf-8') as f:
|
||||
f.write(summary)
|
||||
|
||||
print("=" * 60)
|
||||
print("✅ 測試完成!")
|
||||
print(f"結果目錄: {OUTPUT_DIR}")
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
181
visualize_v5_results.py
Normal file
181
visualize_v5_results.py
Normal file
@@ -0,0 +1,181 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
可視化 PP-OCRv5 的檢測結果
|
||||
"""
|
||||
|
||||
import json
|
||||
import cv2
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
|
||||
def load_results():
|
||||
"""加載 v5 檢測結果"""
|
||||
result_file = "/Volumes/NV2/pdf_recognize/test_results/v5_result.json"
|
||||
with open(result_file, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
return data['res']
|
||||
|
||||
def draw_detections(image_path, results, output_path):
|
||||
"""在圖片上繪製檢測框和文字"""
|
||||
# 讀取圖片
|
||||
img = cv2.imread(image_path)
|
||||
if img is None:
|
||||
print(f"❌ 無法讀取圖片: {image_path}")
|
||||
return None
|
||||
|
||||
# 創建副本用於繪製
|
||||
vis_img = img.copy()
|
||||
|
||||
# 獲取檢測結果
|
||||
rec_texts = results.get('rec_texts', [])
|
||||
rec_boxes = results.get('rec_boxes', [])
|
||||
rec_scores = results.get('rec_scores', [])
|
||||
|
||||
print(f"\n檢測到 {len(rec_texts)} 個文字區域")
|
||||
|
||||
# 繪製每個檢測框
|
||||
for i, (text, box, score) in enumerate(zip(rec_texts, rec_boxes, rec_scores)):
|
||||
x_min, y_min, x_max, y_max = box
|
||||
|
||||
# 繪製矩形框(綠色)
|
||||
cv2.rectangle(vis_img, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
|
||||
|
||||
# 繪製索引號(小字)
|
||||
cv2.putText(vis_img, f"{i}", (x_min, y_min - 5),
|
||||
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
|
||||
|
||||
# 保存結果
|
||||
cv2.imwrite(output_path, vis_img)
|
||||
print(f"✅ 可視化結果已保存: {output_path}")
|
||||
|
||||
return vis_img
|
||||
|
||||
def generate_text_report(results):
|
||||
"""生成文字報告"""
|
||||
rec_texts = results.get('rec_texts', [])
|
||||
rec_scores = results.get('rec_scores', [])
|
||||
rec_boxes = results.get('rec_boxes', [])
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print("PP-OCRv5 檢測結果報告")
|
||||
print("=" * 80)
|
||||
|
||||
print(f"\n總共檢測到: {len(rec_texts)} 個文字區域")
|
||||
print(f"平均置信度: {np.mean(rec_scores):.4f}")
|
||||
print(f"最高置信度: {np.max(rec_scores):.4f}")
|
||||
print(f"最低置信度: {np.min(rec_scores):.4f}")
|
||||
|
||||
# 分類統計
|
||||
high_conf = sum(1 for s in rec_scores if s >= 0.95)
|
||||
medium_conf = sum(1 for s in rec_scores if 0.8 <= s < 0.95)
|
||||
low_conf = sum(1 for s in rec_scores if s < 0.8)
|
||||
|
||||
print(f"\n置信度分布:")
|
||||
print(f" 高 (≥0.95): {high_conf} 個 ({high_conf/len(rec_scores)*100:.1f}%)")
|
||||
print(f" 中 (0.8-0.95): {medium_conf} 個 ({medium_conf/len(rec_scores)*100:.1f}%)")
|
||||
print(f" 低 (<0.8): {low_conf} 個 ({low_conf/len(rec_scores)*100:.1f}%)")
|
||||
|
||||
# 顯示前 20 個檢測結果
|
||||
print("\n前 20 個檢測結果:")
|
||||
print("-" * 80)
|
||||
for i in range(min(20, len(rec_texts))):
|
||||
text = rec_texts[i]
|
||||
score = rec_scores[i]
|
||||
box = rec_boxes[i]
|
||||
|
||||
# 計算框的大小
|
||||
width = box[2] - box[0]
|
||||
height = box[3] - box[1]
|
||||
|
||||
print(f"[{i:2d}] 置信度: {score:.4f} 大小: {width:4d}x{height:3d} 文字: {text}")
|
||||
|
||||
if len(rec_texts) > 20:
|
||||
print(f"\n... 還有 {len(rec_texts) - 20} 個結果(省略)")
|
||||
|
||||
# 尋找可能的手寫區域(低置信度 或 大字)
|
||||
print("\n" + "=" * 80)
|
||||
print("可能的手寫區域分析")
|
||||
print("=" * 80)
|
||||
|
||||
potential_handwriting = []
|
||||
for i, (text, score, box) in enumerate(zip(rec_texts, rec_scores, rec_boxes)):
|
||||
width = box[2] - box[0]
|
||||
height = box[3] - box[1]
|
||||
|
||||
# 判斷條件:
|
||||
# 1. 高度較大 (>50px)
|
||||
# 2. 或置信度較低 (<0.9)
|
||||
# 3. 或文字較短但字體大
|
||||
is_large = height > 50
|
||||
is_low_conf = score < 0.9
|
||||
is_short_text = len(text) <= 3 and height > 40
|
||||
|
||||
if is_large or is_low_conf or is_short_text:
|
||||
potential_handwriting.append({
|
||||
'index': i,
|
||||
'text': text,
|
||||
'score': score,
|
||||
'height': height,
|
||||
'width': width,
|
||||
'reason': []
|
||||
})
|
||||
|
||||
if is_large:
|
||||
potential_handwriting[-1]['reason'].append('大字')
|
||||
if is_low_conf:
|
||||
potential_handwriting[-1]['reason'].append('低置信度')
|
||||
if is_short_text:
|
||||
potential_handwriting[-1]['reason'].append('短文大字')
|
||||
|
||||
if potential_handwriting:
|
||||
print(f"\n找到 {len(potential_handwriting)} 個可能的手寫區域:")
|
||||
print("-" * 80)
|
||||
for item in potential_handwriting[:15]: # 只顯示前 15 個
|
||||
reasons = ', '.join(item['reason'])
|
||||
print(f"[{item['index']:2d}] {item['height']:3d}px {item['score']:.4f} ({reasons}) {item['text']}")
|
||||
else:
|
||||
print("未找到明顯的手寫特徵區域")
|
||||
|
||||
# 保存詳細報告到文件
|
||||
report_path = "/Volumes/NV2/pdf_recognize/test_results/v5_analysis_report.txt"
|
||||
with open(report_path, 'w', encoding='utf-8') as f:
|
||||
f.write(f"PP-OCRv5 檢測結果詳細報告\n")
|
||||
f.write("=" * 80 + "\n\n")
|
||||
f.write(f"總數: {len(rec_texts)}\n")
|
||||
f.write(f"平均置信度: {np.mean(rec_scores):.4f}\n\n")
|
||||
f.write("完整檢測列表:\n")
|
||||
f.write("-" * 80 + "\n")
|
||||
for i, (text, score, box) in enumerate(zip(rec_texts, rec_scores, rec_boxes)):
|
||||
width = box[2] - box[0]
|
||||
height = box[3] - box[1]
|
||||
f.write(f"[{i:2d}] {score:.4f} {width:4d}x{height:3d} {text}\n")
|
||||
|
||||
print(f"\n詳細報告已保存: {report_path}")
|
||||
|
||||
def main():
|
||||
# 加載結果
|
||||
print("加載 PP-OCRv5 檢測結果...")
|
||||
results = load_results()
|
||||
|
||||
# 生成文字報告
|
||||
generate_text_report(results)
|
||||
|
||||
# 可視化
|
||||
print("\n" + "=" * 80)
|
||||
print("生成可視化圖片")
|
||||
print("=" * 80)
|
||||
|
||||
image_path = "/Volumes/NV2/pdf_recognize/full_page_original.png"
|
||||
output_path = "/Volumes/NV2/pdf_recognize/test_results/v5_visualization.png"
|
||||
|
||||
if Path(image_path).exists():
|
||||
draw_detections(image_path, results, output_path)
|
||||
else:
|
||||
print(f"⚠️ 原始圖片不存在: {image_path}")
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print("分析完成")
|
||||
print("=" * 80)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user