pdf_signature_extraction/paper/paper_a_outline.md

# Paper A: IEEE TAI Outline (Draft)

> **Target:** IEEE Transactions on Artificial Intelligence (Regular Paper, ≤10 pages)
> **Review:** Double-blind
> **Status:** Outline — 待討論確認後再展開各 section

---

## Title (候選)

1. "Automated Detection of Digitally Replicated Signatures in Large-Scale Financial Audit Reports"
2. "Are They Really Signing? A Deep Learning Pipeline for Detecting Signature Replication in 90K Audit Reports"
3. "Large-Scale Forensic Analysis of CPA Signature Authenticity Using Deep Features and Perceptual Hashing"

> 建議用 1 或 3，學術正式感較強。2 比較 catchy 但 TAI 可能偏保守。

---

## Abstract (150-250 words)

**要素：**
- Problem: 審計報告要求親簽，但實務上可能用數位複製（套印）
- Gap: 目前無大規模自動化偵測方法
- Method: VLM pre-screening → YOLO detection → ResNet-50 feature extraction → Cosine + pHash verification
- Scale: 90,282 PDFs, 182,328 signatures, 758 CPAs, 2013-2023
- Key finding: 以已知套印事務所作為校準，建立 distribution-free threshold
- Contribution: first large-scale study, end-to-end pipeline, empirical threshold validation

---

## Impact Statement (100-150 words)

**方向（非專業人士看得懂）：**

審計報告上的會計師簽名是財務報告可信度的重要保障。若簽名並非每次親簽，而是數位複製貼上，將影響審計品質與投資人保護。本研究開發了一套自動化 AI pipeline，分析了超過 9 萬份、橫跨 10 年的台灣上市公司審計報告，從中提取並比對 18 萬個簽名。透過深度學習特徵與感知雜湊的交叉驗證，我們能區分「風格一致的親簽」與「數位複製的套印」。研究發現部分會計事務所的簽名呈現統計上不可能由手寫產生的一致性。本方法可直接應用於金融監理機構的自動化稽核系統。

> 注意：投稿時寫英文版，這裡先用中文定調內容方向。

---

## I. Introduction (~1.5 pages)

### 段落結構：

**P1 — Problem context**
- 審計報告簽名的法律意義（台灣法規要求親簽）
- 數位化後的漏洞：PDF 報告中的簽名容易被複製貼上
- 監理機構無法逐份人工檢查

**P2 — Why this matters (motivation)**
- 審計品質 → 投資人保護 → 資本市場信任
- 簽名真偽是審計獨立性的 proxy indicator
- [REF: 審計品質相關文獻]

**P3 — What exists (gap)**
- 現有簽名驗證研究集中在 forgery detection（偽造偵測）
- 我們的問題不同：不是問「是不是本人簽的」，而是「是不是每次都親簽」
- Replication detection ≠ Forgery detection
- 無大規模、真實財報的相關研究

**P4 — What we do (contribution)**
- End-to-end pipeline: VLM → YOLO → ResNet → Cosine + pHash
- Scale: 90K+ documents, 180K+ signatures, 10 years
- Distribution-free threshold with known-replication calibration group
- First study applying AI to audit signature authenticity at this scale

**P5 — Paper organization**
- 一句話帶過各 section

### Contribution list (明確列出):
1. **Pipeline**: 完整的端到端自動化簽名真偽偵測系統
2. **Scale**: 迄今最大規模的審計報告簽名分析（90K PDFs, 180K signatures）
3. **Methodology**: 結合深度特徵（Cosine）與感知雜湊（pHash）的雙層驗證，解決「風格一致 vs 數位複製」的區分問題
4. **Calibration**: 利用已知套印事務所作為 ground truth 校準，建立 distribution-free 閾值

---

## II. Related Work (~1 page)

### A. Offline Signature Verification
- Siamese networks: Bromley et al. 1993, Dey et al. 2017 (SigNet)
- CNN-based: Hadjadj et al. 2020 (single known sample)
- Triplet Siamese: Mathematics 2024
- Consensus threshold: arXiv:2401.03085
- **定位差異**: 這些都是 forgery detection（驗真偽），我們是 replication detection（驗套印）

### B. Document Forensics & Copy-Move Detection
- Copy-move forgery detection survey (MTAP 2024)
- Image forensics in scanned documents
- **定位差異**: 通常針對圖片竄改，非針對簽名重複使用

### C. VLM & Object Detection in Document Analysis
- Vision-Language Models for document understanding
- YOLO variants in document element detection
- **定位差異**: 我們用 VLM + YOLO 作為 pipeline 前端，非核心貢獻但需說明

### D. Perceptual Hashing for Image Comparison
- pHash in near-duplicate detection
- 與 deep features 的互補性

---

## III. Methodology (~3 pages)

> 從 methodology_draft_v1.md 精簡，聚焦在核心方法，省略實作細節

### A. Pipeline Overview
- Figure 1: 全流程圖（精簡版）
- 各階段一句話描述

### B. Data Collection
- 90,282 PDFs from TWSE MOPS, 2013-2023
- Table I: Dataset summary（精簡版）
- CPA registry matching

### C. Signature Detection
- VLM pre-screening (Qwen2.5-VL): hit-and-stop strategy, 86,072 docs
- YOLOv11n: 500 annotated → mAP50=0.99 → 182,328 signatures
- Red stamp removal post-processing
- **省略**: VLM prompt 全文、annotation protocol 細節、validation 細節 → 放 footnote 或略提

### D. Feature Extraction
- ResNet-50 (ImageNet1K_V2), no fine-tuning, 2048-dim, L2 normalized
- Why no fine-tuning: similarity task, not classification; generalizability
- CPA matching: 92.6% success rate

### E. Dual-Method Verification (核心)
- **Cosine similarity**: captures style-level similarity (high-level)
- **pHash distance**: captures perceptual-level similarity (structural)
- 為什麼這個組合：
  - Cosine 高 + pHash 低距離 = 強證據（數位複製）
  - Cosine 高 + pHash 高距離 = 風格一致但非複製（親簽）
  - 互補性解決了單一指標的歧義
- **SSIM 為何排除**: 掃描雜訊敏感，已知套印的 SSIM 僅 0.70（footnote 帶過）

### F. Threshold Selection
- Distribution-free approach（非常態 → 百分位數）
- KDE crossover = 0.838
- Intra/Inter class distributions（Table + Figure）
- **Calibration via known-replication firm**（key contribution）:
  - Deloitte Taiwan: domain knowledge 確認全部套印
  - Cosine mean = 0.980, 1st percentile = 0.908
  - pHash ≤5: 58.75%
  - 用作閾值校準的 anchor point

> 注意雙盲：不能寫 "Deloitte"，改用 "Firm A (a Big-4 firm known to use digital replication)"

---

## IV. Experiments and Results (~2.5 pages)

### A. Experimental Setup
- Hardware/software environment
- Evaluation metrics 定義

### B. Signature Detection Performance
- Table: YOLO metrics (Precision, Recall, mAP)
- VLM-YOLO agreement rate: 98.8%

### C. Distribution Analysis
- Figure: Intra vs Inter cosine similarity distributions
- Figure: pHash distance distributions (intra vs inter)
- Table: Distributional statistics
- Normality tests → justify percentile-based thresholds

### D. Calibration Group Analysis (重點)
- "Firm A" (已知套印) 的 Cosine/pHash 分布
- vs 非四大的分布比較
- KDE crossover (Firm A vs non-Big-4) = 0.969
- Figure: Firm A distribution vs overall distribution
- **這是最有說服力的 section**

### E. Classification Results
- Table: Overall verdict distribution (definite_copy / likely_copy / uncertain / genuine)
- Cross-method agreement analysis
- **Key finding**: Cosine-high ≠ pixel-identical
  - 71,656 PDFs with Cosine > 0.95
  - 只有 3.4% 同時 SSIM > 0.95
  - 只有 0.4% pixel-identical

### F. Ablation Study (新增，增強 AI 貢獻)
- **Feature backbone comparison**: ResNet-50 vs VGG-16 vs EfficientNet-B0
  - 比較 intra/inter class separation (Cohen's d)
  - 計算量 vs 判別力 trade-off
- **Single method vs dual method**:
  - Cosine only vs pHash only vs Cosine + pHash
  - 用 Firm A 作為 positive set，計算 precision/recall
- **Threshold sensitivity**:
  - 不同 cosine threshold 下的分類結果變化
  - ROC-like curve（以 Firm A 為 positive）

---

## V. Discussion (~1 page)

### A. Replication vs Forgery: A Distinction That Matters
- 我們的問題本質上更簡單也更直接
- 不需要考慮仿冒者的存在
- Physical impossibility argument: 同一人每次親簽不可能像素相同

### B. The Gap Between Style Similarity and Digital Replication
- 81.4% likely_copy (Cosine) vs 2.8% definite_copy (pixel-level)
- 解讀：多數 CPA 簽名風格高度一致，但非數位複製
- 可能原因：使用簽名板、固定簽署環境
- **Policy implication**: 僅靠 Cosine 會嚴重高估套印率

### C. The Value of a Known-Replication Calibration Group
- 有 ground truth anchor 對閾值校準的重要性
- 可推廣到其他 document forensics 問題

### D. Limitations
- 精簡版 limitations（3-4 點）
- No labeled ground truth for full dataset
- Feature extractor not fine-tuned
- Scan quality variation over 10 years
- Regulatory/legal definition of "replication" varies

---

## VI. Conclusion and Future Work (~0.5 page)

### Conclusion
- 總結 pipeline、規模、key findings
- 強調 dual-method 的必要性（Cosine alone 不夠）
- Calibration group 的方法論貢獻

### Future Work
- Fine-tuned signature-specific feature extractor
- Temporal analysis (year-over-year trends)
- Cross-country generalization
- Integration with regulatory monitoring systems
- Small-scale ground truth validation (100-200 PDFs)

---

## Figures & Tables Budget (10 頁限制下的分配)

| # | Type | Content | Est. space |
|---|------|---------|------------|
| Fig 1 | Pipeline | 全流程圖 | 1/3 page |
| Fig 2 | Distribution | Intra vs Inter cosine KDE | 1/3 page |
| Fig 3 | Distribution | pHash distance intra vs inter | 1/4 page |
| Fig 4 | Calibration | Firm A vs overall distribution | 1/3 page |
| Fig 5 | Ablation | Backbone comparison / threshold sensitivity | 1/3 page |
| Table I | Data | Dataset summary | 1/4 page |
| Table II | Detection | YOLO performance | 1/6 page |
| Table III | Statistics | Distribution stats + tests | 1/4 page |
| Table IV | Results | Classification verdicts | 1/4 page |
| Table V | Ablation | Feature backbone comparison | 1/4 page |

**Total figures/tables**: ~3 pages → Text: ~7 pages → Feasible for 10-page limit

---

## 待辦 Checklist

### 需要新增的分析（Ablation Study）
- [ ] ResNet-50 vs VGG-16 vs EfficientNet-B0 feature comparison
- [ ] Single method vs dual method precision/recall (with Firm A as positive set)
- [ ] Threshold sensitivity curve

### 需要整理的圖表
- [ ] Fig 1: Pipeline diagram (clean vector version)
- [ ] Fig 4: Firm A calibration distribution (新圖)
- [ ] Fig 5: Ablation results (新圖)
- [ ] 所有圖表英文化

### 寫作
- [ ] Impact Statement (英文版)
- [ ] Abstract (英文版)
- [ ] Introduction
- [ ] Related Work — 需要補充文獻搜索
- [ ] Methodology (從 v1 精簡)
- [ ] Results (新寫)
- [ ] Discussion (新寫)
- [ ] Conclusion

### 投稿準備
- [ ] 匿名化（Deloitte → Firm A，移除所有可辨識資訊）
- [ ] IEEE LaTeX template
- [ ] Reference 格式化（IEEE numbered style）
- [ ] 相似度指數 < 20%