# Taiwan TWSE CPA Signature Authentication ## What This Is A computer-vision research pipeline that classifies whether the CPA signatures appearing on Taiwan TWSE-listed-company financial reports are hand-signed (親簽) or non-hand-signed (非親簽 — early-period rubber-stamp / scan, or post-2020 firm-level electronic signature systems). The pipeline ingests ~90k PDFs (2013-2023), detects ~182k signatures with YOLOv11n, embeds them with ResNet-50 (ImageNet1K_V2, no fine-tune), and characterises distributional structure with cosine + independent dHash descriptors. Target: a peer-reviewed publication (IEEE Access, A/6 on the NCKU CSIE journal list). ## Core Value A statistically defensible, **reproducible** thresholding methodology that distinguishes hand-signed from digitally-replicated CPA signatures at the population level, with traceable evidence at every step (DB → script → table → paper claim). ## Requirements ### Validated - ✓ End-to-end pipeline (TWSE MOPS scrape → Qwen2.5-VL prefilter → YOLO detection → ResNet embedding → DB + descriptors) — `signature_analysis/01-19` - ✓ Independent dHash descriptor for replication detection — Script 14 (v3.x baseline) - ✓ Accountant-level 3-component GMM characterisation — Script 18/20 (v3.x baseline) - ✓ Paper A v3.20.0 manuscript (full-dataset framing, partner Jimmy 2026-04-27 substantive review accepted, codex 3-pass verification clean) — commit `53125d1` on `yolo-signature-pipeline` - ✓ Spike scripts 32-35 confirming Big-4-only scope is methodologically superior — commits `e1d81e3`, `8ac0988`, `55f9f94` on `paper-a-v4-big4` ### Active **Milestone: Paper A v4.0 — Big-4 reframe (primary scope) + full-dataset robustness (secondary)** - [ ] Foundation: rerun core scripts on Big-4 subset with `--scope=big4` flag (`/scripts 19, 20, 21, 24, 25`) - [ ] Methodology rewrite: §III-G/I/J/L re-anchored on dip-test confirmed bimodality and bootstrap-stable Big-4 K=2 GMM (cos=0.975, dh=3.76) - [ ] Results tables: regenerate Tables IV-XVIII on Big-4 subset; new §IV-K full-dataset secondary - [ ] Prose rewrite: Abstract / Intro / Discussion / Conclusion with Firm A reframed as "templated end of Big-4" case study (was: hand-signed calibration anchor) - [ ] AI peer review: ≥3 cross-AI rounds (codex, Gemini 3.x Pro, Opus 4.7) on the v4.0 manuscript - [ ] Partner Jimmy second review on v4.0 (he proposed this direction; needs sign-off on execution) - [ ] iThenticate <20%, eCF copyright form, IEEE Access submission portal upload + cover letter ### Out of Scope - **Paper B (audit behaviour / policy implications)** — partner v4 contribution D, deferred to a separate paper after Paper A ships - **Paper C standalone (reverse-anchor methodology)** — initial 2026-05-12 spike direction, **folded back into Paper A v4.0 §IV-K** as one robustness lens; does not warrant a separate manuscript - **Mid/small-firm primary scope** — included as full-dataset secondary only; primary scope is Big-4 because dip-test only achieves multimodality at Big-4 level - **Per-document classifier release as software product** — paper-only deliverable; no API / SaaS layer in scope - **VLM behavioural interview / IRB study** — removed in v3.4; not coming back ## Context - **Domain**: Taiwan-listed CPA audit signatures, 2013-2023; 4 Big-4 firms (勤業眾信 Deloitte, 安侯建業 KPMG, 資誠 PwC, 安永 EY) + ~30 mid/small firms - **Hardware split**: YOLO + ResNet on RTX 4090 (CUDA, deterministic forward inference, fixed seed); statistical analysis on Apple Silicon MPS / CPU - **Domain expert**: User has practitioner-level CPA-firm knowledge in Taiwan; recognises specific senior-partner names (e.g., 薛明玲 / 周建宏 are known PwC seniors that surfaced in Script 35's C1 cluster) - **Partner**: 與 partner Jimmy 合作;Jimmy 已提出 Big-4-only 方向,是 v4.0 的觸發者 ## Constraints - **Target journal**: IEEE Access (A/6 on NCKU CSIE list); fits Computer-Vision-applied-to-Audit scope - **Timeline**: v3.20.0 was already partner-reviewed and DOCX-shipped (2026-05-05). v4.0 reframe will delay submission by ~4-6 weeks but produces a stronger manuscript; partner Jimmy is aware and supportive - **Reproducibility**: pipeline must run end-to-end on the existing `/Volumes/NV2/PDF-Processing/signature-analysis/signature_analysis.db` snapshot; no new data ingest in scope - **AI review provenance**: every empirical claim must be backed by a fresh sqlite/grep against the named script — see `[[feedback-provenance-fabrication]]` memory; Gemini round-19 caught 4 fabricated provenance claims previously ## Key Decisions | Decision | Rationale | Outcome | |----------|-----------|---------| | Use ResNet-50 ImageNet1K_V2 without fine-tune | Reproducibility; avoid label leakage from fine-tuning on the same corpus | ✓ Validated through v3.x | | Cosine + independent dHash dual descriptor | Cosine catches semantic similarity; independent dHash catches byte-level replication | ✓ Validated | | Drop SSIM / pixel-pHash from descriptor set | Reviewer-rejected as redundant / fragile | ✓ v3.x rewrite | | Drop A2 within-year uniformity assumption | Empirically falsified by Script 27 | ✓ v3.14 | | **Reframe scope to Big-4 only as primary** | Dip-test multimodal only at Big-4 level (p<0.0001); mid/small noise distorted Paper A v3.x's published 0.945/8.10 threshold; partner Jimmy's earlier suggestion empirically confirmed by Scripts 32-35 | — Pending v4.0 | | Reverse-anchor Paper C → folded into v4.0 §IV-K | Big-4 reframe is the stronger story; reverse-anchor is one of several lenses on the same data, not a standalone paper | ✓ Decided 2026-05-12 | | Branch strategy: `paper-a-v4-big4` from `from-outside-of-firmA` from `yolo-signature-pipeline` | Spike artifacts (Scripts 32-35) stay on the spike branch; v4.0 paper work isolated on its own sub-branch; v3.20.0 preserved on yolo-signature-pipeline as fallback | ✓ Decided 2026-05-12 | --- *Last updated: 2026-05-12 after Paper A v4.0 Big-4 reframe milestone bootstrap*