Hand-written minimal GSD scaffolding (PROJECT.md / REQUIREMENTS.md /
ROADMAP.md / STATE.md) without running /gsd-ingest-docs because:
* 51 pre-existing markdown files exceed the v1 50-doc cap and most
are stale (older review rounds, infrastructure notes) or already
captured in auto-memory project_signature_research.md
* Heavyweight ingest workflow not needed when project context is
already comprehensive
PROJECT.md captures the Big-4 reframe key decision and the locked
v3.x history; REQUIREMENTS.md defines REQ-001..008 for v4.0;
ROADMAP.md lays out 7 phases (Foundation -> Methodology -> Results
-> Prose -> AI peer review -> Partner re-review -> Submission);
STATE.md anchors at Phase 1 entry on branch paper-a-v4-big4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.1 KiB
Taiwan TWSE CPA Signature Authentication
What This Is
A computer-vision research pipeline that classifies whether the CPA signatures appearing on Taiwan TWSE-listed-company financial reports are hand-signed (親簽) or non-hand-signed (非親簽 — early-period rubber-stamp / scan, or post-2020 firm-level electronic signature systems). The pipeline ingests ~90k PDFs (2013-2023), detects ~182k signatures with YOLOv11n, embeds them with ResNet-50 (ImageNet1K_V2, no fine-tune), and characterises distributional structure with cosine + independent dHash descriptors. Target: a peer-reviewed publication (IEEE Access, A/6 on the NCKU CSIE journal list).
Core Value
A statistically defensible, reproducible thresholding methodology that distinguishes hand-signed from digitally-replicated CPA signatures at the population level, with traceable evidence at every step (DB → script → table → paper claim).
Requirements
Validated
- ✓ End-to-end pipeline (TWSE MOPS scrape → Qwen2.5-VL prefilter → YOLO detection → ResNet embedding → DB + descriptors) —
signature_analysis/01-19 - ✓ Independent dHash descriptor for replication detection — Script 14 (v3.x baseline)
- ✓ Accountant-level 3-component GMM characterisation — Script 18/20 (v3.x baseline)
- ✓ Paper A v3.20.0 manuscript (full-dataset framing, partner Jimmy 2026-04-27 substantive review accepted, codex 3-pass verification clean) — commit
53125d1onyolo-signature-pipeline - ✓ Spike scripts 32-35 confirming Big-4-only scope is methodologically superior — commits
e1d81e3,8ac0988,55f9f94onpaper-a-v4-big4
Active
Milestone: Paper A v4.0 — Big-4 reframe (primary scope) + full-dataset robustness (secondary)
- Foundation: rerun core scripts on Big-4 subset with
--scope=big4flag (/scripts 19, 20, 21, 24, 25) - Methodology rewrite: §III-G/I/J/L re-anchored on dip-test confirmed bimodality and bootstrap-stable Big-4 K=2 GMM (cos=0.975, dh=3.76)
- Results tables: regenerate Tables IV-XVIII on Big-4 subset; new §IV-K full-dataset secondary
- Prose rewrite: Abstract / Intro / Discussion / Conclusion with Firm A reframed as "templated end of Big-4" case study (was: hand-signed calibration anchor)
- AI peer review: ≥3 cross-AI rounds (codex, Gemini 3.x Pro, Opus 4.7) on the v4.0 manuscript
- Partner Jimmy second review on v4.0 (he proposed this direction; needs sign-off on execution)
- iThenticate <20%, eCF copyright form, IEEE Access submission portal upload + cover letter
Out of Scope
- Paper B (audit behaviour / policy implications) — partner v4 contribution D, deferred to a separate paper after Paper A ships
- Paper C standalone (reverse-anchor methodology) — initial 2026-05-12 spike direction, folded back into Paper A v4.0 §IV-K as one robustness lens; does not warrant a separate manuscript
- Mid/small-firm primary scope — included as full-dataset secondary only; primary scope is Big-4 because dip-test only achieves multimodality at Big-4 level
- Per-document classifier release as software product — paper-only deliverable; no API / SaaS layer in scope
- VLM behavioural interview / IRB study — removed in v3.4; not coming back
Context
- Domain: Taiwan-listed CPA audit signatures, 2013-2023; 4 Big-4 firms (勤業眾信 Deloitte, 安侯建業 KPMG, 資誠 PwC, 安永 EY) + ~30 mid/small firms
- Hardware split: YOLO + ResNet on RTX 4090 (CUDA, deterministic forward inference, fixed seed); statistical analysis on Apple Silicon MPS / CPU
- Domain expert: User has practitioner-level CPA-firm knowledge in Taiwan; recognises specific senior-partner names (e.g., 薛明玲 / 周建宏 are known PwC seniors that surfaced in Script 35's C1 cluster)
- Partner: 與 partner Jimmy 合作;Jimmy 已提出 Big-4-only 方向,是 v4.0 的觸發者
Constraints
- Target journal: IEEE Access (A/6 on NCKU CSIE list); fits Computer-Vision-applied-to-Audit scope
- Timeline: v3.20.0 was already partner-reviewed and DOCX-shipped (2026-05-05). v4.0 reframe will delay submission by ~4-6 weeks but produces a stronger manuscript; partner Jimmy is aware and supportive
- Reproducibility: pipeline must run end-to-end on the existing
/Volumes/NV2/PDF-Processing/signature-analysis/signature_analysis.dbsnapshot; no new data ingest in scope - AI review provenance: every empirical claim must be backed by a fresh sqlite/grep against the named script — see
[[feedback-provenance-fabrication]]memory; Gemini round-19 caught 4 fabricated provenance claims previously
Key Decisions
| Decision | Rationale | Outcome |
|---|---|---|
| Use ResNet-50 ImageNet1K_V2 without fine-tune | Reproducibility; avoid label leakage from fine-tuning on the same corpus | ✓ Validated through v3.x |
| Cosine + independent dHash dual descriptor | Cosine catches semantic similarity; independent dHash catches byte-level replication | ✓ Validated |
| Drop SSIM / pixel-pHash from descriptor set | Reviewer-rejected as redundant / fragile | ✓ v3.x rewrite |
| Drop A2 within-year uniformity assumption | Empirically falsified by Script 27 | ✓ v3.14 |
| Reframe scope to Big-4 only as primary | Dip-test multimodal only at Big-4 level (p<0.0001); mid/small noise distorted Paper A v3.x's published 0.945/8.10 threshold; partner Jimmy's earlier suggestion empirically confirmed by Scripts 32-35 | — Pending v4.0 |
| Reverse-anchor Paper C → folded into v4.0 §IV-K | Big-4 reframe is the stronger story; reverse-anchor is one of several lenses on the same data, not a standalone paper | ✓ Decided 2026-05-12 |
Branch strategy: paper-a-v4-big4 from from-outside-of-firmA from yolo-signature-pipeline |
Spike artifacts (Scripts 32-35) stay on the spike branch; v4.0 paper work isolated on its own sub-branch; v3.20.0 preserved on yolo-signature-pipeline as fallback | ✓ Decided 2026-05-12 |
Last updated: 2026-05-12 after Paper A v4.0 Big-4 reframe milestone bootstrap