diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md
new file mode 100644
index 0000000..bbe3e08
--- /dev/null
+++ b/.planning/PROJECT.md
@@ -0,0 +1,74 @@
+# Taiwan TWSE CPA Signature Authentication
+
+## What This Is
+
+A computer-vision research pipeline that classifies whether the CPA signatures appearing on Taiwan TWSE-listed-company financial reports are hand-signed (親簽) or non-hand-signed (非親簽 — early-period rubber-stamp / scan, or post-2020 firm-level electronic signature systems). The pipeline ingests ~90k PDFs (2013-2023), detects ~182k signatures with YOLOv11n, embeds them with ResNet-50 (ImageNet1K_V2, no fine-tune), and characterises distributional structure with cosine + independent dHash descriptors. Target: a peer-reviewed publication (IEEE Access, A/6 on the NCKU CSIE journal list).
+
+## Core Value
+
+A statistically defensible, **reproducible** thresholding methodology that distinguishes hand-signed from digitally-replicated CPA signatures at the population level, with traceable evidence at every step (DB → script → table → paper claim).
+
+## Requirements
+
+### Validated
+
+<!-- Shipped and confirmed valuable. -->
+
+- ✓ End-to-end pipeline (TWSE MOPS scrape → Qwen2.5-VL prefilter → YOLO detection → ResNet embedding → DB + descriptors) — `signature_analysis/01-19`
+- ✓ Independent dHash descriptor for replication detection — Script 14 (v3.x baseline)
+- ✓ Accountant-level 3-component GMM characterisation — Script 18/20 (v3.x baseline)
+- ✓ Paper A v3.20.0 manuscript (full-dataset framing, partner Jimmy 2026-04-27 substantive review accepted, codex 3-pass verification clean) — commit `53125d1` on `yolo-signature-pipeline`
+- ✓ Spike scripts 32-35 confirming Big-4-only scope is methodologically superior — commits `e1d81e3`, `8ac0988`, `55f9f94` on `paper-a-v4-big4`
+
+### Active
+
+<!-- Current scope. Building toward these. -->
+
+**Milestone: Paper A v4.0 — Big-4 reframe (primary scope) + full-dataset robustness (secondary)**
+
+- [ ] Foundation: rerun core scripts on Big-4 subset with `--scope=big4` flag (`/scripts 19, 20, 21, 24, 25`)
+- [ ] Methodology rewrite: §III-G/I/J/L re-anchored on dip-test confirmed bimodality and bootstrap-stable Big-4 K=2 GMM (cos=0.975, dh=3.76)
+- [ ] Results tables: regenerate Tables IV-XVIII on Big-4 subset; new §IV-K full-dataset secondary
+- [ ] Prose rewrite: Abstract / Intro / Discussion / Conclusion with Firm A reframed as "templated end of Big-4" case study (was: hand-signed calibration anchor)
+- [ ] AI peer review: ≥3 cross-AI rounds (codex, Gemini 3.x Pro, Opus 4.7) on the v4.0 manuscript
+- [ ] Partner Jimmy second review on v4.0 (he proposed this direction; needs sign-off on execution)
+- [ ] iThenticate <20%, eCF copyright form, IEEE Access submission portal upload + cover letter
+
+### Out of Scope
+
+<!-- Explicit boundaries. Includes reasoning to prevent re-adding. -->
+
+- **Paper B (audit behaviour / policy implications)** — partner v4 contribution D, deferred to a separate paper after Paper A ships
+- **Paper C standalone (reverse-anchor methodology)** — initial 2026-05-12 spike direction, **folded back into Paper A v4.0 §IV-K** as one robustness lens; does not warrant a separate manuscript
+- **Mid/small-firm primary scope** — included as full-dataset secondary only; primary scope is Big-4 because dip-test only achieves multimodality at Big-4 level
+- **Per-document classifier release as software product** — paper-only deliverable; no API / SaaS layer in scope
+- **VLM behavioural interview / IRB study** — removed in v3.4; not coming back
+
+## Context
+
+- **Domain**: Taiwan-listed CPA audit signatures, 2013-2023; 4 Big-4 firms (勤業眾信 Deloitte, 安侯建業 KPMG, 資誠 PwC, 安永 EY) + ~30 mid/small firms
+- **Hardware split**: YOLO + ResNet on RTX 4090 (CUDA, deterministic forward inference, fixed seed); statistical analysis on Apple Silicon MPS / CPU
+- **Domain expert**: User has practitioner-level CPA-firm knowledge in Taiwan; recognises specific senior-partner names (e.g., 薛明玲 / 周建宏 are known PwC seniors that surfaced in Script 35's C1 cluster)
+- **Partner**: 與 partner Jimmy 合作；Jimmy 已提出 Big-4-only 方向，是 v4.0 的觸發者
+
+## Constraints
+
+- **Target journal**: IEEE Access (A/6 on NCKU CSIE list); fits Computer-Vision-applied-to-Audit scope
+- **Timeline**: v3.20.0 was already partner-reviewed and DOCX-shipped (2026-05-05). v4.0 reframe will delay submission by ~4-6 weeks but produces a stronger manuscript; partner Jimmy is aware and supportive
+- **Reproducibility**: pipeline must run end-to-end on the existing `/Volumes/NV2/PDF-Processing/signature-analysis/signature_analysis.db` snapshot; no new data ingest in scope
+- **AI review provenance**: every empirical claim must be backed by a fresh sqlite/grep against the named script — see `[[feedback-provenance-fabrication]]` memory; Gemini round-19 caught 4 fabricated provenance claims previously
+
+## Key Decisions
+
+| Decision | Rationale | Outcome |
+|----------|-----------|---------|
+| Use ResNet-50 ImageNet1K_V2 without fine-tune | Reproducibility; avoid label leakage from fine-tuning on the same corpus | ✓ Validated through v3.x |
+| Cosine + independent dHash dual descriptor | Cosine catches semantic similarity; independent dHash catches byte-level replication | ✓ Validated |
+| Drop SSIM / pixel-pHash from descriptor set | Reviewer-rejected as redundant / fragile | ✓ v3.x rewrite |
+| Drop A2 within-year uniformity assumption | Empirically falsified by Script 27 | ✓ v3.14 |
+| **Reframe scope to Big-4 only as primary** | Dip-test multimodal only at Big-4 level (p<0.0001); mid/small noise distorted Paper A v3.x's published 0.945/8.10 threshold; partner Jimmy's earlier suggestion empirically confirmed by Scripts 32-35 | — Pending v4.0 |
+| Reverse-anchor Paper C → folded into v4.0 §IV-K | Big-4 reframe is the stronger story; reverse-anchor is one of several lenses on the same data, not a standalone paper | ✓ Decided 2026-05-12 |
+| Branch strategy: `paper-a-v4-big4` from `from-outside-of-firmA` from `yolo-signature-pipeline` | Spike artifacts (Scripts 32-35) stay on the spike branch; v4.0 paper work isolated on its own sub-branch; v3.20.0 preserved on yolo-signature-pipeline as fallback | ✓ Decided 2026-05-12 |
+
+---
+*Last updated: 2026-05-12 after Paper A v4.0 Big-4 reframe milestone bootstrap*
diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md
new file mode 100644
index 0000000..4d447cb
--- /dev/null
+++ b/.planning/REQUIREMENTS.md
@@ -0,0 +1,85 @@
+# Requirements — Paper A v4.0 (Big-4 reframe)
+
+Milestone: Paper A v4.0 IEEE Access submission with Big-4-only primary scope and full-dataset secondary robustness.
+
+## REQ-001: Big-4-only primary scope (foundation)
+
+**What**: All primary statistical analysis (KDE+dip, BD/McCrary, Beta mixture, 2D-GMM K=2/K=3, pixel-identity FAR, held-out 70/30 z-test, classifier sensitivity) is rerun on the 437-CPA Big-4 subset (Firm A + KPMG + PwC + EY, n_signatures ≥ 10).
+
+**Acceptance**:
+- Script 20 rerun on Big-4 subset, dip-test p < 0.05 on cos_mean and dh_mean
+- Script 21 (held-out validation) rerun on Big-4 subset
+- Script 24 (calibration vs held-out z-test, classifier sensitivity) rerun on Big-4 subset
+- Script 19 (pixel-identity / FAR) rerun on Big-4 subset
+- All rerun outputs land under `reports/v4_big4/`
+- New operational threshold cos > 0.975 AND dh ≤ 3.76 (or refined K=2 posterior) documented with bootstrap 95% CI
+
+## REQ-002: Full-dataset robustness as secondary section
+
+**What**: §IV-K (new) reports the full-dataset (686 CPA) version of the same analyses as a robustness check, demonstrating the pipeline runs at multiple scopes and explaining why the published v3.x 0.945 threshold drifted (mid/small-firm tail heterogeneity).
+
+**Acceptance**:
+- §IV-K table comparing Big-4-only vs full-dataset crossings, with mid/small-firm contribution analysis
+- Explicit explanation of why Big-4 is the methodologically privileged primary scope
+
+## REQ-003: Methodology rewrite (§III-G / I / J / L)
+
+**What**: Sections III-G (unit hierarchy / scope), III-I (threshold estimators), III-J (accountant-level GMM), III-L (per-document classifier rule) rewritten to reflect dip-test confirmed bimodality and the new K=2-derived classifier rule.
+
+**Acceptance**:
+- §III-G justifies Big-4 as the methodological unit (sample size, homogeneity, dip-test evidence)
+- §III-I anchored on bootstrap-stable bimodal evidence rather than three-method convergence on unimodal data
+- §III-J reports K=2 as primary (interpretable: replicated vs hand-leaning) with K=3 BIC slightly preferred (-1112 vs -1108) as secondary
+- §III-L derives operational rule from Big-4 K=2 components and bootstrap CI
+
+## REQ-004: Results tables IV-XVIII regenerated
+
+**What**: All results tables in §IV (currently Tables IV through XVIII at v3.20.0) regenerated on the Big-4 subset with consistent formatting and footnote citation to source script.
+
+**Acceptance**:
+- Each table cites the script + DB query that generated it
+- Big-4 numbers replace full-dataset numbers as primary; full-dataset relegated to §IV-K
+- Figures 1-4 regenerated; Fig 4 (yearly per-firm) likely reusable as-is
+
+## REQ-005: Firm A reframed as templated case study
+
+**What**: Throughout the manuscript, Firm A's role pivots from "calibration anchor (with minority hand-signers)" to "case study of the templated end of Big-4 (0% in K=3 hand-sign-leaning cluster, 82.5% in replicated cluster)". PwC's higher hand-sign tradition (24/102 = 23.5% in C1) noted as a Big-4 internal contrast.
+
+**Acceptance**:
+- Discussion (§V) explicitly states Firm A is the most digitally-replicated of Big-4
+- Cross-tab table (firm × cluster) included in either §IV or §V
+- Conclusion's contributions list updated accordingly
+
+## REQ-006: AI peer review (≥3 rounds)
+
+**What**: At least three cross-AI peer-review rounds on the v4.0 manuscript using codex (GPT-5.x), Gemini 3.x Pro, and Opus 4.7 max effort. Per `[[feedback-ai-review-provenance]]` memory: every reviewer-flagged empirical claim must be provenance-verified against fresh sqlite/grep against the named script.
+
+**Acceptance**:
+- Round 1 verdict obtained from each of the three reviewers
+- All Major-class findings either RESOLVED in revision or explicitly disclaimed
+- Final round produces ≥1 Accept / Minor verdict from at least 2 of 3 reviewers
+
+## REQ-007: Partner Jimmy second review on v4.0
+
+**What**: Jimmy (who proposed Big-4-only direction) reviews the v4.0 manuscript end-to-end before submission.
+
+**Acceptance**:
+- v4.0 DOCX shipped to ~/Downloads
+- Jimmy's response captured in repo (paper/partner_jimmy_v4_review.md)
+- Any must-fix items resolved in v4.0.x
+
+## REQ-008: iThenticate + eCF + submission
+
+**What**: iThenticate similarity check below 20%, IEEE eCF copyright form completed, manuscript uploaded via IEEE Access submission portal with cover letter.
+
+**Acceptance**:
+- iThenticate report saved under `paper/ithenticate_v4.pdf`
+- eCF confirmation captured
+- Submission portal confirmation number recorded in PROJECT.md "Validated" section
+
+## Cross-cutting constraints
+
+- **Reproducibility**: every script accepts a `--scope big4|full` flag (or new scripts under `signature_analysis/v4_*` if a flag refactor is too invasive)
+- **Provenance**: every numeric claim in the paper traces to (script_id, DB query, output file) — see `[[feedback-provenance-fabrication]]`
+- **No data re-ingest**: existing `/Volumes/NV2/PDF-Processing/signature-analysis/signature_analysis.db` is the frozen snapshot
+- **Branch isolation**: all v4.0 work on `paper-a-v4-big4`; do NOT merge back to `yolo-signature-pipeline` until v4.0 is partner-approved
diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
new file mode 100644
index 0000000..aae473c
--- /dev/null
+++ b/.planning/ROADMAP.md
@@ -0,0 +1,87 @@
+# Roadmap — Paper A v4.0 Big-4 reframe
+
+Milestone goal: Ship Paper A v4.0 to IEEE Access with Big-4-only primary scope, dip-test confirmed bimodality, and full-dataset robustness as secondary.
+
+Branch: `paper-a-v4-big4` (from `from-outside-of-firmA` from `yolo-signature-pipeline` at v3.20.0).
+
+## Phase 1 — Foundation: Big-4 subset script reruns
+**Status**: pending
+**Requirements covered**: REQ-001
+**Tasks**:
+- Add `--scope=big4|full` flag to scripts 19, 20, 21, 24, 25 (and harness any others that load accountant aggregates)
+- Rerun on Big-4 subset; outputs to `reports/v4_big4/`
+- Bootstrap 95% CI on K=2 marginal crossings (extend Script 34's bootstrap to other measures)
+- Confirm dip-test p < 0.05 on Big-4 cos_mean and dh_mean (Script 34 already verified at p<0.0001 — replicate inside the rerun harness for audit trail)
+
+**Done when**: All five scripts produce v4_big4 outputs with bootstrap CI; cross-check against Script 34 numbers.
+
+## Phase 2 — Methodology rewrite (§III-G / I / J / L)
+**Status**: pending; depends on Phase 1
+**Requirements covered**: REQ-003
+**Tasks**:
+- §III-G: re-justify accountant-level Big-4 as the analysis unit (sample size, dip-test evidence, contrast with mid/small heterogeneity)
+- §III-I: re-anchor "natural threshold" claim on dip-test multimodality + bootstrap stability
+- §III-J: K=2 primary (replicated 31% / hand-leaning 69%) + K=3 secondary (BIC -1111.93 vs -1108.45)
+- §III-L: derive cos>0.975 AND dh≤3.76 (or K=2 posterior cut) from §III-J components
+
+**Done when**: §III markdown files updated; cross-references to Phase 1 outputs are correct.
+
+## Phase 3 — Results regeneration (§IV Tables IV-XVIII + §IV-K)
+**Status**: pending; depends on Phase 1 and 2
+**Requirements covered**: REQ-001 (tables), REQ-002 (§IV-K), REQ-004
+**Tasks**:
+- Regenerate Tables IV through XVIII on Big-4 subset (relabel as v4 numbering if order shifts)
+- Regenerate Figures 1-3 (Fig 4 yearly per-firm likely reusable)
+- New §IV-K Full-Dataset Robustness section: comparison table (Big-4 vs full), mid/small-firm contribution, why scope matters
+- Add firm × cluster cross-tab table from Script 35
+
+**Done when**: All §IV tables and figures land in repo; cross-refs from §III hold.
+
+## Phase 4 — Prose rewrite (Abstract / I / II / V / VI)
+**Status**: pending; depends on Phase 3
+**Requirements covered**: REQ-005
+**Tasks**:
+- Abstract: new threshold, new scope, retain the "reproducible pipeline" frame
+- §I Introduction: contributions list updated (Firm A reframe, Big-4 internal contrast finding, dip-test natural threshold)
+- §II Related Work: minimal changes (statistical methodology citations stable)
+- §V Discussion: Firm A as templated case study, PwC as hand-sign-leading firm, what this implies
+- §VI Conclusion + Future Work: forecast Paper B (audit behaviour / policy)
+
+**Done when**: All prose markdown files updated; word counts within IEEE Access limits (Abstract ≤ 250 words).
+
+## Phase 5 — AI peer review (3 rounds across codex, Gemini, Opus)
+**Status**: pending; depends on Phase 4 (manuscript-complete state)
+**Requirements covered**: REQ-006
+**Tasks**:
+- Round 1: codex (GPT-5.x) — full manuscript review with provenance verification
+- Round 1: Gemini 3.x Pro — full manuscript review
+- Round 1: Opus 4.7 max-effort — full manuscript review
+- Round 2: address Major findings; same three reviewers cross-check
+- Round 3: convergence — Accept / Minor from at least 2 of 3 reviewers
+
+**Done when**: Final round produces Accept/Minor consensus from majority; reviewer artifacts saved under `paper/`.
+
+## Phase 6 — Partner Jimmy v4.0 review
+**Status**: pending; depends on Phase 5
+**Requirements covered**: REQ-007
+**Tasks**:
+- Export v4.0 DOCX (`paper/export_v3.py` + author block fill)
+- Ship to ~/Downloads
+- Iterate on Jimmy's comments
+- Capture review artifact in `paper/partner_jimmy_v4_review.md`
+
+**Done when**: Jimmy approves v4.0.
+
+## Phase 7 — iThenticate + eCF + IEEE Access submission
+**Status**: pending; depends on Phase 6
+**Requirements covered**: REQ-008
+**Tasks**:
+- Run iThenticate, target similarity < 20%
+- Complete IEEE eCF
+- Upload manuscript + cover letter via IEEE Access submission portal
+- Capture confirmation number
+
+**Done when**: Submission confirmed by IEEE Access portal.
+
+---
+*Phase ordering: 1 → 2 → 3 → 4 → 5 → 6 → 7 (mostly linear; Phase 5 round-2 may loop back to Phase 4 prose if Major findings).*
diff --git a/.planning/STATE.md b/.planning/STATE.md
new file mode 100644
index 0000000..70da4b1
--- /dev/null
+++ b/.planning/STATE.md
@@ -0,0 +1,37 @@
+# STATE — Current snapshot
+
+**Date**: 2026-05-12
+**Active milestone**: Paper A v4.0 — Big-4 reframe
+**Active branch**: `paper-a-v4-big4` (3 commits ahead of `yolo-signature-pipeline`)
+**Active phase**: Phase 1 — Foundation: Big-4 subset script reruns (not yet started)
+
+## Recently completed (preceding this milestone)
+
+- Paper A v3.20.0 shipped to partner Jimmy 2026-04-27, DOCX `~/Downloads/Paper_A_IEEE_Access_Draft_v3.20.0_20260505.docx`
+- Spike Scripts 32-35 (commits `e1d81e3` `8ac0988` `55f9f94`) confirming Big-4-only scope is methodologically superior:
+  - Script 32: non-Firm-A calibration verdict C (negative, but with the bifurcation twist)
+  - Script 33: reverse-anchor PAPER_C_STRONG (rho=+0.744 directional / -0.927 bifurcation)
+  - Script 34: Big-4-only K=2 with dip-test multimodal p<0.0001, bootstrap CI [0.974, 0.977] / [3.48, 3.97]
+  - Script 35: firm × cluster cross-tab — Firm A 0% C1 / 82.5% C3, PwC 23.5% C1
+
+## Pending — Phase 1 entry
+
+- [ ] Refactor scripts 19, 20, 21, 24, 25 to accept `--scope=big4|full` flag
+- [ ] Define `reports/v4_big4/` output convention
+- [ ] Decide whether to retire Script 32-35 spikes or keep as historical artifacts (recommend: keep, treated as "v4.0 origin evidence")
+
+## Blockers
+
+None.
+
+## Open questions deferred from spike
+
+- Bootstrap stability of cosine and dHash crossings *jointly* (not just marginally) — addressed in Phase 1 if time permits
+- K=2 vs K=3 final choice for §III-J — both reported, but operational classifier needs to commit to one (recommend K=2 for interpretability; K=3 in supplementary)
+
+## Things to remember (per memory)
+
+- Provenance-verify all empirical claims against fresh sqlite/grep ([[feedback-provenance-fabrication]])
+- Don't mock the DB or use placeholders — every number must trace to a script + query
+- Partner Jimmy already proposed Big-4 direction (this is execution, not pitching a new direction)
+- Paper C standalone is shelved — folded into v4.0 §IV-K