diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md new file mode 100644 index 0000000..bbe3e08 --- /dev/null +++ b/.planning/PROJECT.md @@ -0,0 +1,74 @@ +# Taiwan TWSE CPA Signature Authentication + +## What This Is + +A computer-vision research pipeline that classifies whether the CPA signatures appearing on Taiwan TWSE-listed-company financial reports are hand-signed (親簽) or non-hand-signed (非親簽 — early-period rubber-stamp / scan, or post-2020 firm-level electronic signature systems). The pipeline ingests ~90k PDFs (2013-2023), detects ~182k signatures with YOLOv11n, embeds them with ResNet-50 (ImageNet1K_V2, no fine-tune), and characterises distributional structure with cosine + independent dHash descriptors. Target: a peer-reviewed publication (IEEE Access, A/6 on the NCKU CSIE journal list). + +## Core Value + +A statistically defensible, **reproducible** thresholding methodology that distinguishes hand-signed from digitally-replicated CPA signatures at the population level, with traceable evidence at every step (DB → script → table → paper claim). + +## Requirements + +### Validated + + + +- ✓ End-to-end pipeline (TWSE MOPS scrape → Qwen2.5-VL prefilter → YOLO detection → ResNet embedding → DB + descriptors) — `signature_analysis/01-19` +- ✓ Independent dHash descriptor for replication detection — Script 14 (v3.x baseline) +- ✓ Accountant-level 3-component GMM characterisation — Script 18/20 (v3.x baseline) +- ✓ Paper A v3.20.0 manuscript (full-dataset framing, partner Jimmy 2026-04-27 substantive review accepted, codex 3-pass verification clean) — commit `53125d1` on `yolo-signature-pipeline` +- ✓ Spike scripts 32-35 confirming Big-4-only scope is methodologically superior — commits `e1d81e3`, `8ac0988`, `55f9f94` on `paper-a-v4-big4` + +### Active + + + +**Milestone: Paper A v4.0 — Big-4 reframe (primary scope) + full-dataset robustness (secondary)** + +- [ ] Foundation: rerun core scripts on Big-4 subset with `--scope=big4` flag (`/scripts 19, 20, 21, 24, 25`) +- [ ] Methodology rewrite: §III-G/I/J/L re-anchored on dip-test confirmed bimodality and bootstrap-stable Big-4 K=2 GMM (cos=0.975, dh=3.76) +- [ ] Results tables: regenerate Tables IV-XVIII on Big-4 subset; new §IV-K full-dataset secondary +- [ ] Prose rewrite: Abstract / Intro / Discussion / Conclusion with Firm A reframed as "templated end of Big-4" case study (was: hand-signed calibration anchor) +- [ ] AI peer review: ≥3 cross-AI rounds (codex, Gemini 3.x Pro, Opus 4.7) on the v4.0 manuscript +- [ ] Partner Jimmy second review on v4.0 (he proposed this direction; needs sign-off on execution) +- [ ] iThenticate <20%, eCF copyright form, IEEE Access submission portal upload + cover letter + +### Out of Scope + + + +- **Paper B (audit behaviour / policy implications)** — partner v4 contribution D, deferred to a separate paper after Paper A ships +- **Paper C standalone (reverse-anchor methodology)** — initial 2026-05-12 spike direction, **folded back into Paper A v4.0 §IV-K** as one robustness lens; does not warrant a separate manuscript +- **Mid/small-firm primary scope** — included as full-dataset secondary only; primary scope is Big-4 because dip-test only achieves multimodality at Big-4 level +- **Per-document classifier release as software product** — paper-only deliverable; no API / SaaS layer in scope +- **VLM behavioural interview / IRB study** — removed in v3.4; not coming back + +## Context + +- **Domain**: Taiwan-listed CPA audit signatures, 2013-2023; 4 Big-4 firms (勤業眾信 Deloitte, 安侯建業 KPMG, 資誠 PwC, 安永 EY) + ~30 mid/small firms +- **Hardware split**: YOLO + ResNet on RTX 4090 (CUDA, deterministic forward inference, fixed seed); statistical analysis on Apple Silicon MPS / CPU +- **Domain expert**: User has practitioner-level CPA-firm knowledge in Taiwan; recognises specific senior-partner names (e.g., 薛明玲 / 周建宏 are known PwC seniors that surfaced in Script 35's C1 cluster) +- **Partner**: 與 partner Jimmy 合作;Jimmy 已提出 Big-4-only 方向,是 v4.0 的觸發者 + +## Constraints + +- **Target journal**: IEEE Access (A/6 on NCKU CSIE list); fits Computer-Vision-applied-to-Audit scope +- **Timeline**: v3.20.0 was already partner-reviewed and DOCX-shipped (2026-05-05). v4.0 reframe will delay submission by ~4-6 weeks but produces a stronger manuscript; partner Jimmy is aware and supportive +- **Reproducibility**: pipeline must run end-to-end on the existing `/Volumes/NV2/PDF-Processing/signature-analysis/signature_analysis.db` snapshot; no new data ingest in scope +- **AI review provenance**: every empirical claim must be backed by a fresh sqlite/grep against the named script — see `[[feedback-provenance-fabrication]]` memory; Gemini round-19 caught 4 fabricated provenance claims previously + +## Key Decisions + +| Decision | Rationale | Outcome | +|----------|-----------|---------| +| Use ResNet-50 ImageNet1K_V2 without fine-tune | Reproducibility; avoid label leakage from fine-tuning on the same corpus | ✓ Validated through v3.x | +| Cosine + independent dHash dual descriptor | Cosine catches semantic similarity; independent dHash catches byte-level replication | ✓ Validated | +| Drop SSIM / pixel-pHash from descriptor set | Reviewer-rejected as redundant / fragile | ✓ v3.x rewrite | +| Drop A2 within-year uniformity assumption | Empirically falsified by Script 27 | ✓ v3.14 | +| **Reframe scope to Big-4 only as primary** | Dip-test multimodal only at Big-4 level (p<0.0001); mid/small noise distorted Paper A v3.x's published 0.945/8.10 threshold; partner Jimmy's earlier suggestion empirically confirmed by Scripts 32-35 | — Pending v4.0 | +| Reverse-anchor Paper C → folded into v4.0 §IV-K | Big-4 reframe is the stronger story; reverse-anchor is one of several lenses on the same data, not a standalone paper | ✓ Decided 2026-05-12 | +| Branch strategy: `paper-a-v4-big4` from `from-outside-of-firmA` from `yolo-signature-pipeline` | Spike artifacts (Scripts 32-35) stay on the spike branch; v4.0 paper work isolated on its own sub-branch; v3.20.0 preserved on yolo-signature-pipeline as fallback | ✓ Decided 2026-05-12 | + +--- +*Last updated: 2026-05-12 after Paper A v4.0 Big-4 reframe milestone bootstrap* diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md new file mode 100644 index 0000000..4d447cb --- /dev/null +++ b/.planning/REQUIREMENTS.md @@ -0,0 +1,85 @@ +# Requirements — Paper A v4.0 (Big-4 reframe) + +Milestone: Paper A v4.0 IEEE Access submission with Big-4-only primary scope and full-dataset secondary robustness. + +## REQ-001: Big-4-only primary scope (foundation) + +**What**: All primary statistical analysis (KDE+dip, BD/McCrary, Beta mixture, 2D-GMM K=2/K=3, pixel-identity FAR, held-out 70/30 z-test, classifier sensitivity) is rerun on the 437-CPA Big-4 subset (Firm A + KPMG + PwC + EY, n_signatures ≥ 10). + +**Acceptance**: +- Script 20 rerun on Big-4 subset, dip-test p < 0.05 on cos_mean and dh_mean +- Script 21 (held-out validation) rerun on Big-4 subset +- Script 24 (calibration vs held-out z-test, classifier sensitivity) rerun on Big-4 subset +- Script 19 (pixel-identity / FAR) rerun on Big-4 subset +- All rerun outputs land under `reports/v4_big4/` +- New operational threshold cos > 0.975 AND dh ≤ 3.76 (or refined K=2 posterior) documented with bootstrap 95% CI + +## REQ-002: Full-dataset robustness as secondary section + +**What**: §IV-K (new) reports the full-dataset (686 CPA) version of the same analyses as a robustness check, demonstrating the pipeline runs at multiple scopes and explaining why the published v3.x 0.945 threshold drifted (mid/small-firm tail heterogeneity). + +**Acceptance**: +- §IV-K table comparing Big-4-only vs full-dataset crossings, with mid/small-firm contribution analysis +- Explicit explanation of why Big-4 is the methodologically privileged primary scope + +## REQ-003: Methodology rewrite (§III-G / I / J / L) + +**What**: Sections III-G (unit hierarchy / scope), III-I (threshold estimators), III-J (accountant-level GMM), III-L (per-document classifier rule) rewritten to reflect dip-test confirmed bimodality and the new K=2-derived classifier rule. + +**Acceptance**: +- §III-G justifies Big-4 as the methodological unit (sample size, homogeneity, dip-test evidence) +- §III-I anchored on bootstrap-stable bimodal evidence rather than three-method convergence on unimodal data +- §III-J reports K=2 as primary (interpretable: replicated vs hand-leaning) with K=3 BIC slightly preferred (-1112 vs -1108) as secondary +- §III-L derives operational rule from Big-4 K=2 components and bootstrap CI + +## REQ-004: Results tables IV-XVIII regenerated + +**What**: All results tables in §IV (currently Tables IV through XVIII at v3.20.0) regenerated on the Big-4 subset with consistent formatting and footnote citation to source script. + +**Acceptance**: +- Each table cites the script + DB query that generated it +- Big-4 numbers replace full-dataset numbers as primary; full-dataset relegated to §IV-K +- Figures 1-4 regenerated; Fig 4 (yearly per-firm) likely reusable as-is + +## REQ-005: Firm A reframed as templated case study + +**What**: Throughout the manuscript, Firm A's role pivots from "calibration anchor (with minority hand-signers)" to "case study of the templated end of Big-4 (0% in K=3 hand-sign-leaning cluster, 82.5% in replicated cluster)". PwC's higher hand-sign tradition (24/102 = 23.5% in C1) noted as a Big-4 internal contrast. + +**Acceptance**: +- Discussion (§V) explicitly states Firm A is the most digitally-replicated of Big-4 +- Cross-tab table (firm × cluster) included in either §IV or §V +- Conclusion's contributions list updated accordingly + +## REQ-006: AI peer review (≥3 rounds) + +**What**: At least three cross-AI peer-review rounds on the v4.0 manuscript using codex (GPT-5.x), Gemini 3.x Pro, and Opus 4.7 max effort. Per `[[feedback-ai-review-provenance]]` memory: every reviewer-flagged empirical claim must be provenance-verified against fresh sqlite/grep against the named script. + +**Acceptance**: +- Round 1 verdict obtained from each of the three reviewers +- All Major-class findings either RESOLVED in revision or explicitly disclaimed +- Final round produces ≥1 Accept / Minor verdict from at least 2 of 3 reviewers + +## REQ-007: Partner Jimmy second review on v4.0 + +**What**: Jimmy (who proposed Big-4-only direction) reviews the v4.0 manuscript end-to-end before submission. + +**Acceptance**: +- v4.0 DOCX shipped to ~/Downloads +- Jimmy's response captured in repo (paper/partner_jimmy_v4_review.md) +- Any must-fix items resolved in v4.0.x + +## REQ-008: iThenticate + eCF + submission + +**What**: iThenticate similarity check below 20%, IEEE eCF copyright form completed, manuscript uploaded via IEEE Access submission portal with cover letter. + +**Acceptance**: +- iThenticate report saved under `paper/ithenticate_v4.pdf` +- eCF confirmation captured +- Submission portal confirmation number recorded in PROJECT.md "Validated" section + +## Cross-cutting constraints + +- **Reproducibility**: every script accepts a `--scope big4|full` flag (or new scripts under `signature_analysis/v4_*` if a flag refactor is too invasive) +- **Provenance**: every numeric claim in the paper traces to (script_id, DB query, output file) — see `[[feedback-provenance-fabrication]]` +- **No data re-ingest**: existing `/Volumes/NV2/PDF-Processing/signature-analysis/signature_analysis.db` is the frozen snapshot +- **Branch isolation**: all v4.0 work on `paper-a-v4-big4`; do NOT merge back to `yolo-signature-pipeline` until v4.0 is partner-approved diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md new file mode 100644 index 0000000..aae473c --- /dev/null +++ b/.planning/ROADMAP.md @@ -0,0 +1,87 @@ +# Roadmap — Paper A v4.0 Big-4 reframe + +Milestone goal: Ship Paper A v4.0 to IEEE Access with Big-4-only primary scope, dip-test confirmed bimodality, and full-dataset robustness as secondary. + +Branch: `paper-a-v4-big4` (from `from-outside-of-firmA` from `yolo-signature-pipeline` at v3.20.0). + +## Phase 1 — Foundation: Big-4 subset script reruns +**Status**: pending +**Requirements covered**: REQ-001 +**Tasks**: +- Add `--scope=big4|full` flag to scripts 19, 20, 21, 24, 25 (and harness any others that load accountant aggregates) +- Rerun on Big-4 subset; outputs to `reports/v4_big4/` +- Bootstrap 95% CI on K=2 marginal crossings (extend Script 34's bootstrap to other measures) +- Confirm dip-test p < 0.05 on Big-4 cos_mean and dh_mean (Script 34 already verified at p<0.0001 — replicate inside the rerun harness for audit trail) + +**Done when**: All five scripts produce v4_big4 outputs with bootstrap CI; cross-check against Script 34 numbers. + +## Phase 2 — Methodology rewrite (§III-G / I / J / L) +**Status**: pending; depends on Phase 1 +**Requirements covered**: REQ-003 +**Tasks**: +- §III-G: re-justify accountant-level Big-4 as the analysis unit (sample size, dip-test evidence, contrast with mid/small heterogeneity) +- §III-I: re-anchor "natural threshold" claim on dip-test multimodality + bootstrap stability +- §III-J: K=2 primary (replicated 31% / hand-leaning 69%) + K=3 secondary (BIC -1111.93 vs -1108.45) +- §III-L: derive cos>0.975 AND dh≤3.76 (or K=2 posterior cut) from §III-J components + +**Done when**: §III markdown files updated; cross-references to Phase 1 outputs are correct. + +## Phase 3 — Results regeneration (§IV Tables IV-XVIII + §IV-K) +**Status**: pending; depends on Phase 1 and 2 +**Requirements covered**: REQ-001 (tables), REQ-002 (§IV-K), REQ-004 +**Tasks**: +- Regenerate Tables IV through XVIII on Big-4 subset (relabel as v4 numbering if order shifts) +- Regenerate Figures 1-3 (Fig 4 yearly per-firm likely reusable) +- New §IV-K Full-Dataset Robustness section: comparison table (Big-4 vs full), mid/small-firm contribution, why scope matters +- Add firm × cluster cross-tab table from Script 35 + +**Done when**: All §IV tables and figures land in repo; cross-refs from §III hold. + +## Phase 4 — Prose rewrite (Abstract / I / II / V / VI) +**Status**: pending; depends on Phase 3 +**Requirements covered**: REQ-005 +**Tasks**: +- Abstract: new threshold, new scope, retain the "reproducible pipeline" frame +- §I Introduction: contributions list updated (Firm A reframe, Big-4 internal contrast finding, dip-test natural threshold) +- §II Related Work: minimal changes (statistical methodology citations stable) +- §V Discussion: Firm A as templated case study, PwC as hand-sign-leading firm, what this implies +- §VI Conclusion + Future Work: forecast Paper B (audit behaviour / policy) + +**Done when**: All prose markdown files updated; word counts within IEEE Access limits (Abstract ≤ 250 words). + +## Phase 5 — AI peer review (3 rounds across codex, Gemini, Opus) +**Status**: pending; depends on Phase 4 (manuscript-complete state) +**Requirements covered**: REQ-006 +**Tasks**: +- Round 1: codex (GPT-5.x) — full manuscript review with provenance verification +- Round 1: Gemini 3.x Pro — full manuscript review +- Round 1: Opus 4.7 max-effort — full manuscript review +- Round 2: address Major findings; same three reviewers cross-check +- Round 3: convergence — Accept / Minor from at least 2 of 3 reviewers + +**Done when**: Final round produces Accept/Minor consensus from majority; reviewer artifacts saved under `paper/`. + +## Phase 6 — Partner Jimmy v4.0 review +**Status**: pending; depends on Phase 5 +**Requirements covered**: REQ-007 +**Tasks**: +- Export v4.0 DOCX (`paper/export_v3.py` + author block fill) +- Ship to ~/Downloads +- Iterate on Jimmy's comments +- Capture review artifact in `paper/partner_jimmy_v4_review.md` + +**Done when**: Jimmy approves v4.0. + +## Phase 7 — iThenticate + eCF + IEEE Access submission +**Status**: pending; depends on Phase 6 +**Requirements covered**: REQ-008 +**Tasks**: +- Run iThenticate, target similarity < 20% +- Complete IEEE eCF +- Upload manuscript + cover letter via IEEE Access submission portal +- Capture confirmation number + +**Done when**: Submission confirmed by IEEE Access portal. + +--- +*Phase ordering: 1 → 2 → 3 → 4 → 5 → 6 → 7 (mostly linear; Phase 5 round-2 may loop back to Phase 4 prose if Major findings).* diff --git a/.planning/STATE.md b/.planning/STATE.md new file mode 100644 index 0000000..70da4b1 --- /dev/null +++ b/.planning/STATE.md @@ -0,0 +1,37 @@ +# STATE — Current snapshot + +**Date**: 2026-05-12 +**Active milestone**: Paper A v4.0 — Big-4 reframe +**Active branch**: `paper-a-v4-big4` (3 commits ahead of `yolo-signature-pipeline`) +**Active phase**: Phase 1 — Foundation: Big-4 subset script reruns (not yet started) + +## Recently completed (preceding this milestone) + +- Paper A v3.20.0 shipped to partner Jimmy 2026-04-27, DOCX `~/Downloads/Paper_A_IEEE_Access_Draft_v3.20.0_20260505.docx` +- Spike Scripts 32-35 (commits `e1d81e3` `8ac0988` `55f9f94`) confirming Big-4-only scope is methodologically superior: + - Script 32: non-Firm-A calibration verdict C (negative, but with the bifurcation twist) + - Script 33: reverse-anchor PAPER_C_STRONG (rho=+0.744 directional / -0.927 bifurcation) + - Script 34: Big-4-only K=2 with dip-test multimodal p<0.0001, bootstrap CI [0.974, 0.977] / [3.48, 3.97] + - Script 35: firm × cluster cross-tab — Firm A 0% C1 / 82.5% C3, PwC 23.5% C1 + +## Pending — Phase 1 entry + +- [ ] Refactor scripts 19, 20, 21, 24, 25 to accept `--scope=big4|full` flag +- [ ] Define `reports/v4_big4/` output convention +- [ ] Decide whether to retire Script 32-35 spikes or keep as historical artifacts (recommend: keep, treated as "v4.0 origin evidence") + +## Blockers + +None. + +## Open questions deferred from spike + +- Bootstrap stability of cosine and dHash crossings *jointly* (not just marginally) — addressed in Phase 1 if time permits +- K=2 vs K=3 final choice for §III-J — both reported, but operational classifier needs to commit to one (recommend K=2 for interpretability; K=3 in supplementary) + +## Things to remember (per memory) + +- Provenance-verify all empirical claims against fresh sqlite/grep ([[feedback-provenance-fabrication]]) +- Don't mock the DB or use placeholders — every number must trace to a script + query +- Partner Jimmy already proposed Big-4 direction (this is execution, not pitching a new direction) +- Paper C standalone is shelved — folded into v4.0 §IV-K