Files
pdf_signature_extraction/.planning/STATE.md
T
gbanyan 9604b273c0 Apply codex round-7 Phase 5 copy-edit fixes + refresh STATE.md
Mechanical copy-edit closing the OPEN/PARTIAL items from
paper/codex_review_gpt55_v4_round7.md; substantive empirical
content unchanged. Manuscript-splice items (strip internal draft
notes, update stale abstract-count note) deferred to final splice.

- Phase 4 prose §V-G + §III-K methodology: "candidate classifiers"
  -> "candidate checks" (closes round-7 m13 + Spot-check 3 wording leak)
- Phase 4 prose §II: remove placeholder caveat sentence at the LOOO
  paragraph (closes round-7 M6 + A4)
- References v3: add [42] Stone 1974, [43] Geisser 1975, [44] Vehtari
  et al. 2017 (44 entries; was 41) — backs the §II LOOO addition
- Round-7 review: add row-count clarification note (11 Major / 15
  Minor labelled rows vs. the prompt's 9/12 tally)
- STATE.md: refresh from stale Phase-2 snapshot to current Phase 5
  status — Phases 1-4 complete; codex rounds 1-7 closed at Minor
  Revision; pending Gemini + Opus rounds + round-2/3 convergence

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:21:59 +08:00

65 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# STATE — Current snapshot
**Date**: 2026-05-14
**Active milestone**: Paper A v4.0 — Big-4 reframe
**Active branch**: `paper-a-v4-big4` (33 commits ahead of `master`; pushed to `origin/paper-a-v4-big4` at `980295d`)
**Active phase**: Phase 5 — AI peer review (codex round 7 closed at Minor Revision; Gemini + Opus rounds + round-2/3 convergence still pending)
## Recently completed
**Phase 1 (Foundation, 9 spike scripts + 6 follow-on scripts)**:
- Script 32 (`e1d81e3`): non-Firm-A calibration verdict C
- Script 33 (`8ac0988`): reverse-anchor PAPER_C_STRONG (directional ρ=+0.744)
- Script 34 (`55f9f94`): Big-4 K=2 dip-test multimodal p<0.0001, bootstrap CI [0.974, 0.977] / [3.48, 3.97]
- Script 35 (`55f9f94`): firm × cluster — Firm A 0% C1 / 82.5% C3, PwC 23.5% C1
- Script 36 (`ccd9f23`): K=2 LOOO **UNSTABLE** (firm-mass conflation; max Δcos=0.028)
- Script 37 (`92f1db8`): K=3 LOOO **PARTIAL** (component shape stable, membership ±5-13pp)
- Script 38 (`bc36dcc`): convergence **STRONG** — 3 lenses pairwise ρ ≥ 0.879
- Script 39 (`39575ce`): per-signature convergence **MODERATE** — κ=0.87 between per-CPA and per-sig K=3 fits
- Script 40 (`338737d`): pixel-identity FAR = **0%** on n=262 ground-truth replicated
- Scripts 39b/c/d/e + 40b + 43 (`d4f370b`): anchor-based FAR diagnostics; composition decomposition proves Big-4 dh "multimodality" = between-firm shift + integer ties (p_median=0.35 under joint correction)
- Scripts 44 + 45 (`4cf21a6`): firm-matched-pool logistic regression; full 5-way doc FAR — Firms B/C/D OR 0.05/0.01/0.03 vs Firm A reference after pool-size adjustment
- Script 46 (`2f05d6f`): alert-rate sensitivity / threshold-plateau analysis (HC threshold locally sensitive; MC/HSC dHash=15 boundary plateau-like)
- Script 41 (`9392f30`): §IV-K full-dataset robustness comparison (Light)
- Script 42 (`453f1d8`): Phase 3 close-out support
**Phase 2 (Methodology rewrite)**: §III v7 delivered at `paper/v4/paper_a_methodology_v4_section_iii.md` (`723a3f6`). Anchor-based ICCR framework + composition-decomposition finding; covers §III-G through §III-M.
**Phase 3 (Results regeneration)**: §IV v3.3 delivered at `paper/v4/paper_a_results_v4_section_iv.md` (`980295d`). 12 sub-sections AL + §IV-M added; Table XV filled; Big-4 reframe + light §IV-K full-dataset robustness; round-23/24/25/27 codex corrections applied; §IV-D/E framing softened; §IV-I renamed.
**Phase 4 (Prose rewrite)**: Abstract + §I + §II (LOOO addition) + §V + §VI delivered at `paper/v4/paper_a_prose_v4_phase4.md`. Prose v3 (`b33e20d`) rewritten to match §III v7. Abstract trimmed to 243 words (`918d551`). Round-25/26/27 codex corrections applied.
**v4 methodological pivot** (recorded for context): Distributional path to thresholds (K=3 / dip / antimode) shown composition-driven and abandoned; replaced by anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units (per-comparison / per-signature / per-document). "FAR" terminology replaced by "ICCR" throughout. K=3 demoted to descriptive firm-compositional partition. Positioning: specificity-proxy-anchored screening framework with human-in-the-loop review, **not** a validated forensic detector. Codex round-32 verdict: SOUND_WITH_QUALIFICATIONS, IEEE Access-publishable.
## Phase 5 — AI peer review (in progress)
**Codex (GPT-5.5) rounds 17 complete** on the v4 drafts. Round-7 verdict (2026-05-12): **Minor Revision**. All round-26 Major findings closed substantively; remaining blockers were packaging/copy-edit only.
**Mechanical copy-edit pass applied 2026-05-14** (uncommitted on `paper-a-v4-big4`):
- Replaced residual "candidate classifiers" wording with "candidate checks" in §V-G prose (line 107) and §III-K methodology (line 149 + table heading)
- Added references [42] Stone 1974, [43] Geisser 1975, [44] Vehtari et al. 2017 to `paper/paper_a_references_v3.md` (now 44 entries)
- Removed §II placeholder caveat sentence at Phase 4 prose line 67
**Pending Phase 5 work**:
1. Gemini 3.x Pro full-manuscript review on v4 drafts (round 1)
2. Opus 4.7 max-effort full-manuscript review on v4 drafts (round 1)
3. Round-2 cross-check across the three reviewers (address any new Major findings)
4. Round-3 convergence — Accept/Minor consensus from ≥2 of 3 reviewers
5. Manuscript-splice copy-edit (strip internal draft notes at line 3 of all three v4 files + Phase 4 close-out checklist at lines 153162 + §IV close-out checklist at line 365+; update stale abstract-count note)
## Pending — partner-side decision still open
2026-05-13 partner alignment: partner wants high-specificity threshold and asked if firm heterogeneity could be framed as "statistically insignificant." Decision: **no** — heterogeneity is highly significant (4062σ in logistic regression). v4 frames it explicitly as decisive firm heterogeneity. Confirm framing with partner before Phase 6.
## Blockers
None on the critical path. Phase 5 substantive empirical work is complete; remaining tasks are reviewer rounds + manuscript-splice copy-edit.
## Things to remember (per memory)
- Inter-CPA "FAR" is NOT true FAR; it's a coincidence rate (ICCR) under an assumption violated by within-firm template sharing — never write "FAR" or "specificity" without the disclaimer ([[feedback-inter-cpa-negative-anchor-assumption]])
- Dip test on Big-4 dh is composition + integer artefact, not mechanism — §III-I.1 "dip justifies finite mixture" framing must NOT be used; K=3 is descriptive of firm composition ([[feedback-dip-test-composition-artifact]])
- Provenance-verify all empirical claims against fresh sqlite/grep ([[feedback-provenance-fabrication]])
- AI peer reviewers have accepted fabricated claims in the past; verify numbers against scripts, not against reviewer agreement ([[feedback-ai-review-provenance]])
- Paper C standalone is shelved — folded into v4.0 §IV-K (Light full-dataset robustness)