Update STATE.md: Phase 5 closed; Phase 6 ready to begin

Phase 5 AI peer review convergence achieved 2026-05-14 with 3/3 reviewers in Accept/Minor band: - Gemini round-2: Accept (splice-ready as-is) - Opus round-2: Minor Revision (N1-N4 → closed in round-4) - codex round-9: Minor Revision (N1/N2 provenance → closed in round-5) Fix-round commits archived: b884d39 (round-2), 4a6f9c5 (round-3), d3ddf74 (round-4), 128a914 (round-5). Reviewer artifacts archived at paper/codex_review_gpt55_v4_round{7,8,9}.md, paper/gemini_review_ v4_round{1,2}.md, paper/opus_review_v4_round{1,2}.md. Phase 6 tasks documented: partner-framing confirmation (reject "statistically insignificant"), manuscript-splice assembly with internal-note strips, DOCX export, partner Jimmy review. Phase 7 tasks documented: iThenticate, IEEE eCF, submission. Lessons added to memory cross-references: codex round-9's DB-verification caught a "majority firm" inference that turned out to be 1:1 ties (round-5 corrected); codex's read-only jitter rerun exposed an unreproducible non-Big-4 range (round-5 replaced with codex-verified range). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:09:33 +08:00
parent 128a91433f
commit e9357c903b
1 changed files with 55 additions and 39 deletions
@@ -2,63 +2,79 @@

 **Date**: 2026-05-14
 **Active milestone**: Paper A v4.0 — Big-4 reframe
-**Active branch**: `paper-a-v4-big4` (33 commits ahead of `master`; pushed to `origin/paper-a-v4-big4` at `980295d`)
-**Active phase**: Phase 5 — AI peer review (codex round 7 closed at Minor Revision; Gemini + Opus rounds + round-2/3 convergence still pending)
+**Active branch**: `paper-a-v4-big4` (41 commits ahead of `master`; fully pushed to `origin/paper-a-v4-big4` at `128a914`)
+**Active phase**: **Phase 5 — AI peer review COMPLETE; Phase 6 ready to begin**

-## Recently completed
+## Phase 5 closure summary (2026-05-14)

-**Phase 1 (Foundation, 9 spike scripts + 6 follow-on scripts)**:
- Script 32 (`e1d81e3`): non-Firm-A calibration verdict C
- Script 33 (`8ac0988`): reverse-anchor PAPER_C_STRONG (directional ρ=+0.744)
- Script 34 (`55f9f94`): Big-4 K=2 dip-test multimodal p<0.0001, bootstrap CI [0.974, 0.977] / [3.48, 3.97]
- Script 35 (`55f9f94`): firm × cluster — Firm A 0% C1 / 82.5% C3, PwC 23.5% C1
- Script 36 (`ccd9f23`): K=2 LOOO **UNSTABLE** (firm-mass conflation; max Δcos=0.028)
- Script 37 (`92f1db8`): K=3 LOOO **PARTIAL** (component shape stable, membership ±5-13pp)
- Script 38 (`bc36dcc`): convergence **STRONG** — 3 lenses pairwise ρ ≥ 0.879
- Script 39 (`39575ce`): per-signature convergence **MODERATE** — κ=0.87 between per-CPA and per-sig K=3 fits
- Script 40 (`338737d`): pixel-identity FAR = **0%** on n=262 ground-truth replicated
- Scripts 39b/c/d/e + 40b + 43 (`d4f370b`): anchor-based FAR diagnostics; composition decomposition proves Big-4 dh "multimodality" = between-firm shift + integer ties (p_median=0.35 under joint correction)
- Scripts 44 + 45 (`4cf21a6`): firm-matched-pool logistic regression; full 5-way doc FAR — Firms B/C/D OR 0.05/0.01/0.03 vs Firm A reference after pool-size adjustment
- Script 46 (`2f05d6f`): alert-rate sensitivity / threshold-plateau analysis (HC threshold locally sensitive; MC/HSC dHash=15 boundary plateau-like)
- Script 41 (`9392f30`): §IV-K full-dataset robustness comparison (Light)
- Script 42 (`453f1d8`): Phase 3 close-out support
+**Convergence achieved**: 3/3 reviewers in Accept/Minor band across the round-2 cross-check.

-**Phase 2 (Methodology rewrite)**: §III v7 delivered at `paper/v4/paper_a_methodology_v4_section_iii.md` (`723a3f6`). Anchor-based ICCR framework + composition-decomposition finding; covers §III-G through §III-M.
+| Reviewer | Final round | Verdict |
+|---|---|---|
+| Gemini 3.1 Pro | round 2 | **Accept** (Phase 5 splice-ready as-is) |
+| Opus 4.7 | round 2 | Minor Revision (4 substantive findings → closed in round-4) |
+| codex GPT-5.5 | round 9 | Minor Revision (2 provenance findings → closed in round-5) |

-**Phase 3 (Results regeneration)**: §IV v3.3 delivered at `paper/v4/paper_a_results_v4_section_iv.md` (`980295d`). 12 sub-sections A–L + §IV-M added; Table XV filled; Big-4 reframe + light §IV-K full-dataset robustness; round-23/24/25/27 codex corrections applied; §IV-D/E framing softened; §IV-I renamed.
+**Original Phase 5 gate met**: Accept/Minor consensus from ≥2 of 3 reviewers. No empirical reruns required.

-**Phase 4 (Prose rewrite)**: Abstract + §I + §II (LOOO addition) + §V + §VI delivered at `paper/v4/paper_a_prose_v4_phase4.md`. Prose v3 (`b33e20d`) rewritten to match §III v7. Abstract trimmed to 243 words (`918d551`). Round-25/26/27 codex corrections applied.
+**Phase 5 fix rounds applied** (commits on this branch):
+1. `9604b27` — codex round-7 closeout copy-edit (candidate classifiers → candidate checks; refs [42]-[44] added; §II placeholder caveat removed; STATE.md refresh)
+2. `b884d39` — round-2 fixes (Opus M1: §IV K=3 mechanism-label reversion; M2: Table XV-B → XIX + cascade XIX → XX … XXV → XXVI; M3: "98-100%" within-firm semantic conflation; M4: duplicate §V-G heading; Gemini Table XV sample-size footnote)
+3. `4a6f9c5` — round-3 fixes (codex round-8 splice blockers: abstract trim 261 → 247 words; §IV-J Table XV footnote §IV-M.5 reclassification; §IV-I "§IV-M Table XVI" → "§IV-M Tables XXI-XXVI"; binary-collapse terminology cleanup)
+4. `d3ddf74` — round-4 fixes (Opus round-2 N1: Firm C 19,501 vs 19,122 denominator footnote; N2: composition-decomposition added as Table XXVII row 1; N3: Table XXVII numbered; N4: cross-firm hit matrix assumption disclosure)
+5. `128a914` — round-5 provenance patches (codex round-9 factual corrections: N1 "majority firm" → "1:1 tie-break to first-sorted firm" via Script 45 `np.argmax`; N2 row narrowed to Big-4-only evidence; non-Big-4 jittered-dHash range $[0.71, 1.00]$ → codex-verified $[0.38, 1.00]$ with read-only-spike provenance)

-**v4 methodological pivot** (recorded for context): Distributional path to thresholds (K=3 / dip / antimode) shown composition-driven and abandoned; replaced by anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units (per-comparison / per-signature / per-document). "FAR" terminology replaced by "ICCR" throughout. K=3 demoted to descriptive firm-compositional partition. Positioning: specificity-proxy-anchored screening framework with human-in-the-loop review, **not** a validated forensic detector. Codex round-32 verdict: SOUND_WITH_QUALIFICATIONS, IEEE Access-publishable.
+**Reviewer artifacts archived** (paper/):
+- `codex_review_gpt55_v4_round{7,8,9}.md`
+- `gemini_review_v4_round{1,2}.md`
+- `opus_review_v4_round{1,2}.md`

-## Phase 5 — AI peer review (in progress)
+## Phase 5 substantive findings catalogue

-**Codex (GPT-5.5) rounds 1–7 complete** on the v4 drafts. Round-7 verdict (2026-05-12): **Minor Revision**. All round-26 Major findings closed substantively; remaining blockers were packaging/copy-edit only.
+**v4 methodological pivot** (unchanged through all reviewer rounds):
+- Distributional path to thresholds (K=3 / dip / antimode) abandoned; anchor-based ICCR calibration at 3 units adopted
+- "FAR" → "ICCR" throughout; inter-CPA-as-negative assumption disclosed as partially violated by within-firm template sharing
+- K=3 demoted to descriptive firm-compositional partition (§III-J line 90 retires "hand-leaning / mixed / replicated" mechanism labels)
+- Positioning: anchor-calibrated specificity-only screening framework with human-in-the-loop review; NOT a validated forensic detector

-**Mechanical copy-edit pass applied 2026-05-14** (uncommitted on `paper-a-v4-big4`):
- Replaced residual "candidate classifiers" wording with "candidate checks" in §V-G prose (line 107) and §III-K methodology (line 149 + table heading)
- Added references [42] Stone 1974, [43] Geisser 1975, [44] Vehtari et al. 2017 to `paper/paper_a_references_v3.md` (now 44 entries)
- Removed §II placeholder caveat sentence at Phase 4 prose line 67
+**Empirical anchors** (all provenance-verified across reviewer panel):
+- Three feature-derived scores converge Spearman $\rho \geq 0.879$ (internal consistency; not external validation)
+- Anchor-based ICCRs: per-comparison $0.0006/0.0013/0.00014$; per-signature $0.11$; per-document $0.34$
+- Firm heterogeneity decisive: Firm A per-doc HC+MC alarm $0.62$ vs Firms B/C/D $0.09$–$0.16$; logistic OR $0.05/0.01/0.03$ relative to Firm A reference
+- Within-firm collision concentration under deployed any-pair rule: Firm A $98.8\%$ vs Firms B/C/D $76.7$–$83.7\%$; same-pair joint event saturates at $97.0$–$99.96\%$ within-firm at all four firms

-**Pending Phase 5 work**:
-1. Gemini 3.x Pro full-manuscript review on v4 drafts (round 1)
-2. Opus 4.7 max-effort full-manuscript review on v4 drafts (round 1)
-3. Round-2 cross-check across the three reviewers (address any new Major findings)
-4. Round-3 convergence — Accept/Minor consensus from ≥2 of 3 reviewers
-5. Manuscript-splice copy-edit (strip internal draft notes at line 3 of all three v4 files + Phase 4 close-out checklist at lines 153–162 + §IV close-out checklist at line 365+; update stale abstract-count note)
+## Phase 6 — Partner Jimmy v4.0 review (READY TO BEGIN)

-## Pending — partner-side decision still open
+**Pre-Phase-6 partner alignment** (2026-05-13 still open): partner asked whether firm heterogeneity could be framed as "statistically insignificant." **Decision: no** — heterogeneity is highly significant (40–62σ in logistic regression; all three AI reviewers independently confirmed the decisive framing). Confirm framing with partner before exporting DOCX.

-2026-05-13 partner alignment: partner wants high-specificity threshold and asked if firm heterogeneity could be framed as "statistically insignificant." Decision: **no** — heterogeneity is highly significant (40–62σ in logistic regression). v4 frames it explicitly as decisive firm heterogeneity. Confirm framing with partner before Phase 6.
+**Phase 6 tasks**:
+1. Confirm "statistically insignificant" framing rejection with partner
+2. Manuscript-splice assembly:
+   - Splice §III-G–§III-M (paper_a_methodology_v4_section_iii.md) onto v3.20.0 §III-A–§III-F into master `paper/paper_a_methodology_v3.md` body
+   - Splice §IV v3.3 (paper_a_results_v4_section_iv.md) into master `paper/paper_a_results_v3.md`
+   - Splice Phase 4 prose (Abstract / §I / §II / §V / §VI) into the master manuscript file
+   - **Strip internal-only blocks** during splice: Phase 4 line 3 draft note + lines 153-162 close-out checklist; §III line 3 + lines 434-447 cross-reference checklist + open-questions block; §IV line 3 + close-out checklist at line 365+
+   - Re-verify table numbering after splice (Table XXVII currently lives in §III between §IV-M.6's Table XXVI; confirm order in final master file)
+3. Export v4.0 DOCX via `paper/export_v3.py` (with author block fill)
+4. Ship to ~/Downloads
+5. Iterate on Jimmy's review comments
+6. Capture review artifact in `paper/partner_jimmy_v4_review.md`
+
+## Phase 7 — IEEE Access submission (pending Phase 6)
+
+1. iThenticate similarity check (target < 20%)
+2. IEEE eCF form
+3. Upload manuscript + cover letter via IEEE Access submission portal
+4. Capture confirmation number

 ## Blockers

-None on the critical path. Phase 5 substantive empirical work is complete; remaining tasks are reviewer rounds + manuscript-splice copy-edit.
+None. Phase 5 closed; Phase 6 ready to begin pending partner-framing confirmation.

 ## Things to remember (per memory)

 - Inter-CPA "FAR" is NOT true FAR; it's a coincidence rate (ICCR) under an assumption violated by within-firm template sharing — never write "FAR" or "specificity" without the disclaimer ([[feedback-inter-cpa-negative-anchor-assumption]])
 - Dip test on Big-4 dh is composition + integer artefact, not mechanism — §III-I.1 "dip justifies finite mixture" framing must NOT be used; K=3 is descriptive of firm composition ([[feedback-dip-test-composition-artifact]])
- Provenance-verify all empirical claims against fresh sqlite/grep ([[feedback-provenance-fabrication]])
- AI peer reviewers have accepted fabricated claims in the past; verify numbers against scripts, not against reviewer agreement ([[feedback-ai-review-provenance]])
+- Provenance-verify all empirical claims against fresh sqlite/grep ([[feedback-provenance-fabrication]]) — codex round-9's DB-verification caught a "majority firm" inference in round-4 that turned out to be 1:1 ties resolved by `np.argmax` tie-break; round-5 corrected it
+- AI peer reviewers have accepted fabricated claims in the past; verify numbers against scripts, not against reviewer agreement ([[feedback-ai-review-provenance]]) — codex round-9's read-only rerun of the non-Big-4 jittered procedure exposed an unreproducible $[0.71, 1.00]$ range that round-5 corrected to $[0.38, 1.00]$
 - Paper C standalone is shelved — folded into v4.0 §IV-K (Light full-dataset robustness)