Apply codex round-7 Phase 5 copy-edit fixes + refresh STATE.md
Mechanical copy-edit closing the OPEN/PARTIAL items from paper/codex_review_gpt55_v4_round7.md; substantive empirical content unchanged. Manuscript-splice items (strip internal draft notes, update stale abstract-count note) deferred to final splice. - Phase 4 prose §V-G + §III-K methodology: "candidate classifiers" -> "candidate checks" (closes round-7 m13 + Spot-check 3 wording leak) - Phase 4 prose §II: remove placeholder caveat sentence at the LOOO paragraph (closes round-7 M6 + A4) - References v3: add [42] Stone 1974, [43] Geisser 1975, [44] Vehtari et al. 2017 (44 entries; was 41) — backs the §II LOOO addition - Round-7 review: add row-count clarification note (11 Major / 15 Minor labelled rows vs. the prompt's 9/12 tally) - STATE.md: refresh from stale Phase-2 snapshot to current Phase 5 status — Phases 1-4 complete; codex rounds 1-7 closed at Minor Revision; pending Gemini + Opus rounds + round-2/3 convergence Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+37
-22
@@ -1,13 +1,13 @@
|
||||
# STATE — Current snapshot
|
||||
|
||||
**Date**: 2026-05-12
|
||||
**Date**: 2026-05-14
|
||||
**Active milestone**: Paper A v4.0 — Big-4 reframe
|
||||
**Active branch**: `paper-a-v4-big4` (12 commits ahead of `yolo-signature-pipeline`)
|
||||
**Active phase**: Phase 2 — Methodology rewrite, draft delivered, **awaiting user review of 5 open questions in `paper/v4/paper_a_methodology_v4_section_iii.md`** before Phase 3 begins
|
||||
**Active branch**: `paper-a-v4-big4` (33 commits ahead of `master`; pushed to `origin/paper-a-v4-big4` at `980295d`)
|
||||
**Active phase**: Phase 5 — AI peer review (codex round 7 closed at Minor Revision; Gemini + Opus rounds + round-2/3 convergence still pending)
|
||||
|
||||
## Recently completed
|
||||
|
||||
**Phase 1 (Foundation, 7 spike + foundation scripts)**:
|
||||
**Phase 1 (Foundation, 9 spike scripts + 6 follow-on scripts)**:
|
||||
- Script 32 (`e1d81e3`): non-Firm-A calibration verdict C
|
||||
- Script 33 (`8ac0988`): reverse-anchor PAPER_C_STRONG (directional ρ=+0.744)
|
||||
- Script 34 (`55f9f94`): Big-4 K=2 dip-test multimodal p<0.0001, bootstrap CI [0.974, 0.977] / [3.48, 3.97]
|
||||
@@ -17,33 +17,48 @@
|
||||
- Script 38 (`bc36dcc`): convergence **STRONG** — 3 lenses pairwise ρ ≥ 0.879
|
||||
- Script 39 (`39575ce`): per-signature convergence **MODERATE** — κ=0.87 between per-CPA and per-sig K=3 fits
|
||||
- Script 40 (`338737d`): pixel-identity FAR = **0%** on n=262 ground-truth replicated
|
||||
- Scripts 39b/c/d/e + 40b + 43 (`d4f370b`): anchor-based FAR diagnostics; composition decomposition proves Big-4 dh "multimodality" = between-firm shift + integer ties (p_median=0.35 under joint correction)
|
||||
- Scripts 44 + 45 (`4cf21a6`): firm-matched-pool logistic regression; full 5-way doc FAR — Firms B/C/D OR 0.05/0.01/0.03 vs Firm A reference after pool-size adjustment
|
||||
- Script 46 (`2f05d6f`): alert-rate sensitivity / threshold-plateau analysis (HC threshold locally sensitive; MC/HSC dHash=15 boundary plateau-like)
|
||||
- Script 41 (`9392f30`): §IV-K full-dataset robustness comparison (Light)
|
||||
- Script 42 (`453f1d8`): Phase 3 close-out support
|
||||
|
||||
**Phase 2 (Methodology rewrite)**: §III-G..L draft delivered at `paper/v4/paper_a_methodology_v4_section_iii.md` (commit on the same branch). Single coherent rewrite covering 6 sub-sections (G/H/I/J/K/L); cross-references to all 9 spike scripts; 5 open questions flagged at end of draft for user decision.
|
||||
**Phase 2 (Methodology rewrite)**: §III v7 delivered at `paper/v4/paper_a_methodology_v4_section_iii.md` (`723a3f6`). Anchor-based ICCR framework + composition-decomposition finding; covers §III-G through §III-M.
|
||||
|
||||
## Pending — Phase 2 user review (BEFORE Phase 3)
|
||||
**Phase 3 (Results regeneration)**: §IV v3.3 delivered at `paper/v4/paper_a_results_v4_section_iv.md` (`980295d`). 12 sub-sections A–L + §IV-M added; Table XV filled; Big-4 reframe + light §IV-K full-dataset robustness; round-23/24/25/27 codex corrections applied; §IV-D/E framing softened; §IV-I renamed.
|
||||
|
||||
5 decisions needed from user before Phase 3 (Results regeneration) starts:
|
||||
**Phase 4 (Prose rewrite)**: Abstract + §I + §II (LOOO addition) + §V + §VI delivered at `paper/v4/paper_a_prose_v4_phase4.md`. Prose v3 (`b33e20d`) rewritten to match §III v7. Abstract trimmed to 243 words (`918d551`). Round-25/26/27 codex corrections applied.
|
||||
|
||||
1. §III-G scope justification — three-point argument enough, or add a fourth?
|
||||
2. §III-H Firm A phrasing — "case study of templated end" vs an alternative framing?
|
||||
3. §III-J K=3 vs K=2 selection — lean on LOOO (current draft) or strengthen BIC argument?
|
||||
4. §III-L hybrid classifier — keep inherited 5-way box rule, or commit to K=3 hard label as primary?
|
||||
5. Section IV table numbering scheme — confirm before Phase 3 builds tables.
|
||||
**v4 methodological pivot** (recorded for context): Distributional path to thresholds (K=3 / dip / antimode) shown composition-driven and abandoned; replaced by anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units (per-comparison / per-signature / per-document). "FAR" terminology replaced by "ICCR" throughout. K=3 demoted to descriptive firm-compositional partition. Positioning: specificity-proxy-anchored screening framework with human-in-the-loop review, **not** a validated forensic detector. Codex round-32 verdict: SOUND_WITH_QUALIFICATIONS, IEEE Access-publishable.
|
||||
|
||||
Plus: any prose-level edits the user wants on the §III draft.
|
||||
## Phase 5 — AI peer review (in progress)
|
||||
|
||||
**Codex (GPT-5.5) rounds 1–7 complete** on the v4 drafts. Round-7 verdict (2026-05-12): **Minor Revision**. All round-26 Major findings closed substantively; remaining blockers were packaging/copy-edit only.
|
||||
|
||||
**Mechanical copy-edit pass applied 2026-05-14** (uncommitted on `paper-a-v4-big4`):
|
||||
- Replaced residual "candidate classifiers" wording with "candidate checks" in §V-G prose (line 107) and §III-K methodology (line 149 + table heading)
|
||||
- Added references [42] Stone 1974, [43] Geisser 1975, [44] Vehtari et al. 2017 to `paper/paper_a_references_v3.md` (now 44 entries)
|
||||
- Removed §II placeholder caveat sentence at Phase 4 prose line 67
|
||||
|
||||
**Pending Phase 5 work**:
|
||||
1. Gemini 3.x Pro full-manuscript review on v4 drafts (round 1)
|
||||
2. Opus 4.7 max-effort full-manuscript review on v4 drafts (round 1)
|
||||
3. Round-2 cross-check across the three reviewers (address any new Major findings)
|
||||
4. Round-3 convergence — Accept/Minor consensus from ≥2 of 3 reviewers
|
||||
5. Manuscript-splice copy-edit (strip internal draft notes at line 3 of all three v4 files + Phase 4 close-out checklist at lines 153–162 + §IV close-out checklist at line 365+; update stale abstract-count note)
|
||||
|
||||
## Pending — partner-side decision still open
|
||||
|
||||
2026-05-13 partner alignment: partner wants high-specificity threshold and asked if firm heterogeneity could be framed as "statistically insignificant." Decision: **no** — heterogeneity is highly significant (40–62σ in logistic regression). v4 frames it explicitly as decisive firm heterogeneity. Confirm framing with partner before Phase 6.
|
||||
|
||||
## Blockers
|
||||
|
||||
None.
|
||||
|
||||
## Open questions deferred from spike
|
||||
|
||||
- Bootstrap stability of cosine and dHash crossings *jointly* (not just marginally) — addressed in Phase 1 if time permits
|
||||
- K=2 vs K=3 final choice for §III-J — both reported, but operational classifier needs to commit to one (recommend K=2 for interpretability; K=3 in supplementary)
|
||||
None on the critical path. Phase 5 substantive empirical work is complete; remaining tasks are reviewer rounds + manuscript-splice copy-edit.
|
||||
|
||||
## Things to remember (per memory)
|
||||
|
||||
- Inter-CPA "FAR" is NOT true FAR; it's a coincidence rate (ICCR) under an assumption violated by within-firm template sharing — never write "FAR" or "specificity" without the disclaimer ([[feedback-inter-cpa-negative-anchor-assumption]])
|
||||
- Dip test on Big-4 dh is composition + integer artefact, not mechanism — §III-I.1 "dip justifies finite mixture" framing must NOT be used; K=3 is descriptive of firm composition ([[feedback-dip-test-composition-artifact]])
|
||||
- Provenance-verify all empirical claims against fresh sqlite/grep ([[feedback-provenance-fabrication]])
|
||||
- Don't mock the DB or use placeholders — every number must trace to a script + query
|
||||
- Partner Jimmy already proposed Big-4 direction (this is execution, not pitching a new direction)
|
||||
- Paper C standalone is shelved — folded into v4.0 §IV-K
|
||||
- AI peer reviewers have accepted fabricated claims in the past; verify numbers against scripts, not against reviewer agreement ([[feedback-ai-review-provenance]])
|
||||
- Paper C standalone is shelved — folded into v4.0 §IV-K (Light full-dataset robustness)
|
||||
|
||||
Reference in New Issue
Block a user