Files

T

gbanyan 9604b273c0 Apply codex round-7 Phase 5 copy-edit fixes + refresh STATE.md

Mechanical copy-edit closing the OPEN/PARTIAL items from
paper/codex_review_gpt55_v4_round7.md; substantive empirical
content unchanged. Manuscript-splice items (strip internal draft
notes, update stale abstract-count note) deferred to final splice.

- Phase 4 prose §V-G + §III-K methodology: "candidate classifiers"
  -> "candidate checks" (closes round-7 m13 + Spot-check 3 wording leak)
- Phase 4 prose §II: remove placeholder caveat sentence at the LOOO
  paragraph (closes round-7 M6 + A4)
- References v3: add [42] Stone 1974, [43] Geisser 1975, [44] Vehtari
  et al. 2017 (44 entries; was 41) — backs the §II LOOO addition
- Round-7 review: add row-count clarification note (11 Major / 15
  Minor labelled rows vs. the prompt's 9/12 tally)
- STATE.md: refresh from stale Phase-2 snapshot to current Phase 5
  status — Phases 1-4 complete; codex rounds 1-7 closed at Minor
  Revision; pending Gemini + Opus rounds + round-2/3 convergence

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-14 14:21:59 +08:00

5.8 KiB

Raw Blame History

STATE — Current snapshot

Date: 2026-05-14 Active milestone: Paper A v4.0 — Big-4 reframe Active branch: paper-a-v4-big4 (33 commits ahead of master; pushed to origin/paper-a-v4-big4 at 980295d) Active phase: Phase 5 — AI peer review (codex round 7 closed at Minor Revision; Gemini + Opus rounds + round-2/3 convergence still pending)

Recently completed

Phase 1 (Foundation, 9 spike scripts + 6 follow-on scripts):

Script 32 (e1d81e3): non-Firm-A calibration verdict C
Script 33 (8ac0988): reverse-anchor PAPER_C_STRONG (directional ρ=+0.744)
Script 34 (55f9f94): Big-4 K=2 dip-test multimodal p<0.0001, bootstrap CI [0.974, 0.977] / [3.48, 3.97]
Script 35 (55f9f94): firm × cluster — Firm A 0% C1 / 82.5% C3, PwC 23.5% C1
Script 36 (ccd9f23): K=2 LOOO UNSTABLE (firm-mass conflation; max Δcos=0.028)
Script 37 (92f1db8): K=3 LOOO PARTIAL (component shape stable, membership ±5-13pp)
Script 38 (bc36dcc): convergence STRONG — 3 lenses pairwise ρ ≥ 0.879
Script 39 (39575ce): per-signature convergence MODERATE — κ=0.87 between per-CPA and per-sig K=3 fits
Script 40 (338737d): pixel-identity FAR = 0% on n=262 ground-truth replicated
Scripts 39b/c/d/e + 40b + 43 (d4f370b): anchor-based FAR diagnostics; composition decomposition proves Big-4 dh "multimodality" = between-firm shift + integer ties (p_median=0.35 under joint correction)
Scripts 44 + 45 (4cf21a6): firm-matched-pool logistic regression; full 5-way doc FAR — Firms B/C/D OR 0.05/0.01/0.03 vs Firm A reference after pool-size adjustment
Script 46 (2f05d6f): alert-rate sensitivity / threshold-plateau analysis (HC threshold locally sensitive; MC/HSC dHash=15 boundary plateau-like)
Script 41 (9392f30): §IV-K full-dataset robustness comparison (Light)
Script 42 (453f1d8): Phase 3 close-out support

Phase 2 (Methodology rewrite): §III v7 delivered at paper/v4/paper_a_methodology_v4_section_iii.md (723a3f6). Anchor-based ICCR framework + composition-decomposition finding; covers §III-G through §III-M.

Phase 3 (Results regeneration): §IV v3.3 delivered at paper/v4/paper_a_results_v4_section_iv.md (980295d). 12 sub-sections A–L + §IV-M added; Table XV filled; Big-4 reframe + light §IV-K full-dataset robustness; round-23/24/25/27 codex corrections applied; §IV-D/E framing softened; §IV-I renamed.

Phase 4 (Prose rewrite): Abstract + §I + §II (LOOO addition) + §V + §VI delivered at paper/v4/paper_a_prose_v4_phase4.md. Prose v3 (b33e20d) rewritten to match §III v7. Abstract trimmed to 243 words (918d551). Round-25/26/27 codex corrections applied.

v4 methodological pivot (recorded for context): Distributional path to thresholds (K=3 / dip / antimode) shown composition-driven and abandoned; replaced by anchor-based inter-CPA coincidence-rate (ICCR) calibration at three units (per-comparison / per-signature / per-document). "FAR" terminology replaced by "ICCR" throughout. K=3 demoted to descriptive firm-compositional partition. Positioning: specificity-proxy-anchored screening framework with human-in-the-loop review, not a validated forensic detector. Codex round-32 verdict: SOUND_WITH_QUALIFICATIONS, IEEE Access-publishable.

Phase 5 — AI peer review (in progress)

Codex (GPT-5.5) rounds 1–7 complete on the v4 drafts. Round-7 verdict (2026-05-12): Minor Revision. All round-26 Major findings closed substantively; remaining blockers were packaging/copy-edit only.

Mechanical copy-edit pass applied 2026-05-14 (uncommitted on paper-a-v4-big4):

Replaced residual "candidate classifiers" wording with "candidate checks" in §V-G prose (line 107) and §III-K methodology (line 149 + table heading)
Added references [42] Stone 1974, [43] Geisser 1975, [44] Vehtari et al. 2017 to paper/paper_a_references_v3.md (now 44 entries)
Removed §II placeholder caveat sentence at Phase 4 prose line 67

Pending Phase 5 work:

Gemini 3.x Pro full-manuscript review on v4 drafts (round 1)
Opus 4.7 max-effort full-manuscript review on v4 drafts (round 1)
Round-2 cross-check across the three reviewers (address any new Major findings)
Round-3 convergence — Accept/Minor consensus from ≥2 of 3 reviewers
Manuscript-splice copy-edit (strip internal draft notes at line 3 of all three v4 files + Phase 4 close-out checklist at lines 153–162 + §IV close-out checklist at line 365+; update stale abstract-count note)

Pending — partner-side decision still open

2026-05-13 partner alignment: partner wants high-specificity threshold and asked if firm heterogeneity could be framed as "statistically insignificant." Decision: no — heterogeneity is highly significant (40–62σ in logistic regression). v4 frames it explicitly as decisive firm heterogeneity. Confirm framing with partner before Phase 6.

Blockers

None on the critical path. Phase 5 substantive empirical work is complete; remaining tasks are reviewer rounds + manuscript-splice copy-edit.

Things to remember (per memory)

Inter-CPA "FAR" is NOT true FAR; it's a coincidence rate (ICCR) under an assumption violated by within-firm template sharing — never write "FAR" or "specificity" without the disclaimer (feedback-inter-cpa-negative-anchor-assumption)
Dip test on Big-4 dh is composition + integer artefact, not mechanism — §III-I.1 "dip justifies finite mixture" framing must NOT be used; K=3 is descriptive of firm composition (feedback-dip-test-composition-artifact)
Provenance-verify all empirical claims against fresh sqlite/grep (feedback-provenance-fabrication)
AI peer reviewers have accepted fabricated claims in the past; verify numbers against scripts, not against reviewer agreement (feedback-ai-review-provenance)
Paper C standalone is shelved — folded into v4.0 §IV-K (Light full-dataset robustness)

5.8 KiB Raw Blame History Unescape Escape