Files
pdf_signature_extraction/.planning/STATE.md
T
gbanyan e9357c903b Update STATE.md: Phase 5 closed; Phase 6 ready to begin
Phase 5 AI peer review convergence achieved 2026-05-14 with 3/3
reviewers in Accept/Minor band:
- Gemini round-2: Accept (splice-ready as-is)
- Opus round-2: Minor Revision (N1-N4 → closed in round-4)
- codex round-9: Minor Revision (N1/N2 provenance → closed in round-5)

Fix-round commits archived: b884d39 (round-2), 4a6f9c5 (round-3),
d3ddf74 (round-4), 128a914 (round-5). Reviewer artifacts archived
at paper/codex_review_gpt55_v4_round{7,8,9}.md, paper/gemini_review_
v4_round{1,2}.md, paper/opus_review_v4_round{1,2}.md.

Phase 6 tasks documented: partner-framing confirmation (reject
"statistically insignificant"), manuscript-splice assembly with
internal-note strips, DOCX export, partner Jimmy review.

Phase 7 tasks documented: iThenticate, IEEE eCF, submission.

Lessons added to memory cross-references: codex round-9's
DB-verification caught a "majority firm" inference that turned out
to be 1:1 ties (round-5 corrected); codex's read-only jitter rerun
exposed an unreproducible non-Big-4 range (round-5 replaced with
codex-verified range).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:09:33 +08:00

6.4 KiB
Raw Blame History

STATE — Current snapshot

Date: 2026-05-14 Active milestone: Paper A v4.0 — Big-4 reframe Active branch: paper-a-v4-big4 (41 commits ahead of master; fully pushed to origin/paper-a-v4-big4 at 128a914) Active phase: Phase 5 — AI peer review COMPLETE; Phase 6 ready to begin

Phase 5 closure summary (2026-05-14)

Convergence achieved: 3/3 reviewers in Accept/Minor band across the round-2 cross-check.

Reviewer Final round Verdict
Gemini 3.1 Pro round 2 Accept (Phase 5 splice-ready as-is)
Opus 4.7 round 2 Minor Revision (4 substantive findings → closed in round-4)
codex GPT-5.5 round 9 Minor Revision (2 provenance findings → closed in round-5)

Original Phase 5 gate met: Accept/Minor consensus from ≥2 of 3 reviewers. No empirical reruns required.

Phase 5 fix rounds applied (commits on this branch):

  1. 9604b27 — codex round-7 closeout copy-edit (candidate classifiers → candidate checks; refs [42]-[44] added; §II placeholder caveat removed; STATE.md refresh)
  2. b884d39 — round-2 fixes (Opus M1: §IV K=3 mechanism-label reversion; M2: Table XV-B → XIX + cascade XIX → XX … XXV → XXVI; M3: "98-100%" within-firm semantic conflation; M4: duplicate §V-G heading; Gemini Table XV sample-size footnote)
  3. 4a6f9c5 — round-3 fixes (codex round-8 splice blockers: abstract trim 261 → 247 words; §IV-J Table XV footnote §IV-M.5 reclassification; §IV-I "§IV-M Table XVI" → "§IV-M Tables XXI-XXVI"; binary-collapse terminology cleanup)
  4. d3ddf74 — round-4 fixes (Opus round-2 N1: Firm C 19,501 vs 19,122 denominator footnote; N2: composition-decomposition added as Table XXVII row 1; N3: Table XXVII numbered; N4: cross-firm hit matrix assumption disclosure)
  5. 128a914 — round-5 provenance patches (codex round-9 factual corrections: N1 "majority firm" → "1:1 tie-break to first-sorted firm" via Script 45 np.argmax; N2 row narrowed to Big-4-only evidence; non-Big-4 jittered-dHash range [0.71, 1.00] → codex-verified [0.38, 1.00] with read-only-spike provenance)

Reviewer artifacts archived (paper/):

  • codex_review_gpt55_v4_round{7,8,9}.md
  • gemini_review_v4_round{1,2}.md
  • opus_review_v4_round{1,2}.md

Phase 5 substantive findings catalogue

v4 methodological pivot (unchanged through all reviewer rounds):

  • Distributional path to thresholds (K=3 / dip / antimode) abandoned; anchor-based ICCR calibration at 3 units adopted
  • "FAR" → "ICCR" throughout; inter-CPA-as-negative assumption disclosed as partially violated by within-firm template sharing
  • K=3 demoted to descriptive firm-compositional partition (§III-J line 90 retires "hand-leaning / mixed / replicated" mechanism labels)
  • Positioning: anchor-calibrated specificity-only screening framework with human-in-the-loop review; NOT a validated forensic detector

Empirical anchors (all provenance-verified across reviewer panel):

  • Three feature-derived scores converge Spearman \rho \geq 0.879 (internal consistency; not external validation)
  • Anchor-based ICCRs: per-comparison 0.0006/0.0013/0.00014; per-signature 0.11; per-document 0.34
  • Firm heterogeneity decisive: Firm A per-doc HC+MC alarm 0.62 vs Firms B/C/D $0.09$0.16; logistic OR 0.05/0.01/0.03 relative to Firm A reference
  • Within-firm collision concentration under deployed any-pair rule: Firm A 98.8\% vs Firms B/C/D $76.7$83.7\%; same-pair joint event saturates at $97.0$99.96\% within-firm at all four firms

Phase 6 — Partner Jimmy v4.0 review (READY TO BEGIN)

Pre-Phase-6 partner alignment (2026-05-13 still open): partner asked whether firm heterogeneity could be framed as "statistically insignificant." Decision: no — heterogeneity is highly significant (4062σ in logistic regression; all three AI reviewers independently confirmed the decisive framing). Confirm framing with partner before exporting DOCX.

Phase 6 tasks:

  1. Confirm "statistically insignificant" framing rejection with partner
  2. Manuscript-splice assembly:
    • Splice §III-G–§III-M (paper_a_methodology_v4_section_iii.md) onto v3.20.0 §III-A–§III-F into master paper/paper_a_methodology_v3.md body
    • Splice §IV v3.3 (paper_a_results_v4_section_iv.md) into master paper/paper_a_results_v3.md
    • Splice Phase 4 prose (Abstract / §I / §II / §V / §VI) into the master manuscript file
    • Strip internal-only blocks during splice: Phase 4 line 3 draft note + lines 153-162 close-out checklist; §III line 3 + lines 434-447 cross-reference checklist + open-questions block; §IV line 3 + close-out checklist at line 365+
    • Re-verify table numbering after splice (Table XXVII currently lives in §III between §IV-M.6's Table XXVI; confirm order in final master file)
  3. Export v4.0 DOCX via paper/export_v3.py (with author block fill)
  4. Ship to ~/Downloads
  5. Iterate on Jimmy's review comments
  6. Capture review artifact in paper/partner_jimmy_v4_review.md

Phase 7 — IEEE Access submission (pending Phase 6)

  1. iThenticate similarity check (target < 20%)
  2. IEEE eCF form
  3. Upload manuscript + cover letter via IEEE Access submission portal
  4. Capture confirmation number

Blockers

None. Phase 5 closed; Phase 6 ready to begin pending partner-framing confirmation.

Things to remember (per memory)

  • Inter-CPA "FAR" is NOT true FAR; it's a coincidence rate (ICCR) under an assumption violated by within-firm template sharing — never write "FAR" or "specificity" without the disclaimer (feedback-inter-cpa-negative-anchor-assumption)
  • Dip test on Big-4 dh is composition + integer artefact, not mechanism — §III-I.1 "dip justifies finite mixture" framing must NOT be used; K=3 is descriptive of firm composition (feedback-dip-test-composition-artifact)
  • Provenance-verify all empirical claims against fresh sqlite/grep (feedback-provenance-fabrication) — codex round-9's DB-verification caught a "majority firm" inference in round-4 that turned out to be 1:1 ties resolved by np.argmax tie-break; round-5 corrected it
  • AI peer reviewers have accepted fabricated claims in the past; verify numbers against scripts, not against reviewer agreement (feedback-ai-review-provenance) — codex round-9's read-only rerun of the non-Big-4 jittered procedure exposed an unreproducible [0.71, 1.00] range that round-5 corrected to [0.38, 1.00]
  • Paper C standalone is shelved — folded into v4.0 §IV-K (Light full-dataset robustness)