Apply codex round-25 final polish: §III v6 + §IV v3.2

Codex round 25 returned Minor Revision: round-24's empirical and
cross-reference issues mostly CLOSED. Remaining items were all
partner-facing cosmetic / internal-notes hygiene.

§III v6 polish:
  1. §III:11 v5 changelog reprint of real firm names removed
     ("real firm names 'EY' and 'KPMG'" -> "real firm names/aliases")
     -- this was a self-regression I introduced in v5 while
     documenting the v5 anonymisation fix.
  2. §III:14 empirical anchor range updated:
     "Scripts 32-40" -> "Scripts 32-42" (includes Scripts 41 + 42).
  3. New v6 changelog entry added under the draft note documenting
     the round-25 fixes.
  4. Draft note version stamp refreshed: v5 -> v6.

§IV v3.2 polish:
  1. §IV draft note rewritten and version label corrected:
     "Draft v3" -> "Draft v3.2"; "post codex rounds 21-23" ->
     "post codex rounds 21-25". The v3 -> v3.1 -> v3.2 lineage is
     now recorded.
  2. §IV close-out checklist item 2 rewritten to remove residual
     "Tables IV-XVIII" wording. v3.2 explicitly states: v4 table
     sequence is Tables V-XVIII plus Table XV-B; no v4 Table IV
     is printed; the inherited v3.20.0 Table IV (per-firm
     detection counts) remains a v3.x reference only.

Verification:
  - Strict-case grep for KPMG / Deloitte / PwC / EY (with word
    boundaries) + Chinese firm names: ZERO matches in either
    file. Anonymisation is now complete throughout the
    manuscript body AND internal notes.

Round 25 closure post-polish:
  Major:     all CLOSED (round 24 Major 1 table numbering: now
             fully explicit V-XVIII + XV-B with v4 Table IV
             absent; Major 4 anonymisation: §III:11 leak removed)
  Minor:     all CLOSED (weight drift 0.023 confirmed across 4
             sites; cos <= 0.837 confirmed across 2 sites; n=686
             provenance row confirmed)
  Editorial: 1 still PARTIAL (internal draft notes + Phase 3
             close-out checklist remain in the files but
             explicitly marked "internal -- remove before
             submission"; these are author working artefacts
             intentionally retained until submission packaging)

Phase 4 readiness: technically Yes; the §III/§IV technical
content is converged across 5 codex review rounds. Internal
notes will be stripped at submission packaging time. Ready to
proceed to Phase 4 (Abstract/Intro/Discussion/Conclusion prose).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-12 22:36:16 +08:00
parent 6d2eddb6e8
commit 6ba128ded4
3 changed files with 88 additions and 7 deletions
@@ -1,6 +1,6 @@
# Section III. Methodology — v4.0 Draft v5 (post codex rounds 2124)
# Section III. Methodology — v4.0 Draft v6 (post codex rounds 2125)
> **Draft note (2026-05-12, v5; internal — remove before submission).** This file replaces the §III-G through §III-L block of `paper/paper_a_methodology_v3.md` (v3.20.0). Sub-sections III-A through III-F (Pipeline / Data Collection / Page Identification / Detection / Feature Extraction / Dual-Method Descriptors) are unchanged from v3.20.0 and not reproduced here.
> **Draft note (2026-05-12, v6; internal — remove before submission).** This file replaces the §III-G through §III-L block of `paper/paper_a_methodology_v3.md` (v3.20.0). Sub-sections III-A through III-F (Pipeline / Data Collection / Page Identification / Detection / Feature Extraction / Dual-Method Descriptors) are unchanged from v3.20.0 and not reproduced here.
>
> **v2** incorporated codex gpt-5.5 round-21 review (`paper/codex_review_gpt55_v4_round1.md`, Major Revision); key revisions were: (i) the inherited five-way per-signature box rule restored as the **primary operational classifier** (§III-L), (ii) the K=3 Gaussian mixture positioned as **accountant-level descriptive characterisation** (§III-J), (iii) "convergent validation" softened to "convergent internal-consistency checks" since the three scores share underlying features (§III-K), (iv) the pixel-identity metric renamed from FAR to positive-anchor miss rate (§III-K), (v) five empirical/wording slips corrected.
>
@@ -8,10 +8,12 @@
>
> **v4** incorporates the §III ↔ §IV cross-reference cleanup that codex round-23 review flagged: §III-G unit references now point to actual §IV locations (§IV-J for five-way per-signature counts; §IV-I for inherited inter-CPA FAR), §III-G scope statement enumerates v4-new vs inherited sub-sections explicitly, §III-K cites v3.20.0 Tables IX/XI/XII/XII-B for moderate-band capture-rate (was "§IV-F" which is now Convergent Internal-Consistency), and §III-L's "without recalibration" claim is narrowed to apply only to the binary high-confidence sub-rule.
>
> **v5** incorporates codex gpt-5.5 round-24 review (`paper/codex_review_gpt55_v4_round4.md`, Minor Revision); seven narrow §III-side cleanups: (1) anonymisation leak repaired (real firm names "EY" and "KPMG" removed from §III prose; Firm AD used throughout); (2) K=3 LOOO weight-drift value $0.025$ corrected to $0.023$ at three §III sites (matches Script 37); (3) §III-K positive-anchor paragraph cross-ref repaired (now points to §IV-I and v3.20.0 §IV-F.1 Table X, was the meaningless "§III-J inherited; Table X"); (4) §III-L five-way rule's Likely-hand-signed band made inclusive ($\text{cos} \leq 0.837$, matches Script 42); (5) open question 1's location pointer changed from current §IV-F to v3.20.0 Tables IX/XI/XII/XII-B and §IV-J descriptive proportions; (6) provenance row added for the full-dataset $n = 686$ claim citing Script 41; (7) draft-note dates and version stamps refreshed.
> **v5** incorporates codex gpt-5.5 round-24 review (`paper/codex_review_gpt55_v4_round4.md`, Minor Revision); seven narrow §III-side cleanups: (1) anonymisation leak repaired (real firm names/aliases removed from §III prose; Firm AD used throughout); (2) K=3 LOOO weight-drift value $0.025$ corrected to $0.023$ at three §III sites (matches Script 37); (3) §III-K positive-anchor paragraph cross-ref repaired (now points to §IV-I and v3.20.0 §IV-F.1 Table X, was the meaningless "§III-J inherited; Table X"); (4) §III-L five-way rule's Likely-hand-signed band made inclusive ($\text{cos} \leq 0.837$, matches Script 42); (5) open question 1's location pointer changed from current §IV-F to v3.20.0 Tables IX/XI/XII/XII-B and §IV-J descriptive proportions; (6) provenance row added for the full-dataset $n = 686$ claim citing Script 41; (7) draft-note dates and version stamps refreshed.
>
> **v6** incorporates codex gpt-5.5 round-25 review (`paper/codex_review_gpt55_v4_round5.md`, Minor Revision): empirical anchor range updated to Scripts 3242 (was 3240, missed Scripts 41 and 42).
>
> Empirical anchors throughout reference Scripts 3240 on branch `paper-a-v4-big4`; a provenance table appears at the end of this section listing every numerical claim with its script and report path.
> Empirical anchors throughout reference Scripts 3242 on branch `paper-a-v4-big4`; a provenance table appears at the end of this section listing every numerical claim with its script and report path.
## G. Unit of Analysis and Scope
+3 -3
View File
@@ -1,6 +1,6 @@
# Section IV. Results — v4.0 Draft v3 (post codex rounds 2123)
# Section IV. Results — v4.0 Draft v3.2 (post codex rounds 2125)
> **Draft note (2026-05-12, v3; internal — remove before submission).** This file replaces the §IV-A through §IV-H block of `paper/paper_a_results_v3.md` (v3.20.0) with the Big-4 reframed structure. Section IV expands from 8 sub-sections in v3.20.0 to 12 sub-sections in v4.0 (A through L) to mirror the §III-G..L lineage. **v3** incorporates codex gpt-5.5 round-23 review (`paper/codex_review_gpt55_v4_round3.md`, Major Revision); the fixes are presentation-level rather than methodology-level. **Table-numbering scheme** (resolved in v3): the v4 manuscript uses fresh Table numbering V through XVIII for the new v4 Big-4 results; inherited v3.x tables are cited only as "v3.20.0 Table N" with the original v3 number and are *not* renumbered into the v4 sequence. **Anonymisation** (resolved in v3): the Big-4 firms remain pseudonymously labelled Firm A through Firm D throughout the manuscript body; real names are not printed in v4 tables or prose (a single mapping line, retained in v3.20.0's §III-L data-source paragraph, discloses the residual identifiability through contextual descriptors as per IEEE Access norms). Tables IVXVIII numbering remains provisional and will be finalised at Phase 3 close-out after §III ↔ §IV cross-references are traced end-to-end. Empirical anchors trace to Scripts 3242 on branch `paper-a-v4-big4`; the §III provenance table covers the methodology-side citations and §IV adds new tables for the v4.0-specific results.
> **Draft note (2026-05-12, v3.2; internal — remove before submission).** This file replaces the §IV-A through §IV-H block of `paper/paper_a_results_v3.md` (v3.20.0) with the Big-4 reframed structure. Section IV expands from 8 sub-sections in v3.20.0 to 12 sub-sections in v4.0 (A through L) to mirror the §III-G..L lineage. **Table-numbering scheme**: the v4 manuscript uses Tables V through XVIII (plus Table XV-B for document-level worst-case counts) for the new v4 Big-4 results; inherited v3.x tables are cited only as "v3.20.0 Table N" with their original v3 number and are *not* renumbered into the v4 sequence. No v4 Table IV is printed; the inherited v3.20.0 Table IV (per-firm detection counts) remains a v3.x reference rather than a v4 table. **Anonymisation**: the Big-4 firms are pseudonymously labelled Firm A through Firm D throughout the manuscript body; real names are not printed in v4 tables or prose. The v3 → v3.1 → v3.2 revision history is: v3 (post round 23) made the table-numbering scheme and anonymisation policy decisions and applied 14 presentation fixes; v3.1 (post round 24) tightened the close-out checklist; v3.2 (post round 25) finalises this draft note. Empirical anchors trace to Scripts 3242 on branch `paper-a-v4-big4`; the §III provenance table covers the methodology-side citations and §IV adds new tables for the v4.0-specific results.
## A. Experimental Setup
@@ -262,7 +262,7 @@ The feature-backbone ablation (v3.20.0 Table XVIII; backbone replacement of ResN
The following items remain after codex rounds 2124 and before §IV is sent to partner Jimmy for v4.0 review:
1. **Table XV per-signature category counts** — RESOLVED (v2 of §IV draft, Script 42 output). Per-signature, per-firm, document-level, and per-firm-document tables now populated.
2. **Table renumbering finalisation.** The provisional Tables IVXVIII numbering (with Table XV-B added in v2) should be confirmed once §IV is read end-to-end and §III–§IV cross-references are traced; some v3.x table positions (e.g., capture-rate tables Tables IX, XI, XII) are kept by reference rather than reproduced as v4.0-numbered tables.
2. **Table renumbering finalisation.** The v4 table sequence as of v3.2 is Tables VXVIII plus Table XV-B (no v4 Table IV is printed); inherited v3.x tables such as capture-rate Tables IX, XI, XII and the backbone-ablation v3.20.0 Table XVIII are kept by reference and cited as "v3.20.0 Table N" rather than reproduced as v4-numbered tables. A final pass should confirm whether the target journal accepts the Table XV-B letter suffix; if not, XV-B can be renumbered to a sequential XIX with §IV-J text adjusted accordingly.
3. **§IV-A to §IV-C content audit.** Verify that the inherited prose for Experimental Setup, Detection Performance, and All-Pairs analysis remains accurate after the §III-G scope change to Big-4 primary.
4. **Open question carry-over from §III v3.** Codex round-22 open questions on five-way moderate-band validation, firm anonymisation policy, and §IV table numbering are addressed in this v3 of §IV: (a) five-way moderate band documented as inherited from v3.x in §IV-J with Big-4 per-firm proportions reported descriptively (Table XV); (b) firm anonymisation maintained throughout §IV (Firm AD used consistently; real names removed in v3); (c) §IV table numbering set provisionally and to be finalised at Phase 3 close-out.