Apply codex round-24 final cleanup: §III v5 + §IV v3.1

Codex round 24 returned Minor Revision: 3 Major CLOSED + 3 Major PARTIAL + 4 Minor CLOSED + 2 Minor PARTIAL + 4 Editorial CLOSED + 1 Editorial OPEN. All 7 narrow residual fixes were §III-side (I applied §IV fixes thoroughly in v3 but didn't mirror them to §III v4). §III v5 fixes: 1. Anonymisation leak repaired: - "held-out-EY fold" -> "held-out-Firm-D fold" (L71) - "Firms B (KPMG) and D (EY)" -> "Firms B and D" (L99) 2. K=3 LOOO weight drift 0.025 -> 0.023 at three sites (L71, L115, L173 provenance table). Matches Script 37 max C1 weight deviation and §IV v3 line 139. 3. §III-K positive-anchor paragraph cross-ref repaired: "v3.x inter-CPA negative anchor (§III-J inherited; Table X)" -> "(§IV-I, inheriting v3.20.0 §IV-F.1 Table X)". 4. §III-L five-way Likely-hand-signed band made inclusive: "Cosine below the all-pairs KDE crossover threshold." -> "Cosine at or below the all-pairs KDE crossover threshold (cos <= 0.837)." Matches Script 42 and §IV:19. 5. Open question 1's pointer changed from current §IV-F (which is Convergent Internal-Consistency Checks) to v3.20.0 Tables IX/XI/XII/XII-B + §IV-J descriptive proportions. 6. Provenance table: new row for full-dataset n=686 citing Script 41 fulldataset_report.md. 7. Draft-note header refreshed: v3 -> v5; v4 -> v5 etc.; "internal -- remove before submission" tag added. §IV v3.1 fixes: - Close-out checklist L262 stale "codex round 23" wording updated to "rounds 21-24 and before partner Jimmy review". - Close-out item 4 "in this v2" stale wording -> "in this v3". - New item 5 added: internal author notes (this checklist + §III cross-reference index + both files' draft-note headers) are author working artefacts and should be moved/stripped before partner / submission packaging. Round 24 finding summary post-v5/v3.1: Major: 3 CLOSED, 3 -> CLOSED (anonymisation + cross-ref + table numbering note residuals) Minor: 4 CLOSED, 2 -> CLOSED (weight drift 0.025 -> 0.023; low-cosine inclusivity cos <= 0.837) Editorial: 4 CLOSED, 1 PARTIAL (draft notes remain visible but explicitly marked as internal-only "remove before submission") Phase 4 readiness: pending decision on whether to do one more codex verification round (round 25) before drafting Abstract / Intro / Discussion / Conclusion prose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:26:14 +08:00
parent ce33156238
commit 6d2eddb6e8
3 changed files with 128 additions and 12 deletions
@@ -1,11 +1,16 @@
-# Section III. Methodology — v4.0 Draft v4 (post codex rounds 21 + 22 + 23 cross-reference cleanup)
+# Section III. Methodology — v4.0 Draft v5 (post codex rounds 21–24)

-> **Draft note (2026-05-12, v3).** This file replaces the §III-G through §III-L block of `paper/paper_a_methodology_v3.md` (v3.20.0). Sub-sections III-A through III-F (Pipeline / Data Collection / Page Identification / Detection / Feature Extraction / Dual-Method Descriptors) are unchanged from v3.20.0 and not reproduced here.
+> **Draft note (2026-05-12, v5; internal — remove before submission).** This file replaces the §III-G through §III-L block of `paper/paper_a_methodology_v3.md` (v3.20.0). Sub-sections III-A through III-F (Pipeline / Data Collection / Page Identification / Detection / Feature Extraction / Dual-Method Descriptors) are unchanged from v3.20.0 and not reproduced here.
 >
 > **v2** incorporated codex gpt-5.5 round-21 review (`paper/codex_review_gpt55_v4_round1.md`, Major Revision); key revisions were: (i) the inherited five-way per-signature box rule restored as the **primary operational classifier** (§III-L), (ii) the K=3 Gaussian mixture positioned as **accountant-level descriptive characterisation** (§III-J), (iii) "convergent validation" softened to "convergent internal-consistency checks" since the three scores share underlying features (§III-K), (iv) the pixel-identity metric renamed from FAR to positive-anchor miss rate (§III-K), (v) five empirical/wording slips corrected.
 >
-> **v3** incorporates codex gpt-5.5 round-22 review (`paper/codex_review_gpt55_v4_round2.md`, Minor Revision); five narrow fixes applied: (1) §III-K per-firm ranking corrected to acknowledge that Score 2 (reverse-anchor) ranks Firm D fractionally above Firm C while Scores 1 and 3 rank Firm C highest; (2) "smallest scope" / "any single firm" language narrowed to "comparison scopes tested in Script 32"; (3) §III-L corrected so the §III-K Spearman correlations are explicitly tied to the K=3 *posterior* P(C1) rather than the K=3 hard label; (4) provenance table source for $n = 150{,}442$ updated to cite Script 39 directly; (5) "max fold-to-fold deviation" wording made precise — the $0.028$ value is the maximum absolute deviation from the across-fold mean, not the pairwise across-fold range ($0.0376$).
+> **v3** incorporates codex gpt-5.5 round-22 review (`paper/codex_review_gpt55_v4_round2.md`, Minor Revision); five narrow fixes applied: per-firm ranking corrected (Score 2 reverse-anchor ranks Firm D fractionally above Firm C while Scores 1 and 3 rank Firm C highest), "smallest scope" language narrowed to "comparison scopes tested in Script 32", §III-L Spearman correlations explicitly tied to the K=3 *posterior* P(C1), provenance for $n = 150{,}442$ cites Script 39 directly, "max fold-to-fold deviation" wording made precise ($0.028$ = max absolute deviation from across-fold mean; pairwise range $0.0376$).
 >
+> **v4** incorporates the §III ↔ §IV cross-reference cleanup that codex round-23 review flagged: §III-G unit references now point to actual §IV locations (§IV-J for five-way per-signature counts; §IV-I for inherited inter-CPA FAR), §III-G scope statement enumerates v4-new vs inherited sub-sections explicitly, §III-K cites v3.20.0 Tables IX/XI/XII/XII-B for moderate-band capture-rate (was "§IV-F" which is now Convergent Internal-Consistency), and §III-L's "without recalibration" claim is narrowed to apply only to the binary high-confidence sub-rule.
+>
+> **v5** incorporates codex gpt-5.5 round-24 review (`paper/codex_review_gpt55_v4_round4.md`, Minor Revision); seven narrow §III-side cleanups: (1) anonymisation leak repaired (real firm names "EY" and "KPMG" removed from §III prose; Firm A–D used throughout); (2) K=3 LOOO weight-drift value $0.025$ corrected to $0.023$ at three §III sites (matches Script 37); (3) §III-K positive-anchor paragraph cross-ref repaired (now points to §IV-I and v3.20.0 §IV-F.1 Table X, was the meaningless "§III-J inherited; Table X"); (4) §III-L five-way rule's Likely-hand-signed band made inclusive ($\text{cos} \leq 0.837$, matches Script 42); (5) open question 1's location pointer changed from current §IV-F to v3.20.0 Tables IX/XI/XII/XII-B and §IV-J descriptive proportions; (6) provenance row added for the full-dataset $n = 686$ claim citing Script 41; (7) draft-note dates and version stamps refreshed.
+>
+
 > Empirical anchors throughout reference Scripts 32–40 on branch `paper-a-v4-big4`; a provenance table appears at the end of this section listing every numerical claim with its script and report path.

 ## G. Unit of Analysis and Scope
@@ -68,7 +73,7 @@ This section develops the K=2 and K=3 Gaussian mixture fits to the Big-4 account

 $\text{BIC}(K{=}3) = -1111.93$, lower than $K{=}2$ by $3.48$ (mild support for K=3 under standard BIC interpretation but not by itself decisive).

-**Why we report both K=2 and K=3.** Leave-one-firm-out cross-validation (§III-K) shows that K=2 is unstable across folds: holding Firm A out gives a fold rule cos $> 0.938$ AND dHash $\leq 8.79$, while holding any single non-Firm-A Big-4 firm out gives a fold rule near cos $> 0.975$ AND dHash $\leq 3.76$ (Script 36). The maximum absolute deviation of the four fold cosine crossings from their across-fold mean is $0.028$ (the corresponding pairwise across-fold range is $0.0376$, from $0.9380$ for the held-out-Firm-A fold to $0.9756$ for the held-out-EY fold; Script 36 stability summary). The $0.028$ value is $5.6\times$ the report's $0.005$ across-fold *stability tolerance* — not the bootstrap CI; the full-Big-4 bootstrap cosine half-width is the much smaller $0.0015$. K=3 in contrast has a *reproducible component shape*: across the four folds the C1 cosine mean varies by at most $0.005$, the C1 dHash mean by at most $0.96$, and the C1 weight by at most $0.025$ (Script 37). K=3 hard-posterior membership for the held-out firm is more composition-sensitive — for Firm C the held-out C1 rate is $36.3\%$ vs the full-Big-4 baseline of $23.5\%$, an absolute difference of $12.8$ pp; for Firm A the held-out C1 rate is $4.7\%$ vs baseline $0.0\%$; the report's own legend classifies this pattern as `P2_PARTIAL`, with the explicit interpretation that "the C1 cluster exists but membership is not well-predicted by the held-out fit" and "K=3 is not predictively useful as an operational classifier" (Script 37 verdict legend).
+**Why we report both K=2 and K=3.** Leave-one-firm-out cross-validation (§III-K) shows that K=2 is unstable across folds: holding Firm A out gives a fold rule cos $> 0.938$ AND dHash $\leq 8.79$, while holding any single non-Firm-A Big-4 firm out gives a fold rule near cos $> 0.975$ AND dHash $\leq 3.76$ (Script 36). The maximum absolute deviation of the four fold cosine crossings from their across-fold mean is $0.028$ (the corresponding pairwise across-fold range is $0.0376$, from $0.9380$ for the held-out-Firm-A fold to $0.9756$ for the held-out-Firm-D fold; Script 36 stability summary). The $0.028$ value is $5.6\times$ the report's $0.005$ across-fold *stability tolerance* — not the bootstrap CI; the full-Big-4 bootstrap cosine half-width is the much smaller $0.0015$. K=3 in contrast has a *reproducible component shape*: across the four folds the C1 cosine mean varies by at most $0.005$, the C1 dHash mean by at most $0.96$, and the C1 weight by at most $0.023$ (Script 37). K=3 hard-posterior membership for the held-out firm is more composition-sensitive — for Firm C the held-out C1 rate is $36.3\%$ vs the full-Big-4 baseline of $23.5\%$, an absolute difference of $12.8$ pp; for Firm A the held-out C1 rate is $4.7\%$ vs baseline $0.0\%$; the report's own legend classifies this pattern as `P2_PARTIAL`, with the explicit interpretation that "the C1 cluster exists but membership is not well-predicted by the held-out fit" and "K=3 is not predictively useful as an operational classifier" (Script 37 verdict legend).

 We take the joint K=2/K=3 LOOO evidence as supporting the following descriptive claims, all of which are used in §III-K and §V but none of which underwrites a v4.0 operational classifier:

@@ -96,7 +101,7 @@ Pairwise Spearman rank correlations among the three scores, $n = 437$ Big-4 CPAs
 | Score 2 vs Score 3 | $+0.889$ | $< 10^{-149}$ |
 | Score 1 vs Score 2 | $+0.879$ | $< 10^{-142}$ |

-We read this as the strongest internal-consistency signal in v4.0: three different summarisations of the same descriptor pair agree on the per-CPA hand-leaning ranking with $\rho > 0.87$. The three scores agree on placing Firm A as the most replication-dominated and the three non-Firm-A Big-4 firms as more hand-leaning, but they do not all rank the non-Firm-A firms identically: the K=3 posterior P(C1) and the box-rule hand-leaning rate (Scores 1 and 3) place Firm C at the most-hand-leaning end of Big-4 (mean P(C1) $= 0.311$; mean box-rule hand-leaning rate $= 0.790$), while the reverse-anchor cosine percentile (Score 2) places Firm D fractionally higher than Firm C (mean reverse-anchor score $-0.7125$ vs Firm C $-0.7672$, with higher value indicating deeper into the reference left tail). The mean values for Firms B (KPMG) and D (EY) sit between Firms A and C on Scores 1 and 3 (Script 38 per-firm summary). We do not claim this constitutes external validation of any operational classifier; the inherited box rule is calibrated separately (§III-L), and the convergence above shows that a mixture-derived score and a reverse-anchor score concur with the box rule's per-CPA-aggregated outputs on the directional ordering, with a modest disagreement at the most-hand-leaning end between the three non-A Big-4 firms.
+We read this as the strongest internal-consistency signal in v4.0: three different summarisations of the same descriptor pair agree on the per-CPA hand-leaning ranking with $\rho > 0.87$. The three scores agree on placing Firm A as the most replication-dominated and the three non-Firm-A Big-4 firms as more hand-leaning, but they do not all rank the non-Firm-A firms identically: the K=3 posterior P(C1) and the box-rule hand-leaning rate (Scores 1 and 3) place Firm C at the most-hand-leaning end of Big-4 (mean P(C1) $= 0.311$; mean box-rule hand-leaning rate $= 0.790$), while the reverse-anchor cosine percentile (Score 2) places Firm D fractionally higher than Firm C (mean reverse-anchor score $-0.7125$ vs Firm C $-0.7672$, with higher value indicating deeper into the reference left tail). The mean values for Firms B and D sit between Firms A and C on Scores 1 and 3 (Script 38 per-firm summary). We do not claim this constitutes external validation of any operational classifier; the inherited box rule is calibrated separately (§III-L), and the convergence above shows that a mixture-derived score and a reverse-anchor score concur with the box rule's per-CPA-aggregated outputs on the directional ordering, with a modest disagreement at the most-hand-leaning end between the three non-A Big-4 firms.

 **2. Per-signature consistency (Script 39).** Per-CPA aggregation could in principle reflect averaging across within-CPA heterogeneity rather than coherent within-CPA behaviour. We test this by repeating the K=3 fit at the signature level — fitting a fresh K=3 GMM to the 150,442 Big-4 signature-level $(\text{cos}, \text{dHash}_{\text{indep}})$ points (Script 39) — and comparing labels. The per-CPA and per-signature K=3 fits recover a broadly similar three-component ordering; per-CPA C1 is at $\overline{\text{cos}} = 0.946$, $\overline{\text{dHash}} = 9.17$ vs per-signature C1 at $\overline{\text{cos}} = 0.928$, $\overline{\text{dHash}} = 9.75$ (an absolute cosine drift of $0.018$). Cohen $\kappa$ on the binary collapse (replicated vs not-replicated):

@@ -112,11 +117,11 @@ The Script 39 report verdict is `SIG_CONVERGENCE_MODERATE`. The $\kappa = 0.870$

 - *K=2 LOOO is unstable.* The maximum absolute deviation of the four fold cosine crossings from their across-fold mean is $0.028$, against the report's $0.005$ across-fold stability tolerance (Script 36; pairwise fold range $0.0376$, from $0.9380$ to $0.9756$). When Firm A is held out, the fold rule classifies $171/171$ of held-out Firm A CPAs as templated; when any non-Firm-A Big-4 firm is held out, the fold rule classifies $0$ of the held-out firm's CPAs as templated. This pattern indicates the K=2 boundary is essentially a Firm-A-vs-others separator rather than a within-Big-4 mechanism boundary.

- *K=3 LOOO is partially stable.* The C1 hand-leaning component shape is reproducible across folds: max deviation from the full-Big-4 baseline is $0.005$ in cosine, $0.96$ in dHash, and $0.025$ in mixture weight (Script 37). Hard-posterior membership remains composition-sensitive — observed absolute differences are $1.8$–$12.8$ pp across the four folds, with the Firm C fold exceeding the report's $5$ pp viability bar; the report's own verdict is `P2_PARTIAL` ("K=3 is not predictively useful as an operational classifier"). We accordingly do not use K=3 hard-posterior membership as an operational label.
+- *K=3 LOOO is partially stable.* The C1 hand-leaning component shape is reproducible across folds: max deviation from the full-Big-4 baseline is $0.005$ in cosine, $0.96$ in dHash, and $0.023$ in mixture weight (Script 37). Hard-posterior membership remains composition-sensitive — observed absolute differences are $1.8$–$12.8$ pp across the four folds, with the Firm C fold exceeding the report's $5$ pp viability bar; the report's own verdict is `P2_PARTIAL` ("K=3 is not predictively useful as an operational classifier"). We accordingly do not use K=3 hard-posterior membership as an operational label.

 **4. Positive-anchor miss rate on byte-identical signatures (Script 40).** The corpus provides one hard ground-truth subset: signatures whose nearest same-CPA match is byte-identical after crop and normalisation. Independent hand-signing cannot produce pixel-identical images, so byte-identical signatures are conservative-subset ground truth for the *replicated* class. The Big-4 byte-identical subset comprises $n = 262$ signatures ($145 / 8 / 107 / 2$ across Firms A through D; Script 40).

-We report each candidate classifier's *positive-anchor miss rate* — the fraction of byte-identical signatures misclassified as hand-leaning. This is a one-sided check against a conservative positive subset, **not a false-alarm rate in the usual two-class sense**; we do not report a paired false-alarm rate because no signature-level hand-signed ground truth exists. The corresponding signature-level false-alarm-rate evidence comes from the v3.x inter-CPA negative anchor (§III-J inherited; Table X), which retains its v3.x interpretation:
+We report each candidate classifier's *positive-anchor miss rate* — the fraction of byte-identical signatures misclassified as hand-leaning. This is a one-sided check against a conservative positive subset, **not a false-alarm rate in the usual two-class sense**; we do not report a paired false-alarm rate because no signature-level hand-signed ground truth exists. The corresponding signature-level false-alarm-rate evidence comes from the v3.x inter-CPA negative anchor (§IV-I, inheriting v3.20.0 §IV-F.1 Table X), which retains its v3.x interpretation:

 | Candidate classifier | Pixel-identity miss rate (Wilson 95% CI) |
 |---|---|
@@ -140,7 +145,7 @@ The v4.0 operational classifier is the inherited v3.x five-way per-signature box

 4. **Uncertain:** Cosine between the all-pairs intra/inter KDE crossover (0.837) and 0.95 without sufficient convergent evidence in either direction.

-5. **Likely hand-signed:** Cosine below the all-pairs KDE crossover threshold.
+5. **Likely hand-signed:** Cosine at or below the all-pairs KDE crossover threshold (cos $\leq 0.837$).

 The conventions about these thresholds (cosine 0.95 as an operating point chosen for capture-vs-FAR tradeoff against the inter-CPA negative anchor, with Wilson 95% inter-CPA FAR of $0.0005$ at the operating point; cosine 0.837 as the all-pairs KDE crossover; dHash 5 and 15 as the upper tail of the high-similarity mode and the structural-similarity ceiling respectively) are inherited from v3.x §III-K and retain their v3.x calibration and capture-rate evidence (v3.x Tables IX, XI, XII, XII-B).

@@ -170,12 +175,13 @@ The conventions about these thresholds (cosine 0.95 as an operating point chosen
 | $\Delta\text{BIC}(K{=}3, K{=}2) = -3.48$ | direct | Script 34 (BIC K=2 = $-1108.45$; Script 36 reports BIC K=3 = $-1111.93$) | direct (arithmetic) |
 | K=2 LOOO max cosine deviation $0.028$ | direct | Script 36 stability summary | direct |
 | K=2 LOOO Firm A held-out $171/171$ replicated | direct | Script 36 fold table | direct |
-| K=3 C1 component shape drift (cos $0.005$, dHash $0.96$, weight $0.025$) | direct | Script 37 stability summary | direct |
+| K=3 C1 component shape drift (cos $0.005$, dHash $0.96$, weight $0.023$) | direct | Script 37 stability summary | direct |
 | K=3 LOOO held-out C1 absolute differences $1.8$–$12.8$ pp | direct | Script 37 held-out prediction check | direct |
 | Three-score pairwise Spearman ($0.963$, $0.889$, $0.879$) | direct | Script 38 correlations | direct |
 | Per-CPA / per-signature K=3 Cohen $\kappa$ ($0.662$, $0.559$, $0.870$) | direct | Script 39 kappa table | direct |
 | Per-CPA / per-signature K=3 C1 center drift $0.018$ (cosine) | derived | $\lvert 0.9457 - 0.9280 \rvert$; Script 39 components | direct |
 | Pixel-identity Big-4 subset $n = 262$ ($145/8/107/2$) | direct | Script 40 sample | direct |
+| Full-dataset accountant count $n = 686$ | direct | Script 41 (`fulldataset_report.md`) | direct |
 | Positive-anchor miss rate $0\%$ on $n = 262$ (Wilson upper $1.45\%$) | direct | Script 40 results table | direct |
 | Inter-CPA FAR $0.0005$ at cos $> 0.95$ (Wilson 95% $[0.0003, 0.0007]$) | direct | v3 §IV-F.1 / Table X (inherited) | inherited from v3 |
 | Firm A byte-identical $145$ pixel-identical signatures in Big-4 subset | direct | Script 40 sample breakdown | direct |
@@ -195,7 +201,7 @@ The conventions about these thresholds (cosine 0.95 as an operating point chosen

 ## Open questions remaining for partner / reviewer

-1. **Five-way rule validation against the moderate-confidence band.** §III-K's $\kappa$ evidence covers only the binary high-confidence rule (cos $> 0.95$ AND dHash $\leq 5$). The moderate-confidence band (cos $> 0.95$ AND $5 < \text{dHash} \leq 15$) inherits its v3.x calibration and capture-rate evidence (Tables IX, XI, XII). Is this inheritance sufficient, or should v4.0 add a Big-4-specific capture-rate analysis for the moderate band in §IV-F?
+1. **Five-way rule validation against the moderate-confidence band.** §III-K's $\kappa$ evidence covers only the binary high-confidence rule (cos $> 0.95$ AND dHash $\leq 5$). The moderate-confidence band (cos $> 0.95$ AND $5 < \text{dHash} \leq 15$) inherits its v3.x calibration and capture-rate evidence (v3.20.0 Tables IX, XI, XII, XII-B). Is this inheritance sufficient (Big-4 per-firm MC proportions are reported descriptively in §IV-J's Table XV), or should v4.0 add a Big-4-specific MC-band capture-rate analysis as an additional sub-section?

 2. **Anonymisation of within-Big-4 firm contrasts.** §III-H states that Firm C is the firm most concentrated in C1 hand-leaning at $23.5\%$ (Script 35). The within-Big-4 ordering by hand-leaning concentration is informative for the §V discussion. v3.x reports under pseudonyms throughout. Confirm that we maintain pseudonyms consistently in §IV–V even when discussing the specific Firm C / Firm B / Firm D hand-leaning rates.

@@ -259,9 +259,11 @@ The feature-backbone ablation (v3.20.0 Table XVIII; backbone replacement of ResN

 ## Phase 3 close-out checklist

-The following items are flagged for resolution before §IV is sent for codex round 23 / partner Jimmy review:
+The following items remain after codex rounds 21–24 and before §IV is sent to partner Jimmy for v4.0 review:

 1. **Table XV per-signature category counts** — RESOLVED (v2 of §IV draft, Script 42 output). Per-signature, per-firm, document-level, and per-firm-document tables now populated.
 2. **Table renumbering finalisation.** The provisional Tables IV–XVIII numbering (with Table XV-B added in v2) should be confirmed once §IV is read end-to-end and §III–§IV cross-references are traced; some v3.x table positions (e.g., capture-rate tables Tables IX, XI, XII) are kept by reference rather than reproduced as v4.0-numbered tables.
 3. **§IV-A to §IV-C content audit.** Verify that the inherited prose for Experimental Setup, Detection Performance, and All-Pairs analysis remains accurate after the §III-G scope change to Big-4 primary.
-4. **Open question carry-over from §III v3.** Codex round-22 open questions on five-way moderate-band validation, firm anonymisation policy, and §IV table numbering are addressed in this v2: (a) five-way moderate band documented as inherited from v3.x in §IV-J with Big-4 per-firm proportions reported descriptively (Table XV); (b) firm anonymisation maintained throughout §IV (Firm A–D used consistently); (c) §IV table numbering set provisionally and to be finalised at Phase 3 close-out.
+4. **Open question carry-over from §III v3.** Codex round-22 open questions on five-way moderate-band validation, firm anonymisation policy, and §IV table numbering are addressed in this v3 of §IV: (a) five-way moderate band documented as inherited from v3.x in §IV-J with Big-4 per-firm proportions reported descriptively (Table XV); (b) firm anonymisation maintained throughout §IV (Firm A–D used consistently; real names removed in v3); (c) §IV table numbering set provisionally and to be finalised at Phase 3 close-out.
+
+5. **Internal author notes (this checklist + §III's cross-reference index + both files' draft-note headers).** These are author working artefacts and should be moved to a separate notes file or stripped before partner / submission packaging.