Phase 6 round-4 codex-review fixes: blocker + 2 majors + minors

Codex round-3 verification (commit 9e68f2e) flagged remaining items:

BLOCKER:
- Appendix A Table A.I was inside an HTML comment but visible prose at L1268
  and L1275 referenced it as if it rendered. Un-commented Table A.I (same fix
  pattern as Tables I-IV in round-3).

MAJOR:
- Table XXVII (diagnostic summary) appeared in §III-M at L593 before Tables
  III-XXVI in §IV, breaking sequential IEEE-style numbering. Moved the table
  to Appendix A as Table A.II (new §A.2 "Diagnostic Summary"). §III-M now
  references "Appendix A Table A.II" instead of hosting the table inline;
  §VI Conclusion contribution (8) updated similarly. Appendix A heading
  generalised to "Supplementary Diagnostic Detail" with §A.1 (BD/McCrary)
  and §A.2 (Diagnostic Summary).
- §III-M L614 rate-definition conflation: the sentence "per-signature and
  per-document rates ($0.11$ and $0.34$ respectively under the deployed
  any-pair HC + MC alarm)" mixed unit labels. Rewrote to label each rate by
  its actual unit and source table: per-signature any-pair HC ICCR ($0.11$;
  Table XXII) and per-document HC+MC alarm-rate ICCR ($0.34$; Table XXIII).

MINORS:
- L400 §III-L stale ref retargeted: "operational signature-level classifier
  of §III-L" -> "of §III-H.1 calibrated in §III-L".
- L663 §III-L stale ref retargeted: "KDE crossover used in per-document
  classification (Section III-L)" -> "(Section III-H.1)".
- L1146 Conclusion "corpus-universal" overstated generality -> "across the
  tested eligible scopes" (non-Big-4 evidence is cosine-axis only, not full
  calibration scope).
- L1024 Results §IV-M.4 footnote: np.argmax / np.unique implementation
  detail moved to "alphabetically-ordered tie-break ... full implementation
  detail in the supplementary materials" (less script-internal prose).

Artefacts:
- Combined manuscript regenerated: paper_a_v4_combined.md, 1316 lines.
- Final main-text table sequence: I, II, III, ..., XXVI (26 tables, all
  sequential in rendered order). Appendix tables: A.I, A.II.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-15 19:59:56 +08:00
parent 9e68f2e1d3
commit bbb662acd1
5 changed files with 64 additions and 56 deletions
+2 -2
View File
@@ -32,7 +32,7 @@ The Big-4 subset of the detection output yields 150,442 signatures with both des
## C. All-Pairs Intra-vs-Inter Class Distribution Analysis
Fig. 2 presents the cosine similarity distributions computed over the full set of *pairwise comparisons* under two groupings: intra-class (all signature pairs belonging to the same CPA) and inter-class (signature pairs from different CPAs).
This all-pairs analysis is a different unit from the per-signature best-match statistics used in Sections IV-D onward; we report it first because it supplies the reference point for the KDE crossover used in per-document classification (Section III-L).
This all-pairs analysis is a different unit from the per-signature best-match statistics used in Sections IV-D onward; we report it first because it supplies the reference point for the KDE crossover used in per-signature classification (Section III-H.1).
Table IV summarizes the distributional statistics.
**Table IV.** Cosine Similarity Distribution Statistics.
@@ -393,7 +393,7 @@ Decile trend is broadly monotone in pool size with two minor reversals (decile 5
| D2 (operational) | HC + MC | $0.3375$ | $[0.3342, 0.3409]$ |
| D3 | HC + MC + HSC | $0.3384$ | $[0.3351, 0.3418]$ |
Per-firm D2 document-level ICCR: Firm A $0.6201$ ($n = 30{,}226$); Firm B $0.1600$ ($n = 17{,}127$); Firm C $0.1635$ ($n = 19{,}501$); Firm D $0.0863$ ($n = 8{,}379$). The Firm C denominator $n = 19{,}501$ exceeds Table XVI's single-firm Firm C count of $19{,}122$ by exactly the $379$ mixed-firm PDFs: all $379$ are $1{:}1$ Firm C / Firm D mixed-firm documents, and Script 45's mode-of-firms implementation (`np.argmax` over `np.unique`'s alphabetically-sorted firm counts) returns the first-sorted firm on ties, which assigns these tied documents to Firm C rather than to Firm D. The four per-firm denominators here therefore sum to the full $75{,}233$, whereas Table XVI's per-firm rows sum to $74{,}854 = 75{,}233 - 379$.
Per-firm D2 document-level ICCR: Firm A $0.6201$ ($n = 30{,}226$); Firm B $0.1600$ ($n = 17{,}127$); Firm C $0.1635$ ($n = 19{,}501$); Firm D $0.0863$ ($n = 8{,}379$). The Firm C denominator $n = 19{,}501$ exceeds Table XVI's single-firm Firm C count of $19{,}122$ by exactly the $379$ mixed-firm PDFs (all $379$ are $1{:}1$ Firm C / Firm D documents that an alphabetically-ordered tie-break assigns to Firm C; full implementation detail in the supplementary materials). The four per-firm denominators here therefore sum to the full $75{,}233$, whereas Table XVI's per-firm rows sum to $74{,}854 = 75{,}233 - 379$.
### M.5 Firm heterogeneity logistic regression and cross-firm hit matrix (Script 44)