# Paper A v4.0 — Narrative-Thread Audit Auditor: Claude Opus 4.7 (1M context) Date: 2026-05-14 Target: paper/v4/paper_a_prose_v4_phase4.md + paper/v4/paper_a_methodology_v4_section_iii.md + paper/v4/paper_a_results_v4_section_iv.md (post round-5, commit 128a914) Purpose: Coherence check across Abstract / §I / §III / §IV / §V / §VI as a single argument, after Phase 5 AI peer-review panel convergence (3/3 in Accept/Minor band). ## Headline assessment **Mostly Coherent — submission-ready after 2-3 small narrative-consistency patches.** The v4 story arc — *"v3.20.0 distributional path turns out to be composition + integer artefact; v4.0 replaces it with anchor-based ICCR + decisive firm heterogeneity; positioning is anchor-calibrated specificity-only screening, not validated detector"* — reads cleanly from Abstract through §VI. The 5 fix rounds + 3 reviewer panels have substantially closed the major framing, terminology, and provenance risks. What remains is narrow narrative-consistency residue between Phase 4 §I/§V prose and §III source-of-truth (3 specific items, all small), plus 1 interpretive caveat that would strengthen §V-H limitation 2. No empirical reruns required. No structural rewrites required. Submission-readiness gate: **conditionally pass** — recommend a 15-minute round-6 prose-consistency patch before manuscript-splice, then the manuscript is splice-ready. ## 1. Abstract → body mirror audit The Abstract (Phase 4 line 11, 247 words) makes ~12 distinct claims. Each maps cleanly to a body location: | # | Abstract claim | §I location | §III/§IV body location | Status | |---:|---|---|---|---| | 1 | Non-hand-signed detection problem (regulation + digitization) | §I paras 1–3 (lines 19–21) | — (problem framing) | **Aligned** | | 2 | Pipeline: VLM + YOLOv11 + ResNet-50 + dual-descriptor | §I para 6 (line 29) + contribution 2 | §III-A..F (inherited) + §III-F dual-descriptor | **Aligned** | | 3 | 90,282 reports / 182,328 sigs / 758 CPAs | §I para 8 (line 39) | §IV-A..C (inherited) | **Aligned** | | 4 | Big-4 sub-corpus: 437 CPAs / 150,442 sigs | §I para 8 (line 39) | §III-G (line 19), §IV-D (line 9, line 15) | **Aligned** | | 5 | Composition decomposition $p_{\text{median}} = 0.35$ | §I para 5 (line 31) + contribution 4 | §III-I.4 (lines 55–73), §IV-M.1 Table XX (line 266) | **Aligned** | | 6 | Per-comparison ICCRs 0.0006 / 0.0013 / 0.00014 | §I para 6 (line 33) + contribution 5 | §III-L.1 (line 196), §IV-M.2 Table XXI (line 280) | **Aligned** | | 7 | Per-signature ICCR 0.11 | §I para 6 (line 33) | §III-L.2 (line 208), §IV-M.3 Table XXII (line 300) | **Aligned** | | 8 | Per-document ICCR 0.34 (HC+MC) | §I para 6 (line 33) | §III-L.3 (line 233), §IV-M.4 Table XXIII (line 317) | **Aligned** | | 9 | Firm heterogeneity: Firm A 0.62 vs B/C/D 0.09–0.16 | §I para 7 (line 35) + contribution 6 | §III-L.4 (line 259), §IV-M.4 (line 325) | **Aligned** | | 10 | Within-firm 77–99% (any-pair) | §I para 7 (line 35) + contribution 6 | §III-L.4 (line 283), §IV-M.5 Table XXV (line 340) | **Aligned** | | 11 | "Specificity-proxy-anchored screening + HITL, not validated detector" positioning | §I contribution 8 (line 57) | §III-M Table XXVII (line 316) + §V-G/H + §VI item 8 | **Aligned** | | 12 | "No calibrated error rates without ground truth" disclaimer | §I para 5 item (v) (line 25) | §III-M (line 312), §V-H limit 1 (line 113) | **Aligned** | Observation: Abstract does NOT mention the three-score Spearman convergence ($\rho \geq 0.879$). This is by design — the v4 pivot demoted three-score from a headline finding to "internal consistency" because the scores share inputs. §I contribution 7 and §V-E retain it with the demoted caveat. **No action needed.** ## 2. §I contributions (8) → body implementation map | # | §I contribution | §III/§IV implementation | §V/§VI loop-back | Status | |---:|---|---|---|---| | 1 | Problem formulation | — (§V-A discusses) | §V-A (line 73), §VI implicit | **Aligned** | | 2 | End-to-end pipeline | §III-A..F (inherited) | §VI line 145 | **Aligned** | | 3 | Dual-descriptor verification | §III-F + §IV-L backbone ablation | §V-A/B implicit | **Aligned** | | 4 | Composition decomposition | §III-I.4 + §IV-M.1 Table XX | §V-B (line 81), §VI item 1 (line 147) | **Aligned** | | 5 | Anchor-based multi-level ICCR | §III-L + §IV-M.2/M.3/M.4 Tables XXI/XXII/XXIII | §V-F (line 99), §VI item 2 | **Aligned** | | 6 | Firm heterogeneity + within-firm collision | §III-L.4 + §IV-M.4/M.5 Tables XXIV/XXV | §V-C (line 83), §VI items 3+4 | **Aligned** | | 7 | K=3 descriptive + three-score convergence | §III-J + §III-K.1 + §IV-E/F/G | §V-D/E (lines 89–97), §VI items 5+6 | **Aligned** | | 8 | Annotation-free positive-anchor + ten-tool ceiling | §III-K.4 + §III-M Table XXVII + §IV-H Table XIV | §V-G (line 105), §VI items 7+8 | **Aligned** | All 8 contributions trace cleanly through §III/§IV implementation and §V/§VI loop-back. **No action needed.** ## 3. v3→v4 pivot rhetoric thread The v4 pivot has four narrative nodes; each must reinforce the others: | Node | Location | Says | |---|---|---| | **Setup**: v3.x distributional path | §I para 5 line 31 (Phase 4 prose) | "Earlier work...adopted a distributional path...v4.0 reports a composition decomposition diagnostic that overturns this reading" | | **Proof**: 2×2 factorial composition decomposition | §III-I.4 Scripts 39b–39e (lines 55–73) | Joint firm-mean centring + integer-tie jitter eliminates rejection ($p_{\text{median}} = 0.35$) | | **Alternative**: anchor-based ICCR | §III-L (line 173+) | Replaces distributional thresholds with inter-CPA coincidence-rate calibration at 3 units | | **Discussion**: K=3 stays descriptive | §V-B + §V-D (lines 77–93) | Mixture fits are firm-compositional partitions, not mechanism modes | | **Conclusion**: pivot summary | §VI items 1+5 (line 147) | Demotes K=3 mechanism reading; positions ICCR as the operational calibration | All five nodes use consistent language ("composition + integer artefact"; "descriptive firm-compositional partition"; "no within-population bimodal antimode"). **No action needed.** ## 4. K=3 demotion consistency Verified across 4 locations using consistent descriptor-position language (post round-2 M1 fix): - §III-J line 90 (source of truth): "The 'descriptive position' column replaces v3.x's 'hand-leaning / mixed / replicated' mechanism labels" - §I contribution 7 (line 55): "K=3 mixture demoted from 'three mechanism clusters' to a descriptive firm-compositional partition" - §V-D (line 93): "the K=3 stability supports a descriptive reading...*not* a three-mechanism latent-class structure" - §VI item 5 (line 147): same demotion language - §IV Tables XVI/XVII column headers: "C1 (low-cos / high-dHash) | C2 (central) | C3 (high-cos / low-dHash)" — descriptor-position labels throughout `grep -n "hand-leaning"` in v4 public prose: 0 hits (only internal-strip text). **Closed.** ## 5. ICCR vs FAR terminology consistency Verified across all rate-reporting locations (post round-2 + round-5): - Abstract: "inter-CPA coincidence-rate (ICCR)" ✓ - §I contribution 5 (line 51): explicit terminology adoption and FAR disclaimer ✓ - §III-L.1 (line 185): "Terminological note on 'FAR'" with full disclaimer ✓ - §IV-I (line 159): historical "FAR" cited only with the "v3.x terminology" caveat ✓ - §V-G heading (line 105): "Inherited Inter-CPA Negative Anchor Reframed as Coincidence Rate" ✓ - §V-H limitations: specificity-proxy framing under partially-violated assumption ✓ - §VI item 2 (line 147): "explicit terminological replacement of 'FAR' by 'ICCR' given the unsupervised setting" ✓ No public-prose "FAR" leak outside the historical-context caveats. **Closed.** ## 6. Numbers consistency audit (cross-section) Headline numbers cross-referenced between Abstract / §I / §III / §IV: | Claim | Abstract | §I body | §III source | §IV table | Status | |---|---|---|---|---|---| | Per-comparison ICCR cos $>0.95$ | 0.0006 | 0.0006 | 0.00060 | 0.00060 Table XXI | **Match** (rounding consistent) | | Per-comparison ICCR dHash $\leq 5$ | 0.0013 | 0.0013 | 0.00129 | 0.00129 Table XXI | **Match** | | Per-comparison joint | 0.00014 | 0.00014 | 0.00014 | 0.00014 Table XXI | **Match** | | Per-signature ICCR | 0.11 | 0.11 | 0.1102 (Wilson) | 0.1102 Table XXII | **Match** | | Per-document ICCR (HC+MC) | 0.34 | 0.34 | 0.3375 | 0.3375 Table XXIII | **Match** | | Firm A doc HC+MC | 0.62 | 0.62 | 0.6201 | 0.6201 §IV-M.4 line 325 | **Match** | | Firms B/C/D doc HC+MC | 0.09–0.16 | 0.09–0.16 | 0.1600 / 0.1635 / 0.0863 | same | **Match** | | Within-firm any-pair | 77–99% (rounded) | 76.7–83.7% / 98.8% | same | Table XXV | **Match** | | Same-pair within-firm | — | 97.0–99.96% | 99.96 / 97.7 / 98.2 / 97.0 | line 349 | **Match** | | Composition $p_{\text{median}}$ | 0.35 | 0.35 | 0.35 | 0.35 Table XX | **Match** | | Logistic OR | — | 0.053 / 0.010 / 0.027 | same | Table XXIV | **Match** | | Spearman ρ floor | — | 0.879 | 0.879 | 0.8794 Table IX | **Match** (Spearman precision §III/§IV differ at 4th decimal — see §8) | All headline numbers reconcile across sections. **Closed.** ## 7. Limitations vs §III-M Table XXVII coverage §V-H lists 14 limitations (9 v4-specific + 5 inherited from v3.20.0). Each Table XXVII assumption should be covered by an explicit §V-H item OR be self-evidently descriptive: | Table XXVII tool | Assumption | §V-H coverage | |---|---|---| | Composition decomposition | Jitter unbiased; Big-4 jittered + centred + jittered evidence | Implicit — covered by general "no signature-level ground truth" frame | | Per-comparison ICCR | Inter-CPA pairs are negative anchor (partially violated) | §V-H limit 2 (explicit) | | Per-signature ICCR | Same + pool replacement preserves negative-anchor property | §V-H limit 2 (implicit via "specificity-proxy rates under partially-violated assumption") | | Per-document ICCR | Same | Same | | Firm-heterogeneity logistic | Cluster-robust SE not run | **Gap** — no §V-H item explicitly flags the naive-SE caveat | | Cross-firm hit matrix | Deployed-rule semantics + mode-of-firms tie-break | §V-H limit 2 | | Alert-rate sensitivity | Descriptive gradient, not formal plateau | §V-H limit 5 (line 121, "alert-rate sensitivity analysis characterises only the HC threshold") | | Three-score Spearman | Scores share inputs | §V-H limit 6 (line 123, deployed-rate-excess interpretation) — partial; not the score-independence caveat directly | | Pixel-identical positive capture | Tautological (byte-identical ⇒ in HC region) | §V-H limit 4 (line 119, "pixel-identity is a conservative subset") | | LOOO firm-level reproducibility | Stability ≠ classification validity; K=3 membership ±12.8 pp | §V-H limit 8 (line 127, "K=3 hard-posterior membership is composition-sensitive") | **Gap 1**: §V-H does not explicitly flag the logistic-regression naive-SE caveat that Table XXVII row 5 discloses. Worth adding a half-sentence to §V-H or letting Table XXVII carry the disclosure since it's already in print. **Gap 2** (Opus N5 from round-2 audit): §V-H limit 2 discloses the firm-dependent within-firm violation numerically (98.8% at A; 76.7–83.7% at B/C/D) but does not interpret what this means for proxy reliability — namely that Firm A's per-firm ICCR is MORE contaminated by within-firm sharing than B/C/D's, so the per-firm B/C/D rates are closer to clean specificity. This nuance affects interpretation of the headline "firm heterogeneity is decisive" framing. ## 8. Net-new narrative concerns (audit-surfaced) ### Concern A — Phase 4 §I body line 31 cites "Script 39c" for jittered-dHash claim **Issue.** Phase 4 prose line 31 says: > "Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual mid/small firm with $\geq 500$ signatures (10 firms tested in Script 39c)." Codex round-9 verified that Script 39c on RAW dHash actually REJECTS unimodality in all 10 firms; only the JITTERED variant (codex's read-only spike on the Script 39c substrate) fails to reject. Round-5 corrected §III line 59 + provenance table line 382 to cite the spike attribution, but Phase 4 §I line 31's bare "Script 39c" citation now reads less precisely than the source-of-truth at §III line 59. **Severity.** Low. The qualitative claim ("fail to reject in 10 mid/small firms") is correct per codex's own rerun. The provenance attribution is what's slightly off. **Recommended fix.** Update Phase 4 line 31 to match §III line 59: "...in every individual mid/small firm with $\geq 500$ signatures (10 firms tested; cosine: Script 39c per-firm; jittered-dHash: codex-verified read-only spike on Script 39c substrate)." Or simpler: drop the parenthetical "in Script 39c" and let §III carry the precise provenance. ### Concern B — Phase 4 §V-B line 81 carries the same jittered-dHash claim without provenance **Issue.** §V-B (line 81) says: > "Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual non-Big-4 firm with $\geq 500$ signatures (10 firms tested)." No script citation here; the bare "10 firms tested" is technically OK but less precise than §III line 59 after round-5. **Severity.** Low. **Recommended fix.** Add a §III cross-reference: "...10 firms tested; see §III-I.4 / §III provenance table line for the codex-verified read-only spike." Or leave as-is and let §III line 59 carry the detailed provenance. ### Concern C — §III-K.4 line 149 stale cross-reference to v3.x §IV-I **Issue.** §III-K item 4 line 149 says: > "The corresponding signature-level inter-CPA negative-anchor ICCR evidence is developed in §III-L.1 (Big-4 sample) and the v3.x §IV-I corpus-wide version (reported under prior 'FAR' terminology)" After v4 §IV-I was shrunk to a 3-paragraph reframing stub (post round-3), the phrase "v3.x §IV-I corpus-wide version" is misleading — v4 §IV-I now exists, just as a pointer. Opus round-2 N6 flagged this; codex round-9 did not. **Severity.** Cosmetic. **Recommended fix.** Update line 149 to "§III-L.1 (Big-4 v4 sample) and the inherited corpus-wide v3.x version cited at §IV-I (reported under prior 'FAR' terminology)". ### Concern D — Spearman precision mismatch §III vs §IV **Issue.** §III-K.1 line 123–127 reports Spearman ρ as 0.963 / 0.889 / 0.879 (3 decimal places); §IV-F Table IX line 81–87 reports 0.9627 / 0.8890 / 0.8794 (4 decimal places). Codex round-8 flagged this as OPEN / COPY-EDIT. **Severity.** Cosmetic. **Recommended fix.** Standardise on 4 decimal places (matches Script 38 reported precision) in §III + §IV + §V-E + §VI. ## 9. Splice-readiness gate | Item | Status | Notes | |---|---|---| | Abstract word count | ✓ | 247 / 250 | | §I contributions count | ✓ | 8 contributions, all map to body | | §II LOOO addition with refs [42]-[44] | ✓ | Present (post round-1) | | §III sub-sections G..M complete | ✓ | Including Table XXVII numbered | | §IV table sequence V–XXVI sequential | ✓ | Post round-2 cascade | | §V sub-sections A..H complete | ✓ | Post round-2 M4 fix (G→H) | | §VI items 1..8 map to §I 1..8 | ✓ | 1:1 mapping verified | | References [1]–[44] present | ✓ | 44 entries; [42]-[44] = Stone 1974 / Geisser 1975 / Vehtari 2017 | | Internal draft notes stripped | ✗ | **Splice-time mechanical** — Phase 4 line 3 + lines 153-162; §III line 3 + lines 434-447; §IV line 3 + line 365+ | | "Nine-tool" / "Table XV-B" residue | ✗ | **Splice-time mechanical** — only in internal-strip text | | Cross-section number consistency | ✓ | All headline numbers match across Abstract / §I / §III / §IV | | Terminology consistency (ICCR / K=3 / less-replication-dominated) | ✓ | No public-prose leaks | | IEEE Access format | ✓ | Abstract single-paragraph ≤250 words; numbered references; numbered tables | ## 10. Recommended round-6 narrative-consistency patch (15 min, optional) Before manuscript-splice, three small text patches would close the audit-surfaced concerns: 1. **Concern A**: Phase 4 line 31 — narrow the "Script 39c" provenance attribution for the jittered-dHash claim to match §III line 59. 2. **Concern C**: §III line 149 — update the "v3.x §IV-I corpus-wide version" wording to reflect v4 §IV-I's reframing-stub status. 3. **Concern D**: Standardise Spearman precision to 4 decimal places across §III/§IV/§V. Optional: - **Gap 2 / Opus N5**: Add a half-sentence to §V-H limit 2 interpreting the firm-dependent within-firm violation as "Firm A's per-firm ICCR is more contaminated by within-firm sharing than Firms B/C/D's, so the per-firm B/C/D rates are closer to clean specificity than the pooled rate." None of these is empirical or structural. They are prose-level consistency polishings. **Submission can proceed without them**, but they would strengthen reviewer-pass robustness. ## 11. Submission-readiness verdict **Conditionally ready.** The empirical core is sound and reproducible. The Phase 5 panel converged 3/3 in Accept/Minor band. Five fix rounds have closed every reviewer-flagged finding. Numbers are consistent across sections. Terminology is consistent. The v4 pivot's narrative thread reads as a coherent argument from Abstract through Conclusion. **Path A (ship now)**: proceed directly to manuscript-splice → DOCX export → partner Jimmy review → submission. Risk: the four audit-surfaced concerns above may be flagged by external reviewers as small cosmetic issues. Cost: zero pre-submission work. **Path B (15-min polish first)**: apply round-6 patches for the four concerns above → re-verify with `rg`-based grep → manuscript-splice → DOCX → partner. Risk: zero. Cost: 15 min. **Recommendation**: Path B. The four concerns are small, narrative-consistency only, and fixing them avoids putting cross-section attribution inconsistencies in front of partner Jimmy / IEEE Access reviewers. After Path B (or A): proceed to manuscript-splice as the next mechanical step.