pdf_signature_extraction/paper/narrative_audit_v4.md

# Paper A v4.0 — Narrative-Thread Audit

Auditor: Claude Opus 4.7 (1M context)
Date: 2026-05-14
Target: paper/v4/paper_a_prose_v4_phase4.md + paper/v4/paper_a_methodology_v4_section_iii.md + paper/v4/paper_a_results_v4_section_iv.md (post round-5, commit 128a914)
Purpose: Coherence check across Abstract / §I / §III / §IV / §V / §VI as a single argument, after Phase 5 AI peer-review panel convergence (3/3 in Accept/Minor band).

## Headline assessment

**Mostly Coherent — submission-ready after 2-3 small narrative-consistency patches.**

The v4 story arc — *"v3.20.0 distributional path turns out to be composition + integer artefact; v4.0 replaces it with anchor-based ICCR + decisive firm heterogeneity; positioning is anchor-calibrated specificity-only screening, not validated detector"* — reads cleanly from Abstract through §VI. The 5 fix rounds + 3 reviewer panels have substantially closed the major framing, terminology, and provenance risks. What remains is narrow narrative-consistency residue between Phase 4 §I/§V prose and §III source-of-truth (3 specific items, all small), plus 1 interpretive caveat that would strengthen §V-H limitation 2.

No empirical reruns required. No structural rewrites required. Submission-readiness gate: **conditionally pass** — recommend a 15-minute round-6 prose-consistency patch before manuscript-splice, then the manuscript is splice-ready.

## 1. Abstract → body mirror audit

The Abstract (Phase 4 line 11, 247 words) makes ~12 distinct claims. Each maps cleanly to a body location:

| # | Abstract claim | §I location | §III/§IV body location | Status |
|---:|---|---|---|---|
| 1 | Non-hand-signed detection problem (regulation + digitization) | §I paras 1–3 (lines 19–21) | — (problem framing) | **Aligned** |
| 2 | Pipeline: VLM + YOLOv11 + ResNet-50 + dual-descriptor | §I para 6 (line 29) + contribution 2 | §III-A..F (inherited) + §III-F dual-descriptor | **Aligned** |
| 3 | 90,282 reports / 182,328 sigs / 758 CPAs | §I para 8 (line 39) | §IV-A..C (inherited) | **Aligned** |
| 4 | Big-4 sub-corpus: 437 CPAs / 150,442 sigs | §I para 8 (line 39) | §III-G (line 19), §IV-D (line 9, line 15) | **Aligned** |
| 5 | Composition decomposition $p_{\text{median}} = 0.35$ | §I para 5 (line 31) + contribution 4 | §III-I.4 (lines 55–73), §IV-M.1 Table XX (line 266) | **Aligned** |
| 6 | Per-comparison ICCRs 0.0006 / 0.0013 / 0.00014 | §I para 6 (line 33) + contribution 5 | §III-L.1 (line 196), §IV-M.2 Table XXI (line 280) | **Aligned** |
| 7 | Per-signature ICCR 0.11 | §I para 6 (line 33) | §III-L.2 (line 208), §IV-M.3 Table XXII (line 300) | **Aligned** |
| 8 | Per-document ICCR 0.34 (HC+MC) | §I para 6 (line 33) | §III-L.3 (line 233), §IV-M.4 Table XXIII (line 317) | **Aligned** |
| 9 | Firm heterogeneity: Firm A 0.62 vs B/C/D 0.09–0.16 | §I para 7 (line 35) + contribution 6 | §III-L.4 (line 259), §IV-M.4 (line 325) | **Aligned** |
| 10 | Within-firm 77–99% (any-pair) | §I para 7 (line 35) + contribution 6 | §III-L.4 (line 283), §IV-M.5 Table XXV (line 340) | **Aligned** |
| 11 | "Specificity-proxy-anchored screening + HITL, not validated detector" positioning | §I contribution 8 (line 57) | §III-M Table XXVII (line 316) + §V-G/H + §VI item 8 | **Aligned** |
| 12 | "No calibrated error rates without ground truth" disclaimer | §I para 5 item (v) (line 25) | §III-M (line 312), §V-H limit 1 (line 113) | **Aligned** |

Observation: Abstract does NOT mention the three-score Spearman convergence ($\rho \geq 0.879$). This is by design — the v4 pivot demoted three-score from a headline finding to "internal consistency" because the scores share inputs. §I contribution 7 and §V-E retain it with the demoted caveat. **No action needed.**

## 2. §I contributions (8) → body implementation map

| # | §I contribution | §III/§IV implementation | §V/§VI loop-back | Status |
|---:|---|---|---|---|
| 1 | Problem formulation | — (§V-A discusses) | §V-A (line 73), §VI implicit | **Aligned** |
| 2 | End-to-end pipeline | §III-A..F (inherited) | §VI line 145 | **Aligned** |
| 3 | Dual-descriptor verification | §III-F + §IV-L backbone ablation | §V-A/B implicit | **Aligned** |
| 4 | Composition decomposition | §III-I.4 + §IV-M.1 Table XX | §V-B (line 81), §VI item 1 (line 147) | **Aligned** |
| 5 | Anchor-based multi-level ICCR | §III-L + §IV-M.2/M.3/M.4 Tables XXI/XXII/XXIII | §V-F (line 99), §VI item 2 | **Aligned** |
| 6 | Firm heterogeneity + within-firm collision | §III-L.4 + §IV-M.4/M.5 Tables XXIV/XXV | §V-C (line 83), §VI items 3+4 | **Aligned** |
| 7 | K=3 descriptive + three-score convergence | §III-J + §III-K.1 + §IV-E/F/G | §V-D/E (lines 89–97), §VI items 5+6 | **Aligned** |
| 8 | Annotation-free positive-anchor + ten-tool ceiling | §III-K.4 + §III-M Table XXVII + §IV-H Table XIV | §V-G (line 105), §VI items 7+8 | **Aligned** |

All 8 contributions trace cleanly through §III/§IV implementation and §V/§VI loop-back. **No action needed.**

## 3. v3→v4 pivot rhetoric thread

The v4 pivot has four narrative nodes; each must reinforce the others:

| Node | Location | Says |
|---|---|---|
| **Setup**: v3.x distributional path | §I para 5 line 31 (Phase 4 prose) | "Earlier work...adopted a distributional path...v4.0 reports a composition decomposition diagnostic that overturns this reading" |
| **Proof**: 2×2 factorial composition decomposition | §III-I.4 Scripts 39b–39e (lines 55–73) | Joint firm-mean centring + integer-tie jitter eliminates rejection ($p_{\text{median}} = 0.35$) |
| **Alternative**: anchor-based ICCR | §III-L (line 173+) | Replaces distributional thresholds with inter-CPA coincidence-rate calibration at 3 units |
| **Discussion**: K=3 stays descriptive | §V-B + §V-D (lines 77–93) | Mixture fits are firm-compositional partitions, not mechanism modes |
| **Conclusion**: pivot summary | §VI items 1+5 (line 147) | Demotes K=3 mechanism reading; positions ICCR as the operational calibration |

All five nodes use consistent language ("composition + integer artefact"; "descriptive firm-compositional partition"; "no within-population bimodal antimode"). **No action needed.**

## 4. K=3 demotion consistency

Verified across 4 locations using consistent descriptor-position language (post round-2 M1 fix):

- §III-J line 90 (source of truth): "The 'descriptive position' column replaces v3.x's 'hand-leaning / mixed / replicated' mechanism labels"
- §I contribution 7 (line 55): "K=3 mixture demoted from 'three mechanism clusters' to a descriptive firm-compositional partition"
- §V-D (line 93): "the K=3 stability supports a descriptive reading...*not* a three-mechanism latent-class structure"
- §VI item 5 (line 147): same demotion language
- §IV Tables XVI/XVII column headers: "C1 (low-cos / high-dHash) | C2 (central) | C3 (high-cos / low-dHash)" — descriptor-position labels throughout

`grep -n "hand-leaning"` in v4 public prose: 0 hits (only internal-strip text). **Closed.**

## 5. ICCR vs FAR terminology consistency

Verified across all rate-reporting locations (post round-2 + round-5):

- Abstract: "inter-CPA coincidence-rate (ICCR)" ✓
- §I contribution 5 (line 51): explicit terminology adoption and FAR disclaimer ✓
- §III-L.1 (line 185): "Terminological note on 'FAR'" with full disclaimer ✓
- §IV-I (line 159): historical "FAR" cited only with the "v3.x terminology" caveat ✓
- §V-G heading (line 105): "Inherited Inter-CPA Negative Anchor Reframed as Coincidence Rate" ✓
- §V-H limitations: specificity-proxy framing under partially-violated assumption ✓
- §VI item 2 (line 147): "explicit terminological replacement of 'FAR' by 'ICCR' given the unsupervised setting" ✓

No public-prose "FAR" leak outside the historical-context caveats. **Closed.**

## 6. Numbers consistency audit (cross-section)

Headline numbers cross-referenced between Abstract / §I / §III / §IV:

| Claim | Abstract | §I body | §III source | §IV table | Status |
|---|---|---|---|---|---|
| Per-comparison ICCR cos $>0.95$ | 0.0006 | 0.0006 | 0.00060 | 0.00060 Table XXI | **Match** (rounding consistent) |
| Per-comparison ICCR dHash $\leq 5$ | 0.0013 | 0.0013 | 0.00129 | 0.00129 Table XXI | **Match** |
| Per-comparison joint | 0.00014 | 0.00014 | 0.00014 | 0.00014 Table XXI | **Match** |
| Per-signature ICCR | 0.11 | 0.11 | 0.1102 (Wilson) | 0.1102 Table XXII | **Match** |
| Per-document ICCR (HC+MC) | 0.34 | 0.34 | 0.3375 | 0.3375 Table XXIII | **Match** |
| Firm A doc HC+MC | 0.62 | 0.62 | 0.6201 | 0.6201 §IV-M.4 line 325 | **Match** |
| Firms B/C/D doc HC+MC | 0.09–0.16 | 0.09–0.16 | 0.1600 / 0.1635 / 0.0863 | same | **Match** |
| Within-firm any-pair | 77–99% (rounded) | 76.7–83.7% / 98.8% | same | Table XXV | **Match** |
| Same-pair within-firm | — | 97.0–99.96% | 99.96 / 97.7 / 98.2 / 97.0 | line 349 | **Match** |
| Composition $p_{\text{median}}$ | 0.35 | 0.35 | 0.35 | 0.35 Table XX | **Match** |
| Logistic OR | — | 0.053 / 0.010 / 0.027 | same | Table XXIV | **Match** |
| Spearman ρ floor | — | 0.879 | 0.879 | 0.8794 Table IX | **Match** (Spearman precision §III/§IV differ at 4th decimal — see §8) |

All headline numbers reconcile across sections. **Closed.**

## 7. Limitations vs §III-M Table XXVII coverage

§V-H lists 14 limitations (9 v4-specific + 5 inherited from v3.20.0). Each Table XXVII assumption should be covered by an explicit §V-H item OR be self-evidently descriptive:

| Table XXVII tool | Assumption | §V-H coverage |
|---|---|---|
| Composition decomposition | Jitter unbiased; Big-4 jittered + centred + jittered evidence | Implicit — covered by general "no signature-level ground truth" frame |
| Per-comparison ICCR | Inter-CPA pairs are negative anchor (partially violated) | §V-H limit 2 (explicit) |
| Per-signature ICCR | Same + pool replacement preserves negative-anchor property | §V-H limit 2 (implicit via "specificity-proxy rates under partially-violated assumption") |
| Per-document ICCR | Same | Same |
| Firm-heterogeneity logistic | Cluster-robust SE not run | **Gap** — no §V-H item explicitly flags the naive-SE caveat |
| Cross-firm hit matrix | Deployed-rule semantics + mode-of-firms tie-break | §V-H limit 2 |
| Alert-rate sensitivity | Descriptive gradient, not formal plateau | §V-H limit 5 (line 121, "alert-rate sensitivity analysis characterises only the HC threshold") |
| Three-score Spearman | Scores share inputs | §V-H limit 6 (line 123, deployed-rate-excess interpretation) — partial; not the score-independence caveat directly |
| Pixel-identical positive capture | Tautological (byte-identical ⇒ in HC region) | §V-H limit 4 (line 119, "pixel-identity is a conservative subset") |
| LOOO firm-level reproducibility | Stability ≠ classification validity; K=3 membership ±12.8 pp | §V-H limit 8 (line 127, "K=3 hard-posterior membership is composition-sensitive") |

**Gap 1**: §V-H does not explicitly flag the logistic-regression naive-SE caveat that Table XXVII row 5 discloses. Worth adding a half-sentence to §V-H or letting Table XXVII carry the disclosure since it's already in print.

**Gap 2** (Opus N5 from round-2 audit): §V-H limit 2 discloses the firm-dependent within-firm violation numerically (98.8% at A; 76.7–83.7% at B/C/D) but does not interpret what this means for proxy reliability — namely that Firm A's per-firm ICCR is MORE contaminated by within-firm sharing than B/C/D's, so the per-firm B/C/D rates are closer to clean specificity. This nuance affects interpretation of the headline "firm heterogeneity is decisive" framing.

## 8. Net-new narrative concerns (audit-surfaced)

### Concern A — Phase 4 §I body line 31 cites "Script 39c" for jittered-dHash claim

**Issue.** Phase 4 prose line 31 says:
> "Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual mid/small firm with $\geq 500$ signatures (10 firms tested in Script 39c)."

Codex round-9 verified that Script 39c on RAW dHash actually REJECTS unimodality in all 10 firms; only the JITTERED variant (codex's read-only spike on the Script 39c substrate) fails to reject. Round-5 corrected §III line 59 + provenance table line 382 to cite the spike attribution, but Phase 4 §I line 31's bare "Script 39c" citation now reads less precisely than the source-of-truth at §III line 59.

**Severity.** Low. The qualitative claim ("fail to reject in 10 mid/small firms") is correct per codex's own rerun. The provenance attribution is what's slightly off.

**Recommended fix.** Update Phase 4 line 31 to match §III line 59: "...in every individual mid/small firm with $\geq 500$ signatures (10 firms tested; cosine: Script 39c per-firm; jittered-dHash: codex-verified read-only spike on Script 39c substrate)." Or simpler: drop the parenthetical "in Script 39c" and let §III carry the precise provenance.

### Concern B — Phase 4 §V-B line 81 carries the same jittered-dHash claim without provenance

**Issue.** §V-B (line 81) says:
> "Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual non-Big-4 firm with $\geq 500$ signatures (10 firms tested)."

No script citation here; the bare "10 firms tested" is technically OK but less precise than §III line 59 after round-5.

**Severity.** Low.

**Recommended fix.** Add a §III cross-reference: "...10 firms tested; see §III-I.4 / §III provenance table line for the codex-verified read-only spike." Or leave as-is and let §III line 59 carry the detailed provenance.

### Concern C — §III-K.4 line 149 stale cross-reference to v3.x §IV-I

**Issue.** §III-K item 4 line 149 says:
> "The corresponding signature-level inter-CPA negative-anchor ICCR evidence is developed in §III-L.1 (Big-4 sample) and the v3.x §IV-I corpus-wide version (reported under prior 'FAR' terminology)"

After v4 §IV-I was shrunk to a 3-paragraph reframing stub (post round-3), the phrase "v3.x §IV-I corpus-wide version" is misleading — v4 §IV-I now exists, just as a pointer. Opus round-2 N6 flagged this; codex round-9 did not.

**Severity.** Cosmetic.

**Recommended fix.** Update line 149 to "§III-L.1 (Big-4 v4 sample) and the inherited corpus-wide v3.x version cited at §IV-I (reported under prior 'FAR' terminology)".

### Concern D — Spearman precision mismatch §III vs §IV

**Issue.** §III-K.1 line 123–127 reports Spearman ρ as 0.963 / 0.889 / 0.879 (3 decimal places); §IV-F Table IX line 81–87 reports 0.9627 / 0.8890 / 0.8794 (4 decimal places). Codex round-8 flagged this as OPEN / COPY-EDIT.

**Severity.** Cosmetic.

**Recommended fix.** Standardise on 4 decimal places (matches Script 38 reported precision) in §III + §IV + §V-E + §VI.

## 9. Splice-readiness gate

| Item | Status | Notes |
|---|---|---|
| Abstract word count | ✓ | 247 / 250 |
| §I contributions count | ✓ | 8 contributions, all map to body |
| §II LOOO addition with refs [42]-[44] | ✓ | Present (post round-1) |
| §III sub-sections G..M complete | ✓ | Including Table XXVII numbered |
| §IV table sequence V–XXVI sequential | ✓ | Post round-2 cascade |
| §V sub-sections A..H complete | ✓ | Post round-2 M4 fix (G→H) |
| §VI items 1..8 map to §I 1..8 | ✓ | 1:1 mapping verified |
| References [1]–[44] present | ✓ | 44 entries; [42]-[44] = Stone 1974 / Geisser 1975 / Vehtari 2017 |
| Internal draft notes stripped | ✗ | **Splice-time mechanical** — Phase 4 line 3 + lines 153-162; §III line 3 + lines 434-447; §IV line 3 + line 365+ |
| "Nine-tool" / "Table XV-B" residue | ✗ | **Splice-time mechanical** — only in internal-strip text |
| Cross-section number consistency | ✓ | All headline numbers match across Abstract / §I / §III / §IV |
| Terminology consistency (ICCR / K=3 / less-replication-dominated) | ✓ | No public-prose leaks |
| IEEE Access format | ✓ | Abstract single-paragraph ≤250 words; numbered references; numbered tables |

## 10. Recommended round-6 narrative-consistency patch (15 min, optional)

Before manuscript-splice, three small text patches would close the audit-surfaced concerns:

1. **Concern A**: Phase 4 line 31 — narrow the "Script 39c" provenance attribution for the jittered-dHash claim to match §III line 59.
2. **Concern C**: §III line 149 — update the "v3.x §IV-I corpus-wide version" wording to reflect v4 §IV-I's reframing-stub status.
3. **Concern D**: Standardise Spearman precision to 4 decimal places across §III/§IV/§V.

Optional:
- **Gap 2 / Opus N5**: Add a half-sentence to §V-H limit 2 interpreting the firm-dependent within-firm violation as "Firm A's per-firm ICCR is more contaminated by within-firm sharing than Firms B/C/D's, so the per-firm B/C/D rates are closer to clean specificity than the pooled rate."

None of these is empirical or structural. They are prose-level consistency polishings. **Submission can proceed without them**, but they would strengthen reviewer-pass robustness.

## 11. Submission-readiness verdict

**Conditionally ready.**

The empirical core is sound and reproducible. The Phase 5 panel converged 3/3 in Accept/Minor band. Five fix rounds have closed every reviewer-flagged finding. Numbers are consistent across sections. Terminology is consistent. The v4 pivot's narrative thread reads as a coherent argument from Abstract through Conclusion.

**Path A (ship now)**: proceed directly to manuscript-splice → DOCX export → partner Jimmy review → submission. Risk: the four audit-surfaced concerns above may be flagged by external reviewers as small cosmetic issues. Cost: zero pre-submission work.

**Path B (15-min polish first)**: apply round-6 patches for the four concerns above → re-verify with `rg`-based grep → manuscript-splice → DOCX → partner. Risk: zero. Cost: 15 min.

**Recommendation**: Path B. The four concerns are small, narrative-consistency only, and fixing them avoids putting cross-section attribution inconsistencies in front of partner Jimmy / IEEE Access reviewers.

After Path B (or A): proceed to manuscript-splice as the next mechanical step.