Files
pdf_signature_extraction/paper/narrative_audit_v4.md
T
gbanyan 8dddc3b87c Apply Phase 5 round-6 narrative-consistency patches + audit artifact
Closes the four audit-surfaced concerns from
paper/narrative_audit_v4.md plus the Opus round-2 N5 interpretive
caveat. All five are prose-level consistency polishings; no
empirical or structural changes.

Concern A (Phase 4 line 31 / §I body): "Script 39c" provenance for
the jittered-dHash claim was less precise than the §III line 59
source-of-truth which (post round-5) attributes the non-Big-4
jittered evidence to a codex-verified read-only spike. Updated §I
to: "cosine: Script 39c; jittered-dHash: Script 39d for Big-4
plus codex-verified read-only spike for ten non-Big-4 firms."

Concern B (Phase 4 line 81 / §V-B): same jittered-dHash claim
without precise provenance. Updated §V-B to match Concern A
attribution + §III-I.4 cross-reference.

Concern C (§III-K.4 line 149): cross-reference to "v3.x §IV-I
corpus-wide version" was stale after v4 §IV-I was shrunk to a
reframing stub. Updated to "§III-L.1 (Big-4 v4 sample) and the
inherited corpus-wide v3.x version cited at §IV-I".

Concern D (Spearman precision): standardized §III-K.1 table at
lines 125-127 to 4 decimal places (0.963/0.889/0.879 ->
0.9627/0.8890/0.8794), matching §IV-F Table IX. Prose floor
language "rho >= 0.879" preserved across Abstract/§I/§V/§VI
since 0.8794 still rounds to 0.879 at 3dp.

Opus N5 / §V-H limit 2 nuance: added a sentence interpreting the
firm-dependent within-firm violation - Firm A's per-firm ICCR is
more contaminated by within-firm sharing than B/C/D's, so the
B/C/D rates of 0.09-0.16 are closer to clean specificity, and the
Firm A vs B/C/D contrast reflects both genuine heterogeneity AND
a firm-dependent proxy-contamination gradient.

Audit artifact paper/narrative_audit_v4.md (~200 lines) captures
the full cross-section coherence check across Abstract / §I /
§III / §IV / §V / §VI:
- Abstract -> body mirror audit (12 claims, all aligned)
- §I 8 contributions -> §III/§IV/§V/§VI mapping (all aligned)
- v3->v4 pivot rhetoric thread (5 nodes, all aligned)
- K=3 demotion / ICCR-FAR / numbers consistency: all verified
- Splice-readiness gate: 10/12 pass + 2 splice-time mechanical

Headline assessment: "Mostly Coherent - submission-ready after
2-3 small patches" (now applied).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:22:22 +08:00

18 KiB
Raw Blame History

Paper A v4.0 — Narrative-Thread Audit

Auditor: Claude Opus 4.7 (1M context) Date: 2026-05-14 Target: paper/v4/paper_a_prose_v4_phase4.md + paper/v4/paper_a_methodology_v4_section_iii.md + paper/v4/paper_a_results_v4_section_iv.md (post round-5, commit 128a914) Purpose: Coherence check across Abstract / §I / §III / §IV / §V / §VI as a single argument, after Phase 5 AI peer-review panel convergence (3/3 in Accept/Minor band).

Headline assessment

Mostly Coherent — submission-ready after 2-3 small narrative-consistency patches.

The v4 story arc — "v3.20.0 distributional path turns out to be composition + integer artefact; v4.0 replaces it with anchor-based ICCR + decisive firm heterogeneity; positioning is anchor-calibrated specificity-only screening, not validated detector" — reads cleanly from Abstract through §VI. The 5 fix rounds + 3 reviewer panels have substantially closed the major framing, terminology, and provenance risks. What remains is narrow narrative-consistency residue between Phase 4 §I/§V prose and §III source-of-truth (3 specific items, all small), plus 1 interpretive caveat that would strengthen §V-H limitation 2.

No empirical reruns required. No structural rewrites required. Submission-readiness gate: conditionally pass — recommend a 15-minute round-6 prose-consistency patch before manuscript-splice, then the manuscript is splice-ready.

1. Abstract → body mirror audit

The Abstract (Phase 4 line 11, 247 words) makes ~12 distinct claims. Each maps cleanly to a body location:

# Abstract claim §I location §III/§IV body location Status
1 Non-hand-signed detection problem (regulation + digitization) §I paras 13 (lines 1921) — (problem framing) Aligned
2 Pipeline: VLM + YOLOv11 + ResNet-50 + dual-descriptor §I para 6 (line 29) + contribution 2 §III-A..F (inherited) + §III-F dual-descriptor Aligned
3 90,282 reports / 182,328 sigs / 758 CPAs §I para 8 (line 39) §IV-A..C (inherited) Aligned
4 Big-4 sub-corpus: 437 CPAs / 150,442 sigs §I para 8 (line 39) §III-G (line 19), §IV-D (line 9, line 15) Aligned
5 Composition decomposition p_{\text{median}} = 0.35 §I para 5 (line 31) + contribution 4 §III-I.4 (lines 5573), §IV-M.1 Table XX (line 266) Aligned
6 Per-comparison ICCRs 0.0006 / 0.0013 / 0.00014 §I para 6 (line 33) + contribution 5 §III-L.1 (line 196), §IV-M.2 Table XXI (line 280) Aligned
7 Per-signature ICCR 0.11 §I para 6 (line 33) §III-L.2 (line 208), §IV-M.3 Table XXII (line 300) Aligned
8 Per-document ICCR 0.34 (HC+MC) §I para 6 (line 33) §III-L.3 (line 233), §IV-M.4 Table XXIII (line 317) Aligned
9 Firm heterogeneity: Firm A 0.62 vs B/C/D 0.090.16 §I para 7 (line 35) + contribution 6 §III-L.4 (line 259), §IV-M.4 (line 325) Aligned
10 Within-firm 7799% (any-pair) §I para 7 (line 35) + contribution 6 §III-L.4 (line 283), §IV-M.5 Table XXV (line 340) Aligned
11 "Specificity-proxy-anchored screening + HITL, not validated detector" positioning §I contribution 8 (line 57) §III-M Table XXVII (line 316) + §V-G/H + §VI item 8 Aligned
12 "No calibrated error rates without ground truth" disclaimer §I para 5 item (v) (line 25) §III-M (line 312), §V-H limit 1 (line 113) Aligned

Observation: Abstract does NOT mention the three-score Spearman convergence (\rho \geq 0.879). This is by design — the v4 pivot demoted three-score from a headline finding to "internal consistency" because the scores share inputs. §I contribution 7 and §V-E retain it with the demoted caveat. No action needed.

2. §I contributions (8) → body implementation map

# §I contribution §III/§IV implementation §V/§VI loop-back Status
1 Problem formulation — (§V-A discusses) §V-A (line 73), §VI implicit Aligned
2 End-to-end pipeline §III-A..F (inherited) §VI line 145 Aligned
3 Dual-descriptor verification §III-F + §IV-L backbone ablation §V-A/B implicit Aligned
4 Composition decomposition §III-I.4 + §IV-M.1 Table XX §V-B (line 81), §VI item 1 (line 147) Aligned
5 Anchor-based multi-level ICCR §III-L + §IV-M.2/M.3/M.4 Tables XXI/XXII/XXIII §V-F (line 99), §VI item 2 Aligned
6 Firm heterogeneity + within-firm collision §III-L.4 + §IV-M.4/M.5 Tables XXIV/XXV §V-C (line 83), §VI items 3+4 Aligned
7 K=3 descriptive + three-score convergence §III-J + §III-K.1 + §IV-E/F/G §V-D/E (lines 8997), §VI items 5+6 Aligned
8 Annotation-free positive-anchor + ten-tool ceiling §III-K.4 + §III-M Table XXVII + §IV-H Table XIV §V-G (line 105), §VI items 7+8 Aligned

All 8 contributions trace cleanly through §III/§IV implementation and §V/§VI loop-back. No action needed.

3. v3→v4 pivot rhetoric thread

The v4 pivot has four narrative nodes; each must reinforce the others:

Node Location Says
Setup: v3.x distributional path §I para 5 line 31 (Phase 4 prose) "Earlier work...adopted a distributional path...v4.0 reports a composition decomposition diagnostic that overturns this reading"
Proof: 2×2 factorial composition decomposition §III-I.4 Scripts 39b39e (lines 5573) Joint firm-mean centring + integer-tie jitter eliminates rejection (p_{\text{median}} = 0.35)
Alternative: anchor-based ICCR §III-L (line 173+) Replaces distributional thresholds with inter-CPA coincidence-rate calibration at 3 units
Discussion: K=3 stays descriptive §V-B + §V-D (lines 7793) Mixture fits are firm-compositional partitions, not mechanism modes
Conclusion: pivot summary §VI items 1+5 (line 147) Demotes K=3 mechanism reading; positions ICCR as the operational calibration

All five nodes use consistent language ("composition + integer artefact"; "descriptive firm-compositional partition"; "no within-population bimodal antimode"). No action needed.

4. K=3 demotion consistency

Verified across 4 locations using consistent descriptor-position language (post round-2 M1 fix):

  • §III-J line 90 (source of truth): "The 'descriptive position' column replaces v3.x's 'hand-leaning / mixed / replicated' mechanism labels"
  • §I contribution 7 (line 55): "K=3 mixture demoted from 'three mechanism clusters' to a descriptive firm-compositional partition"
  • §V-D (line 93): "the K=3 stability supports a descriptive reading...not a three-mechanism latent-class structure"
  • §VI item 5 (line 147): same demotion language
  • §IV Tables XVI/XVII column headers: "C1 (low-cos / high-dHash) | C2 (central) | C3 (high-cos / low-dHash)" — descriptor-position labels throughout

grep -n "hand-leaning" in v4 public prose: 0 hits (only internal-strip text). Closed.

5. ICCR vs FAR terminology consistency

Verified across all rate-reporting locations (post round-2 + round-5):

  • Abstract: "inter-CPA coincidence-rate (ICCR)" ✓
  • §I contribution 5 (line 51): explicit terminology adoption and FAR disclaimer ✓
  • §III-L.1 (line 185): "Terminological note on 'FAR'" with full disclaimer ✓
  • §IV-I (line 159): historical "FAR" cited only with the "v3.x terminology" caveat ✓
  • §V-G heading (line 105): "Inherited Inter-CPA Negative Anchor Reframed as Coincidence Rate" ✓
  • §V-H limitations: specificity-proxy framing under partially-violated assumption ✓
  • §VI item 2 (line 147): "explicit terminological replacement of 'FAR' by 'ICCR' given the unsupervised setting" ✓

No public-prose "FAR" leak outside the historical-context caveats. Closed.

6. Numbers consistency audit (cross-section)

Headline numbers cross-referenced between Abstract / §I / §III / §IV:

Claim Abstract §I body §III source §IV table Status
Per-comparison ICCR cos >0.95 0.0006 0.0006 0.00060 0.00060 Table XXI Match (rounding consistent)
Per-comparison ICCR dHash \leq 5 0.0013 0.0013 0.00129 0.00129 Table XXI Match
Per-comparison joint 0.00014 0.00014 0.00014 0.00014 Table XXI Match
Per-signature ICCR 0.11 0.11 0.1102 (Wilson) 0.1102 Table XXII Match
Per-document ICCR (HC+MC) 0.34 0.34 0.3375 0.3375 Table XXIII Match
Firm A doc HC+MC 0.62 0.62 0.6201 0.6201 §IV-M.4 line 325 Match
Firms B/C/D doc HC+MC 0.090.16 0.090.16 0.1600 / 0.1635 / 0.0863 same Match
Within-firm any-pair 7799% (rounded) 76.783.7% / 98.8% same Table XXV Match
Same-pair within-firm 97.099.96% 99.96 / 97.7 / 98.2 / 97.0 line 349 Match
Composition p_{\text{median}} 0.35 0.35 0.35 0.35 Table XX Match
Logistic OR 0.053 / 0.010 / 0.027 same Table XXIV Match
Spearman ρ floor 0.879 0.879 0.8794 Table IX Match (Spearman precision §III/§IV differ at 4th decimal — see §8)

All headline numbers reconcile across sections. Closed.

7. Limitations vs §III-M Table XXVII coverage

§V-H lists 14 limitations (9 v4-specific + 5 inherited from v3.20.0). Each Table XXVII assumption should be covered by an explicit §V-H item OR be self-evidently descriptive:

Table XXVII tool Assumption §V-H coverage
Composition decomposition Jitter unbiased; Big-4 jittered + centred + jittered evidence Implicit — covered by general "no signature-level ground truth" frame
Per-comparison ICCR Inter-CPA pairs are negative anchor (partially violated) §V-H limit 2 (explicit)
Per-signature ICCR Same + pool replacement preserves negative-anchor property §V-H limit 2 (implicit via "specificity-proxy rates under partially-violated assumption")
Per-document ICCR Same Same
Firm-heterogeneity logistic Cluster-robust SE not run Gap — no §V-H item explicitly flags the naive-SE caveat
Cross-firm hit matrix Deployed-rule semantics + mode-of-firms tie-break §V-H limit 2
Alert-rate sensitivity Descriptive gradient, not formal plateau §V-H limit 5 (line 121, "alert-rate sensitivity analysis characterises only the HC threshold")
Three-score Spearman Scores share inputs §V-H limit 6 (line 123, deployed-rate-excess interpretation) — partial; not the score-independence caveat directly
Pixel-identical positive capture Tautological (byte-identical ⇒ in HC region) §V-H limit 4 (line 119, "pixel-identity is a conservative subset")
LOOO firm-level reproducibility Stability ≠ classification validity; K=3 membership ±12.8 pp §V-H limit 8 (line 127, "K=3 hard-posterior membership is composition-sensitive")

Gap 1: §V-H does not explicitly flag the logistic-regression naive-SE caveat that Table XXVII row 5 discloses. Worth adding a half-sentence to §V-H or letting Table XXVII carry the disclosure since it's already in print.

Gap 2 (Opus N5 from round-2 audit): §V-H limit 2 discloses the firm-dependent within-firm violation numerically (98.8% at A; 76.783.7% at B/C/D) but does not interpret what this means for proxy reliability — namely that Firm A's per-firm ICCR is MORE contaminated by within-firm sharing than B/C/D's, so the per-firm B/C/D rates are closer to clean specificity. This nuance affects interpretation of the headline "firm heterogeneity is decisive" framing.

8. Net-new narrative concerns (audit-surfaced)

Concern A — Phase 4 §I body line 31 cites "Script 39c" for jittered-dHash claim

Issue. Phase 4 prose line 31 says:

"Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual mid/small firm with \geq 500 signatures (10 firms tested in Script 39c)."

Codex round-9 verified that Script 39c on RAW dHash actually REJECTS unimodality in all 10 firms; only the JITTERED variant (codex's read-only spike on the Script 39c substrate) fails to reject. Round-5 corrected §III line 59 + provenance table line 382 to cite the spike attribution, but Phase 4 §I line 31's bare "Script 39c" citation now reads less precisely than the source-of-truth at §III line 59.

Severity. Low. The qualitative claim ("fail to reject in 10 mid/small firms") is correct per codex's own rerun. The provenance attribution is what's slightly off.

Recommended fix. Update Phase 4 line 31 to match §III line 59: "...in every individual mid/small firm with \geq 500 signatures (10 firms tested; cosine: Script 39c per-firm; jittered-dHash: codex-verified read-only spike on Script 39c substrate)." Or simpler: drop the parenthetical "in Script 39c" and let §III carry the precise provenance.

Concern B — Phase 4 §V-B line 81 carries the same jittered-dHash claim without provenance

Issue. §V-B (line 81) says:

"Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual non-Big-4 firm with \geq 500 signatures (10 firms tested)."

No script citation here; the bare "10 firms tested" is technically OK but less precise than §III line 59 after round-5.

Severity. Low.

Recommended fix. Add a §III cross-reference: "...10 firms tested; see §III-I.4 / §III provenance table line for the codex-verified read-only spike." Or leave as-is and let §III line 59 carry the detailed provenance.

Concern C — §III-K.4 line 149 stale cross-reference to v3.x §IV-I

Issue. §III-K item 4 line 149 says:

"The corresponding signature-level inter-CPA negative-anchor ICCR evidence is developed in §III-L.1 (Big-4 sample) and the v3.x §IV-I corpus-wide version (reported under prior 'FAR' terminology)"

After v4 §IV-I was shrunk to a 3-paragraph reframing stub (post round-3), the phrase "v3.x §IV-I corpus-wide version" is misleading — v4 §IV-I now exists, just as a pointer. Opus round-2 N6 flagged this; codex round-9 did not.

Severity. Cosmetic.

Recommended fix. Update line 149 to "§III-L.1 (Big-4 v4 sample) and the inherited corpus-wide v3.x version cited at §IV-I (reported under prior 'FAR' terminology)".

Concern D — Spearman precision mismatch §III vs §IV

Issue. §III-K.1 line 123127 reports Spearman ρ as 0.963 / 0.889 / 0.879 (3 decimal places); §IV-F Table IX line 8187 reports 0.9627 / 0.8890 / 0.8794 (4 decimal places). Codex round-8 flagged this as OPEN / COPY-EDIT.

Severity. Cosmetic.

Recommended fix. Standardise on 4 decimal places (matches Script 38 reported precision) in §III + §IV + §V-E + §VI.

9. Splice-readiness gate

Item Status Notes
Abstract word count 247 / 250
§I contributions count 8 contributions, all map to body
§II LOOO addition with refs [42]-[44] Present (post round-1)
§III sub-sections G..M complete Including Table XXVII numbered
§IV table sequence VXXVI sequential Post round-2 cascade
§V sub-sections A..H complete Post round-2 M4 fix (G→H)
§VI items 1..8 map to §I 1..8 1:1 mapping verified
References [1][44] present 44 entries; [42]-[44] = Stone 1974 / Geisser 1975 / Vehtari 2017
Internal draft notes stripped Splice-time mechanical — Phase 4 line 3 + lines 153-162; §III line 3 + lines 434-447; §IV line 3 + line 365+
"Nine-tool" / "Table XV-B" residue Splice-time mechanical — only in internal-strip text
Cross-section number consistency All headline numbers match across Abstract / §I / §III / §IV
Terminology consistency (ICCR / K=3 / less-replication-dominated) No public-prose leaks
IEEE Access format Abstract single-paragraph ≤250 words; numbered references; numbered tables

Before manuscript-splice, three small text patches would close the audit-surfaced concerns:

  1. Concern A: Phase 4 line 31 — narrow the "Script 39c" provenance attribution for the jittered-dHash claim to match §III line 59.
  2. Concern C: §III line 149 — update the "v3.x §IV-I corpus-wide version" wording to reflect v4 §IV-I's reframing-stub status.
  3. Concern D: Standardise Spearman precision to 4 decimal places across §III/§IV/§V.

Optional:

  • Gap 2 / Opus N5: Add a half-sentence to §V-H limit 2 interpreting the firm-dependent within-firm violation as "Firm A's per-firm ICCR is more contaminated by within-firm sharing than Firms B/C/D's, so the per-firm B/C/D rates are closer to clean specificity than the pooled rate."

None of these is empirical or structural. They are prose-level consistency polishings. Submission can proceed without them, but they would strengthen reviewer-pass robustness.

11. Submission-readiness verdict

Conditionally ready.

The empirical core is sound and reproducible. The Phase 5 panel converged 3/3 in Accept/Minor band. Five fix rounds have closed every reviewer-flagged finding. Numbers are consistent across sections. Terminology is consistent. The v4 pivot's narrative thread reads as a coherent argument from Abstract through Conclusion.

Path A (ship now): proceed directly to manuscript-splice → DOCX export → partner Jimmy review → submission. Risk: the four audit-surfaced concerns above may be flagged by external reviewers as small cosmetic issues. Cost: zero pre-submission work.

Path B (15-min polish first): apply round-6 patches for the four concerns above → re-verify with rg-based grep → manuscript-splice → DOCX → partner. Risk: zero. Cost: 15 min.

Recommendation: Path B. The four concerns are small, narrative-consistency only, and fixing them avoids putting cross-section attribution inconsistencies in front of partner Jimmy / IEEE Access reviewers.

After Path B (or A): proceed to manuscript-splice as the next mechanical step.