Files

T

gbanyan 8dddc3b87c Apply Phase 5 round-6 narrative-consistency patches + audit artifact

Closes the four audit-surfaced concerns from
paper/narrative_audit_v4.md plus the Opus round-2 N5 interpretive
caveat. All five are prose-level consistency polishings; no
empirical or structural changes.

Concern A (Phase 4 line 31 / §I body): "Script 39c" provenance for
the jittered-dHash claim was less precise than the §III line 59
source-of-truth which (post round-5) attributes the non-Big-4
jittered evidence to a codex-verified read-only spike. Updated §I
to: "cosine: Script 39c; jittered-dHash: Script 39d for Big-4
plus codex-verified read-only spike for ten non-Big-4 firms."

Concern B (Phase 4 line 81 / §V-B): same jittered-dHash claim
without precise provenance. Updated §V-B to match Concern A
attribution + §III-I.4 cross-reference.

Concern C (§III-K.4 line 149): cross-reference to "v3.x §IV-I
corpus-wide version" was stale after v4 §IV-I was shrunk to a
reframing stub. Updated to "§III-L.1 (Big-4 v4 sample) and the
inherited corpus-wide v3.x version cited at §IV-I".

Concern D (Spearman precision): standardized §III-K.1 table at
lines 125-127 to 4 decimal places (0.963/0.889/0.879 ->
0.9627/0.8890/0.8794), matching §IV-F Table IX. Prose floor
language "rho >= 0.879" preserved across Abstract/§I/§V/§VI
since 0.8794 still rounds to 0.879 at 3dp.

Opus N5 / §V-H limit 2 nuance: added a sentence interpreting the
firm-dependent within-firm violation - Firm A's per-firm ICCR is
more contaminated by within-firm sharing than B/C/D's, so the
B/C/D rates of 0.09-0.16 are closer to clean specificity, and the
Firm A vs B/C/D contrast reflects both genuine heterogeneity AND
a firm-dependent proxy-contamination gradient.

Audit artifact paper/narrative_audit_v4.md (~200 lines) captures
the full cross-section coherence check across Abstract / §I /
§III / §IV / §V / §VI:
- Abstract -> body mirror audit (12 claims, all aligned)
- §I 8 contributions -> §III/§IV/§V/§VI mapping (all aligned)
- v3->v4 pivot rhetoric thread (5 nodes, all aligned)
- K=3 demotion / ICCR-FAR / numbers consistency: all verified
- Splice-readiness gate: 10/12 pass + 2 splice-time mechanical

Headline assessment: "Mostly Coherent - submission-ready after
2-3 small patches" (now applied).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-14 18:22:22 +08:00

18 KiB

Raw Blame History

Paper A v4.0 — Narrative-Thread Audit

Auditor: Claude Opus 4.7 (1M context) Date: 2026-05-14 Target: paper/v4/paper_a_prose_v4_phase4.md + paper/v4/paper_a_methodology_v4_section_iii.md + paper/v4/paper_a_results_v4_section_iv.md (post round-5, commit 128a914) Purpose: Coherence check across Abstract / §I / §III / §IV / §V / §VI as a single argument, after Phase 5 AI peer-review panel convergence (3/3 in Accept/Minor band).

Headline assessment

Mostly Coherent — submission-ready after 2-3 small narrative-consistency patches.

The v4 story arc — "v3.20.0 distributional path turns out to be composition + integer artefact; v4.0 replaces it with anchor-based ICCR + decisive firm heterogeneity; positioning is anchor-calibrated specificity-only screening, not validated detector" — reads cleanly from Abstract through §VI. The 5 fix rounds + 3 reviewer panels have substantially closed the major framing, terminology, and provenance risks. What remains is narrow narrative-consistency residue between Phase 4 §I/§V prose and §III source-of-truth (3 specific items, all small), plus 1 interpretive caveat that would strengthen §V-H limitation 2.

No empirical reruns required. No structural rewrites required. Submission-readiness gate: conditionally pass — recommend a 15-minute round-6 prose-consistency patch before manuscript-splice, then the manuscript is splice-ready.

1. Abstract → body mirror audit

The Abstract (Phase 4 line 11, 247 words) makes ~12 distinct claims. Each maps cleanly to a body location:

#	Abstract claim	§I location	§III/§IV body location	Status
1	Non-hand-signed detection problem (regulation + digitization)	§I paras 1–3 (lines 19–21)	— (problem framing)	Aligned
2	Pipeline: VLM + YOLOv11 + ResNet-50 + dual-descriptor	§I para 6 (line 29) + contribution 2	§III-A..F (inherited) + §III-F dual-descriptor	Aligned
3	90,282 reports / 182,328 sigs / 758 CPAs	§I para 8 (line 39)	§IV-A..C (inherited)	Aligned
4	Big-4 sub-corpus: 437 CPAs / 150,442 sigs	§I para 8 (line 39)	§III-G (line 19), §IV-D (line 9, line 15)	Aligned
5	Composition decomposition `p_{\text{median}} = 0.35`	§I para 5 (line 31) + contribution 4	§III-I.4 (lines 55–73), §IV-M.1 Table XX (line 266)	Aligned
6	Per-comparison ICCRs 0.0006 / 0.0013 / 0.00014	§I para 6 (line 33) + contribution 5	§III-L.1 (line 196), §IV-M.2 Table XXI (line 280)	Aligned
7	Per-signature ICCR 0.11	§I para 6 (line 33)	§III-L.2 (line 208), §IV-M.3 Table XXII (line 300)	Aligned
8	Per-document ICCR 0.34 (HC+MC)	§I para 6 (line 33)	§III-L.3 (line 233), §IV-M.4 Table XXIII (line 317)	Aligned
9	Firm heterogeneity: Firm A 0.62 vs B/C/D 0.09–0.16	§I para 7 (line 35) + contribution 6	§III-L.4 (line 259), §IV-M.4 (line 325)	Aligned
10	Within-firm 77–99% (any-pair)	§I para 7 (line 35) + contribution 6	§III-L.4 (line 283), §IV-M.5 Table XXV (line 340)	Aligned
11	"Specificity-proxy-anchored screening + HITL, not validated detector" positioning	§I contribution 8 (line 57)	§III-M Table XXVII (line 316) + §V-G/H + §VI item 8	Aligned
12	"No calibrated error rates without ground truth" disclaimer	§I para 5 item (v) (line 25)	§III-M (line 312), §V-H limit 1 (line 113)	Aligned

Observation: Abstract does NOT mention the three-score Spearman convergence (\rho \geq 0.879). This is by design — the v4 pivot demoted three-score from a headline finding to "internal consistency" because the scores share inputs. §I contribution 7 and §V-E retain it with the demoted caveat. No action needed.

2. §I contributions (8) → body implementation map

#	§I contribution	§III/§IV implementation	§V/§VI loop-back	Status
1	Problem formulation	— (§V-A discusses)	§V-A (line 73), §VI implicit	Aligned
2	End-to-end pipeline	§III-A..F (inherited)	§VI line 145	Aligned
3	Dual-descriptor verification	§III-F + §IV-L backbone ablation	§V-A/B implicit	Aligned
4	Composition decomposition	§III-I.4 + §IV-M.1 Table XX	§V-B (line 81), §VI item 1 (line 147)	Aligned
5	Anchor-based multi-level ICCR	§III-L + §IV-M.2/M.3/M.4 Tables XXI/XXII/XXIII	§V-F (line 99), §VI item 2	Aligned
6	Firm heterogeneity + within-firm collision	§III-L.4 + §IV-M.4/M.5 Tables XXIV/XXV	§V-C (line 83), §VI items 3+4	Aligned
7	K=3 descriptive + three-score convergence	§III-J + §III-K.1 + §IV-E/F/G	§V-D/E (lines 89–97), §VI items 5+6	Aligned
8	Annotation-free positive-anchor + ten-tool ceiling	§III-K.4 + §III-M Table XXVII + §IV-H Table XIV	§V-G (line 105), §VI items 7+8	Aligned

All 8 contributions trace cleanly through §III/§IV implementation and §V/§VI loop-back. No action needed.

3. v3→v4 pivot rhetoric thread

The v4 pivot has four narrative nodes; each must reinforce the others:

Node	Location	Says
Setup: v3.x distributional path	§I para 5 line 31 (Phase 4 prose)	"Earlier work...adopted a distributional path...v4.0 reports a composition decomposition diagnostic that overturns this reading"
Proof: 2×2 factorial composition decomposition	§III-I.4 Scripts 39b–39e (lines 55–73)	Joint firm-mean centring + integer-tie jitter eliminates rejection (`p_{\text{median}} = 0.35`)
Alternative: anchor-based ICCR	§III-L (line 173+)	Replaces distributional thresholds with inter-CPA coincidence-rate calibration at 3 units
Discussion: K=3 stays descriptive	§V-B + §V-D (lines 77–93)	Mixture fits are firm-compositional partitions, not mechanism modes
Conclusion: pivot summary	§VI items 1+5 (line 147)	Demotes K=3 mechanism reading; positions ICCR as the operational calibration

All five nodes use consistent language ("composition + integer artefact"; "descriptive firm-compositional partition"; "no within-population bimodal antimode"). No action needed.

4. K=3 demotion consistency

Verified across 4 locations using consistent descriptor-position language (post round-2 M1 fix):

§III-J line 90 (source of truth): "The 'descriptive position' column replaces v3.x's 'hand-leaning / mixed / replicated' mechanism labels"
§I contribution 7 (line 55): "K=3 mixture demoted from 'three mechanism clusters' to a descriptive firm-compositional partition"
§V-D (line 93): "the K=3 stability supports a descriptive reading...not a three-mechanism latent-class structure"
§VI item 5 (line 147): same demotion language
§IV Tables XVI/XVII column headers: "C1 (low-cos / high-dHash) | C2 (central) | C3 (high-cos / low-dHash)" — descriptor-position labels throughout

grep -n "hand-leaning" in v4 public prose: 0 hits (only internal-strip text). Closed.

5. ICCR vs FAR terminology consistency

Verified across all rate-reporting locations (post round-2 + round-5):

Abstract: "inter-CPA coincidence-rate (ICCR)" ✓
§I contribution 5 (line 51): explicit terminology adoption and FAR disclaimer ✓
§III-L.1 (line 185): "Terminological note on 'FAR'" with full disclaimer ✓
§IV-I (line 159): historical "FAR" cited only with the "v3.x terminology" caveat ✓
§V-G heading (line 105): "Inherited Inter-CPA Negative Anchor Reframed as Coincidence Rate" ✓
§V-H limitations: specificity-proxy framing under partially-violated assumption ✓
§VI item 2 (line 147): "explicit terminological replacement of 'FAR' by 'ICCR' given the unsupervised setting" ✓

No public-prose "FAR" leak outside the historical-context caveats. Closed.

6. Numbers consistency audit (cross-section)

Headline numbers cross-referenced between Abstract / §I / §III / §IV:

Claim	Abstract	§I body	§III source	§IV table	Status
Per-comparison ICCR cos `>0.95`	0.0006	0.0006	0.00060	0.00060 Table XXI	Match (rounding consistent)
Per-comparison ICCR dHash `\leq 5`	0.0013	0.0013	0.00129	0.00129 Table XXI	Match
Per-comparison joint	0.00014	0.00014	0.00014	0.00014 Table XXI	Match
Per-signature ICCR	0.11	0.11	0.1102 (Wilson)	0.1102 Table XXII	Match
Per-document ICCR (HC+MC)	0.34	0.34	0.3375	0.3375 Table XXIII	Match
Firm A doc HC+MC	0.62	0.62	0.6201	0.6201 §IV-M.4 line 325	Match
Firms B/C/D doc HC+MC	0.09–0.16	0.09–0.16	0.1600 / 0.1635 / 0.0863	same	Match
Within-firm any-pair	77–99% (rounded)	76.7–83.7% / 98.8%	same	Table XXV	Match
Same-pair within-firm	—	97.0–99.96%	99.96 / 97.7 / 98.2 / 97.0	line 349	Match
Composition `p_{\text{median}}`	0.35	0.35	0.35	0.35 Table XX	Match
Logistic OR	—	0.053 / 0.010 / 0.027	same	Table XXIV	Match
Spearman ρ floor	—	0.879	0.879	0.8794 Table IX	Match (Spearman precision §III/§IV differ at 4th decimal — see §8)

All headline numbers reconcile across sections. Closed.

7. Limitations vs §III-M Table XXVII coverage

§V-H lists 14 limitations (9 v4-specific + 5 inherited from v3.20.0). Each Table XXVII assumption should be covered by an explicit §V-H item OR be self-evidently descriptive:

Table XXVII tool	Assumption	§V-H coverage
Composition decomposition	Jitter unbiased; Big-4 jittered + centred + jittered evidence	Implicit — covered by general "no signature-level ground truth" frame
Per-comparison ICCR	Inter-CPA pairs are negative anchor (partially violated)	§V-H limit 2 (explicit)
Per-signature ICCR	Same + pool replacement preserves negative-anchor property	§V-H limit 2 (implicit via "specificity-proxy rates under partially-violated assumption")
Per-document ICCR	Same	Same
Firm-heterogeneity logistic	Cluster-robust SE not run	Gap — no §V-H item explicitly flags the naive-SE caveat
Cross-firm hit matrix	Deployed-rule semantics + mode-of-firms tie-break	§V-H limit 2
Alert-rate sensitivity	Descriptive gradient, not formal plateau	§V-H limit 5 (line 121, "alert-rate sensitivity analysis characterises only the HC threshold")
Three-score Spearman	Scores share inputs	§V-H limit 6 (line 123, deployed-rate-excess interpretation) — partial; not the score-independence caveat directly
Pixel-identical positive capture	Tautological (byte-identical ⇒ in HC region)	§V-H limit 4 (line 119, "pixel-identity is a conservative subset")
LOOO firm-level reproducibility	Stability ≠ classification validity; K=3 membership ±12.8 pp	§V-H limit 8 (line 127, "K=3 hard-posterior membership is composition-sensitive")

Gap 1: §V-H does not explicitly flag the logistic-regression naive-SE caveat that Table XXVII row 5 discloses. Worth adding a half-sentence to §V-H or letting Table XXVII carry the disclosure since it's already in print.

Gap 2 (Opus N5 from round-2 audit): §V-H limit 2 discloses the firm-dependent within-firm violation numerically (98.8% at A; 76.7–83.7% at B/C/D) but does not interpret what this means for proxy reliability — namely that Firm A's per-firm ICCR is MORE contaminated by within-firm sharing than B/C/D's, so the per-firm B/C/D rates are closer to clean specificity. This nuance affects interpretation of the headline "firm heterogeneity is decisive" framing.

8. Net-new narrative concerns (audit-surfaced)

Concern A — Phase 4 §I body line 31 cites "Script 39c" for jittered-dHash claim

Issue. Phase 4 prose line 31 says:

"Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual mid/small firm with \geq 500 signatures (10 firms tested in Script 39c)."

Codex round-9 verified that Script 39c on RAW dHash actually REJECTS unimodality in all 10 firms; only the JITTERED variant (codex's read-only spike on the Script 39c substrate) fails to reject. Round-5 corrected §III line 59 + provenance table line 382 to cite the spike attribution, but Phase 4 §I line 31's bare "Script 39c" citation now reads less precisely than the source-of-truth at §III line 59.

Severity. Low. The qualitative claim ("fail to reject in 10 mid/small firms") is correct per codex's own rerun. The provenance attribution is what's slightly off.

Recommended fix. Update Phase 4 line 31 to match §III line 59: "...in every individual mid/small firm with \geq 500 signatures (10 firms tested; cosine: Script 39c per-firm; jittered-dHash: codex-verified read-only spike on Script 39c substrate)." Or simpler: drop the parenthetical "in Script 39c" and let §III carry the precise provenance.

Concern B — Phase 4 §V-B line 81 carries the same jittered-dHash claim without provenance

Issue. §V-B (line 81) says:

"Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual non-Big-4 firm with \geq 500 signatures (10 firms tested)."

No script citation here; the bare "10 firms tested" is technically OK but less precise than §III line 59 after round-5.

Severity. Low.

Recommended fix. Add a §III cross-reference: "...10 firms tested; see §III-I.4 / §III provenance table line for the codex-verified read-only spike." Or leave as-is and let §III line 59 carry the detailed provenance.

Concern C — §III-K.4 line 149 stale cross-reference to v3.x §IV-I

Issue. §III-K item 4 line 149 says:

"The corresponding signature-level inter-CPA negative-anchor ICCR evidence is developed in §III-L.1 (Big-4 sample) and the v3.x §IV-I corpus-wide version (reported under prior 'FAR' terminology)"

After v4 §IV-I was shrunk to a 3-paragraph reframing stub (post round-3), the phrase "v3.x §IV-I corpus-wide version" is misleading — v4 §IV-I now exists, just as a pointer. Opus round-2 N6 flagged this; codex round-9 did not.

Severity. Cosmetic.

Recommended fix. Update line 149 to "§III-L.1 (Big-4 v4 sample) and the inherited corpus-wide v3.x version cited at §IV-I (reported under prior 'FAR' terminology)".

Concern D — Spearman precision mismatch §III vs §IV

Issue. §III-K.1 line 123–127 reports Spearman ρ as 0.963 / 0.889 / 0.879 (3 decimal places); §IV-F Table IX line 81–87 reports 0.9627 / 0.8890 / 0.8794 (4 decimal places). Codex round-8 flagged this as OPEN / COPY-EDIT.

Severity. Cosmetic.

Recommended fix. Standardise on 4 decimal places (matches Script 38 reported precision) in §III + §IV + §V-E + §VI.

9. Splice-readiness gate

Item	Status	Notes
Abstract word count	✓	247 / 250
§I contributions count	✓	8 contributions, all map to body
§II LOOO addition with refs [42]-[44]	✓	Present (post round-1)
§III sub-sections G..M complete	✓	Including Table XXVII numbered
§IV table sequence V–XXVI sequential	✓	Post round-2 cascade
§V sub-sections A..H complete	✓	Post round-2 M4 fix (G→H)
§VI items 1..8 map to §I 1..8	✓	1:1 mapping verified
References [1]–[44] present	✓	44 entries; [42]-[44] = Stone 1974 / Geisser 1975 / Vehtari 2017
Internal draft notes stripped	✗	Splice-time mechanical — Phase 4 line 3 + lines 153-162; §III line 3 + lines 434-447; §IV line 3 + line 365+
"Nine-tool" / "Table XV-B" residue	✗	Splice-time mechanical — only in internal-strip text
Cross-section number consistency	✓	All headline numbers match across Abstract / §I / §III / §IV
Terminology consistency (ICCR / K=3 / less-replication-dominated)	✓	No public-prose leaks
IEEE Access format	✓	Abstract single-paragraph ≤250 words; numbered references; numbered tables

10. Recommended round-6 narrative-consistency patch (15 min, optional)

Before manuscript-splice, three small text patches would close the audit-surfaced concerns:

Concern A: Phase 4 line 31 — narrow the "Script 39c" provenance attribution for the jittered-dHash claim to match §III line 59.
Concern C: §III line 149 — update the "v3.x §IV-I corpus-wide version" wording to reflect v4 §IV-I's reframing-stub status.
Concern D: Standardise Spearman precision to 4 decimal places across §III/§IV/§V.

Optional:

Gap 2 / Opus N5: Add a half-sentence to §V-H limit 2 interpreting the firm-dependent within-firm violation as "Firm A's per-firm ICCR is more contaminated by within-firm sharing than Firms B/C/D's, so the per-firm B/C/D rates are closer to clean specificity than the pooled rate."

None of these is empirical or structural. They are prose-level consistency polishings. Submission can proceed without them, but they would strengthen reviewer-pass robustness.

11. Submission-readiness verdict

Conditionally ready.

The empirical core is sound and reproducible. The Phase 5 panel converged 3/3 in Accept/Minor band. Five fix rounds have closed every reviewer-flagged finding. Numbers are consistent across sections. Terminology is consistent. The v4 pivot's narrative thread reads as a coherent argument from Abstract through Conclusion.

Path A (ship now): proceed directly to manuscript-splice → DOCX export → partner Jimmy review → submission. Risk: the four audit-surfaced concerns above may be flagged by external reviewers as small cosmetic issues. Cost: zero pre-submission work.

Path B (15-min polish first): apply round-6 patches for the four concerns above → re-verify with rg-based grep → manuscript-splice → DOCX → partner. Risk: zero. Cost: 15 min.

Recommendation: Path B. The four concerns are small, narrative-consistency only, and fixing them avoids putting cross-section attribution inconsistencies in front of partner Jimmy / IEEE Access reviewers.

After Path B (or A): proceed to manuscript-splice as the next mechanical step.

18 KiB Raw Blame History Unescape Escape