Files
pdf_signature_extraction/paper/narrative_audit_v4.md
T
gbanyan 8dddc3b87c Apply Phase 5 round-6 narrative-consistency patches + audit artifact
Closes the four audit-surfaced concerns from
paper/narrative_audit_v4.md plus the Opus round-2 N5 interpretive
caveat. All five are prose-level consistency polishings; no
empirical or structural changes.

Concern A (Phase 4 line 31 / §I body): "Script 39c" provenance for
the jittered-dHash claim was less precise than the §III line 59
source-of-truth which (post round-5) attributes the non-Big-4
jittered evidence to a codex-verified read-only spike. Updated §I
to: "cosine: Script 39c; jittered-dHash: Script 39d for Big-4
plus codex-verified read-only spike for ten non-Big-4 firms."

Concern B (Phase 4 line 81 / §V-B): same jittered-dHash claim
without precise provenance. Updated §V-B to match Concern A
attribution + §III-I.4 cross-reference.

Concern C (§III-K.4 line 149): cross-reference to "v3.x §IV-I
corpus-wide version" was stale after v4 §IV-I was shrunk to a
reframing stub. Updated to "§III-L.1 (Big-4 v4 sample) and the
inherited corpus-wide v3.x version cited at §IV-I".

Concern D (Spearman precision): standardized §III-K.1 table at
lines 125-127 to 4 decimal places (0.963/0.889/0.879 ->
0.9627/0.8890/0.8794), matching §IV-F Table IX. Prose floor
language "rho >= 0.879" preserved across Abstract/§I/§V/§VI
since 0.8794 still rounds to 0.879 at 3dp.

Opus N5 / §V-H limit 2 nuance: added a sentence interpreting the
firm-dependent within-firm violation - Firm A's per-firm ICCR is
more contaminated by within-firm sharing than B/C/D's, so the
B/C/D rates of 0.09-0.16 are closer to clean specificity, and the
Firm A vs B/C/D contrast reflects both genuine heterogeneity AND
a firm-dependent proxy-contamination gradient.

Audit artifact paper/narrative_audit_v4.md (~200 lines) captures
the full cross-section coherence check across Abstract / §I /
§III / §IV / §V / §VI:
- Abstract -> body mirror audit (12 claims, all aligned)
- §I 8 contributions -> §III/§IV/§V/§VI mapping (all aligned)
- v3->v4 pivot rhetoric thread (5 nodes, all aligned)
- K=3 demotion / ICCR-FAR / numbers consistency: all verified
- Splice-readiness gate: 10/12 pass + 2 splice-time mechanical

Headline assessment: "Mostly Coherent - submission-ready after
2-3 small patches" (now applied).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:22:22 +08:00

221 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Paper A v4.0 — Narrative-Thread Audit
Auditor: Claude Opus 4.7 (1M context)
Date: 2026-05-14
Target: paper/v4/paper_a_prose_v4_phase4.md + paper/v4/paper_a_methodology_v4_section_iii.md + paper/v4/paper_a_results_v4_section_iv.md (post round-5, commit 128a914)
Purpose: Coherence check across Abstract / §I / §III / §IV / §V / §VI as a single argument, after Phase 5 AI peer-review panel convergence (3/3 in Accept/Minor band).
## Headline assessment
**Mostly Coherent — submission-ready after 2-3 small narrative-consistency patches.**
The v4 story arc — *"v3.20.0 distributional path turns out to be composition + integer artefact; v4.0 replaces it with anchor-based ICCR + decisive firm heterogeneity; positioning is anchor-calibrated specificity-only screening, not validated detector"* — reads cleanly from Abstract through §VI. The 5 fix rounds + 3 reviewer panels have substantially closed the major framing, terminology, and provenance risks. What remains is narrow narrative-consistency residue between Phase 4 §I/§V prose and §III source-of-truth (3 specific items, all small), plus 1 interpretive caveat that would strengthen §V-H limitation 2.
No empirical reruns required. No structural rewrites required. Submission-readiness gate: **conditionally pass** — recommend a 15-minute round-6 prose-consistency patch before manuscript-splice, then the manuscript is splice-ready.
## 1. Abstract → body mirror audit
The Abstract (Phase 4 line 11, 247 words) makes ~12 distinct claims. Each maps cleanly to a body location:
| # | Abstract claim | §I location | §III/§IV body location | Status |
|---:|---|---|---|---|
| 1 | Non-hand-signed detection problem (regulation + digitization) | §I paras 13 (lines 1921) | — (problem framing) | **Aligned** |
| 2 | Pipeline: VLM + YOLOv11 + ResNet-50 + dual-descriptor | §I para 6 (line 29) + contribution 2 | §III-A..F (inherited) + §III-F dual-descriptor | **Aligned** |
| 3 | 90,282 reports / 182,328 sigs / 758 CPAs | §I para 8 (line 39) | §IV-A..C (inherited) | **Aligned** |
| 4 | Big-4 sub-corpus: 437 CPAs / 150,442 sigs | §I para 8 (line 39) | §III-G (line 19), §IV-D (line 9, line 15) | **Aligned** |
| 5 | Composition decomposition $p_{\text{median}} = 0.35$ | §I para 5 (line 31) + contribution 4 | §III-I.4 (lines 5573), §IV-M.1 Table XX (line 266) | **Aligned** |
| 6 | Per-comparison ICCRs 0.0006 / 0.0013 / 0.00014 | §I para 6 (line 33) + contribution 5 | §III-L.1 (line 196), §IV-M.2 Table XXI (line 280) | **Aligned** |
| 7 | Per-signature ICCR 0.11 | §I para 6 (line 33) | §III-L.2 (line 208), §IV-M.3 Table XXII (line 300) | **Aligned** |
| 8 | Per-document ICCR 0.34 (HC+MC) | §I para 6 (line 33) | §III-L.3 (line 233), §IV-M.4 Table XXIII (line 317) | **Aligned** |
| 9 | Firm heterogeneity: Firm A 0.62 vs B/C/D 0.090.16 | §I para 7 (line 35) + contribution 6 | §III-L.4 (line 259), §IV-M.4 (line 325) | **Aligned** |
| 10 | Within-firm 7799% (any-pair) | §I para 7 (line 35) + contribution 6 | §III-L.4 (line 283), §IV-M.5 Table XXV (line 340) | **Aligned** |
| 11 | "Specificity-proxy-anchored screening + HITL, not validated detector" positioning | §I contribution 8 (line 57) | §III-M Table XXVII (line 316) + §V-G/H + §VI item 8 | **Aligned** |
| 12 | "No calibrated error rates without ground truth" disclaimer | §I para 5 item (v) (line 25) | §III-M (line 312), §V-H limit 1 (line 113) | **Aligned** |
Observation: Abstract does NOT mention the three-score Spearman convergence ($\rho \geq 0.879$). This is by design — the v4 pivot demoted three-score from a headline finding to "internal consistency" because the scores share inputs. §I contribution 7 and §V-E retain it with the demoted caveat. **No action needed.**
## 2. §I contributions (8) → body implementation map
| # | §I contribution | §III/§IV implementation | §V/§VI loop-back | Status |
|---:|---|---|---|---|
| 1 | Problem formulation | — (§V-A discusses) | §V-A (line 73), §VI implicit | **Aligned** |
| 2 | End-to-end pipeline | §III-A..F (inherited) | §VI line 145 | **Aligned** |
| 3 | Dual-descriptor verification | §III-F + §IV-L backbone ablation | §V-A/B implicit | **Aligned** |
| 4 | Composition decomposition | §III-I.4 + §IV-M.1 Table XX | §V-B (line 81), §VI item 1 (line 147) | **Aligned** |
| 5 | Anchor-based multi-level ICCR | §III-L + §IV-M.2/M.3/M.4 Tables XXI/XXII/XXIII | §V-F (line 99), §VI item 2 | **Aligned** |
| 6 | Firm heterogeneity + within-firm collision | §III-L.4 + §IV-M.4/M.5 Tables XXIV/XXV | §V-C (line 83), §VI items 3+4 | **Aligned** |
| 7 | K=3 descriptive + three-score convergence | §III-J + §III-K.1 + §IV-E/F/G | §V-D/E (lines 8997), §VI items 5+6 | **Aligned** |
| 8 | Annotation-free positive-anchor + ten-tool ceiling | §III-K.4 + §III-M Table XXVII + §IV-H Table XIV | §V-G (line 105), §VI items 7+8 | **Aligned** |
All 8 contributions trace cleanly through §III/§IV implementation and §V/§VI loop-back. **No action needed.**
## 3. v3→v4 pivot rhetoric thread
The v4 pivot has four narrative nodes; each must reinforce the others:
| Node | Location | Says |
|---|---|---|
| **Setup**: v3.x distributional path | §I para 5 line 31 (Phase 4 prose) | "Earlier work...adopted a distributional path...v4.0 reports a composition decomposition diagnostic that overturns this reading" |
| **Proof**: 2×2 factorial composition decomposition | §III-I.4 Scripts 39b39e (lines 5573) | Joint firm-mean centring + integer-tie jitter eliminates rejection ($p_{\text{median}} = 0.35$) |
| **Alternative**: anchor-based ICCR | §III-L (line 173+) | Replaces distributional thresholds with inter-CPA coincidence-rate calibration at 3 units |
| **Discussion**: K=3 stays descriptive | §V-B + §V-D (lines 7793) | Mixture fits are firm-compositional partitions, not mechanism modes |
| **Conclusion**: pivot summary | §VI items 1+5 (line 147) | Demotes K=3 mechanism reading; positions ICCR as the operational calibration |
All five nodes use consistent language ("composition + integer artefact"; "descriptive firm-compositional partition"; "no within-population bimodal antimode"). **No action needed.**
## 4. K=3 demotion consistency
Verified across 4 locations using consistent descriptor-position language (post round-2 M1 fix):
- §III-J line 90 (source of truth): "The 'descriptive position' column replaces v3.x's 'hand-leaning / mixed / replicated' mechanism labels"
- §I contribution 7 (line 55): "K=3 mixture demoted from 'three mechanism clusters' to a descriptive firm-compositional partition"
- §V-D (line 93): "the K=3 stability supports a descriptive reading...*not* a three-mechanism latent-class structure"
- §VI item 5 (line 147): same demotion language
- §IV Tables XVI/XVII column headers: "C1 (low-cos / high-dHash) | C2 (central) | C3 (high-cos / low-dHash)" — descriptor-position labels throughout
`grep -n "hand-leaning"` in v4 public prose: 0 hits (only internal-strip text). **Closed.**
## 5. ICCR vs FAR terminology consistency
Verified across all rate-reporting locations (post round-2 + round-5):
- Abstract: "inter-CPA coincidence-rate (ICCR)" ✓
- §I contribution 5 (line 51): explicit terminology adoption and FAR disclaimer ✓
- §III-L.1 (line 185): "Terminological note on 'FAR'" with full disclaimer ✓
- §IV-I (line 159): historical "FAR" cited only with the "v3.x terminology" caveat ✓
- §V-G heading (line 105): "Inherited Inter-CPA Negative Anchor Reframed as Coincidence Rate" ✓
- §V-H limitations: specificity-proxy framing under partially-violated assumption ✓
- §VI item 2 (line 147): "explicit terminological replacement of 'FAR' by 'ICCR' given the unsupervised setting" ✓
No public-prose "FAR" leak outside the historical-context caveats. **Closed.**
## 6. Numbers consistency audit (cross-section)
Headline numbers cross-referenced between Abstract / §I / §III / §IV:
| Claim | Abstract | §I body | §III source | §IV table | Status |
|---|---|---|---|---|---|
| Per-comparison ICCR cos $>0.95$ | 0.0006 | 0.0006 | 0.00060 | 0.00060 Table XXI | **Match** (rounding consistent) |
| Per-comparison ICCR dHash $\leq 5$ | 0.0013 | 0.0013 | 0.00129 | 0.00129 Table XXI | **Match** |
| Per-comparison joint | 0.00014 | 0.00014 | 0.00014 | 0.00014 Table XXI | **Match** |
| Per-signature ICCR | 0.11 | 0.11 | 0.1102 (Wilson) | 0.1102 Table XXII | **Match** |
| Per-document ICCR (HC+MC) | 0.34 | 0.34 | 0.3375 | 0.3375 Table XXIII | **Match** |
| Firm A doc HC+MC | 0.62 | 0.62 | 0.6201 | 0.6201 §IV-M.4 line 325 | **Match** |
| Firms B/C/D doc HC+MC | 0.090.16 | 0.090.16 | 0.1600 / 0.1635 / 0.0863 | same | **Match** |
| Within-firm any-pair | 7799% (rounded) | 76.783.7% / 98.8% | same | Table XXV | **Match** |
| Same-pair within-firm | — | 97.099.96% | 99.96 / 97.7 / 98.2 / 97.0 | line 349 | **Match** |
| Composition $p_{\text{median}}$ | 0.35 | 0.35 | 0.35 | 0.35 Table XX | **Match** |
| Logistic OR | — | 0.053 / 0.010 / 0.027 | same | Table XXIV | **Match** |
| Spearman ρ floor | — | 0.879 | 0.879 | 0.8794 Table IX | **Match** (Spearman precision §III/§IV differ at 4th decimal — see §8) |
All headline numbers reconcile across sections. **Closed.**
## 7. Limitations vs §III-M Table XXVII coverage
§V-H lists 14 limitations (9 v4-specific + 5 inherited from v3.20.0). Each Table XXVII assumption should be covered by an explicit §V-H item OR be self-evidently descriptive:
| Table XXVII tool | Assumption | §V-H coverage |
|---|---|---|
| Composition decomposition | Jitter unbiased; Big-4 jittered + centred + jittered evidence | Implicit — covered by general "no signature-level ground truth" frame |
| Per-comparison ICCR | Inter-CPA pairs are negative anchor (partially violated) | §V-H limit 2 (explicit) |
| Per-signature ICCR | Same + pool replacement preserves negative-anchor property | §V-H limit 2 (implicit via "specificity-proxy rates under partially-violated assumption") |
| Per-document ICCR | Same | Same |
| Firm-heterogeneity logistic | Cluster-robust SE not run | **Gap** — no §V-H item explicitly flags the naive-SE caveat |
| Cross-firm hit matrix | Deployed-rule semantics + mode-of-firms tie-break | §V-H limit 2 |
| Alert-rate sensitivity | Descriptive gradient, not formal plateau | §V-H limit 5 (line 121, "alert-rate sensitivity analysis characterises only the HC threshold") |
| Three-score Spearman | Scores share inputs | §V-H limit 6 (line 123, deployed-rate-excess interpretation) — partial; not the score-independence caveat directly |
| Pixel-identical positive capture | Tautological (byte-identical ⇒ in HC region) | §V-H limit 4 (line 119, "pixel-identity is a conservative subset") |
| LOOO firm-level reproducibility | Stability ≠ classification validity; K=3 membership ±12.8 pp | §V-H limit 8 (line 127, "K=3 hard-posterior membership is composition-sensitive") |
**Gap 1**: §V-H does not explicitly flag the logistic-regression naive-SE caveat that Table XXVII row 5 discloses. Worth adding a half-sentence to §V-H or letting Table XXVII carry the disclosure since it's already in print.
**Gap 2** (Opus N5 from round-2 audit): §V-H limit 2 discloses the firm-dependent within-firm violation numerically (98.8% at A; 76.783.7% at B/C/D) but does not interpret what this means for proxy reliability — namely that Firm A's per-firm ICCR is MORE contaminated by within-firm sharing than B/C/D's, so the per-firm B/C/D rates are closer to clean specificity. This nuance affects interpretation of the headline "firm heterogeneity is decisive" framing.
## 8. Net-new narrative concerns (audit-surfaced)
### Concern A — Phase 4 §I body line 31 cites "Script 39c" for jittered-dHash claim
**Issue.** Phase 4 prose line 31 says:
> "Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual mid/small firm with $\geq 500$ signatures (10 firms tested in Script 39c)."
Codex round-9 verified that Script 39c on RAW dHash actually REJECTS unimodality in all 10 firms; only the JITTERED variant (codex's read-only spike on the Script 39c substrate) fails to reject. Round-5 corrected §III line 59 + provenance table line 382 to cite the spike attribution, but Phase 4 §I line 31's bare "Script 39c" citation now reads less precisely than the source-of-truth at §III line 59.
**Severity.** Low. The qualitative claim ("fail to reject in 10 mid/small firms") is correct per codex's own rerun. The provenance attribution is what's slightly off.
**Recommended fix.** Update Phase 4 line 31 to match §III line 59: "...in every individual mid/small firm with $\geq 500$ signatures (10 firms tested; cosine: Script 39c per-firm; jittered-dHash: codex-verified read-only spike on Script 39c substrate)." Or simpler: drop the parenthetical "in Script 39c" and let §III carry the precise provenance.
### Concern B — Phase 4 §V-B line 81 carries the same jittered-dHash claim without provenance
**Issue.** §V-B (line 81) says:
> "Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual non-Big-4 firm with $\geq 500$ signatures (10 firms tested)."
No script citation here; the bare "10 firms tested" is technically OK but less precise than §III line 59 after round-5.
**Severity.** Low.
**Recommended fix.** Add a §III cross-reference: "...10 firms tested; see §III-I.4 / §III provenance table line for the codex-verified read-only spike." Or leave as-is and let §III line 59 carry the detailed provenance.
### Concern C — §III-K.4 line 149 stale cross-reference to v3.x §IV-I
**Issue.** §III-K item 4 line 149 says:
> "The corresponding signature-level inter-CPA negative-anchor ICCR evidence is developed in §III-L.1 (Big-4 sample) and the v3.x §IV-I corpus-wide version (reported under prior 'FAR' terminology)"
After v4 §IV-I was shrunk to a 3-paragraph reframing stub (post round-3), the phrase "v3.x §IV-I corpus-wide version" is misleading — v4 §IV-I now exists, just as a pointer. Opus round-2 N6 flagged this; codex round-9 did not.
**Severity.** Cosmetic.
**Recommended fix.** Update line 149 to "§III-L.1 (Big-4 v4 sample) and the inherited corpus-wide v3.x version cited at §IV-I (reported under prior 'FAR' terminology)".
### Concern D — Spearman precision mismatch §III vs §IV
**Issue.** §III-K.1 line 123127 reports Spearman ρ as 0.963 / 0.889 / 0.879 (3 decimal places); §IV-F Table IX line 8187 reports 0.9627 / 0.8890 / 0.8794 (4 decimal places). Codex round-8 flagged this as OPEN / COPY-EDIT.
**Severity.** Cosmetic.
**Recommended fix.** Standardise on 4 decimal places (matches Script 38 reported precision) in §III + §IV + §V-E + §VI.
## 9. Splice-readiness gate
| Item | Status | Notes |
|---|---|---|
| Abstract word count | ✓ | 247 / 250 |
| §I contributions count | ✓ | 8 contributions, all map to body |
| §II LOOO addition with refs [42]-[44] | ✓ | Present (post round-1) |
| §III sub-sections G..M complete | ✓ | Including Table XXVII numbered |
| §IV table sequence VXXVI sequential | ✓ | Post round-2 cascade |
| §V sub-sections A..H complete | ✓ | Post round-2 M4 fix (G→H) |
| §VI items 1..8 map to §I 1..8 | ✓ | 1:1 mapping verified |
| References [1][44] present | ✓ | 44 entries; [42]-[44] = Stone 1974 / Geisser 1975 / Vehtari 2017 |
| Internal draft notes stripped | ✗ | **Splice-time mechanical** — Phase 4 line 3 + lines 153-162; §III line 3 + lines 434-447; §IV line 3 + line 365+ |
| "Nine-tool" / "Table XV-B" residue | ✗ | **Splice-time mechanical** — only in internal-strip text |
| Cross-section number consistency | ✓ | All headline numbers match across Abstract / §I / §III / §IV |
| Terminology consistency (ICCR / K=3 / less-replication-dominated) | ✓ | No public-prose leaks |
| IEEE Access format | ✓ | Abstract single-paragraph ≤250 words; numbered references; numbered tables |
## 10. Recommended round-6 narrative-consistency patch (15 min, optional)
Before manuscript-splice, three small text patches would close the audit-surfaced concerns:
1. **Concern A**: Phase 4 line 31 — narrow the "Script 39c" provenance attribution for the jittered-dHash claim to match §III line 59.
2. **Concern C**: §III line 149 — update the "v3.x §IV-I corpus-wide version" wording to reflect v4 §IV-I's reframing-stub status.
3. **Concern D**: Standardise Spearman precision to 4 decimal places across §III/§IV/§V.
Optional:
- **Gap 2 / Opus N5**: Add a half-sentence to §V-H limit 2 interpreting the firm-dependent within-firm violation as "Firm A's per-firm ICCR is more contaminated by within-firm sharing than Firms B/C/D's, so the per-firm B/C/D rates are closer to clean specificity than the pooled rate."
None of these is empirical or structural. They are prose-level consistency polishings. **Submission can proceed without them**, but they would strengthen reviewer-pass robustness.
## 11. Submission-readiness verdict
**Conditionally ready.**
The empirical core is sound and reproducible. The Phase 5 panel converged 3/3 in Accept/Minor band. Five fix rounds have closed every reviewer-flagged finding. Numbers are consistent across sections. Terminology is consistent. The v4 pivot's narrative thread reads as a coherent argument from Abstract through Conclusion.
**Path A (ship now)**: proceed directly to manuscript-splice → DOCX export → partner Jimmy review → submission. Risk: the four audit-surfaced concerns above may be flagged by external reviewers as small cosmetic issues. Cost: zero pre-submission work.
**Path B (15-min polish first)**: apply round-6 patches for the four concerns above → re-verify with `rg`-based grep → manuscript-splice → DOCX → partner. Risk: zero. Cost: 15 min.
**Recommendation**: Path B. The four concerns are small, narrative-consistency only, and fixing them avoids putting cross-section attribution inconsistencies in front of partner Jimmy / IEEE Access reviewers.
After Path B (or A): proceed to manuscript-splice as the next mechanical step.