Files
pdf_signature_extraction/paper/v4
gbanyan 723a3f6eaf Rewrite §III v7: anchor-based ICCR framework + composition-decomp finding
Major §III restructuring after codex rounds 29-34 demolished the
distributional path to thresholds (Scripts 39b-39e prove (cos, dHash)
multimodality is composition-driven + integer-tie artefact).

v4.0 pivots to anchor-based multi-level inter-CPA coincidence-rate
(ICCR) calibration via Scripts 40b, 43, 44, 45, 46:

- §III-G: scope justification rewritten (LOOO + Firm A case study +
  within-firm collision structure; dropped "smallest scope rejects
  unimodality" rationale); added sample-size reconciliation
  (150,442 descriptor-complete vs 150,453 vector-complete; 437
  accountant-level vs 468 all)
- §III-I: new sub-section I.4 composition decomposition (2x2 factorial
  centred + jittered Big-4 pooled dh p=0.35); I.5 conclusion of no
  natural threshold
- §III-J: K=3 recast as firm-compositional descriptive partition
  (not three mechanism clusters); bridge to §III-L.4 cross-firm
  hit matrix added
- §III-K: Score 1 reframed as firm-composition position score
- §III-L: NEW major sub-section — anchor-based threshold calibration
  with L.0 methodology, L.1 per-comparison ICCR (replicates v3
  cos>0.95 -> 0.0006; new dh<=5 -> 0.0013; joint -> 0.00014),
  L.2 pool-normalised per-signature ICCR (any-pair HC 11.02%;
  per-firm A 25.94% vs B/C/D <1.5%), L.3 doc-level ICCR (HC 18%;
  HC+MC 34%), L.4 firm heterogeneity logistic OR 0.01-0.05 +
  cross-firm hit matrix (98-100% within-firm), L.5 alert-rate
  sensitivity (HC threshold locally sensitive not plateau-stable),
  L.6 observed deployed alert rate excess over inter-CPA proxy
- §III-M: NEW sub-section — multi-tool validation strategy under
  unsupervised setting; 9 partial-evidence diagnostics each with
  disclosed untested assumption; positioning as anchor-calibrated
  screening framework with human-in-the-loop review, NOT validated
  forensic detector
- Terminology: "FAR" replaced with "inter-CPA coincidence rate
  (ICCR)" throughout; primary metric name change documented in
  §III-L.0
- Provenance table: ~35 new rows for Scripts 39b-e/40b/43-46;
  "key numerical claims" instead of "every numerical claim"
- Removed v2-v6 internal changelog metadata; v7 draft note added

Codex round-32 SOUND_WITH_QUALIFICATIONS, round-33 GO_WITH_REVISIONS,
round-34 READY_WITH_NARROW_FIXES (all 8 patches applied).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:27:01 +08:00
..