Files
pdf_signature_extraction/signature_analysis
gbanyan 39575cef49 Add script 39: signature-level convergence (SIG_CONVERGENCE_MODERATE)
Phase 1.7 follow-up to Script 38's per-CPA convergence. Tests
whether the convergence holds at signature granularity, preempting
"per-CPA aggregation washes out signal" reviewer attacks.

Three signature-level labels per Big-4 signature (n=150,442):
  L1 PaperA      non_hand iff cos > 0.95 AND dh <= 5
  L2 K=3 perCPA  hard assignment under per-CPA-fit components
  L3 K=3 perSig  hard assignment under fresh signature-level fit

Component comparison (per-CPA vs per-signature K=3):

  Component        Per-CPA cos/dh/wt     Per-Sig cos/dh/wt
  C1 hand-leaning  0.9457/9.17/0.143     0.9280/9.75/0.146
  C2 mixed         0.9558/6.66/0.536     0.9625/6.04/0.582
  C3 replicated    0.9826/2.41/0.321     0.9890/1.27/0.272

  Component drift modest: max |dcos| = 0.018, max |ddh| = 1.15.

Cohen kappa (binary, 1 = replicated):

  PaperA vs K=3 perCPA       kappa = 0.6616  substantial
  PaperA vs K=3 perSig       kappa = 0.5586  moderate
  K=3 perCPA vs K=3 perSig   kappa = 0.8701  almost perfect

Per-firm binary agreement PaperA vs K=3 perCPA:

  Firm A 86.13%, KPMG 77.46%, PwC 82.64%, EY 85.01%.

Verdict: SIG_CONVERGENCE_MODERATE (all kappas >= 0.40; per-CPA
aggregation captures most signature-level structure).

Implication for v4.0: per-CPA K=3 is robust to aggregation level
(kappa = 0.87 vs per-signature fit). The modest disagreement
between K=3 and Paper A's box rule (kappa 0.56-0.66) reflects
different decision geometries -- K=3 posterior soft boundary vs
Paper A rectangle box -- not a fundamental signal disagreement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:07:48 +08:00
..