Correct Firm A framing: replication-dominated, not pure

Interview evidence from multiple Firm A accountants confirms that MOST
use replication (stamping / firm-level e-signing) but a MINORITY may
still hand-sign. Firm A is therefore a "replication-dominated" population,
not a "pure" one. This framing is consistent with:

- 92.5% of Firm A signatures exceed cosine 0.95 (majority replication)
- The long left tail (~7%) captures the minority hand-signers, not scan
  noise or preprocessing artifacts
- Hartigan dip test: Firm A cosine unimodal long-tail (p=0.17)
- Accountant-level GMM: of 180 Firm A accountants, 139 cluster in C1
  (high-replication) and 32 in C2 (middle band = minority hand-signers)

Updates docstrings and report text in Scripts 15, 16, 18, 19 to match.
Partner v3's "near-universal non-hand-signing" language corrected.

Script 19 regenerated with the updated text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-20 21:57:16 +08:00
parent fbfab1fa68
commit 68689c9f9b
4 changed files with 37 additions and 6 deletions
@@ -11,9 +11,15 @@ occurring reference populations instead of manual labels:
=> absolute ground truth for replication.
Positive anchor 2: Firm A (Deloitte) signatures
Interview + visual evidence establishes near-universal non-hand-
signing across 2013-2023 (see memories 2026-04-08, 2026-04-14).
We treat Firm A as a strong prior positive.
Interview evidence from multiple Firm A accountants confirms that
MOST use replication (stamping / firm-level e-signing) but a
MINORITY may still hand-sign. Firm A is therefore a
"replication-dominated" population (not a pure one). We use it as
a strong prior positive for the majority regime, while noting that
~7% of Firm A signatures fall below cosine 0.95 consistent with
the minority hand-signers. This matches the long left tail
observed in the dip test (Script 15) and the Firm A members who
land in C2 (middle band) of the accountant-level GMM (Script 18).
Negative anchor: signatures with cosine <= low threshold
Pairs with very low cosine similarity cannot plausibly be pixel
@@ -354,7 +360,11 @@ def main():
f'({int(neg_mask.sum()):,} signatures). Treated as',
' confirmed not-replicated.',
f'* **Firm A anchor:** Deloitte ({int(firm_a_mask.sum()):,} signatures),',
' near-universally non-hand-signed per partner interviews.',
' a replication-dominated population per interviews with multiple',
' Firm A accountants: most use replication (stamping / firm-level',
' e-signing), but a minority may still hand-sign. Used as a strong',
' prior positive for the majority regime, with the ~7% below',
' cosine 0.95 reflecting the minority hand-signers.',
'',
'## Equal Error Rate (EER)',
'',