Apply codex round-22 corrections to §III v3 (Minor -> ready)

Codex gpt-5.5 round 22 returned Minor Revision after v2 closed 3/5 Major findings fully and 2/5 partially. Five narrow fixes applied for v3: 1. Per-firm ranking unanimity corrected (v2:93). The reverse- anchor score ranks Firm D fractionally higher than Firm C (-0.7125 vs -0.7672); only Scores 1 and 3 rank Firm C highest. The unanimity claim was wrong; v3 prose now says all three agree on Firm A as most replication-dominated and on the non-Firm-A Big-4 as more hand-leaning, with a modest disagreement on Firm C vs D ordering. 2. "Smallest scope" / "any single firm" overclaim narrowed (v2:21, v2:43). Script 32 only tested Firm A alone, big4_non_A pooled, and all_non_A pooled -- not Firms B, C, D individually. v3 explicitly says "comparison scopes tested in Script 32" and notes single-firm dip tests for B, C, D were not separately computed. 3. K=3 hard label vs posterior in Spearman correctly attributed (v2:143). Script 38 uses the K=3 posterior P(C1), not the hard label, in the internal-consistency Spearman correlations. v3 §III-L now correctly says the hard label is for the §IV cluster cross-tabulation while the posterior is the continuous Score 1 in §III-K. 4. Provenance source for n=150,442 corrected (v2:17, v2:152). Script 39 directly reports this count in its per-signature K=3 fit; Script 38's report does not. v3 cites Script 39 for this number. 5. "Max fold-to-fold deviation" wording made precise (v2:65, v2:107). The $0.028$ value is the max absolute deviation from the across-fold mean (Script 36 stability summary), not the pairwise across-fold range (which is $0.0376 = 0.9756 - 0.9380$). v3 reports both statistics with explicit definitions. Also: draft note at top now records v2 (round-21) and v3 (round-22) revision lineage. Cross-reference index and open- question block retained as author working checklist (will be removed before manuscript submission per codex e7). Outstanding open questions reduced to 3 (codex round-22 view): - Five-way moderate-confidence band: validate in Big-4 specifically (Phase 3 §IV-F work) or document as inherited from v3.x? - Firm anonymisation policy in §IV-V (procedural) - §IV table numbering (procedural; defer until §IV done) Phase 2 §III draft is now Minor-Revision-quality. Ready for Phase 3 (Results regeneration §IV). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:26:02 +08:00
parent 62a22ceb83
commit c8c7656513
2 changed files with 103 additions and 10 deletions
@@ -1,6 +1,12 @@
-# Section III. Methodology — v4.0 Draft v2 (post codex round-21 revision)
+# Section III. Methodology — v4.0 Draft v3 (post codex rounds 21 + 22)

-> **Draft note (2026-05-12, v2).** This file replaces the §III-G through §III-L block of `paper/paper_a_methodology_v3.md` (v3.20.0). Sub-sections III-A through III-F (Pipeline / Data Collection / Page Identification / Detection / Feature Extraction / Dual-Method Descriptors) are unchanged from v3.20.0 and not reproduced here. v2 incorporates codex gpt-5.5 round-21 review (`paper/codex_review_gpt55_v4_round1.md`); the key revisions are: (i) the inherited five-way per-signature box rule is restored as the **primary operational classifier** (§III-L), (ii) the K=3 Gaussian mixture is positioned as an **accountant-level descriptive characterisation**, not an operational threshold (§III-J), (iii) the "convergent validation" framing is softened to "convergent internal-consistency checks" since the three scores share underlying features (§III-K), (iv) the pixel-identity metric is renamed from FAR to positive-anchor miss rate (§III-K), (v) five empirical/wording slips flagged by the reviewer are corrected. Empirical anchors throughout reference Scripts 32–40 on branch `paper-a-v4-big4`; a provenance table appears at the end of this section listing every numerical claim with its script and report path.
+> **Draft note (2026-05-12, v3).** This file replaces the §III-G through §III-L block of `paper/paper_a_methodology_v3.md` (v3.20.0). Sub-sections III-A through III-F (Pipeline / Data Collection / Page Identification / Detection / Feature Extraction / Dual-Method Descriptors) are unchanged from v3.20.0 and not reproduced here.
+>
+> **v2** incorporated codex gpt-5.5 round-21 review (`paper/codex_review_gpt55_v4_round1.md`, Major Revision); key revisions were: (i) the inherited five-way per-signature box rule restored as the **primary operational classifier** (§III-L), (ii) the K=3 Gaussian mixture positioned as **accountant-level descriptive characterisation** (§III-J), (iii) "convergent validation" softened to "convergent internal-consistency checks" since the three scores share underlying features (§III-K), (iv) the pixel-identity metric renamed from FAR to positive-anchor miss rate (§III-K), (v) five empirical/wording slips corrected.
+>
+> **v3** incorporates codex gpt-5.5 round-22 review (`paper/codex_review_gpt55_v4_round2.md`, Minor Revision); five narrow fixes applied: (1) §III-K per-firm ranking corrected to acknowledge that Score 2 (reverse-anchor) ranks Firm D fractionally above Firm C while Scores 1 and 3 rank Firm C highest; (2) "smallest scope" / "any single firm" language narrowed to "comparison scopes tested in Script 32"; (3) §III-L corrected so the §III-K Spearman correlations are explicitly tied to the K=3 *posterior* P(C1) rather than the K=3 hard label; (4) provenance table source for $n = 150{,}442$ updated to cite Script 39 directly; (5) "max fold-to-fold deviation" wording made precise — the $0.028$ value is the maximum absolute deviation from the across-fold mean, not the pairwise across-fold range ($0.0376$).
+>
+> Empirical anchors throughout reference Scripts 32–40 on branch `paper-a-v4-big4`; a provenance table appears at the end of this section listing every numerical claim with its script and report path.

 ## G. Unit of Analysis and Scope

@@ -14,11 +20,11 @@ We adopt one stipulation about same-CPA pair detectability:

 A1 is plausible for high-volume stamping or firm-level electronic signing workflows but is not guaranteed when (i) the corpus contains only one observed replicated report for a CPA, (ii) multiple template variants are used in parallel, or (iii) scan-stage noise pushes a replicated pair outside the detection regime. A1 is the only assumption the per-signature detector requires to be sensitive to replication.

-**Scope: the Big-4 sub-corpus.** v4.0's primary analyses (§III-I, §III-J, §III-K, all §IV results except §IV-K) are restricted to the four largest accounting firms in Taiwan, pseudonymously labelled Firm A through Firm D throughout the manuscript. The Big-4 sub-corpus comprises 437 CPAs (171 / 112 / 102 / 52 across Firms A through D) with $n_{\text{sig}} \geq 10$, totalling 150,442 signatures with both descriptors available (Script 38 verification). Restricting the analyses to Big-4 is a methodological choice driven by four considerations:
+**Scope: the Big-4 sub-corpus.** v4.0's primary analyses (§III-I, §III-J, §III-K, all §IV results except §IV-K) are restricted to the four largest accounting firms in Taiwan, pseudonymously labelled Firm A through Firm D throughout the manuscript. The Big-4 sub-corpus comprises 437 CPAs (171 / 112 / 102 / 52 across Firms A through D) with $n_{\text{sig}} \geq 10$ (Scripts 36, 38), totalling 150,442 Big-4 signatures with both descriptors available (Script 39 reports the explicit per-signature $n$ used in the signature-level K=3 fit). Restricting the analyses to Big-4 is a methodological choice driven by four considerations:

 1. **Within-pool homogeneity for mixture characterisation.** Pooling Big-4 with mid- and small-firm CPAs introduces a heterogeneous tail of $\sim$249 CPAs distributed across multiple firms with idiosyncratic signing practices and small per-firm samples. The full-sample and Big-4-only calibrations *differ* in their fitted marginal crossings (full-sample published $\overline{\text{cos}}^* = 0.945$, $\overline{\text{dHash}}^* = 8.10$ from v3.x; Big-4-only $\overline{\text{cos}}^* = 0.975$, $\overline{\text{dHash}}^* = 3.76$ from Script 34; bootstrap 95% CIs $[0.974, 0.977]$ / $[3.48, 3.97]$, $n_{\text{boot}} = 500$); the offset is large compared to the Big-4 bootstrap CI half-width of $0.0015$. We report this as a *scope-dependent shift* rather than asserting a causal "mid/small-firm tail distorts" claim.

-2. **Statistical multimodality at the accountant level.** Within the Big-4 sub-corpus, the Hartigan dip test rejects unimodality on both axes (cosine $p = 0.0000$, dHash $p = 0.0000$ in the bootstrap-2000 implementation, i.e., no bootstrap replicate exceeded the observed statistic; reported here as $p < 5 \times 10^{-4}$; Script 34). No such rejection holds within any single firm pooled alone (Firm A: $p_{\text{cos}} = 0.992$, $p_{\text{dHash}} = 0.924$, $n = 171$; Script 32). Pooling Firm A's three Big-4 peers gives $p_{\text{cos}} = 0.998$, $p_{\text{dHash}} = 0.906$ (Script 32, `big4_non_A` subset). Pooling all non-Firm-A CPAs (other Big-4 plus mid/small firms) gives $p_{\text{cos}} = 0.998$, $p_{\text{dHash}} = 0.907$ (Script 32, `all_non_A` subset). Big-4 is the smallest scope at which the dip test supports applying a finite-mixture model to the per-CPA distribution.
+2. **Statistical multimodality at the accountant level.** Within the Big-4 sub-corpus, the Hartigan dip test rejects unimodality on both axes (cosine $p = 0.0000$, dHash $p = 0.0000$ in the bootstrap-2000 implementation, i.e., no bootstrap replicate exceeded the observed statistic; reported here as $p < 5 \times 10^{-4}$; Script 34). No such rejection holds in the comparison scopes tested by Script 32: Firm A pooled alone ($p_{\text{cos}} = 0.992$, $p_{\text{dHash}} = 0.924$, $n = 171$); Firms B + C + D pooled (Script 32, `big4_non_A` subset: $p_{\text{cos}} = 0.998$, $p_{\text{dHash}} = 0.906$); all non-Firm-A pooled (Script 32, `all_non_A` subset: other Big-4 plus mid/small firms; $p_{\text{cos}} = 0.998$, $p_{\text{dHash}} = 0.907$). Among the comparison scopes we evaluated, Big-4 is the smallest scope at which the dip test supports applying a finite-mixture model to the per-CPA distribution; we did not separately test single-firm dip statistics for Firms B, C, or D.

 3. **Reproducibility under leave-one-firm-out cross-validation.** §III-K reports leave-one-firm-out (LOOO) cross-validation of the Big-4 mixture fit. The Big-4 sub-corpus permits a four-fold LOOO at the firm level (one fold per Big-4 firm). No analogous firm-level fold is available outside Big-4 because mid/small firms have CPA counts of $O(1)$–$O(30)$ per firm.

@@ -40,7 +46,7 @@ The reverse-anchor reference centre is at $\overline{\text{cos}} = 0.935$, $\ove

 This section characterises the joint distribution of accountant-level descriptor means $(\overline{\text{cos}}_a, \overline{\text{dHash}}_a)$ across the 437 Big-4 CPAs of §III-G. Three diagnostic procedures are applied: a univariate unimodality test on each marginal axis, a 2D Gaussian mixture fit (developed in §III-J), and a density-smoothness diagnostic.

-**1. Hartigan dip test on each marginal.** We apply the Hartigan & Hartigan dip test [37] to each of the two marginal distributions $\{\overline{\text{cos}}_a\}_{a=1}^{437}$ and $\{\overline{\text{dHash}}_a\}_{a=1}^{437}$, with bootstrap-based $p$-value estimation ($n_{\text{boot}} = 2000$). In both cases no bootstrap replicate exceeded the observed dip statistic, so the empirical $p$-value is bounded above by $5 \times 10^{-4}$; we report this in tables as $p < 5 \times 10^{-4}$ rather than $p = 0$ to reflect the bootstrap resolution (Script 34). For comparison, no rejection of unimodality holds at any narrower scope or in the non-Big-4 population (Script 32): Firm A alone ($p_{\text{cos}} = 0.992$, $p_{\text{dHash}} = 0.924$, $n = 171$); Firms B + C + D pooled ($p_{\text{cos}} = 0.998$, $p_{\text{dHash}} = 0.906$, $n = 266$); all non-Firm-A pooled ($p_{\text{cos}} = 0.998$, $p_{\text{dHash}} = 0.907$, $n = 515$). The dip-test multimodality at the Big-4 level is the empirical justification for fitting a finite-mixture model in §III-J; without it, the mixture would be a forced fit on an essentially unimodal distribution.
+**1. Hartigan dip test on each marginal.** We apply the Hartigan & Hartigan dip test [37] to each of the two marginal distributions $\{\overline{\text{cos}}_a\}_{a=1}^{437}$ and $\{\overline{\text{dHash}}_a\}_{a=1}^{437}$, with bootstrap-based $p$-value estimation ($n_{\text{boot}} = 2000$). In both cases no bootstrap replicate exceeded the observed dip statistic, so the empirical $p$-value is bounded above by $5 \times 10^{-4}$; we report this in tables as $p < 5 \times 10^{-4}$ rather than $p = 0$ to reflect the bootstrap resolution (Script 34). For comparison, no rejection of unimodality holds in the comparison scopes tested in Script 32: Firm A pooled alone ($p_{\text{cos}} = 0.992$, $p_{\text{dHash}} = 0.924$, $n = 171$); Firms B + C + D pooled ($p_{\text{cos}} = 0.998$, $p_{\text{dHash}} = 0.906$, $n = 266$); all non-Firm-A CPAs pooled ($p_{\text{cos}} = 0.998$, $p_{\text{dHash}} = 0.907$, $n = 515$). Single-firm dip tests for Firms B, C, and D were not separately computed; the comparison scopes above sufficed to establish that no narrower-than-Big-4 *tested* scope rejected unimodality. The dip-test multimodality at the Big-4 level is the empirical justification for fitting a finite-mixture model in §III-J; without it, the mixture would be a forced fit on an essentially unimodal distribution.

 **2. Mixture-model evidence.** A 2-component 2D Gaussian Mixture Model (full covariance, $n_{\text{init}} = 15$, fixed seed 42; Script 34) recovers components at $(\overline{\text{cos}}, \overline{\text{dHash}}) = (0.954, 7.14)$, weight $0.689$, and $(0.983, 2.41)$, weight $0.311$. The marginal crossings of the K=2 fit are $\overline{\text{cos}}^* = 0.9755$ and $\overline{\text{dHash}}^* = 3.755$, with bootstrap 95% confidence intervals $[0.9742, 0.9772]$ and $[3.48, 3.97]$ over $n_{\text{boot}} = 500$ resamples. The 3-component fit (§III-J) is BIC-preferred — using the convention that lower BIC is preferred, $\text{BIC}(K{=}3) - \text{BIC}(K{=}2) = -3.48$ (Script 36). The $\Delta$BIC magnitude is small in absolute terms; we do not treat $\Delta\text{BIC} = 3.5$ alone as decisive evidence for K=3, and the operational role of each fit is developed in §III-J and §III-K.

@@ -62,7 +68,7 @@ This section develops the K=2 and K=3 Gaussian mixture fits to the Big-4 account

 $\text{BIC}(K{=}3) = -1111.93$, lower than $K{=}2$ by $3.48$ (mild support for K=3 under standard BIC interpretation but not by itself decisive).

-**Why we report both K=2 and K=3.** Leave-one-firm-out cross-validation (§III-K) shows that K=2 is unstable across folds: holding Firm A out gives a fold rule cos $> 0.938$ AND dHash $\leq 8.79$, while holding any single non-Firm-A Big-4 firm out gives a fold rule near cos $> 0.975$ AND dHash $\leq 3.76$ (Script 36). The max fold-to-fold deviation of the cosine crossing is $0.028$, which is $5.6\times$ the report's $0.005$ across-fold *stability tolerance* (not bootstrap CI; the full-Big-4 bootstrap cosine half-width is the much smaller $0.0015$). K=3 in contrast has a *reproducible component shape*: across the four folds the C1 cosine mean varies by at most $0.005$, the C1 dHash mean by at most $0.96$, and the C1 weight by at most $0.025$ (Script 37). K=3 hard-posterior membership for the held-out firm is more composition-sensitive — for Firm C the held-out C1 rate is $36.3\%$ vs the full-Big-4 baseline of $23.5\%$, an absolute difference of $12.8$ pp; for Firm A the held-out C1 rate is $4.7\%$ vs baseline $0.0\%$; the report's own legend classifies this pattern as `P2_PARTIAL`, with the explicit interpretation that "the C1 cluster exists but membership is not well-predicted by the held-out fit" and "K=3 is not predictively useful as an operational classifier" (Script 37 verdict legend).
+**Why we report both K=2 and K=3.** Leave-one-firm-out cross-validation (§III-K) shows that K=2 is unstable across folds: holding Firm A out gives a fold rule cos $> 0.938$ AND dHash $\leq 8.79$, while holding any single non-Firm-A Big-4 firm out gives a fold rule near cos $> 0.975$ AND dHash $\leq 3.76$ (Script 36). The maximum absolute deviation of the four fold cosine crossings from their across-fold mean is $0.028$ (the corresponding pairwise across-fold range is $0.0376$, from $0.9380$ for the held-out-Firm-A fold to $0.9756$ for the held-out-EY fold; Script 36 stability summary). The $0.028$ value is $5.6\times$ the report's $0.005$ across-fold *stability tolerance* — not the bootstrap CI; the full-Big-4 bootstrap cosine half-width is the much smaller $0.0015$. K=3 in contrast has a *reproducible component shape*: across the four folds the C1 cosine mean varies by at most $0.005$, the C1 dHash mean by at most $0.96$, and the C1 weight by at most $0.025$ (Script 37). K=3 hard-posterior membership for the held-out firm is more composition-sensitive — for Firm C the held-out C1 rate is $36.3\%$ vs the full-Big-4 baseline of $23.5\%$, an absolute difference of $12.8$ pp; for Firm A the held-out C1 rate is $4.7\%$ vs baseline $0.0\%$; the report's own legend classifies this pattern as `P2_PARTIAL`, with the explicit interpretation that "the C1 cluster exists but membership is not well-predicted by the held-out fit" and "K=3 is not predictively useful as an operational classifier" (Script 37 verdict legend).

 We take the joint K=2/K=3 LOOO evidence as supporting the following descriptive claims, all of which are used in §III-K and §V but none of which underwrites a v4.0 operational classifier:

@@ -90,7 +96,7 @@ Pairwise Spearman rank correlations among the three scores, $n = 437$ Big-4 CPAs
 | Score 2 vs Score 3 | $+0.889$ | $< 10^{-149}$ |
 | Score 1 vs Score 2 | $+0.879$ | $< 10^{-142}$ |

-We read this as the strongest internal-consistency signal in v4.0: three different summarisations of the same descriptor pair agree on the per-CPA hand-leaning ranking with $\rho > 0.87$. The three scores agree on the per-firm ordering: by all three scores, Firm A is the most replication-dominated and Firm C is the most hand-leaning of the Big-4 (Script 38 per-firm summary). We do not claim this constitutes external validation of any operational classifier; the inherited box rule is calibrated separately (§III-L), and the convergence above shows that a mixture-derived score and a reverse-anchor score concur with the box rule's per-CPA-aggregated outputs.
+We read this as the strongest internal-consistency signal in v4.0: three different summarisations of the same descriptor pair agree on the per-CPA hand-leaning ranking with $\rho > 0.87$. The three scores agree on placing Firm A as the most replication-dominated and the three non-Firm-A Big-4 firms as more hand-leaning, but they do not all rank the non-Firm-A firms identically: the K=3 posterior P(C1) and the box-rule hand-leaning rate (Scores 1 and 3) place Firm C at the most-hand-leaning end of Big-4 (mean P(C1) $= 0.311$; mean box-rule hand-leaning rate $= 0.790$), while the reverse-anchor cosine percentile (Score 2) places Firm D fractionally higher than Firm C (mean reverse-anchor score $-0.7125$ vs Firm C $-0.7672$, with higher value indicating deeper into the reference left tail). The mean values for Firms B (KPMG) and D (EY) sit between Firms A and C on Scores 1 and 3 (Script 38 per-firm summary). We do not claim this constitutes external validation of any operational classifier; the inherited box rule is calibrated separately (§III-L), and the convergence above shows that a mixture-derived score and a reverse-anchor score concur with the box rule's per-CPA-aggregated outputs on the directional ordering, with a modest disagreement at the most-hand-leaning end between the three non-A Big-4 firms.

 **2. Per-signature consistency (Script 39).** Per-CPA aggregation could in principle reflect averaging across within-CPA heterogeneity rather than coherent within-CPA behaviour. We test this by repeating the K=3 fit at the signature level — fitting a fresh K=3 GMM to the 150,442 Big-4 signature-level $(\text{cos}, \text{dHash}_{\text{indep}})$ points (Script 39) — and comparing labels. The per-CPA and per-signature K=3 fits recover a broadly similar three-component ordering; per-CPA C1 is at $\overline{\text{cos}} = 0.946$, $\overline{\text{dHash}} = 9.17$ vs per-signature C1 at $\overline{\text{cos}} = 0.928$, $\overline{\text{dHash}} = 9.75$ (an absolute cosine drift of $0.018$). Cohen $\kappa$ on the binary collapse (replicated vs not-replicated):

@@ -104,7 +110,7 @@ The Script 39 report verdict is `SIG_CONVERGENCE_MODERATE`. The $\kappa = 0.870$

 **3. Leave-one-firm-out reproducibility (Scripts 36, 37).** Discussed in §III-J above. We summarise the joint result for cross-reference:

- *K=2 LOOO is unstable.* Max fold-to-fold cosine crossing deviation $= 0.028$ across the four folds, against the report's $0.005$ stability tolerance (Script 36). When Firm A is held out, the fold rule classifies $171/171$ of held-out Firm A CPAs as templated; when any non-Firm-A Big-4 firm is held out, the fold rule classifies $0$ of the held-out firm's CPAs as templated. This pattern indicates the K=2 boundary is essentially a Firm-A-vs-others separator rather than a within-Big-4 mechanism boundary.
+- *K=2 LOOO is unstable.* The maximum absolute deviation of the four fold cosine crossings from their across-fold mean is $0.028$, against the report's $0.005$ across-fold stability tolerance (Script 36; pairwise fold range $0.0376$, from $0.9380$ to $0.9756$). When Firm A is held out, the fold rule classifies $171/171$ of held-out Firm A CPAs as templated; when any non-Firm-A Big-4 firm is held out, the fold rule classifies $0$ of the held-out firm's CPAs as templated. This pattern indicates the K=2 boundary is essentially a Firm-A-vs-others separator rather than a within-Big-4 mechanism boundary.

 - *K=3 LOOO is partially stable.* The C1 hand-leaning component shape is reproducible across folds: max deviation from the full-Big-4 baseline is $0.005$ in cosine, $0.96$ in dHash, and $0.025$ in mixture weight (Script 37). Hard-posterior membership remains composition-sensitive — observed absolute differences are $1.8$–$12.8$ pp across the four folds, with the Firm C fold exceeding the report's $5$ pp viability bar; the report's own verdict is `P2_PARTIAL` ("K=3 is not predictively useful as an operational classifier"). We accordingly do not use K=3 hard-posterior membership as an operational label.

@@ -140,7 +146,7 @@ The conventions about these thresholds (cosine 0.95 as an operating point chosen

 **Document-level aggregation.** Each audit report typically carries two certifying-CPA signatures. We aggregate signature-level outcomes to document-level labels using the v3.x worst-case rule: the document inherits the *most-replication-consistent* signature label among the two signatures (rank order: High-confidence $>$ Moderate-confidence $>$ Style-consistency $>$ Uncertain $>$ Likely-hand-signed). The aggregation rule reflects the detection goal of flagging any potentially non-hand-signed report.

-**K=3 as accountant-level characterisation, not classifier.** The K=3 mixture of §III-J is reported in §IV as an accountant-level descriptive summary alongside the per-signature five-way classifier. We do not assign signature-level or document-level labels from the K=3 mixture in any v4.0 result table; the K=3 hard label is used solely for the accountant-level cluster cross-tabulation (Script 35) and the internal-consistency Spearman correlations (§III-K).
+**K=3 as accountant-level characterisation, not classifier.** The K=3 mixture of §III-J is reported in §IV as an accountant-level descriptive summary alongside the per-signature five-way classifier. We do not assign signature-level or document-level labels from the K=3 mixture in any v4.0 result table; the K=3 hard label is used for the accountant-level cluster cross-tabulation (Script 35), and the K=3 *posterior* P(C1) is used (as the continuous Score 1) in the internal-consistency Spearman correlations of §III-K.

 ---

@@ -149,7 +155,7 @@ The conventions about these thresholds (cosine 0.95 as an operating point chosen
 | Claim | Value | Source | Notes |
 |---|---|---|---|
 | Big-4 CPA count, $n_{\text{sig}} \geq 10$ | $437$ ($171/112/102/52$) | Script 36 sample sizes; Script 38 per-firm summary | direct |
-| Big-4 signature count | $150{,}442$ | Script 39 / Script 38 | direct |
+| Big-4 signature count | $150{,}442$ | Script 39 (per-signature K=3 fit explicitly cites this $n$) | direct |
 | Non-Big-4 reference CPA count | $249$ | Script 38 reference population | direct |
 | Big-4 K=2 marginal crossings $(0.9755, 3.755)$ | direct | Script 34; Script 36 §A | direct |
 | Bootstrap 95% CI cosine $[0.9742, 0.9772]$ | direct | Script 34; Script 36 §A | $n_{\text{boot}} = 500$ |