Paper A v3.13: resolve Opus 4.7 round-12 + codex gpt-5.5 round-13 findings
Opus 4.7 max-effort round-12 on v3.12 found 1 MAJOR + 7 MINOR residues;
codex gpt-5.5 xhigh round-13 cross-verified 11/11 RESOLVED and caught
one additional cosine-P95 ambiguity Opus missed (methodology L255).
Total 12 text-only edits across 5 files.
MAJOR M1 - Cosine P95→P7.5 terminology residue at two sites that cite
the v3.12-corrected Section III-L but still wrote "P95" (self-
contradiction). Fix: methodology L165 and results L247 both restated
as "whole-sample Firm A P7.5 heuristic" with the 92.5%/7.5%
complement spelled out.
MINOR findings and fixes:
- m1 Big-4 scope slip: methodology III-H(b) L166 and results IV-H.2
L311 said "every Big-4 auditor-year" but IV-H.2 ranking actually
pools all 4,629 auditor-years across Big-4 and Non-Big-4. Both
sites now say "every auditor-year ... across all firms."
- m2 178 vs 180 Firm A CPA breakdown: intro L54 and conclusion L21
now add "of 180 registered CPAs; 178 after excluding two with
disambiguation ties, Section IV-G.2" parenthetical to avoid the
misleading 180−171=9 reading.
- m3 IV-H.1 A2 citation: results L286 now explicitly invokes the
A2 within-year label-uniformity convention (Section III-G) when
reading the left-tail share as a partner-level "minority of hand-
signers."
- m4 IV-F L177 cross-ref / fold distinction: corrected Section III-H
→ Section III-L anchor, and added explicit note that the 0.95
heuristic is a whole-sample anchor while Table XI thresholds are
calibration-fold-derived (cosine P5 = 0.9407).
- m5 Table XVI (30,222) vs Table XVII (30,226) Firm A count gap:
results L406 now explains the 4-report difference (XVI restricts
to both-signers-Firm-A single-firm two-signer reports; XVII counts
at-least-one-Firm-A signer under the 84,386-document cohort).
- m6 Methodology L156 "four independent quantitative analyses"
actually enumerated 6 items: rephrased as "three primary
independent quantitative analyses plus a fourth strand comprising
three complementary checks."
- m7 Abstract "cluster into three groups" restored the "smoothly-
mixed" qualifier to match Discussion V-B and Conclusion L17.
- Codex-caught residue at methodology L255 ("Median, 1st percentile,
and 95th percentile of signature-level cosine/dHash distributions")
grammatically applied P95 to cosine too. Rewrote as
"cosine median, P1, and P5 (lower-tail) and dHash_indep median
and P95 (upper-tail)" matching Table XI L233 exactly.
No re-computation. All tables (IV-XVIII) and Appendix A numbers
unchanged. Abstract at 249/250 words after smoothly-mixed qualifier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -153,7 +153,7 @@ Rather than treating Firm A as a synthetic or laboratory positive control, we tr
|
||||
The background context for this choice is practitioner knowledge about Firm A's signing practice: industry practice at the firm is widely understood among practitioners to involve reproducing a stored signature image for the majority of certifying partners---originally via administrative stamping workflows and later via firm-level electronic signing systems---while not ruling out that a minority of partners may continue to hand-sign some or all of their reports.
|
||||
We use this only as background context for why Firm A is a plausible calibration candidate; the *evidence* for Firm A's replication-dominated status comes entirely from the paper's own analyses, which do not depend on any claim about signing practice beyond what the audit-report images themselves show.
|
||||
|
||||
We establish Firm A's replication-dominated status through four independent quantitative analyses, each of which can be reproduced from the public audit-report corpus alone:
|
||||
We establish Firm A's replication-dominated status through three primary independent quantitative analyses plus a fourth strand comprising three complementary checks, each of which can be reproduced from the public audit-report corpus alone:
|
||||
|
||||
First, *independent visual inspection* of randomly sampled Firm A reports reveals pixel-identical signature images across different audit engagements and fiscal years for the majority of partners---a physical impossibility under independent hand-signing events.
|
||||
|
||||
@@ -162,8 +162,8 @@ Second, *whole-sample signature-level rates*: 92.5% of Firm A's per-signature be
|
||||
Third, *accountant-level mixture analysis* (Section IV-E): a BIC-selected three-component Gaussian mixture over per-accountant mean cosine and mean dHash places 139 of the 171 Firm A CPAs (with $\geq 10$ signatures) in the high-replication C1 cluster and 32 in the middle-band C2 cluster, directly quantifying the within-firm heterogeneity.
|
||||
|
||||
Fourth, we additionally validate the Firm A benchmark through three complementary analyses reported in Section IV-H. Only the partner-level ranking is fully threshold-free; the longitudinal-stability and intra-report analyses use the operational classifier and are interpreted as consistency checks on its firm-level output:
|
||||
(a) *Longitudinal stability (Section IV-H.1).* The share of Firm A per-signature best-match cosine values below 0.95 is stable at 6-13% across 2013-2023, with the lowest share in 2023. The 0.95 cutoff is the whole-sample Firm A P95 of the per-signature cosine distribution (Section III-L); the substantive finding here is the *temporal stability* of the rate, not the absolute rate at any single year.
|
||||
(b) *Partner-level similarity ranking (Section IV-H.2).* When every Big-4 auditor-year is ranked globally by its per-auditor-year mean best-match cosine, Firm A auditor-years account for 95.9% of the top decile against a baseline share of 27.8% (a 3.5$\times$ concentration ratio), and this over-representation is stable across 2013-2023. This analysis uses only the ordinal ranking and is independent of any absolute cutoff.
|
||||
(a) *Longitudinal stability (Section IV-H.1).* The share of Firm A per-signature best-match cosine values below 0.95 is stable at 6-13% across 2013-2023, with the lowest share in 2023. The 0.95 cutoff is the whole-sample Firm A P7.5 heuristic (Section III-L; 92.5% of whole-sample Firm A signatures exceed this cutoff); the substantive finding here is the *temporal stability* of the rate, not the absolute rate at any single year.
|
||||
(b) *Partner-level similarity ranking (Section IV-H.2).* When every auditor-year is ranked globally by its per-auditor-year mean best-match cosine (across all firms: Big-4 and Non-Big-4), Firm A auditor-years account for 95.9% of the top decile against a baseline share of 27.8% (a 3.5$\times$ concentration ratio), and this over-representation is stable across 2013-2023. This analysis uses only the ordinal ranking and is independent of any absolute cutoff.
|
||||
(c) *Intra-report consistency (Section IV-H.3).* Because each Taiwanese statutory audit report is co-signed by two engagement partners, firm-wide stamping practice predicts that both signers on a given Firm A report should receive the same signature-level label under the classifier. Firm A exhibits 89.9% intra-report agreement against 62-67% at the other Big-4 firms. This test uses the operational classifier and is therefore a *consistency* check on the classifier's firm-level output rather than a threshold-free test; the cross-firm gap (not the absolute rate) is the substantive finding.
|
||||
|
||||
We emphasize that the 92.5% figure is a within-sample consistency check rather than an independent validation of Firm A's status; the validation role is played by the visual inspection, the accountant-level mixture, the three complementary analyses above, and the held-out Firm A fold (which confirms the qualitative replication-dominated framing; fold-level rate differences are disclosed in Section IV-G.2) described in Section III-K.
|
||||
@@ -252,7 +252,7 @@ This anchor is substantially larger than a simple low-similarity-same-CPA negati
|
||||
|
||||
3. **Firm A anchor (replication-dominated prior positive):** Firm A signatures, treated as a majority-positive reference with within-firm heterogeneity in the left tail (consistent with a minority of hand-signers), as evidenced by the 32/171 middle-band share in the accountant-level mixture (Section III-H).
|
||||
Because Firm A is both used for empirical percentile calibration in Section III-H and as a validation anchor, we make the within-Firm-A sampling variance visible by splitting Firm A CPAs randomly (at the CPA level, not the signature level) into a 70% *calibration* fold and a 30% *heldout* fold.
|
||||
Median, 1st percentile, and 95th percentile of signature-level cosine/dHash distributions are derived from the calibration fold only.
|
||||
The calibration-fold percentiles used in thresholding---cosine median, P1, and P5 (lower-tail, since higher cosine indicates greater similarity), and dHash_indep median and P95 (upper-tail, since lower dHash indicates greater similarity)---are derived from the 70% calibration fold only.
|
||||
The heldout fold is used exclusively to report post-hoc capture rates with Wilson 95% confidence intervals.
|
||||
|
||||
4. **Low-similarity same-CPA anchor (supplementary negative):** signatures whose maximum same-CPA cosine similarity is below 0.70.
|
||||
|
||||
Reference in New Issue
Block a user