5717d61dd459da99bd25249c03cd380c0a1e2cc9
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
5717d61dd4 |
Paper A v3.3: apply codex v3.2 peer-review fixes
Codex (gpt-5.4) second-round review recommended 'minor revision'. This commit addresses all issues flagged in that review. ## Structural fixes - dHash calibration inconsistency (codex #1, most important): Clarified in Section III-L that the <=5 and <=15 dHash cutoffs come from the whole-sample Firm A cosine-conditional dHash distribution (median=5, P95=15), not from the calibration-fold independent-minimum dHash distribution (median=2, P95=9) which we report elsewhere as descriptive anchors. Added explicit note about the two dHash conventions and their relationship. - Section IV-H framing (codex #2): Renamed "Firm A Benchmark Validation: Threshold-Independent Evidence" to "Additional Firm A Benchmark Validation" and clarified in the section intro that H.1 uses a fixed 0.95 cutoff, H.2 is fully threshold-free, H.3 uses the calibrated classifier. H.3's concluding sentence now says "the substantive evidence lies in the cross-firm gap" rather than claiming the test is threshold-free. - Table XVI 93,979 typo fixed (codex #3): Corrected to 84,354 total (83,970 same-firm + 384 mixed-firm). - Held-out Firm A denominator 124+54=178 vs 180 (codex #4): Added explicit note that 2 CPAs were excluded due to disambiguation ties in the CPA registry. - Table VIII duplication (codex #5): Removed the duplicate accountant-level-only Table VIII comment; the comprehensive cross-level Table VIII subsumes it. Text now says "accountant-level rows of Table VIII (below)". - Anonymization broken in Tables XIV-XVI (codex #6): Replaced "Deloitte"/"KPMG"/"PwC"/"EY" with "Firm A"/"Firm B"/"Firm C"/ "Firm D" across Tables XIV, XV, XVI. Table and caption language updated accordingly. - Table X unit mismatch (codex #7): Dropped precision, recall, F1 columns. Table now reports FAR (against the inter-CPA negative anchor) with Wilson 95% CIs and FRR (against the byte-identical positive anchor). III-K and IV-G.1 text updated to justify the change. ## Sentence-level fixes - "three independent statistical methods" in Methodology III-A -> "three methodologically distinct statistical methods". - "three independent methods" in Conclusion -> "three methodologically distinct methods". - Abstract "~0.006 converging" now explicitly acknowledges that BD/McCrary produces no significant accountant-level discontinuity. - Conclusion ditto. - Discussion limitation sentence "BD/McCrary should be interpreted at the accountant level for threshold-setting purposes" rewritten to reflect v3.3 result that BD/McCrary is a diagnostic, not a threshold estimator, at the accountant level. - III-H "two analyses" -> "three analyses" (H.1 longitudinal stability, H.2 partner ranking, H.3 intra-report consistency). - Related Work White 1982 overclaim rewritten: "consistent estimators of the pseudo-true parameter that minimizes KL divergence" replaces "guarantees asymptotic recovery". - III-J "behavior is close to discrete" -> "practice is clustered". - IV-D.2 pivot sentence "discreteness of individual behavior yields bimodality" -> "aggregation over signatures reveals clustered (though not sharply discrete) patterns". Target journal remains IEEE Access. Output: Paper_A_IEEE_Access_Draft_v3.docx (395 KB). Codex v3.2 review saved to paper/codex_review_gpt54_v3_2.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9d19ca5a31 |
Paper A v3.1: apply codex peer-review fixes + add Scripts 20/21
Major fixes per codex (gpt-5.4) review: ## Structural fixes - Fixed three-method convergence overclaim: added Script 20 to run KDE antimode, BD/McCrary, and Beta mixture EM on accountant-level means. Accountant-level 1D convergence: KDE antimode=0.973, Beta-2=0.979, LogGMM-2=0.976 (within ~0.006). BD/McCrary finds no transition at accountant level (consistent with smooth clustering, not sharp discontinuity). - Disambiguated Method 1: KDE crossover (between two labeled distributions, used at signature all-pairs level) vs KDE antimode (single-distribution local minimum, used at accountant level). - Addressed Firm A circular validation: Script 21 adds CPA-level 70/30 held-out fold. Calibration thresholds derived from 70% only; heldout rates reported with Wilson 95% CIs (e.g. cos>0.95 heldout=93.61% [93.21%-93.98%]). - Fixed 139+32 vs 180: the split is 139/32 of 171 Firm A CPAs with >=10 signatures (9 CPAs excluded for insufficient sample). Reconciled across intro, results, discussion, conclusion. - Added document-level classification aggregation rule (worst-case signature label determines document label). ## Pixel-identity validation strengthened - Script 21: built ~50,000-pair inter-CPA random negative anchor (replaces the original n=35 same-CPA low-similarity negative which had untenable Wilson CIs). - Added Wilson 95% CI for every FAR in Table X. - Proper EER interpolation (FAR=FRR point) in Table X. - Softened "conservative recall" claim to "non-generalizable subset" language per codex feedback (byte-identical positives are a subset, not a representative positive class). - Added inter-CPA stats: mean=0.762, P95=0.884, P99=0.913. ## Terminology & sentence-level fixes - "statistically independent methods" -> "methodologically distinct methods" throughout (three diagnostics on the same sample are not independent). - "formal bimodality check" -> "unimodality test" (dip test tests H0 of unimodality; rejection is consistent with but not a direct test of bimodality). - "Firm A near-universally non-hand-signed" -> already corrected to "replication-dominated" in prior commit; this commit strengthens that framing with explicit held-out validation. - "discrete-behavior regimes" -> "clustered accountant-level heterogeneity" (BD/McCrary non-transition at accountant level rules out sharp discrete boundaries; the defensible claim is clustered-but-smooth). - Softened White 1982 quasi-MLE claim (no longer framed as a guarantee). - Fixed VLM 1.2% FP overclaim (now acknowledges the 1.2% could be VLM FP or YOLO FN). - Unified "310 byte-identical signatures" language across Abstract, Results, Discussion (previously alternated between pairs/signatures). - Defined min_dhash_independent explicitly in Section III-G. - Fixed table numbering (Table XI heldout added, classification moved to XII, ablation to XIII). - Explained 84,386 vs 85,042 gap (656 docs have only one signature, no pairwise stat). - Made Table IX explicitly a "consistency check" not "validation"; paired it with Table XI held-out rates as the genuine external check. - Defined 0.941 threshold (calibration-fold Firm A cosine P5). - Computed 0.945 Firm A rate exactly (94.52%) instead of interpolated. - Fixed Ref [24] Qwen2.5-VL to full IEEE format (arXiv:2502.13923). ## New artifacts - Script 20: accountant-level three-method threshold analysis - Script 21: expanded validation (inter-CPA anchor, held-out Firm A 70/30) - paper/codex_review_gpt54_v3.md: preserved review feedback Output: Paper_A_IEEE_Access_Draft_v3.docx (391 KB, rebuilt from v3.1 markdown sources). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9b11f03548 |
Paper A v3: full rewrite for IEEE Access with three-method convergence
Major changes from v2: Terminology: - "digitally replicated" -> "non-hand-signed" throughout (per partner v3 feedback and to avoid implicit accusation) - "Firm A near-universal non-hand-signing" -> "replication-dominated" (per interview nuance: most but not all Firm A partners use replication) Target journal: IEEE TAI -> IEEE Access (per NCKU CSIE list) New methodological sections (III.G-III.L + IV.D-IV.G): - Three convergent threshold methods (KDE antimode + Hartigan dip test / Burgstahler-Dichev McCrary / EM-fitted Beta mixture + logit-GMM robustness check) - Explicit unit-of-analysis discussion (signature vs accountant) - Accountant-level 2D Gaussian mixture (BIC-best K=3 found empirically) - Pixel-identity validation anchor (no manual annotation needed) - Low-similarity negative anchor + Firm A replication-dominated anchor New empirical findings integrated: - Firm A signature cosine UNIMODAL (dip p=0.17) - long left tail = minority hand-signers - Full-sample cosine MULTIMODAL but not cleanly bimodal (BIC prefers 3-comp mixture) - signature-level is continuous quality spectrum - Accountant-level mixture trimodal (C1 Deloitte-heavy 139/141, C2 other Big-4, C3 smaller firms). 2-comp crossings cos=0.945, dh=8.10 - Pixel-identity anchor (310 pairs) gives perfect recall at all cosine thresholds - Firm A anchor rates: cos>0.95=92.5%, dual-rule cos>0.95 AND dh<=8=89.95% New discussion section V.B: "Continuous-quality spectrum vs discrete- behavior regimes" - the core interpretive contribution of v3. References added: Hartigan & Hartigan 1985, Burgstahler & Dichev 1997, McCrary 2008, Dempster-Laird-Rubin 1977, White 1982 (refs 37-41). export_v3.py builds Paper_A_IEEE_Access_Draft_v3.docx (462 KB, +40% vs v2 from expanded methodology + results sections). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |