Paper A v3.6: codex round-5 quick-wins cleanup (Minor Revision)

Codex gpt-5.4 round-5 (codex_review_gpt54_v3_5.md) verdict was Minor Revision - all v3.4 round-4 PARTIAL/UNFIXED items now confirmed RESOLVED, including line-by-line recomputation of Table XI z/p matching the manuscript values. This commit cleans the remaining quick-win items: Table IX numerical sync to Script 24 authoritative values - Five count corrections: cos>0.837 (60,405->60,408), cos>0.945 (57,131/94.52% -> 56,836/94.02%, was 295 sigs / 0.50 pp off), cos>0.973 (48,910/80.91% -> 48,028/79.45%, was 882 sigs / 1.46 pp off), cos>0.95 (55,916->55,922), dh<=8 (57,521->57,527), dh<=15 (60,345->60,348), dual (54,373->54,370). - Threshold label cos>0.941 -> cos>0.9407 (use exact calib-fold P5 rather than rounded value). - "dHash_indep <= 5 (calib-fold median-adjacent)" relabeled to "(whole-sample upper-tail of mode)" to match what III-L explains. - Added "(operational dual)" / "(style-consistency boundary)" labels for unambiguous mapping into III-L category definitions. - Removed circularity-language footnote inside the table comment. Circularity overclaim removed paper-wide - Methodology III-K (Section 3 anchor): "we break the resulting circularity" -> "we make the within-Firm-A sampling variance visible". - Results IV-G.2 subsection title: "(breaks calibration-validation circularity)" -> "(within-Firm-A sampling variance disclosure)". - Combined with the v3.5 Abstract / Conclusion edits, no surviving use of circular* anywhere in the paper. export_v3.py title page now single-anonymized - Removed "[Authors removed for double-blind review]" placeholder (IEEE Access uses single-anonymized review). - Replaced with explicit "[AUTHOR NAMES - fill in before submission]" + affiliation placeholder so the requirement is unmissable. - Subtitle now reads "single-anonymized review". III-G stale "cosine-conditional dHash" sentence removed - After the v3.5 III-L rewrite to dh_indep, the sentence at Methodology L131 referencing "cosine-conditional dHash used as a diagnostic elsewhere" no longer described any current paper usage. - Replaced with a positive statement that dh_indep is the dHash statistic used throughout the operational classifier and all reported capture-rate analyses. Abstract trimmed 247 -> 242 words for IEEE 250-word safety margin - "an end-to-end pipeline" -> "a pipeline"; "Unlike signature forgery" -> "Unlike forgery"; "we report" passive recast; small conjunction trims. Outstanding items deferred (require user decision / larger scope): - BD/McCrary either substantiate (Z/p table + bin-width robustness) or demote to supplementary diagnostic. - Visual-inspection protocol disclosure (sample size, rater count, blinding, adjudication rule). - Reproducibility appendix (VLM prompt, HSV thresholds, seeds, EM init / stopping / boundary handling). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 12:41:11 +08:00
parent 12f716ddf1
commit 6946baa096
5 changed files with 193 additions and 19 deletions
@@ -155,20 +155,19 @@ Fig. 3 presents the per-signature cosine and dHash distributions of Firm A compa
 Table IX reports the proportion of Firm A signatures crossing each candidate threshold; these rates play the role of calibration-validation metrics (what fraction of a known replication-dominated population does each threshold capture?).

 <!-- TABLE IX: Firm A Whole-Sample Capture Rates (consistency check, NOT external validation)
-| Rule | Firm A rate | n / N |
+| Rule | Firm A rate | k / N |
 |------|-------------|-------|
-| cosine > 0.837 (all-pairs KDE crossover) | 99.93% | 60,405 / 60,448 |
-| cosine > 0.941 (calibration-fold P5) | 95.08% | 57,473 / 60,448 |
-| cosine > 0.945 (2D GMM marginal crossing) | 94.52% | 57,131 / 60,448 |
-| cosine > 0.95 | 92.51% | 55,916 / 60,448 |
-| cosine > 0.973 (accountant KDE antimode) | 80.91% | 48,910 / 60,448 |
-| dHash_indep ≤ 5 (calib-fold median-adjacent) | 84.20% | 50,897 / 60,448 |
-| dHash_indep ≤ 8 | 95.17% | 57,521 / 60,448 |
-| dHash_indep ≤ 15 | 99.83% | 60,345 / 60,448 |
-| cosine > 0.95 AND dHash_indep ≤ 8 | 89.95% | 54,373 / 60,448 |
+| cosine > 0.837 (all-pairs KDE crossover)              | 99.93% | 60,408 / 60,448 |
+| cosine > 0.9407 (calibration-fold P5)                 | 95.15% | 57,518 / 60,448 |
+| cosine > 0.945 (2D GMM marginal crossing)             | 94.02% | 56,836 / 60,448 |
+| cosine > 0.95                                         | 92.51% | 55,922 / 60,448 |
+| cosine > 0.973 (accountant-level KDE antimode)        | 79.45% | 48,028 / 60,448 |
+| dHash_indep ≤ 5 (whole-sample upper-tail of mode)     | 84.20% | 50,897 / 60,448 |
+| dHash_indep ≤ 8                                       | 95.17% | 57,527 / 60,448 |
+| dHash_indep ≤ 15 (style-consistency boundary)         | 99.83% | 60,348 / 60,448 |
+| cosine > 0.95 AND dHash_indep ≤ 8 (operational dual)  | 89.95% | 54,370 / 60,448 |

-All rates computed exactly from the full Firm A sample (N = 60,448 signatures).
-The threshold 0.941 corresponds to the 5th percentile of the calibration-fold Firm A cosine distribution (see Section IV-G for the held-out validation that addresses the circularity inherent in this whole-sample table).
+All rates computed exactly from the full Firm A sample (N = 60,448 signatures); counts reproduce from `signature_analysis/24_validation_recalibration.py` (whole_firm_a section).
 -->

 Table IX is a whole-sample consistency check rather than an external validation: the thresholds 0.95, dHash median, and dHash 95th percentile are themselves anchored to Firm A via the calibration described in Section III-H.
@@ -204,7 +203,7 @@ Zero FRR against this subset does not establish zero FRR against the broader pos
 Second, the 0.945 / 0.95 / 0.973 thresholds are derived from the Firm A calibration fold or the accountant-level methods rather than from this anchor set, so the FAR values in Table X are post-hoc-fit-free evaluations of thresholds that were not chosen to optimize Table X.
 The very low FAR at the accountant-level thresholds is therefore informative about specificity against a realistic inter-CPA negative population.

-### 2) Held-Out Firm A Validation (breaks calibration-validation circularity)
+### 2) Held-Out Firm A Validation (within-Firm-A sampling variance disclosure)

 We split Firm A CPAs randomly 70 / 30 at the CPA level into a calibration fold (124 CPAs, 45,116 signatures) and a held-out fold (54 CPAs, 15,332 signatures).
 The total of 178 Firm A CPAs differs from the 180 in the Firm A registry by two CPAs whose signatures could not be matched to a single assigned-accountant record because of disambiguation ties in the CPA registry and which we therefore exclude from both folds; this handling is made explicit here and has no effect on the accountant-level mixture analysis of Section IV-E, which uses the $\geq 10$-signature subset of 171 CPAs.