Paper A v3.6: codex round-5 quick-wins cleanup (Minor Revision)

Codex gpt-5.4 round-5 (codex_review_gpt54_v3_5.md) verdict was Minor
Revision - all v3.4 round-4 PARTIAL/UNFIXED items now confirmed
RESOLVED, including line-by-line recomputation of Table XI z/p
matching the manuscript values. This commit cleans the remaining
quick-win items:

Table IX numerical sync to Script 24 authoritative values
- Five count corrections: cos>0.837 (60,405->60,408), cos>0.945
  (57,131/94.52% -> 56,836/94.02%, was 295 sigs / 0.50 pp off),
  cos>0.973 (48,910/80.91% -> 48,028/79.45%, was 882 sigs / 1.46 pp
  off), cos>0.95 (55,916->55,922), dh<=8 (57,521->57,527),
  dh<=15 (60,345->60,348), dual (54,373->54,370).
- Threshold label cos>0.941 -> cos>0.9407 (use exact calib-fold P5
  rather than rounded value).
- "dHash_indep <= 5 (calib-fold median-adjacent)" relabeled to
  "(whole-sample upper-tail of mode)" to match what III-L explains.
- Added "(operational dual)" / "(style-consistency boundary)" labels
  for unambiguous mapping into III-L category definitions.
- Removed circularity-language footnote inside the table comment.

Circularity overclaim removed paper-wide
- Methodology III-K (Section 3 anchor): "we break the resulting
  circularity" -> "we make the within-Firm-A sampling variance
  visible".
- Results IV-G.2 subsection title: "(breaks calibration-validation
  circularity)" -> "(within-Firm-A sampling variance disclosure)".
- Combined with the v3.5 Abstract / Conclusion edits, no surviving
  use of circular* anywhere in the paper.

export_v3.py title page now single-anonymized
- Removed "[Authors removed for double-blind review]" placeholder
  (IEEE Access uses single-anonymized review).
- Replaced with explicit "[AUTHOR NAMES - fill in before submission]"
  + affiliation placeholder so the requirement is unmissable.
- Subtitle now reads "single-anonymized review".

III-G stale "cosine-conditional dHash" sentence removed
- After the v3.5 III-L rewrite to dh_indep, the sentence at
  Methodology L131 referencing "cosine-conditional dHash used as a
  diagnostic elsewhere" no longer described any current paper usage.
- Replaced with a positive statement that dh_indep is the dHash
  statistic used throughout the operational classifier and all
  reported capture-rate analyses.

Abstract trimmed 247 -> 242 words for IEEE 250-word safety margin
- "an end-to-end pipeline" -> "a pipeline"; "Unlike signature
  forgery" -> "Unlike forgery"; "we report" passive recast; small
  conjunction trims.

Outstanding items deferred (require user decision / larger scope):
- BD/McCrary either substantiate (Z/p table + bin-width robustness)
  or demote to supplementary diagnostic.
- Visual-inspection protocol disclosure (sample size, rater count,
  blinding, adjudication rule).
- Reproducibility appendix (VLM prompt, HSV thresholds, seeds, EM
  init / stopping / boundary handling).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-21 12:41:11 +08:00
parent 12f716ddf1
commit 6946baa096
5 changed files with 193 additions and 19 deletions
+12 -13
View File
@@ -155,20 +155,19 @@ Fig. 3 presents the per-signature cosine and dHash distributions of Firm A compa
Table IX reports the proportion of Firm A signatures crossing each candidate threshold; these rates play the role of calibration-validation metrics (what fraction of a known replication-dominated population does each threshold capture?).
<!-- TABLE IX: Firm A Whole-Sample Capture Rates (consistency check, NOT external validation)
| Rule | Firm A rate | n / N |
| Rule | Firm A rate | k / N |
|------|-------------|-------|
| cosine > 0.837 (all-pairs KDE crossover) | 99.93% | 60,405 / 60,448 |
| cosine > 0.941 (calibration-fold P5) | 95.08% | 57,473 / 60,448 |
| cosine > 0.945 (2D GMM marginal crossing) | 94.52% | 57,131 / 60,448 |
| cosine > 0.95 | 92.51% | 55,916 / 60,448 |
| cosine > 0.973 (accountant KDE antimode) | 80.91% | 48,910 / 60,448 |
| dHash_indep ≤ 5 (calib-fold median-adjacent) | 84.20% | 50,897 / 60,448 |
| dHash_indep ≤ 8 | 95.17% | 57,521 / 60,448 |
| dHash_indep ≤ 15 | 99.83% | 60,345 / 60,448 |
| cosine > 0.95 AND dHash_indep ≤ 8 | 89.95% | 54,373 / 60,448 |
| cosine > 0.837 (all-pairs KDE crossover) | 99.93% | 60,408 / 60,448 |
| cosine > 0.9407 (calibration-fold P5) | 95.15% | 57,518 / 60,448 |
| cosine > 0.945 (2D GMM marginal crossing) | 94.02% | 56,836 / 60,448 |
| cosine > 0.95 | 92.51% | 55,922 / 60,448 |
| cosine > 0.973 (accountant-level KDE antimode) | 79.45% | 48,028 / 60,448 |
| dHash_indep ≤ 5 (whole-sample upper-tail of mode) | 84.20% | 50,897 / 60,448 |
| dHash_indep ≤ 8 | 95.17% | 57,527 / 60,448 |
| dHash_indep ≤ 15 (style-consistency boundary) | 99.83% | 60,348 / 60,448 |
| cosine > 0.95 AND dHash_indep ≤ 8 (operational dual) | 89.95% | 54,370 / 60,448 |
All rates computed exactly from the full Firm A sample (N = 60,448 signatures).
The threshold 0.941 corresponds to the 5th percentile of the calibration-fold Firm A cosine distribution (see Section IV-G for the held-out validation that addresses the circularity inherent in this whole-sample table).
All rates computed exactly from the full Firm A sample (N = 60,448 signatures); counts reproduce from `signature_analysis/24_validation_recalibration.py` (whole_firm_a section).
-->
Table IX is a whole-sample consistency check rather than an external validation: the thresholds 0.95, dHash median, and dHash 95th percentile are themselves anchored to Firm A via the calibration described in Section III-H.
@@ -204,7 +203,7 @@ Zero FRR against this subset does not establish zero FRR against the broader pos
Second, the 0.945 / 0.95 / 0.973 thresholds are derived from the Firm A calibration fold or the accountant-level methods rather than from this anchor set, so the FAR values in Table X are post-hoc-fit-free evaluations of thresholds that were not chosen to optimize Table X.
The very low FAR at the accountant-level thresholds is therefore informative about specificity against a realistic inter-CPA negative population.
### 2) Held-Out Firm A Validation (breaks calibration-validation circularity)
### 2) Held-Out Firm A Validation (within-Firm-A sampling variance disclosure)
We split Firm A CPAs randomly 70 / 30 at the CPA level into a calibration fold (124 CPAs, 45,116 signatures) and a held-out fold (54 CPAs, 15,332 signatures).
The total of 178 Firm A CPAs differs from the 180 in the Firm A registry by two CPAs whose signatures could not be matched to a single assigned-accountant record because of disambiguation ties in the CPA registry and which we therefore exclude from both folds; this handling is made explicit here and has no effect on the accountant-level mixture analysis of Section IV-E, which uses the $\geq 10$-signature subset of 171 CPAs.