diff --git a/paper/Paper_A_IEEE_Access_Draft_v3.docx b/paper/Paper_A_IEEE_Access_Draft_v3.docx index bb8067f..9605c47 100644 Binary files a/paper/Paper_A_IEEE_Access_Draft_v3.docx and b/paper/Paper_A_IEEE_Access_Draft_v3.docx differ diff --git a/paper/codex_partner_redpen_audit_v3_19_0.md b/paper/codex_partner_redpen_audit_v3_19_0.md new file mode 100644 index 0000000..3ad4abf --- /dev/null +++ b/paper/codex_partner_redpen_audit_v3_19_0.md @@ -0,0 +1,43 @@ +# Codex Partner Red-Pen Regression Audit (Paper A v3.19.0) + +Scope: focused regression audit of whether the authors' partner red-pen comments on v3.17 have been adequately addressed in the current v3.19.0 manuscript files under `paper/`. This is not a fresh peer review. + +## 1. Overall summary + +For the 11 lettered red-pen items (a-k), my independent count is **7 RESOLVED / 1 IMPROVED / 0 PARTIAL / 0 UNRESOLVED / 3 N/A**. The two broader theme-level issues are **Citation reality: RESOLVED** and **ZH/EN alignment: N/A**. + +My bottom-line assessment is close to Gemini's: the revision substantially addresses the partner's concerns by deleting the most confusing accountant-level GMM / accountant-level BD-McCrary material and by replacing several AI-sounding explanations with more literal, auditable prose. I do not agree with Gemini's fully clean "8 RESOLVED / 3 N/A" verdict, however. The BIC / strict-3-component item is materially improved, but the manuscript still retains "upper bound" wording in the methods and Table VI even though the results correctly call the two-component fit a forced fit. That is a small prose/rationale residue, not a blocking unresolved issue. + +## 2. Item-by-item table + +| Item | Status | Manuscript section addressing it | Brief justification | Disagreement with Gemini audit | +|---|---:|---|---|---| +| Theme 1: Citation reality for refs [5], [16], [21], [22], [25], [27], [37]-[41] | RESOLVED | `paper_a_references_v3.md`; `reference_verification_v3.md` | The current reference list fixes the serious [5] author/title error and includes real, recognizable method references for Hartigan, Burgstahler-Dichev, McCrary, Dempster-Laird-Rubin, and White. The flagged technical references are not hallucinated. Minor citation-polish items from the verification file appear fixed in the current reference list. | No substantive disagreement. One housekeeping note: `reference_verification_v3.md` still describes [5] as a "major problem" in the detailed findings/recommendations because it records the audit history; the actual current reference list is fixed. | +| Theme 3: ZH/EN alignment gap at end of III-H Calibration Reference | N/A | Entire v3.19.0 manuscript | The dual-language zh-TW/en scaffold that produced the partner's "no English alongside?" concern is gone. The current draft is monolingual English for IEEE submission, so there is no remaining bilingual alignment task. | No disagreement. | +| (a) A1 stipulation, "do not understand your description" | RESOLVED | Section III-G, `paper_a_methodology_v3.md` | A1 is now stated as a specific cross-year pair-existence assumption: if replication occurs, at least one same-CPA near-identical pair exists in the observed same-CPA pool. The text also states when A1 may fail. This is much clearer than a vague stipulation. | No disagreement. | +| (h) A1 pair-detectability paragraph red-circled | RESOLVED | Section III-G, `paper_a_methodology_v3.md` | The red-circled assumption is now bounded: it is plausible for high-volume stamping/e-signing, not guaranteed under singletons, multiple templates, or scan noise, and not a within-year uniformity claim. That should answer the partner's concern about over-assumption. | No disagreement. | +| (b) Conservative structural-similarity wording, "a bit roundabout?" | RESOLVED | Section III-G, `paper_a_methodology_v3.md` | The independent-minimum dHash is now defined directly as the minimum Hamming distance to any same-CPA signature and identified as the statistic used in the classifier and capture-rate analyses. The wording is concise enough for re-read. | No disagreement. | +| (c) IV-G validation lead-in, "do not understand why you say this" | RESOLVED | Section IV-G, `paper_a_results_v3.md` | The lead-in now explicitly says Section IV-E capture rates are internally circular because Firm A helped set the thresholds, then explains why the three IV-G analyses are threshold-free or threshold-robust. This directly supplies the missing rationale. | No disagreement. | +| (d) BD/McCrary at accountant level, "cannot understand" | N/A | Removed from current structure | The accountant-level BD/McCrary analysis no longer appears in the live v3.19.0 manuscript. BD/McCrary is now signature-level only and framed as a density-smoothness diagnostic, not an accountant-level threshold device. | No disagreement. | +| (k) Accountant-level aggregation rationale, "why accountant level total, because component?" | N/A | Removed from current structure | The confusing accountant-level component narrative has been deleted. The paper now avoids translating signature-level outputs into accountant-level mechanism assignments except for auditor-year ranking. | No disagreement. | +| (e) 92.6% match rate, "do not understand improvement angle" | RESOLVED | Section III-D, `paper_a_methodology_v3.md`; Table III in Section IV-B | The match rate is now a data-processing coverage metric: 168,755 of 182,328 signatures are CPA-matched, and the unmatched 7.4% are excluded because same-CPA best-match statistics are undefined. The old "improvement" angle is gone. | No disagreement. | +| (f) 0.95 cosine cutoff, "cut-off corresponds to what?" | RESOLVED | Section III-K, `paper_a_methodology_v3.md`; Sections IV-E/F | The text now states that 0.95 corresponds to the whole-sample Firm A P7.5 heuristic: 92.5% of Firm A signatures exceed it and 7.5% fall at or below it. It also distinguishes 0.95 from the calibration-fold P5 = 0.9407 and rounded 0.945 sensitivity cut. | No disagreement. | +| (g) 139/32 C1/C2 split, "too reliant on weighting factor?" | N/A | Removed from current structure | The C1/C2 accountant-level GMM cluster split is gone from the current manuscript. Residual fold-variance wording no longer invokes the 139/32 split. | No disagreement. | +| (i) Hartigan rejection-as-bimodality, "so why?" | RESOLVED | Section III-I.1, `paper_a_methodology_v3.md`; Section IV-D.1 | The text now separates the dip test from component counting: it tests unimodality, does not specify a component count, and is used to decide whether a KDE antimode is meaningful. Section IV-D then explains why Firm A's non-rejection and all-CPA rejection matter. | No disagreement. | +| (j) BIC strict-3-component upper-bound framing, red-circled paragraph | IMPROVED | Section III-I.2/III-I.4, `paper_a_methodology_v3.md`; Section IV-D.3/IV-D.4, `paper_a_results_v3.md` | The results section is much clearer: it labels the 2-component Beta mixture as "A Forced Fit," reports the 3-component BIC preference, and says the Beta/logit disagreement reflects unsupported parametric structure. However, the methods still say the 2-component crossing "should be treated as an upper bound," and Table VI labels one row as "signature-level Beta/KDE upper bound." That residual wording may still prompt "upper bound of what?" from the partner. | I disagree with Gemini's RESOLVED verdict here. The item is not unresolved, but it is only IMPROVED until "upper bound" is either defined in one plain sentence or removed in favor of "forced-fit descriptive reference." | + +## 3. Specific pushback on Gemini's RESOLVED verdict + +Only item **(j)** needs pushback. + +Gemini says the BIC issue is resolved because the results now title the subsection "A Forced Fit" and state that the 2-component structure is not supported. That is true for Section IV-D.3, but not the whole manuscript. Section III-I.2 still says that when BIC prefers three components, "the 2-component crossing should be treated as an upper bound rather than a definitive cut." Section III-I.4 repeats that the 2-component crossing is a forced fit and "should be read as an upper bound," and Table VI contains "signature-level Beta/KDE upper bound." + +For a statistically trained reviewer, this may be defensible shorthand. For the partner's original red-pen concern, it is still slightly too abstract. If the authors keep "upper bound," they should define the bound explicitly. Otherwise the safer fix is to remove the term and call these values "forced-fit descriptive references not used operationally." + +## 4. Smallest residual set before partner re-read + +1. Replace or explain the remaining **"upper bound"** wording in Section III-I.2, Section III-I.4, and Table VI. Suggested direction: "Because the two-component assumption is not supported, we report the crossing only as a forced-fit descriptive reference and do not use it as an operational threshold." + +2. Optional housekeeping: update `reference_verification_v3.md` so its detailed [5] entry no longer reads like an active problem after the reference list has been corrected. This is not a manuscript blocker, but it avoids confusion if the partner or a coauthor opens the verification note. + +No other partner red-pen issue appears to need substantive revision before re-read. diff --git a/paper/paper_a_methodology_v3.md b/paper/paper_a_methodology_v3.md index 1898962..dd9601b 100644 --- a/paper/paper_a_methodology_v3.md +++ b/paper/paper_a_methodology_v3.md @@ -208,7 +208,7 @@ As a robustness check against the Beta parametric form we fit a parallel two-com White's [41] quasi-MLE consistency result justifies interpreting the logit-Gaussian estimates as asymptotic approximations to the best Gaussian-family fit under misspecification; we use the cross-check between Beta and logit-Gaussian crossings as a diagnostic of parametric-form sensitivity rather than as a guarantee of distributional recovery. We fit 2- and 3-component variants of each mixture and report BIC for model selection. -When BIC prefers the 3-component fit, the 2-component assumption itself is a forced fit, and the Bayes-optimal threshold derived from the 2-component crossing should be treated as an upper bound rather than a definitive cut. +When BIC prefers the 3-component fit, the 2-component assumption itself is a forced fit; we report the resulting crossing only as a forced-fit descriptive reference and do not use it as an operational threshold. ### 3) Density-Smoothness Diagnostic: Burgstahler-Dichev / McCrary @@ -228,7 +228,7 @@ The two threshold estimators rest on decreasing-in-strength assumptions: the KDE If the two estimated thresholds were to differ by less than a practically meaningful margin and the BD/McCrary procedure were to identify a sharp transition at the same level, that pattern would constitute convergent evidence for a clean two-mechanism boundary at that location. This is *not* the pattern we observe at the per-signature level. -The two threshold estimators yield crossings spread across a wide range (Section IV-D); the BIC clearly prefers a 3-component over a 2-component Beta fit, indicating that the 2-component crossing is a forced fit and should be read as an upper bound rather than a definitive cut; and the BD/McCrary procedure locates its candidate transition *inside* the non-hand-signed mode rather than between modes (Appendix A). +The two threshold estimators yield crossings spread across a wide range (Section IV-D); the BIC clearly prefers a 3-component over a 2-component Beta fit, indicating that the 2-component crossing is a forced fit reported only as a descriptive reference rather than as an operational threshold; and the BD/McCrary procedure locates its candidate transition *inside* the non-hand-signed mode rather than between modes (Appendix A). We interpret this jointly as evidence that per-signature similarity is a continuous quality spectrum rather than a clean two-mechanism mixture, and we accordingly anchor the operational classifier's cosine cut on whole-sample Firm A percentile heuristics (Section III-K) rather than on a mixture-fit crossing. ## J. Pixel-Identity, Inter-CPA, and Held-Out Firm A Validation (No Manual Annotation) diff --git a/paper/paper_a_results_v3.md b/paper/paper_a_results_v3.md index 0728e84..4fefd83 100644 --- a/paper/paper_a_results_v3.md +++ b/paper/paper_a_results_v3.md @@ -163,7 +163,7 @@ We do not report an Equal Error Rate: EER is meaningful only when the positive a | 0.900 | 0.0250 | [0.0237, 0.0264] | | 0.945 (calibration-fold P5 rounded) | 0.0008 | [0.0006, 0.0011] | | 0.950 (whole-sample Firm A P7.5; operational cut) | 0.0005 | [0.0003, 0.0007] | -| 0.973 (signature-level Beta/KDE upper bound) | 0.0002 | [0.0001, 0.0004] | +| 0.973 (signature-level Beta/KDE forced-fit reference) | 0.0002 | [0.0001, 0.0004] | | 0.979 (signature-level Beta-2 forced-fit crossing) | 0.0001 | [0.0001, 0.0003] | Table note: We do not include FRR against the byte-identical positive anchor as a column here: the byte-identical subset has cosine $\approx 1$ by construction, so FRR against that subset is trivially $0$ at every threshold below $1$ and carries no biometric information beyond verifying that the threshold does not exceed $1$. The conservative-subset FRR role of the byte-identical anchor is instead discussed qualitatively in Section V-F. diff --git a/paper/reference_verification_v3.md b/paper/reference_verification_v3.md index 61cdae4..4b965a4 100644 --- a/paper/reference_verification_v3.md +++ b/paper/reference_verification_v3.md @@ -1,14 +1,17 @@ # Reference Verification — Paper A v3 (41 refs) -Date: 2026-04-27 +Date: 2026-04-27 (initial audit); v3.18 reference list updated to incorporate every fix recorded below. + Method: WebSearch + WebFetch verification of each citation against authoritative sources (publisher pages, DOIs, arXiv, IEEE Xplore, Project Euclid, etc.). -## Summary -- Verified correct: 35/41 -- Minor discrepancies (typos, page numbers, year on early-access vs. issue): 5/41 -- MAJOR PROBLEMS (does not exist, wrong author, wrong title, wrong venue): 1/41 +## Summary (audit history) +- Verified correct on first audit: 35/41 +- Minor discrepancies (typos, page numbers, year on early-access vs. issue): 5/41 — all fixed in v3.18 +- MAJOR PROBLEMS (wrong author): 1/41 — `[5]` Hadjadj et al. → Kao and Wen, fixed in v3.18 -The single major problem is **[5]**, where the paper at the cited venue/article number is real, but the cited authors ("Hadjadj et al.") are wrong — the actual authors are Kao and Wen. None of the statistical-method refs [37]–[41] flagged by the partner are fabricated; all five are bibliographically correct. +The current `paper_a_references_v3.md` reflects every correction listed below. The detailed findings are retained as an audit trail; the live reference list no longer carries any of the recorded errors. + +The single major problem at the time of the audit was **[5]**, where the paper at the cited venue/article number is real, but the cited authors ("Hadjadj et al.") were wrong — the actual authors are Kao and Wen. None of the statistical-method refs [37]–[41] flagged by the partner are fabricated; all five are bibliographically correct. ## Detailed findings