Paper A v3.18.4: address codex GPT-5.5 round-18 self-comparing review findings
Codex round-18 (paper/codex_review_gpt55_v3_18_3.md) caught a falsified provenance claim I introduced in v3.18.3 plus four cleaner narrative items that survived the prior 17 rounds. Verdict was Minor Revision; this commit closes all 5 actionable items. - Harmonize signature_analysis/28_byte_identity_decomposition.py to use accountants.firm (joined on signatures.assigned_accountant) for Firm A membership, matching the convention in 24_validation_recalibration.py. Regenerated reports/byte_identity_decomp/byte_identity_decomposition.json. Cross-firm convergence now reports Firm A 49,389 / 55,922 = 88.32% and Non-Firm-A 27,595 / 65,514 = 42.12% (percentages unchanged at two decimal places; counts now match Table IX exactly). - Replace the Section IV-H.2 reconciliation note. The previous note speculated that the one-record discrepancy was a snapshot/floating-point artifact, which codex round-18 falsified by direct DB queries: the real cause was that script 28 used signatures.excel_firm while Table IX uses accountants.firm. With script 28 now harmonized, Table IX and the cross-firm artifact agree exactly at 55,922; the new note documents the Firm A grouping convention plus the dHash-non-null filter. - Replace residual "known-majority-positive" wording with "replication-dominated" in Introduction (contributions 4 and 6) and Methodology III-I (anchor-rationale paragraph). - Correct Methodology III-G's auditor-year description: the per-signature best-match cosine that feeds each auditor-year mean is computed against the full same-CPA cross-year pool, not within-year only. The aggregation unit is within-year, but the underlying similarity statistic is not. - Add the 145 / 50 / 180 / 35 Firm A byte-decomposition sentence to Results IV-F.1 with explicit pointer to script 28 and the JSON artifact; this resolves the round-18 finding that several manuscript locations cited IV-F.1 for a decomposition that was not actually reported there. - Rebuild Paper_A_IEEE_Access_Draft_v3.docx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -149,6 +149,7 @@ We report three validation analyses corresponding to the anchors of Section III-
|
||||
### 1) Pixel-Identity Positive Anchor with Inter-CPA Negative Anchor
|
||||
|
||||
Of the 182,328 extracted signatures, 310 have a same-CPA nearest match that is byte-identical after crop and normalization (pixel-identical-to-closest = 1); these form the byte-identity positive anchor---a pair-level proof of image reuse that serves as conservative ground truth for non-hand-signed signatures, subject to the source-template edge case discussed in Section V-G.
|
||||
Within Firm A specifically, 145 of these byte-identical signatures are distributed across 50 distinct partners (of 180 registered Firm A partners), with 35 of the byte-identical pairs spanning different fiscal years; this Firm A decomposition is reproduced by `signature_analysis/28_byte_identity_decomposition.py` and reported in `reports/byte_identity_decomp/byte_identity_decomposition.json` (Appendix B).
|
||||
As the gold-negative anchor we sample 50,000 random cross-CPA signature pairs (inter-CPA cosine: mean $= 0.762$, $P_{95} = 0.884$, $P_{99} = 0.913$, max $= 0.988$).
|
||||
Because the positive and negative anchor populations are constructed from different sampling units (byte-identical same-CPA pairs vs random inter-CPA pairs), their relative prevalence in the combined anchor set is arbitrary, and precision / $F_1$ / recall therefore have no meaningful population interpretation.
|
||||
We accordingly report FAR with Wilson 95% confidence intervals against the large inter-CPA negative anchor in Table X.
|
||||
@@ -370,9 +371,8 @@ We note that because the non-hand-signed thresholds are themselves calibrated to
|
||||
|
||||
### 2) Cross-Firm Comparison of Dual-Descriptor Convergence
|
||||
|
||||
Among the 65,515 non-Firm-A signatures with per-signature best-match cosine $> 0.95$, 42.12% have $\text{dHash}_\text{indep} \leq 5$, compared to 88.32% of the 55,921 Firm A signatures meeting the same cosine condition---a $\sim 2.1\times$ difference that the structural-verification layer makes visible.
|
||||
The Firm A denominator here (55,921) differs by a single signature from Table IX's cosine-only count (55,922) because the two artifacts were materialized from successive snapshots of the underlying database: Table IX is rendered from `validation_recalibration.json` produced earlier in the analysis pipeline, while the cross-firm decomposition is rendered from `byte_identity_decomposition.json` produced more recently after a downstream feature recomputation that shifted exactly one borderline Firm A signature from `cos > 0.95` to `cos = 0.95...` at floating-point precision.
|
||||
The one-record drift does not affect any reported rate to two decimal places; we retain both values to make the snapshot provenance explicit.
|
||||
Among the 65,514 non-Firm-A signatures with per-signature best-match cosine $> 0.95$, 42.12% have $\text{dHash}_\text{indep} \leq 5$, compared to 88.32% of the 55,922 Firm A signatures meeting the same cosine condition---a $\sim 2.1\times$ difference that the structural-verification layer makes visible.
|
||||
The Firm A denominator (55,922) matches Table IX exactly: both Table IX and the cross-firm decomposition define Firm A membership via the CPA registry (`accountants.firm`), and the cross-firm analysis additionally requires a non-null independent-min dHash record, which all 55,922 Firm A cosine-eligible signatures have in the current database.
|
||||
This cross-firm gap is consistent with firm-wide non-hand-signing practice at Firm A versus partner-specific or per-engagement replication at other firms; it complements the partner-level ranking (Section IV-G.2) and intra-report consistency (Section IV-G.3) findings.
|
||||
Counts and percentages are reproduced by `signature_analysis/28_byte_identity_decomposition.py` and reported in `reports/byte_identity_decomp/byte_identity_decomposition.json` (see Appendix B for the table-to-script provenance map).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user