Paper A v3.18.4: address codex GPT-5.5 round-18 self-comparing review findings

Codex round-18 (paper/codex_review_gpt55_v3_18_3.md) caught a falsified provenance claim I introduced in v3.18.3 plus four cleaner narrative items that survived the prior 17 rounds. Verdict was Minor Revision; this commit closes all 5 actionable items. - Harmonize signature_analysis/28_byte_identity_decomposition.py to use accountants.firm (joined on signatures.assigned_accountant) for Firm A membership, matching the convention in 24_validation_recalibration.py. Regenerated reports/byte_identity_decomp/byte_identity_decomposition.json. Cross-firm convergence now reports Firm A 49,389 / 55,922 = 88.32% and Non-Firm-A 27,595 / 65,514 = 42.12% (percentages unchanged at two decimal places; counts now match Table IX exactly). - Replace the Section IV-H.2 reconciliation note. The previous note speculated that the one-record discrepancy was a snapshot/floating-point artifact, which codex round-18 falsified by direct DB queries: the real cause was that script 28 used signatures.excel_firm while Table IX uses accountants.firm. With script 28 now harmonized, Table IX and the cross-firm artifact agree exactly at 55,922; the new note documents the Firm A grouping convention plus the dHash-non-null filter. - Replace residual "known-majority-positive" wording with "replication-dominated" in Introduction (contributions 4 and 6) and Methodology III-I (anchor-rationale paragraph). - Correct Methodology III-G's auditor-year description: the per-signature best-match cosine that feeds each auditor-year mean is computed against the full same-CPA cross-year pool, not within-year only. The aggregation unit is within-year, but the underlying similarity statistic is not. - Add the 145 / 50 / 180 / 35 Firm A byte-decomposition sentence to Results IV-F.1 with explicit pointer to script 28 and the JSON artifact; this resolves the round-18 finding that several manuscript locations cited IV-F.1 for a decomposition that was not actually reported there. - Rebuild Paper_A_IEEE_Access_Draft_v3.docx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:59:07 +08:00
parent 26b934c429
commit 6b64eabbfb
5 changed files with 21 additions and 14 deletions
@@ -70,11 +70,11 @@ The contributions of this paper are summarized as follows:

 3. **Dual-descriptor verification.** We demonstrate that combining deep-feature cosine similarity with perceptual hashing resolves the fundamental ambiguity between style consistency and image reproduction, and we validate the backbone choice through an ablation study comparing three feature-extraction architectures.

-4. **Percentile-anchored operational threshold.** We anchor the operational classifier's cosine cut on the whole-sample Firm A P7.5 percentile (cos $> 0.95$), a transparent and reproducible reference drawn from a known-majority-positive population, and complement it with dHash structural cuts derived from the same reference distribution. Operational thresholds are therefore explained by an empirical reference rather than asserted.
+4. **Percentile-anchored operational threshold.** We anchor the operational classifier's cosine cut on the whole-sample Firm A P7.5 percentile (cos $> 0.95$), a transparent and reproducible reference drawn from a replication-dominated reference population, and complement it with dHash structural cuts derived from the same reference distribution. Operational thresholds are therefore explained by an empirical reference rather than asserted.

 5. **Distributional characterisation of per-signature similarity.** We apply three statistical diagnostics---a Hartigan dip test, an EM-fitted Beta mixture with logit-Gaussian robustness check, and a Burgstahler-Dichev / McCrary density-smoothness procedure---to characterise the shape of the per-signature similarity distribution. The three diagnostics jointly find that per-signature similarity forms a continuous quality spectrum, which both motivates the percentile-based operational anchor over a mixture-fit crossing and is itself a substantive finding for the document-forensics literature on similarity-threshold selection.

-6. **Replication-dominated calibration methodology.** We introduce a calibration strategy using a known-majority-positive reference group, distinguishing *replication-dominated* from *replication-pure* anchors; and we validate classification using byte-level pixel identity as an annotation-free gold positive, requiring no manual labeling.
+6. **Replication-dominated calibration methodology.** We introduce a calibration strategy using a replication-dominated reference group, distinguishing *replication-dominated* from *replication-pure* anchors; and we validate classification using byte-level pixel identity as an annotation-free gold positive, requiring no manual labeling.

 7. **Large-scale empirical analysis.** We report findings from the analysis of over 90,000 audit reports spanning a decade, providing the first large-scale empirical evidence on non-hand-signing practices in financial reporting under a methodology designed for peer-review defensibility.