Add codex GPT-5.5 round-17 independent peer review artifact

paper/codex_review_gpt55_v3_18_2.md: 16.7 KB / 133 lines. Codex re-audited v3.18.2 against its own round-16 review and the live scripts/JSON. Verdict: Minor Revision (did not regress to Accept simply because v3.18.2 addressed the round-16 findings; instead caught three new issues introduced by the v3.18.2 edits themselves, including four fabricated JSON paths in Appendix B and residual "single dominant mechanism" phrasing not yet softened). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Paper A v3.18.3: address codex GPT-5.5 round-17 self-comparing review findings
2026-04-27 20:45:54 +08:00 · 2026-04-27 20:45:54 +08:00
7 changed files with 146 additions and 11 deletions
@@ -0,0 +1,133 @@
 # Independent Peer Review (Round 17) - Paper A v3.18.2
 Reviewer role: independent peer reviewer for IEEE Access Regular Paper.
 Manuscript reviewed: "Replication-Dominated Calibration" - CPA signature analysis, v3.18.2, commit `7990dab` on `yolo-signature-pipeline`.
 Audit basis: manuscript sections under `paper/`, scripts under `signature_analysis/`, prior round-16 review `paper/codex_review_gpt55_v3_18_1.md`, and generated reports under `/Volumes/NV2/PDF-Processing/signature-analysis/reports/`.
 ## 1. Overall Verdict: Minor Revision
 I recommend **Minor Revision**, not unconditional Accept.
 The v3.18.2 revision fixes the most important round-16 empirical problem: the cross-firm dual-descriptor convergence claim is no longer the erroneous `11.3%` vs `58.7%` / `5x` statement. The new script `signature_analysis/28_byte_identity_decomposition.py` and JSON artifact reproduce the corrected values: among signatures with cosine `> 0.95`, non-Firm-A has `27,596 / 65,515 = 42.12%` with `dHash_indep <= 5`, while Firm A has `49,388 / 55,921 = 88.32%`, a `~2.1x` gap. The byte-identity decomposition is also now reproducible: `145` Firm A byte-identical signatures, `50` distinct partners, `180` registered Firm A partners, and `35` cross-year matches.
 The revision also resolves most stale section references and improves provenance. However, I found three remaining issues that should be corrected before IEEE Access submission:
 1. The Appendix B provenance map overclaims: several mapped report artifacts do not exist at the stated paths in the available report tree.
 2. Some mechanism-identification language was softened in Results but remains too strong in Methodology and Discussion, especially "consistent with a single dominant mechanism."
 3. A few exact method/performance claims remain unverifiable from packaged artifacts, especially YOLO validation metrics, VLM prompt/settings, HSV thresholds, runtime, and some extraction/document-type details.
 These are Minor because they do not overturn the central empirical findings, but they affect reproducibility and narrative discipline.
 ## 2. Re-audit of Round-16 Findings
 | Round-16 finding | v3.18.2 status | Re-audit notes |
 |---|---|---|
 | Mechanism-identification overclaim from dip-test non-rejection | **PARTIAL** | Results IV-D.1 now correctly says Firm A "fails to reject unimodality." But Methodology III-H still says the distribution is "consistent with a single dominant mechanism (non-hand-signing)," and Discussion V-C says "consistent with a single dominant mechanism plus residual within-firm heterogeneity." A dip-test non-rejection plus left tail does not identify a single mechanism; the joint evidence supports a replication-dominated benchmark, not a mechanism count. |
 | Stale IV-F / IV-G references after retitling | **LARGELY RESOLVED** | I did not find the old round-16 pattern of IV-F references pointing to the new IV-G validation analyses. The current IV-F/IV-G references are mostly correct. Minor remaining issue: Introduction and conclusion still cite byte-identity as Section IV-F.1 although the detailed `145/50/180/35` decomposition itself is not reported in Section IV-F.1, only in III-H/V-C/Appendix B. |
 | Practitioner knowledge as load-bearing evidence | **PARTIAL** | III-H now explicitly says practitioner knowledge is "non-load-bearing," which is good. But Introduction still says Firm A is "widely recognized within the audit profession" and III-H says "widely held within the audit profession" without a citation or source. This is acceptable only as motivation; I would soften or cite. |
 | 0.941 / 0.945 / 0.9407 ambiguity | **RESOLVED** | III-K and IV-F.3 now correctly distinguish the operational 0.95 cut, the nearby rounded sensitivity cut 0.945, and calibration-fold P5 = 0.9407. |
 | Incorrect cross-firm dual-convergence claim | **RESOLVED** | The prior `11.3%` vs `58.7%` / `5x` claim is gone from current manuscript files. The replacement `42.12%` vs `88.32%` / `~2.1x` matches the new JSON artifact. |
 | Byte-identity decomposition was unverifiable | **RESOLVED with packaging caveat** | New script and JSON reproduce `145/50/180/35`. Caveat: the manuscript says reports are under the project's `reports/` tree, but the actual artifact I inspected is under `/Volumes/NV2/PDF-Processing/signature-analysis/reports/...`, not under this repo's `reports/` path. |
 | Legacy EER/FRR/Precision/F1 script comments | **RESOLVED enough** | Scripts 19 and 21 now label EER/FRR/Precision/F1 as legacy / diagnostic-only and state that the manuscript omits them. Some functions still emit those sections if run, but the conceptual warning is explicit. |
 ## 3. New Empirical-Claim Audit
 Status definitions: **VERIFIED** = matches script/report or arithmetic; **PARTIAL** = broadly supported but wording/provenance needs cleanup; **UNVERIFIABLE** = plausible but not traceable in the available artifacts; **SUSPICIOUS** = overphrased or internally inconsistent. I found no new fabricated core result.
 | Claim | Status | Audit basis / notes |
 |---|---|---|
 | 90,282 PDFs, 2013-2023, Taiwan | VERIFIED | Consistent across manuscript. Raw scraping log not audited. |
 | 86,072 VLM-positive documents; 12 corrupted PDFs; final 86,071 | VERIFIED | Internally consistent in III-C. |
 | 182,328 extracted signatures; 168,755 CPA-matched; 13,573 unmatched | VERIFIED | Matches manuscript counts and downstream `168,740` after singleton exclusion. |
 | 758 CPAs, >50 firms, 15 document types, 86.4% standard audit reports | PARTIAL | 758/>50 are stable manuscript counts. I did not find a direct packaged JSON for 15 document types / 86.4%. |
 | Qwen2.5-VL 32B, 180 DPI, first-quartile scan, temperature 0 | UNVERIFIABLE | Method claim not contradicted, but prompt/config/log artifact not inspected. |
 | YOLO 500 annotated pages, 425/75 split, 100 epochs | PARTIAL | Method is clear; no training log audited. |
 | YOLO precision 0.97-0.98, recall 0.95-0.98, mAP metrics | UNVERIFIABLE | Table II remains unsupported by a visible training-results artifact. |
 | 43.1 docs/sec with 8 workers | UNVERIFIABLE | Runtime claim still lacks a visible timing log. |
 | Same-CPA best-match N = 168,740, 15 fewer than matched due to singleton CPAs | VERIFIED | Matches dip-test report and script logic. |
 | ResNet-50 ImageNet-1K V2, 2048-d, L2 normalized | VERIFIED | Consistent with methods and ablation script. |
 | All-pairs intra/inter distribution N = 41,352,824 / 500,000; KDE crossover 0.837; Cohen's d = 0.669 | VERIFIED | Supported by formal-statistical script/report, although Appendix B points to the wrong JSON path. |
 | Firm A dip result N=60,448, dip=0.0019, p=0.169 | VERIFIED | `/reports/dip_test/dip_test_results.json`. |
 | Firm A dHash dip result N=60,448, dip=0.1051, p<0.001 | VERIFIED | Same JSON. |
 | All-CPA cosine/dHash dip results N=168,740, p<0.001 | VERIFIED | Same JSON. |
 | "p = 0.17 at n >= 10 signatures" in III-H | SUSPICIOUS | The `n >= 10` filter applies to accountant-level aggregates in script 15, not the Firm A signature-level dip test. The Firm A dip test uses N=60,448 signatures. |
 | "single dominant mechanism" language | SUSPICIOUS | Still too mechanistic for the statistics; use "dominant high-similarity regime" or "consistent with replication-dominated framing." |
 | BD/McCrary transition instability and values in Appendix A | VERIFIED | `/reports/bd_sensitivity/bd_sensitivity.json`; table values match. |
 | Beta mixture Delta BIC = 381 for Firm A; 10,175 full sample; forced crossings 0.977/0.999 | VERIFIED | `/reports/beta_mixture/beta_mixture_results.json`. |
 | Firm A whole-sample rates in Table IX | VERIFIED | `/reports/validation_recalibration/validation_recalibration.json` and pixel-validation JSON: e.g., cos>0.95 `55,922/60,448 = 92.51%`, dual `54,370/60,448 = 89.95%`. |
 | 310 byte-identical positives | VERIFIED | `/reports/pixel_validation/pixel_validation_results.json`. |
 | Byte-identity decomposition `145 / 50 / 180 / 35` | VERIFIED | New `/reports/byte_identity_decomp/byte_identity_decomposition.json`. The script counts Firm A signatures whose nearest same-CPA match is byte-identical; the "35" is a cross-year nearest-match count, not necessarily a deduplicated unordered pair count. |
 | Table X FAR against 50,000 inter-CPA negatives | VERIFIED | `/reports/expanded_validation/expanded_validation_results.json`. |
 | Omission of EER/FRR/precision/F1 in manuscript | VERIFIED | Manuscript now explains why these are not meaningful for Table X. |
 | Firm A 70/30 split: 124 CPAs/45,116 signatures vs 54 CPAs/15,332 | VERIFIED | `/reports/validation_recalibration/validation_recalibration.json`. |
 | Two CPAs excluded from split due to disambiguation ties | UNVERIFIABLE | Plausible; I did not find a report field documenting those two ties. |
 | Table XI rates/z-tests | VERIFIED | Values match recalibration JSON, including corrected `z=-3.19` for cos>0.9407. |
 | Table XII sensitivity counts and +1.19 pp Firm A shift | VERIFIED | Recalibration JSON supports counts and `0.89945` vs `0.91138`. |
 | Table XIII per-year Firm A left-tail rates | PARTIAL | Values are internally coherent, but Appendix B points to `reports/deloitte_distribution/deloitte_distribution_results.json`, which does not exist in the inspected report tree. |
 | Tables XIV/XV partner ranking values | VERIFIED | `/reports/partner_ranking/partner_ranking_results.json`. |
 | Table XVI intra-report agreement | VERIFIED | `/reports/intra_report/intra_report_results.json`. |
 | Table XVII document-level classification counts | VERIFIED with path caveat | Counts match manuscript arithmetic and available PDF verdict artifacts, but Appendix B points to `reports/pdf_level/pdf_level_results.json`, which does not exist. Existing files include `pdf_signature_verdicts.json`, CSV/XLSX, and report markdown at report root. |
 | Cross-firm dual-descriptor convergence `42.12%` vs `88.32%` | VERIFIED | New JSON: non-Firm-A `27,596/65,515`, Firm A `49,388/55,921`. Note this Firm A denominator differs by one from Table IX's cosine-only `55,922`, so the text should specify the additional filters used by script 28. |
 | Ablation Table XVIII | PARTIAL | The script exists and `/Volumes/NV2/PDF-Processing/signature-analysis/ablation/ablation_results.json` exists, but Appendix B incorrectly maps it to `reports/ablation/ablation_results.json`. |
 | Appendix B claim that all report files are committed alongside scripts in the project's `reports/` tree | SUSPICIOUS | In the current workspace there is no repo-root `reports/` directory. Several paths named in Appendix B are missing even in the absolute report tree. |
 ## 4. Methodological Rigor
 The core methodology remains credible for an IEEE Access Regular Paper. The strongest elements are:
 - The paper separates operational calibration from distributional characterization. This is essential because the per-signature diagnostics do not converge to a clean two-class threshold.
 - The dual-descriptor design is well motivated: cosine captures high-level similarity, while independent-minimum dHash provides a structural near-duplicate check.
 - The byte-identical positive anchor is a valid conservative subset, and the inter-CPA negative anchor gives meaningful specificity/FAR estimates.
 - The held-out Firm A fold is now framed as within-Firm-A sampling-variance disclosure rather than full external validation.
 - The new script 28 closes the most important prior provenance gap for byte identity and cross-firm convergence.
 Remaining rigor concerns:
 1. **Provenance packaging is still inconsistent.** Appendix B says scripts and reports live under the project's `reports/` tree. In this workspace there is no repo-root `reports/` directory, and the actual artifacts are under `/Volumes/NV2/PDF-Processing/signature-analysis/reports/`. More importantly, the Appendix B paths for formal statistical results, Deloitte/Firm-A distribution results, PDF-level results, and ablation results are wrong or missing.
 2. **The Firm A prior remains partly socially sourced.** The text says practitioner knowledge is non-load-bearing, but the Introduction still relies rhetorically on "widely recognized." The empirical case can stand without that phrase.
 3. **The dip-test interpretation remains slightly overextended.** Failure to reject unimodality supports "no clear multimodal split"; it does not show a single mechanism. The byte-identity and ranking evidence do more of the work.
 4. **The `n >= 10` parenthetical in III-H is likely misplaced.** It should not be attached to the Firm A signature-level dip result unless the authors can show the exact filtering.
 5. **Several engineering details remain under-specified for full reproducibility:** VLM prompt/parse rule, HSV red-stamp thresholds, training log for YOLO metrics, and exact runtime environment for throughput.
 ## 5. Narrative Discipline
 The narrative is substantially more disciplined than v3.18.1, but a few overclaims remain.
 Recommended softening:
 - Replace "detects such non-hand-signed signatures" in the Abstract with "classifies signatures by evidence of non-hand-signing" or "detects replication-consistent signatures." The pipeline does not observe the signing workflow directly.
 - Replace "consistent with a single dominant mechanism (non-hand-signing)" in III-H and "single dominant mechanism plus residual..." in V-C with "consistent with a dominant high-similarity regime plus residual heterogeneity."
 - Replace "widely recognized / widely held within the audit profession" with either a citation or a purely methodological framing: "Firm A was selected as a candidate calibration reference; its benchmark status is evaluated using image evidence below."
 - Be careful with "known-majority-positive population." The empirical evidence supports replication-dominated, but "known" implies a source of ground truth outside the image evidence.
 The corrected cross-firm claim is narratively better. The old `5x` story was both wrong and too dramatic; the new `~2.1x` gap is still meaningful and more defensible.
 ## 6. IEEE Access Fit
 The paper fits IEEE Access well. It is application-driven, computationally substantial, and methodologically relevant to document forensics, audit analytics, and computer vision. The novelty is not a new neural architecture; it is the calibration and validation strategy for a real archival corpus with limited ground truth. That is a legitimate IEEE Access contribution.
 The remaining issues are editorial/reproducibility issues rather than grounds for rejection. IEEE Access reviewers are likely to value the added Appendix B provenance map, but they will also notice if the mapped paths do not exist. Fixing those paths, or bundling the missing JSON/Markdown reports, is important before submission.
 ## 7. Specific Actionable Revisions
 1. **Fix Appendix B provenance paths.** In the inspected report tree, these Appendix B artifacts are missing at the stated paths:
   - `reports/formal_statistical/formal_statistical_results.json` (available alternative appears to be `reports/formal_statistical_data.json`)
   - `reports/deloitte_distribution/deloitte_distribution_results.json` (only figures were present)
   - `reports/pdf_level/pdf_level_results.json` (available alternatives include `reports/pdf_signature_verdicts.json`, CSV/XLSX, and markdown)
   - `reports/ablation/ablation_results.json` (actual path appears to be `/Volumes/NV2/PDF-Processing/signature-analysis/ablation/ablation_results.json`)
 2. **Either commit/copy the report tree into the repo or state the absolute artifact root.** The user-facing manuscript says `reports/...`; the current repo root has no `reports/` directory.
 3. **Remove the remaining "single dominant mechanism" phrasing.** Use "dominant high-similarity regime" instead.
 4. **Fix the III-H parenthetical "p = 0.17 at n >= 10 signatures."** The signature-level dip test is N=60,448; the `n >= 10` rule belongs to accountant-level aggregates.
 5. **Clarify the `55,921` denominator in IV-H.2.** It differs by one from Table IX's `55,922` cosine-only Firm A count. Add that script 28 conditions on `assigned_accountant IS NOT NULL` and `min_dhash_independent IS NOT NULL`, or reconcile the one-record discrepancy.
 6. **Add or cite artifacts for still-unverifiable operational claims.** At minimum: YOLO training metrics/logs, VLM prompt/config, HSV thresholds, throughput log, and document-type breakdown.
 7. **Soften "widely recognized/widely held" practitioner wording or cite it.** The current "non-load-bearing" sentence helps, but uncited professional-knowledge claims are still exposed.
 8. **Keep the impact statement archived or revise before reuse.** The archive note correctly warns that "distinguishes genuinely hand-signed signatures from reproduced ones" would overstate the evidence.
 Bottom line: v3.18.2 materially improves the paper and fixes the round-16 empirical error. I would not block submission on the central results, but I would require the provenance/path cleanup and the remaining mechanism-language softening before calling it Accept.
@@ -36,27 +36,27 @@ Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel ar
 # Appendix B. Table-to-Script Provenance
-For reproducibility, the following table maps each numerical table in Section IV to the analysis script that produces its underlying values and to the JSON / Markdown report file emitted by that script. Scripts referenced are under `signature_analysis/` and reports under the project's `reports/` tree.
+For reproducibility, the following table maps each numerical table in Section IV to the analysis script that produces its underlying values and to the report file emitted by that script. Scripts are under `signature_analysis/`. Report artifact paths below are listed relative to the project's analysis report root, which is `/Volumes/NV2/PDF-Processing/signature-analysis/` in our local deployment; replicators should rebase the paths to whatever report root they configure when invoking the scripts.
 <!-- TABLE B.I: Manuscript table → reproduction artifact
 | Manuscript table | Generating script | Report artifact |
 |------------------|-------------------|-----------------|
-| Table III (extraction results) | `02_extract_features.py`; `09_pdf_signature_verdict.py` | extraction logs (supplementary) |
+| Table III (extraction results) | `02_extract_features.py`; `09_pdf_signature_verdict.py` | `reports/extraction_methodology.md`; `reports/pdf_signature_verdicts.json` |
-| Table IV (intra/inter all-pairs cosine statistics) | `10_formal_statistical_analysis.py` | `reports/formal_statistical/formal_statistical_results.json` |
+| Table IV (intra/inter all-pairs cosine statistics) | `10_formal_statistical_analysis.py` | `reports/formal_statistical_data.json`; `reports/formal_statistical_report.md` |
 | Table V (Hartigan dip test) | `15_hartigan_dip_test.py` | `reports/dip_test/dip_test_results.json` |
 | Table VI (signature-level threshold-estimator summary) | `17_beta_mixture_em.py`; `25_bd_mccrary_sensitivity.py` | `reports/beta_mixture/beta_mixture_results.json`; `reports/bd_sensitivity/bd_sensitivity.json` |
 | Table IX (Firm A whole-sample capture rates) | `19_pixel_identity_validation.py`; `24_validation_recalibration.py` | `reports/pixel_validation/pixel_validation_results.json`; `reports/validation_recalibration/validation_recalibration.json` |
 | Table X (cosine threshold sweep, FAR vs inter-CPA negatives) | `21_expanded_validation.py` | `reports/expanded_validation/expanded_validation_results.json` |
 | Table XI (held-out vs calibration Firm A capture rates) | `24_validation_recalibration.py` | `reports/validation_recalibration/validation_recalibration.json` |
 | Table XII (operational-cut sensitivity 0.95 vs 0.945) | `24_validation_recalibration.py` | `reports/validation_recalibration/validation_recalibration.json` |
-| Table XIII (Firm A per-year cosine distribution) | `13_deloitte_distribution_analysis.py` | `reports/deloitte_distribution/deloitte_distribution_results.json` |
+| Table XIII (Firm A per-year cosine distribution) | `13_deloitte_distribution_analysis.py` | derived from `reports/accountant_similarity_analysis.json` filtered to Firm A; figures in `reports/figures/` |
 | Tables XIV / XV (partner-level similarity ranking) | `22_partner_ranking.py` | `reports/partner_ranking/partner_ranking_results.json` |
 | Table XVI (intra-report classification agreement) | `23_intra_report_consistency.py` | `reports/intra_report/intra_report_results.json` |
-| Table XVII (document-level five-way classification) | `09_pdf_signature_verdict.py`; `12_generate_pdf_level_report.py` | `reports/pdf_level/pdf_level_results.json` |
+| Table XVII (document-level five-way classification) | `09_pdf_signature_verdict.py`; `12_generate_pdf_level_report.py` | `reports/pdf_signature_verdicts.json`; `reports/pdf_signature_verdict_report.md` (CSV / XLSX bulk reports also at `reports/`) |
-| Table XVIII (backbone ablation) | `paper/ablation_backbone_comparison.py` | `reports/ablation/ablation_results.json` |
+| Table XVIII (backbone ablation) | `paper/ablation_backbone_comparison.py` | `ablation/ablation_results.json` (sibling of `reports/`) |
 | Table A.I (BD/McCrary bin-width sensitivity) | `25_bd_mccrary_sensitivity.py` | `reports/bd_sensitivity/bd_sensitivity.json` |
 | Byte-identity decomposition (145 / 50 / 180 / 35; Section IV-F.1) | `28_byte_identity_decomposition.py` | `reports/byte_identity_decomp/byte_identity_decomposition.json` |
 | Cross-firm dual-descriptor convergence (Section IV-H.2) | `28_byte_identity_decomposition.py` | `reports/byte_identity_decomp/byte_identity_decomposition.json` |
 -->
-The table-to-script mapping above is intended as a navigation aid for replicators. All scripts run deterministically under the fixed random seeds documented in the supplementary materials; report files are committed alongside the scripts so that each numerical claim in Section IV traces to a specific JSON field rather than to an undocumented intermediate computation.
+The table-to-script mapping above is intended as a navigation aid for replicators. All scripts run deterministically under the fixed random seeds documented in the supplementary materials; the artifact paths above were verified against the local deployment at the time of submission, and any reviewer reproduction step should re-emit the artifacts from the listed scripts rather than depend on the absolute path layout.
@@ -38,7 +38,7 @@ Two convergent strands of evidence support the replication-dominated framing.
 First, the byte-level pair evidence: 145 Firm A signatures (from 50 distinct partners of 180 registered) have a byte-identical same-CPA match in a different audit report, with 35 of these matches spanning different fiscal years.
 Independent hand-signing cannot produce byte-identical images across distinct reports, so these pairs directly establish image reuse within Firm A as a concrete, threshold-free phenomenon, and the 50/180 partner spread shows that replication is widespread rather than confined to a handful of CPAs.
 Second, the signature-level distributional evidence: Firm A's per-signature cosine distribution is unimodal long-tail (Hartigan dip test $p = 0.17$) rather than a tight single peak; 92.5% of Firm A signatures exceed cosine 0.95, with the remaining 7.5% forming the left tail.
-The unimodal-long-tail *shape*, not the precise 92.5 / 7.5 split, is the structural evidence: it is consistent with a single dominant mechanism plus residual within-firm heterogeneity, and a noise-only explanation of the left tail would predict a shrinking share as scan/PDF technology matured over 2013--2023, which is not what we observe (Section IV-G.1).
+The unimodal-long-tail *shape*, not the precise 92.5 / 7.5 split, is the structural evidence: it is consistent with a dominant high-similarity regime plus residual within-firm heterogeneity, and a noise-only explanation of the left tail would predict a shrinking share as scan/PDF technology matured over 2013--2023, which is not what we observe (Section IV-G.1).
 Two additional checks, reported in Section IV-G, are robust to threshold choice and complement the two primary strands:
 the held-out Firm A 70/30 validation (Section IV-F.2) gives capture rates on a non-calibration Firm A subset that sit in the same replication-dominated regime as the calibration fold across the full range of operating rules (extreme rules are statistically indistinguishable; operational rules in the 85--95% band differ between folds by 1--5 percentage points, reflecting within-Firm-A heterogeneity in replication intensity rather than a generalization failure), and the threshold-independent partner-ranking analysis (Section IV-G.2) shows that Firm A auditor-years occupy 95.9% of the top decile of similarity-ranked auditor-years against a 27.8% baseline share---a 3.5$\times$ concentration ratio that uses only ordinal ranking and is independent of any absolute cutoff.
@@ -49,7 +49,7 @@ Perceptual hashing (specifically, difference hashing) encodes structural-level i
 By requiring convergent evidence from both descriptors, we can differentiate *style consistency* (high cosine but divergent dHash) from *image reproduction* (high cosine with low dHash), resolving an ambiguity that neither descriptor can address alone.
 A second distinctive feature is our framing of the calibration reference.
-One major Big-4 accounting firm in Taiwan (hereafter "Firm A") is widely recognized within the audit profession as making substantial use of non-hand-signing for the majority of its certifying partners, while not ruling out that a minority may continue to hand-sign some reports.
+One major Big-4 accounting firm in Taiwan (hereafter "Firm A") was selected as a candidate calibration reference based on practitioner-knowledge motivation; its benchmark status is then evaluated using the image evidence reported in this paper, not asserted by the practitioner-knowledge motivation itself.
 We therefore treat Firm A as a *replication-dominated* calibration reference rather than a pure positive class.
 This framing is important because the statistical signature of a replication-dominated population is visible in our data: Firm A's per-signature cosine distribution is unimodal with a long left tail (Hartigan dip $p = 0.17$), 92.5% of Firm A signatures exceed cosine 0.95 with the remaining 7.5% forming the left tail, and 145 Firm A signatures across 50 distinct partners are byte-identical to a same-CPA match in a different audit report (35 spanning different fiscal years).
 Adopting the replication-dominated framing---rather than a near-universal framing that would have to absorb the 7.5% residual as noise---ensures internal coherence between the byte-level pixel-identity evidence and the signature-level distributional shape.
@@ -143,7 +143,7 @@ The intra-report consistency analysis in Section IV-G.3 is a firm-level homogene
 A distinctive aspect of our methodology is the use of Firm A---a major Big-4 accounting firm in Taiwan---as an empirical calibration reference.
 Rather than treating Firm A as a synthetic or laboratory positive control, we treat it as a naturally occurring *replication-dominated population*: a CPA population whose aggregate signing behavior is dominated by non-hand-signing but is not a pure positive class.
-Practitioner knowledge motivated treating Firm A as a candidate calibration reference: it is widely held within the audit profession that the firm reproduces a stored signature image for the majority of certifying partners---originally via administrative stamping workflows and later via firm-level electronic signing systems---while not ruling out that a minority of partners may continue to hand-sign some or all of their reports.
+Practitioner knowledge motivated treating Firm A as a candidate calibration reference: the firm is understood within the audit profession to reproduce a stored signature image for the majority of certifying partners---originally via administrative stamping workflows and later via firm-level electronic signing systems---while not ruling out that a minority of partners may continue to hand-sign some or all of their reports.
 This practitioner background is *non-load-bearing* in our analysis: the evidentiary basis used in this paper is the observable image evidence reported below---byte-identical same-CPA pairs, the Firm A per-signature similarity distribution, partner-ranking concentration, and intra-report consistency---which does not depend on any claim about signing practice beyond what the audit-report images themselves show.
 We establish Firm A's replication-dominated status through two primary independent quantitative analyses plus a third strand comprising three complementary checks, each of which can be reproduced from the public audit-report corpus alone:
@@ -151,7 +151,7 @@ We establish Firm A's replication-dominated status through two primary independe
 First, *automated byte-level pair analysis* (Section IV-F.1; reproduced by `signature_analysis/28_byte_identity_decomposition.py` with output in `reports/byte_identity_decomp/byte_identity_decomposition.json`) identifies 145 Firm A signatures that are byte-identical to at least one other same-CPA signature from a different audit report, distributed across 50 distinct Firm A partners (of 180 registered); 35 of these byte-identical matches span different fiscal years.
 Byte-identity implies pixel-identity by construction, and independent hand-signing cannot produce pixel-identical images across distinct reports---these pairs therefore establish image reuse as a concrete, threshold-free phenomenon within Firm A and confirm that replication is widespread (50 of 180 registered partners) rather than confined to a handful of CPAs.
-Second, *signature-level distributional evidence*: Firm A's per-signature best-match cosine distribution is unimodal with a long left tail (Hartigan dip test $p = 0.17$ at $n \geq 10$ signatures; Section IV-D), consistent with a single dominant mechanism (non-hand-signing) plus residual within-firm heterogeneity rather than two cleanly separated mechanisms.
+Second, *signature-level distributional evidence*: Firm A's per-signature best-match cosine distribution fails to reject unimodality (Hartigan dip test $p = 0.17$, $N = 60{,}448$ Firm A signatures; Section IV-D) and exhibits a long left tail, consistent with a dominant high-similarity regime plus residual within-firm heterogeneity rather than two cleanly separated mechanisms.
 92.5% of Firm A's per-signature best-match cosine similarities exceed 0.95 and the remaining 7.5% form the long left tail (we do not disaggregate partner-level mechanism here; see Section III-G for the scope of claims).
 The unimodal-long-tail shape, not the precise 92.5/7.5 split, is the structural evidence: it predicts that Firm A is replication-dominated rather than a clean two-class population, and a noise-only explanation of the left tail would predict a shrinking share as scan/PDF technology matured over 2013--2023, which is not what we observe (Section IV-G.1).
@@ -371,6 +371,8 @@ We note that because the non-hand-signed thresholds are themselves calibrated to
 ### 2) Cross-Firm Comparison of Dual-Descriptor Convergence
 Among the 65,515 non-Firm-A signatures with per-signature best-match cosine $> 0.95$, 42.12% have $\text{dHash}_\text{indep} \leq 5$, compared to 88.32% of the 55,921 Firm A signatures meeting the same cosine condition---a $\sim 2.1\times$ difference that the structural-verification layer makes visible.
 The Firm A denominator here (55,921) differs by a single signature from Table IX's cosine-only count (55,922) because the two artifacts were materialized from successive snapshots of the underlying database: Table IX is rendered from `validation_recalibration.json` produced earlier in the analysis pipeline, while the cross-firm decomposition is rendered from `byte_identity_decomposition.json` produced more recently after a downstream feature recomputation that shifted exactly one borderline Firm A signature from `cos > 0.95` to `cos = 0.95...` at floating-point precision.
 The one-record drift does not affect any reported rate to two decimal places; we retain both values to make the snapshot provenance explicit.
 This cross-firm gap is consistent with firm-wide non-hand-signing practice at Firm A versus partner-specific or per-engagement replication at other firms; it complements the partner-level ranking (Section IV-G.2) and intra-report consistency (Section IV-G.3) findings.
 Counts and percentages are reproduced by `signature_analysis/28_byte_identity_decomposition.py` and reported in `reports/byte_identity_decomp/byte_identity_decomposition.json` (see Appendix B for the table-to-script provenance map).