From 1e37d344ea4790225c95cc2bb91bdec570dcbc55 Mon Sep 17 00:00:00 2001
From: gbanyan <gbanyan.huang@gmail.com>
Date: Mon, 27 Apr 2026 20:59:07 +0800
Subject: [PATCH] Add codex GPT-5.5 round-18 independent peer review artifact

paper/codex_review_gpt55_v3_18_3.md: 12.5 KB / 128 lines. Codex re-audited
v3.18.3 against its own round-17 review, the live filesystem (verified
all 17 Appendix B paths exist), and the SQLite database. Verdict: Minor
Revision; the round-18 finding was that the v3.18.3 reconciliation note
for 55,921 vs 55,922 was empirically false (DB query showed the cause
was accountants.firm vs signatures.excel_firm field mismatch, not
floating-point/snapshot drift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 paper/codex_review_gpt55_v3_18_3.md | 127 ++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)
 create mode 100644 paper/codex_review_gpt55_v3_18_3.md

diff --git a/paper/codex_review_gpt55_v3_18_3.md b/paper/codex_review_gpt55_v3_18_3.md
new file mode 100644
index 0000000..bcf65bc
--- /dev/null
+++ b/paper/codex_review_gpt55_v3_18_3.md
@@ -0,0 +1,127 @@
+# Independent Peer Review (Round 18) - Paper A v3.18.3
+
+Reviewer role: independent peer reviewer for IEEE Access Regular Paper.
+Manuscript reviewed: "Replication-Dominated Calibration" - CPA signature analysis, v3.18.3, commits `f1c2537` + `26b934c` on `yolo-signature-pipeline`.
+Audit basis: manuscript sections under `paper/`, prior round-16 and round-17 reviews, scripts under `signature_analysis/`, the current SQLite/report artifacts under `/Volumes/NV2/PDF-Processing/signature-analysis/`, and direct filesystem checks of Appendix B paths.
+
+## 1. Overall Verdict: Minor Revision
+
+I recommend **Minor Revision**, not Accept.
+
+v3.18.3 resolves the main round-17 provenance problem: the four fabricated Appendix B paths have been replaced with paths that exist in the available report tree, and the manuscript now explicitly states the local report root (`/Volumes/NV2/PDF-Processing/signature-analysis/`) plus the fact that the ablation artifact is a sibling of `reports/`. The prior "single dominant mechanism" wording is also removed from the main Methodology/Discussion passages, and the mistaken "p = 0.17 at n >= 10 signatures" parenthetical is fixed.
+
+However, the new reconciliation note for the `55,921` vs `55,922` Firm A cosine-only counts is not supported by the current artifacts. The manuscript attributes the one-record difference to successive database snapshots and a downstream floating-point shift of one borderline Firm A signature. Direct database checks indicate a different cause: Table IX is based on Firm A membership from `accountants.firm`, whereas `signature_analysis/28_byte_identity_decomposition.py` groups Firm A by `signatures.excel_firm`. In the current database, one signature above `cos > 0.95` belongs to an accountant whose registry firm is Firm A but whose `excel_firm` field is not Firm A. Thus the new note fixes the arithmetic discrepancy but introduces a false provenance explanation.
+
+This is Minor rather than Major because the one-record drift has negligible numerical effect and does not overturn the central findings. It should still be corrected before submission because v3.18.3 was specifically intended to repair provenance discipline.
+
+## 2. Re-audit of Round-17 Findings
+
+| Round-17 finding | v3.18.3 status | Re-audit notes |
+|---|---|---|
+| Appendix B provenance paths overclaimed / several did not exist | **RESOLVED** | All listed Appendix B report artifacts now exist when rebased to `/Volumes/NV2/PDF-Processing/signature-analysis/`. The replacement paths for formal statistics, Firm A per-year data, PDF verdicts, ablation, and byte decomposition are real. |
+| Residual "single dominant mechanism" wording | **RESOLVED enough** | The exact phrase is gone from Methodology III-H and Discussion V-C. Current wording uses "dominant high-similarity regime plus residual within-firm heterogeneity," which is more defensible. |
+| III-H "p = 0.17 at n >= 10 signatures" parenthetical | **RESOLVED** | The current text correctly reports the signature-level dip result as `p = 0.17`, `N = 60,448` Firm A signatures. The `n >= 10` filter is no longer attached to that claim. |
+| "Widely recognized / widely held" practitioner wording | **RESOLVED enough** | Introduction now frames Firm A as selected by practitioner-knowledge motivation and evaluated by image evidence. III-H says "is understood within the audit profession" but immediately marks this as non-load-bearing. A citation would still be cleaner, but this is no longer a submission blocker. |
+| 55,921 vs 55,922 Firm A cosine-only count discrepancy | **PARTIAL / NEW ERROR** | The manuscript now acknowledges the discrepancy, but the explanation appears wrong. Current DB evidence points to different Firm A attribution fields (`accountants.firm` vs `signatures.excel_firm`), not a snapshot/floating-point shift. |
+| Still-unverifiable operational details: YOLO logs, VLM prompt/config, HSV thresholds, throughput log | **UNRESOLVED but not new** | These remain plausible method claims, but I did not find dedicated artifacts establishing them. This is acceptable for main-paper review only if the supplement includes training/config/runtime logs. |
+| Section reference for `145/50/180/35` byte decomposition | **PARTIAL** | Appendix B now maps the decomposition to script 28, but the main results Section IV-F.1 still reports only the all-sample 310 byte-identical signatures, not the Firm A `145/50/180/35` decomposition. Several locations still cite Section IV-F.1 for a decomposition that is actually in III-H / V-C / Appendix B. |
+
+## 3. Appendix B Path Verification
+
+I checked every Appendix B artifact path directly against the filesystem. Rebased to `/Volumes/NV2/PDF-Processing/signature-analysis/`, all listed artifacts exist:
+
+| Appendix B artifact | Exists? |
+|---|---|
+| `reports/extraction_methodology.md` | Yes |
+| `reports/pdf_signature_verdicts.json` | Yes |
+| `reports/formal_statistical_data.json` | Yes |
+| `reports/formal_statistical_report.md` | Yes |
+| `reports/dip_test/dip_test_results.json` | Yes |
+| `reports/beta_mixture/beta_mixture_results.json` | Yes |
+| `reports/bd_sensitivity/bd_sensitivity.json` | Yes |
+| `reports/pixel_validation/pixel_validation_results.json` | Yes |
+| `reports/validation_recalibration/validation_recalibration.json` | Yes |
+| `reports/expanded_validation/expanded_validation_results.json` | Yes |
+| `reports/accountant_similarity_analysis.json` | Yes |
+| `reports/figures/` | Yes |
+| `reports/partner_ranking/partner_ranking_results.json` | Yes |
+| `reports/intra_report/intra_report_results.json` | Yes |
+| `reports/pdf_signature_verdict_report.md` | Yes |
+| `ablation/ablation_results.json` | Yes |
+| `reports/byte_identity_decomp/byte_identity_decomposition.json` | Yes |
+
+The path replacements are real. The only caveat is semantic rather than filesystem-level: Table XIII is described as "derived from `reports/accountant_similarity_analysis.json` filtered to Firm A; figures in `reports/figures/`." That is acceptable as provenance if the supplement documents the filter/query used for the table.
+
+## 4. Empirical-Claim Audit
+
+I focused on claims introduced or changed by v3.18.3.
+
+**Verified**
+
+- Appendix B path replacements exist in the actual report tree.
+- `reports/byte_identity_decomp/byte_identity_decomposition.json` exists and reports:
+  - Firm A byte-identical signatures: `145`
+  - distinct Firm A partners: `50`
+  - registered Firm A partners: `180`
+  - cross-year byte-identical matches: `35`
+- The same JSON reports cross-firm dual convergence:
+  - Firm A: `49,388 / 55,921 = 88.32%`
+  - Non-Firm-A: `27,596 / 65,515 = 42.12%`
+- `validation_recalibration.json` reports Table IX's Firm A `cos > 0.95` count as `55,922 / 60,448 = 92.51%`.
+
+**New / Incorrect**
+
+- The new Results IV-H.2 reconciliation note says the `55,921` vs `55,922` discrepancy comes from successive snapshots and one borderline Firm A signature shifting from `cos > 0.95` to `cos = 0.95...` at floating-point precision. I could not reproduce that explanation.
+- Direct SQLite checks on the current database show:
+  - Firm A by `accountants.firm`, `cos > 0.95`: `55,922`
+  - Firm A by `signatures.excel_firm`, `cos > 0.95`: `55,921`
+  - exactly one `cos > 0.95` signature has `accountants.firm = Firm A` but `signatures.excel_firm != Firm A`.
+- The discrepant row I saw was `signature_id = 37768`, `assigned_accountant = 徐文亞`, `excel_firm = 黃毅民`, `max_similarity_to_same_accountant = 0.978511691093445`, `min_dhash_independent = 0`. That is not a `cos = 0.95...` borderline case.
+
+The corrected explanation should be along the lines of: Table IX uses accountant-registry Firm A membership, while script 28's cross-firm decomposition uses the `excel_firm` field; one above-threshold signature differs between those two firm-attribution fields. Alternatively, change script 28 to use the same `accountants.firm` join as the validation artifacts and regenerate the JSON.
+
+**Still only partially supported**
+
+- YOLO validation metrics, VLM prompt/settings, HSV red-removal thresholds, and 43.1 docs/sec throughput remain method claims without visible log/config artifacts in the inspected report tree.
+- The two Firm A CPAs excluded from the held-out split due to disambiguation ties remain plausible but not directly documented in a report field.
+- The 15 document types / 86.4% standard audit-report breakdown remains plausible but was not traced to a packaged table.
+
+## 5. Methodological + Narrative Discipline
+
+The narrative is materially cleaner than v3.18.2. The manuscript now keeps the central inference where it belongs: the evidence supports a replication-dominated calibration population and a continuous similarity-quality spectrum, not a directly observed signing workflow or a clean two-mechanism mixture.
+
+The remaining narrative issues are narrow:
+
+1. **Fix the new count-reconciliation note.** The current note is too specific and appears empirically false. Do not invoke successive snapshots or a floating-point boundary shift unless that can be shown from archived artifacts. The current evidence points to a firm-attribution-field mismatch.
+
+2. **Clarify Firm A membership consistently.** Several scripts use `accountants.firm`; script 28 uses `signatures.excel_firm`. Both may be defensible for different questions, but the paper must state which field defines Firm A in each table or harmonize the scripts.
+
+3. **Remove or soften remaining "known-majority-positive" phrasing.** The term appears in the Introduction, Methodology, Discussion, and Conclusion. The paper's better phrase is "replication-dominated reference population." "Known" still implies external ground truth stronger than the paper can document.
+
+4. **Correct the auditor-year / cross-year pooling description.** Methodology III-G says the auditor-year ranking is a "deliberately within-year aggregation that avoids cross-year pooling." But the same section and Results IV-G.2 state that each signature's best match is computed against the full same-CPA cross-year pool. The aggregation is by auditor-year, but the underlying similarity statistic is cross-year. Replace "avoids cross-year pooling" with "aggregates signatures within each auditor-year while using the full same-CPA pool for each signature's best-match statistic."
+
+5. **Align the byte-decomposition section reference.** If the `145/50/180/35` decomposition is meant to be a Results claim, put a sentence in IV-F.1 or cite Appendix B directly. As written, Section IV-F.1 reports the 310 all-sample byte-identical signatures, not the Firm A decomposition.
+
+## 6. IEEE Access Fit
+
+The paper remains a good IEEE Access fit. It is application-driven, computationally substantial, and methodologically relevant to document forensics, audit analytics, and computer vision. The contribution is not a novel neural architecture; it is a defensible calibration and validation strategy for a large archival corpus with limited ground truth.
+
+The remaining problems are reproducibility/provenance polish, not a collapse of the empirical core. Still, IEEE Access reviewers may scrutinize the supplement and table provenance. v3.18.3's Appendix B is now much stronger, but the newly added reconciliation note should be corrected because it is exactly the kind of precise provenance statement that reviewers can audit.
+
+## 7. Specific Actionable Revisions
+
+1. Replace the IV-H.2 `55,921` vs `55,922` explanation. Either:
+   - harmonize script 28 to use `accountants.firm` like `validation_recalibration.py` and regenerate the byte-decomposition JSON; or
+   - keep the current script 28 output and state that the one-record difference arises from `accountants.firm` versus `signatures.excel_firm` Firm A attribution.
+
+2. Add a short note in Appendix B or the script 28 report defining the Firm A grouping field for each artifact.
+
+3. Replace "known-majority-positive" with "replication-dominated" or "candidate replication-dominated" unless an external citation/ground-truth source is supplied.
+
+4. Revise Methodology III-G's auditor-year sentence so it does not claim the ranking avoids cross-year pooling.
+
+5. Add the `145/50/180/35` Firm A byte-decomposition sentence to Results IV-F.1, or cite Appendix B directly instead of Section IV-F.1 when discussing that decomposition.
+
+6. If time permits before submission, include supplementary logs/configs for YOLO metrics, VLM prompt/settings, HSV thresholds, and throughput. These are not central-result blockers, but they would strengthen the reproducibility package.
+
+Bottom line: v3.18.3 successfully fixes the fabricated Appendix B paths and most narrative overclaim from round 17. The manuscript should not be accepted until the new count-reconciliation explanation and the auditor-year pooling wording are corrected, but the required changes are small and localized.