From 1e37d344ea4790225c95cc2bb91bdec570dcbc55 Mon Sep 17 00:00:00 2001 From: gbanyan Date: Mon, 27 Apr 2026 20:59:07 +0800 Subject: [PATCH] Add codex GPT-5.5 round-18 independent peer review artifact paper/codex_review_gpt55_v3_18_3.md: 12.5 KB / 128 lines. Codex re-audited v3.18.3 against its own round-17 review, the live filesystem (verified all 17 Appendix B paths exist), and the SQLite database. Verdict: Minor Revision; the round-18 finding was that the v3.18.3 reconciliation note for 55,921 vs 55,922 was empirically false (DB query showed the cause was accountants.firm vs signatures.excel_firm field mismatch, not floating-point/snapshot drift). Co-Authored-By: Claude Opus 4.7 (1M context) --- paper/codex_review_gpt55_v3_18_3.md | 127 ++++++++++++++++++++++++++++ 1 file changed, 127 insertions(+) create mode 100644 paper/codex_review_gpt55_v3_18_3.md diff --git a/paper/codex_review_gpt55_v3_18_3.md b/paper/codex_review_gpt55_v3_18_3.md new file mode 100644 index 0000000..bcf65bc --- /dev/null +++ b/paper/codex_review_gpt55_v3_18_3.md @@ -0,0 +1,127 @@ +# Independent Peer Review (Round 18) - Paper A v3.18.3 + +Reviewer role: independent peer reviewer for IEEE Access Regular Paper. +Manuscript reviewed: "Replication-Dominated Calibration" - CPA signature analysis, v3.18.3, commits `f1c2537` + `26b934c` on `yolo-signature-pipeline`. +Audit basis: manuscript sections under `paper/`, prior round-16 and round-17 reviews, scripts under `signature_analysis/`, the current SQLite/report artifacts under `/Volumes/NV2/PDF-Processing/signature-analysis/`, and direct filesystem checks of Appendix B paths. + +## 1. Overall Verdict: Minor Revision + +I recommend **Minor Revision**, not Accept. + +v3.18.3 resolves the main round-17 provenance problem: the four fabricated Appendix B paths have been replaced with paths that exist in the available report tree, and the manuscript now explicitly states the local report root (`/Volumes/NV2/PDF-Processing/signature-analysis/`) plus the fact that the ablation artifact is a sibling of `reports/`. The prior "single dominant mechanism" wording is also removed from the main Methodology/Discussion passages, and the mistaken "p = 0.17 at n >= 10 signatures" parenthetical is fixed. + +However, the new reconciliation note for the `55,921` vs `55,922` Firm A cosine-only counts is not supported by the current artifacts. The manuscript attributes the one-record difference to successive database snapshots and a downstream floating-point shift of one borderline Firm A signature. Direct database checks indicate a different cause: Table IX is based on Firm A membership from `accountants.firm`, whereas `signature_analysis/28_byte_identity_decomposition.py` groups Firm A by `signatures.excel_firm`. In the current database, one signature above `cos > 0.95` belongs to an accountant whose registry firm is Firm A but whose `excel_firm` field is not Firm A. Thus the new note fixes the arithmetic discrepancy but introduces a false provenance explanation. + +This is Minor rather than Major because the one-record drift has negligible numerical effect and does not overturn the central findings. It should still be corrected before submission because v3.18.3 was specifically intended to repair provenance discipline. + +## 2. Re-audit of Round-17 Findings + +| Round-17 finding | v3.18.3 status | Re-audit notes | +|---|---|---| +| Appendix B provenance paths overclaimed / several did not exist | **RESOLVED** | All listed Appendix B report artifacts now exist when rebased to `/Volumes/NV2/PDF-Processing/signature-analysis/`. The replacement paths for formal statistics, Firm A per-year data, PDF verdicts, ablation, and byte decomposition are real. | +| Residual "single dominant mechanism" wording | **RESOLVED enough** | The exact phrase is gone from Methodology III-H and Discussion V-C. Current wording uses "dominant high-similarity regime plus residual within-firm heterogeneity," which is more defensible. | +| III-H "p = 0.17 at n >= 10 signatures" parenthetical | **RESOLVED** | The current text correctly reports the signature-level dip result as `p = 0.17`, `N = 60,448` Firm A signatures. The `n >= 10` filter is no longer attached to that claim. | +| "Widely recognized / widely held" practitioner wording | **RESOLVED enough** | Introduction now frames Firm A as selected by practitioner-knowledge motivation and evaluated by image evidence. III-H says "is understood within the audit profession" but immediately marks this as non-load-bearing. A citation would still be cleaner, but this is no longer a submission blocker. | +| 55,921 vs 55,922 Firm A cosine-only count discrepancy | **PARTIAL / NEW ERROR** | The manuscript now acknowledges the discrepancy, but the explanation appears wrong. Current DB evidence points to different Firm A attribution fields (`accountants.firm` vs `signatures.excel_firm`), not a snapshot/floating-point shift. | +| Still-unverifiable operational details: YOLO logs, VLM prompt/config, HSV thresholds, throughput log | **UNRESOLVED but not new** | These remain plausible method claims, but I did not find dedicated artifacts establishing them. This is acceptable for main-paper review only if the supplement includes training/config/runtime logs. | +| Section reference for `145/50/180/35` byte decomposition | **PARTIAL** | Appendix B now maps the decomposition to script 28, but the main results Section IV-F.1 still reports only the all-sample 310 byte-identical signatures, not the Firm A `145/50/180/35` decomposition. Several locations still cite Section IV-F.1 for a decomposition that is actually in III-H / V-C / Appendix B. | + +## 3. Appendix B Path Verification + +I checked every Appendix B artifact path directly against the filesystem. Rebased to `/Volumes/NV2/PDF-Processing/signature-analysis/`, all listed artifacts exist: + +| Appendix B artifact | Exists? | +|---|---| +| `reports/extraction_methodology.md` | Yes | +| `reports/pdf_signature_verdicts.json` | Yes | +| `reports/formal_statistical_data.json` | Yes | +| `reports/formal_statistical_report.md` | Yes | +| `reports/dip_test/dip_test_results.json` | Yes | +| `reports/beta_mixture/beta_mixture_results.json` | Yes | +| `reports/bd_sensitivity/bd_sensitivity.json` | Yes | +| `reports/pixel_validation/pixel_validation_results.json` | Yes | +| `reports/validation_recalibration/validation_recalibration.json` | Yes | +| `reports/expanded_validation/expanded_validation_results.json` | Yes | +| `reports/accountant_similarity_analysis.json` | Yes | +| `reports/figures/` | Yes | +| `reports/partner_ranking/partner_ranking_results.json` | Yes | +| `reports/intra_report/intra_report_results.json` | Yes | +| `reports/pdf_signature_verdict_report.md` | Yes | +| `ablation/ablation_results.json` | Yes | +| `reports/byte_identity_decomp/byte_identity_decomposition.json` | Yes | + +The path replacements are real. The only caveat is semantic rather than filesystem-level: Table XIII is described as "derived from `reports/accountant_similarity_analysis.json` filtered to Firm A; figures in `reports/figures/`." That is acceptable as provenance if the supplement documents the filter/query used for the table. + +## 4. Empirical-Claim Audit + +I focused on claims introduced or changed by v3.18.3. + +**Verified** + +- Appendix B path replacements exist in the actual report tree. +- `reports/byte_identity_decomp/byte_identity_decomposition.json` exists and reports: + - Firm A byte-identical signatures: `145` + - distinct Firm A partners: `50` + - registered Firm A partners: `180` + - cross-year byte-identical matches: `35` +- The same JSON reports cross-firm dual convergence: + - Firm A: `49,388 / 55,921 = 88.32%` + - Non-Firm-A: `27,596 / 65,515 = 42.12%` +- `validation_recalibration.json` reports Table IX's Firm A `cos > 0.95` count as `55,922 / 60,448 = 92.51%`. + +**New / Incorrect** + +- The new Results IV-H.2 reconciliation note says the `55,921` vs `55,922` discrepancy comes from successive snapshots and one borderline Firm A signature shifting from `cos > 0.95` to `cos = 0.95...` at floating-point precision. I could not reproduce that explanation. +- Direct SQLite checks on the current database show: + - Firm A by `accountants.firm`, `cos > 0.95`: `55,922` + - Firm A by `signatures.excel_firm`, `cos > 0.95`: `55,921` + - exactly one `cos > 0.95` signature has `accountants.firm = Firm A` but `signatures.excel_firm != Firm A`. +- The discrepant row I saw was `signature_id = 37768`, `assigned_accountant = 徐文亞`, `excel_firm = 黃毅民`, `max_similarity_to_same_accountant = 0.978511691093445`, `min_dhash_independent = 0`. That is not a `cos = 0.95...` borderline case. + +The corrected explanation should be along the lines of: Table IX uses accountant-registry Firm A membership, while script 28's cross-firm decomposition uses the `excel_firm` field; one above-threshold signature differs between those two firm-attribution fields. Alternatively, change script 28 to use the same `accountants.firm` join as the validation artifacts and regenerate the JSON. + +**Still only partially supported** + +- YOLO validation metrics, VLM prompt/settings, HSV red-removal thresholds, and 43.1 docs/sec throughput remain method claims without visible log/config artifacts in the inspected report tree. +- The two Firm A CPAs excluded from the held-out split due to disambiguation ties remain plausible but not directly documented in a report field. +- The 15 document types / 86.4% standard audit-report breakdown remains plausible but was not traced to a packaged table. + +## 5. Methodological + Narrative Discipline + +The narrative is materially cleaner than v3.18.2. The manuscript now keeps the central inference where it belongs: the evidence supports a replication-dominated calibration population and a continuous similarity-quality spectrum, not a directly observed signing workflow or a clean two-mechanism mixture. + +The remaining narrative issues are narrow: + +1. **Fix the new count-reconciliation note.** The current note is too specific and appears empirically false. Do not invoke successive snapshots or a floating-point boundary shift unless that can be shown from archived artifacts. The current evidence points to a firm-attribution-field mismatch. + +2. **Clarify Firm A membership consistently.** Several scripts use `accountants.firm`; script 28 uses `signatures.excel_firm`. Both may be defensible for different questions, but the paper must state which field defines Firm A in each table or harmonize the scripts. + +3. **Remove or soften remaining "known-majority-positive" phrasing.** The term appears in the Introduction, Methodology, Discussion, and Conclusion. The paper's better phrase is "replication-dominated reference population." "Known" still implies external ground truth stronger than the paper can document. + +4. **Correct the auditor-year / cross-year pooling description.** Methodology III-G says the auditor-year ranking is a "deliberately within-year aggregation that avoids cross-year pooling." But the same section and Results IV-G.2 state that each signature's best match is computed against the full same-CPA cross-year pool. The aggregation is by auditor-year, but the underlying similarity statistic is cross-year. Replace "avoids cross-year pooling" with "aggregates signatures within each auditor-year while using the full same-CPA pool for each signature's best-match statistic." + +5. **Align the byte-decomposition section reference.** If the `145/50/180/35` decomposition is meant to be a Results claim, put a sentence in IV-F.1 or cite Appendix B directly. As written, Section IV-F.1 reports the 310 all-sample byte-identical signatures, not the Firm A decomposition. + +## 6. IEEE Access Fit + +The paper remains a good IEEE Access fit. It is application-driven, computationally substantial, and methodologically relevant to document forensics, audit analytics, and computer vision. The contribution is not a novel neural architecture; it is a defensible calibration and validation strategy for a large archival corpus with limited ground truth. + +The remaining problems are reproducibility/provenance polish, not a collapse of the empirical core. Still, IEEE Access reviewers may scrutinize the supplement and table provenance. v3.18.3's Appendix B is now much stronger, but the newly added reconciliation note should be corrected because it is exactly the kind of precise provenance statement that reviewers can audit. + +## 7. Specific Actionable Revisions + +1. Replace the IV-H.2 `55,921` vs `55,922` explanation. Either: + - harmonize script 28 to use `accountants.firm` like `validation_recalibration.py` and regenerate the byte-decomposition JSON; or + - keep the current script 28 output and state that the one-record difference arises from `accountants.firm` versus `signatures.excel_firm` Firm A attribution. + +2. Add a short note in Appendix B or the script 28 report defining the Firm A grouping field for each artifact. + +3. Replace "known-majority-positive" with "replication-dominated" or "candidate replication-dominated" unless an external citation/ground-truth source is supplied. + +4. Revise Methodology III-G's auditor-year sentence so it does not claim the ranking avoids cross-year pooling. + +5. Add the `145/50/180/35` Firm A byte-decomposition sentence to Results IV-F.1, or cite Appendix B directly instead of Section IV-F.1 when discussing that decomposition. + +6. If time permits before submission, include supplementary logs/configs for YOLO metrics, VLM prompt/settings, HSV thresholds, and throughput. These are not central-result blockers, but they would strengthen the reproducibility package. + +Bottom line: v3.18.3 successfully fixes the fabricated Appendix B paths and most narrative overclaim from round 17. The manuscript should not be accepted until the new count-reconciliation explanation and the auditor-year pooling wording are corrected, but the required changes are small and localized.