Add codex GPT-5.5 round-18 independent peer review artifact

paper/codex_review_gpt55_v3_18_3.md: 12.5 KB / 128 lines. Codex re-audited v3.18.3 against its own round-17 review, the live filesystem (verified all 17 Appendix B paths exist), and the SQLite database. Verdict: Minor Revision; the round-18 finding was that the v3.18.3 reconciliation note for 55,921 vs 55,922 was empirically false (DB query showed the cause was accountants.firm vs signatures.excel_firm field mismatch, not floating-point/snapshot drift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:59:07 +08:00
parent 6b64eabbfb
commit 1e37d344ea
1 changed files with 127 additions and 0 deletions
@@ -0,0 +1,127 @@
 # Independent Peer Review (Round 18) - Paper A v3.18.3
 Reviewer role: independent peer reviewer for IEEE Access Regular Paper.
 Manuscript reviewed: "Replication-Dominated Calibration" - CPA signature analysis, v3.18.3, commits `f1c2537` + `26b934c` on `yolo-signature-pipeline`.
 Audit basis: manuscript sections under `paper/`, prior round-16 and round-17 reviews, scripts under `signature_analysis/`, the current SQLite/report artifacts under `/Volumes/NV2/PDF-Processing/signature-analysis/`, and direct filesystem checks of Appendix B paths.
 ## 1. Overall Verdict: Minor Revision
 I recommend **Minor Revision**, not Accept.
 v3.18.3 resolves the main round-17 provenance problem: the four fabricated Appendix B paths have been replaced with paths that exist in the available report tree, and the manuscript now explicitly states the local report root (`/Volumes/NV2/PDF-Processing/signature-analysis/`) plus the fact that the ablation artifact is a sibling of `reports/`. The prior "single dominant mechanism" wording is also removed from the main Methodology/Discussion passages, and the mistaken "p = 0.17 at n >= 10 signatures" parenthetical is fixed.
 However, the new reconciliation note for the `55,921` vs `55,922` Firm A cosine-only counts is not supported by the current artifacts. The manuscript attributes the one-record difference to successive database snapshots and a downstream floating-point shift of one borderline Firm A signature. Direct database checks indicate a different cause: Table IX is based on Firm A membership from `accountants.firm`, whereas `signature_analysis/28_byte_identity_decomposition.py` groups Firm A by `signatures.excel_firm`. In the current database, one signature above `cos > 0.95` belongs to an accountant whose registry firm is Firm A but whose `excel_firm` field is not Firm A. Thus the new note fixes the arithmetic discrepancy but introduces a false provenance explanation.
 This is Minor rather than Major because the one-record drift has negligible numerical effect and does not overturn the central findings. It should still be corrected before submission because v3.18.3 was specifically intended to repair provenance discipline.
 ## 2. Re-audit of Round-17 Findings
 | Round-17 finding | v3.18.3 status | Re-audit notes |
 |---|---|---|
 | Appendix B provenance paths overclaimed / several did not exist | **RESOLVED** | All listed Appendix B report artifacts now exist when rebased to `/Volumes/NV2/PDF-Processing/signature-analysis/`. The replacement paths for formal statistics, Firm A per-year data, PDF verdicts, ablation, and byte decomposition are real. |
 | Residual "single dominant mechanism" wording | **RESOLVED enough** | The exact phrase is gone from Methodology III-H and Discussion V-C. Current wording uses "dominant high-similarity regime plus residual within-firm heterogeneity," which is more defensible. |
 | III-H "p = 0.17 at n >= 10 signatures" parenthetical | **RESOLVED** | The current text correctly reports the signature-level dip result as `p = 0.17`, `N = 60,448` Firm A signatures. The `n >= 10` filter is no longer attached to that claim. |
 | "Widely recognized / widely held" practitioner wording | **RESOLVED enough** | Introduction now frames Firm A as selected by practitioner-knowledge motivation and evaluated by image evidence. III-H says "is understood within the audit profession" but immediately marks this as non-load-bearing. A citation would still be cleaner, but this is no longer a submission blocker. |
 | 55,921 vs 55,922 Firm A cosine-only count discrepancy | **PARTIAL / NEW ERROR** | The manuscript now acknowledges the discrepancy, but the explanation appears wrong. Current DB evidence points to different Firm A attribution fields (`accountants.firm` vs `signatures.excel_firm`), not a snapshot/floating-point shift. |
 | Still-unverifiable operational details: YOLO logs, VLM prompt/config, HSV thresholds, throughput log | **UNRESOLVED but not new** | These remain plausible method claims, but I did not find dedicated artifacts establishing them. This is acceptable for main-paper review only if the supplement includes training/config/runtime logs. |
 | Section reference for `145/50/180/35` byte decomposition | **PARTIAL** | Appendix B now maps the decomposition to script 28, but the main results Section IV-F.1 still reports only the all-sample 310 byte-identical signatures, not the Firm A `145/50/180/35` decomposition. Several locations still cite Section IV-F.1 for a decomposition that is actually in III-H / V-C / Appendix B. |
 ## 3. Appendix B Path Verification
 I checked every Appendix B artifact path directly against the filesystem. Rebased to `/Volumes/NV2/PDF-Processing/signature-analysis/`, all listed artifacts exist:
 | Appendix B artifact | Exists? |
 |---|---|
 | `reports/extraction_methodology.md` | Yes |
 | `reports/pdf_signature_verdicts.json` | Yes |
 | `reports/formal_statistical_data.json` | Yes |
 | `reports/formal_statistical_report.md` | Yes |
 | `reports/dip_test/dip_test_results.json` | Yes |
 | `reports/beta_mixture/beta_mixture_results.json` | Yes |
 | `reports/bd_sensitivity/bd_sensitivity.json` | Yes |
 | `reports/pixel_validation/pixel_validation_results.json` | Yes |
 | `reports/validation_recalibration/validation_recalibration.json` | Yes |
 | `reports/expanded_validation/expanded_validation_results.json` | Yes |
 | `reports/accountant_similarity_analysis.json` | Yes |
 | `reports/figures/` | Yes |
 | `reports/partner_ranking/partner_ranking_results.json` | Yes |
 | `reports/intra_report/intra_report_results.json` | Yes |
 | `reports/pdf_signature_verdict_report.md` | Yes |
 | `ablation/ablation_results.json` | Yes |
 | `reports/byte_identity_decomp/byte_identity_decomposition.json` | Yes |
 The path replacements are real. The only caveat is semantic rather than filesystem-level: Table XIII is described as "derived from `reports/accountant_similarity_analysis.json` filtered to Firm A; figures in `reports/figures/`." That is acceptable as provenance if the supplement documents the filter/query used for the table.
 ## 4. Empirical-Claim Audit
 I focused on claims introduced or changed by v3.18.3.
 **Verified**
 - Appendix B path replacements exist in the actual report tree.
 - `reports/byte_identity_decomp/byte_identity_decomposition.json` exists and reports:
  - Firm A byte-identical signatures: `145`
  - distinct Firm A partners: `50`
  - registered Firm A partners: `180`
  - cross-year byte-identical matches: `35`
 - The same JSON reports cross-firm dual convergence:
  - Firm A: `49,388 / 55,921 = 88.32%`
  - Non-Firm-A: `27,596 / 65,515 = 42.12%`
 - `validation_recalibration.json` reports Table IX's Firm A `cos > 0.95` count as `55,922 / 60,448 = 92.51%`.
 **New / Incorrect**
 - The new Results IV-H.2 reconciliation note says the `55,921` vs `55,922` discrepancy comes from successive snapshots and one borderline Firm A signature shifting from `cos > 0.95` to `cos = 0.95...` at floating-point precision. I could not reproduce that explanation.
 - Direct SQLite checks on the current database show:
  - Firm A by `accountants.firm`, `cos > 0.95`: `55,922`
  - Firm A by `signatures.excel_firm`, `cos > 0.95`: `55,921`
  - exactly one `cos > 0.95` signature has `accountants.firm = Firm A` but `signatures.excel_firm != Firm A`.
 - The discrepant row I saw was `signature_id = 37768`, `assigned_accountant = 徐文亞`, `excel_firm = 黃毅民`, `max_similarity_to_same_accountant = 0.978511691093445`, `min_dhash_independent = 0`. That is not a `cos = 0.95...` borderline case.
 The corrected explanation should be along the lines of: Table IX uses accountant-registry Firm A membership, while script 28's cross-firm decomposition uses the `excel_firm` field; one above-threshold signature differs between those two firm-attribution fields. Alternatively, change script 28 to use the same `accountants.firm` join as the validation artifacts and regenerate the JSON.
 **Still only partially supported**
 - YOLO validation metrics, VLM prompt/settings, HSV red-removal thresholds, and 43.1 docs/sec throughput remain method claims without visible log/config artifacts in the inspected report tree.
 - The two Firm A CPAs excluded from the held-out split due to disambiguation ties remain plausible but not directly documented in a report field.
 - The 15 document types / 86.4% standard audit-report breakdown remains plausible but was not traced to a packaged table.
 ## 5. Methodological + Narrative Discipline
 The narrative is materially cleaner than v3.18.2. The manuscript now keeps the central inference where it belongs: the evidence supports a replication-dominated calibration population and a continuous similarity-quality spectrum, not a directly observed signing workflow or a clean two-mechanism mixture.
 The remaining narrative issues are narrow:
 1. **Fix the new count-reconciliation note.** The current note is too specific and appears empirically false. Do not invoke successive snapshots or a floating-point boundary shift unless that can be shown from archived artifacts. The current evidence points to a firm-attribution-field mismatch.
 2. **Clarify Firm A membership consistently.** Several scripts use `accountants.firm`; script 28 uses `signatures.excel_firm`. Both may be defensible for different questions, but the paper must state which field defines Firm A in each table or harmonize the scripts.
 3. **Remove or soften remaining "known-majority-positive" phrasing.** The term appears in the Introduction, Methodology, Discussion, and Conclusion. The paper's better phrase is "replication-dominated reference population." "Known" still implies external ground truth stronger than the paper can document.
 4. **Correct the auditor-year / cross-year pooling description.** Methodology III-G says the auditor-year ranking is a "deliberately within-year aggregation that avoids cross-year pooling." But the same section and Results IV-G.2 state that each signature's best match is computed against the full same-CPA cross-year pool. The aggregation is by auditor-year, but the underlying similarity statistic is cross-year. Replace "avoids cross-year pooling" with "aggregates signatures within each auditor-year while using the full same-CPA pool for each signature's best-match statistic."
 5. **Align the byte-decomposition section reference.** If the `145/50/180/35` decomposition is meant to be a Results claim, put a sentence in IV-F.1 or cite Appendix B directly. As written, Section IV-F.1 reports the 310 all-sample byte-identical signatures, not the Firm A decomposition.
 ## 6. IEEE Access Fit
 The paper remains a good IEEE Access fit. It is application-driven, computationally substantial, and methodologically relevant to document forensics, audit analytics, and computer vision. The contribution is not a novel neural architecture; it is a defensible calibration and validation strategy for a large archival corpus with limited ground truth.
 The remaining problems are reproducibility/provenance polish, not a collapse of the empirical core. Still, IEEE Access reviewers may scrutinize the supplement and table provenance. v3.18.3's Appendix B is now much stronger, but the newly added reconciliation note should be corrected because it is exactly the kind of precise provenance statement that reviewers can audit.
 ## 7. Specific Actionable Revisions
 1. Replace the IV-H.2 `55,921` vs `55,922` explanation. Either:
   - harmonize script 28 to use `accountants.firm` like `validation_recalibration.py` and regenerate the byte-decomposition JSON; or
   - keep the current script 28 output and state that the one-record difference arises from `accountants.firm` versus `signatures.excel_firm` Firm A attribution.
 2. Add a short note in Appendix B or the script 28 report defining the Firm A grouping field for each artifact.
 3. Replace "known-majority-positive" with "replication-dominated" or "candidate replication-dominated" unless an external citation/ground-truth source is supplied.
 4. Revise Methodology III-G's auditor-year sentence so it does not claim the ranking avoids cross-year pooling.
 5. Add the `145/50/180/35` Firm A byte-decomposition sentence to Results IV-F.1, or cite Appendix B directly instead of Section IV-F.1 when discussing that decomposition.
 6. If time permits before submission, include supplementary logs/configs for YOLO metrics, VLM prompt/settings, HSV thresholds, and throughput. These are not central-result blockers, but they would strengthen the reproducibility package.
 Bottom line: v3.18.3 successfully fixes the fabricated Appendix B paths and most narrative overclaim from round 17. The manuscript should not be accepted until the new count-reconciliation explanation and the auditor-year pooling wording are corrected, but the required changes are small and localized.