Files
pdf_signature_extraction/paper/codex_review_gpt55_v3_18_3.md
gbanyan 1e37d344ea Add codex GPT-5.5 round-18 independent peer review artifact
paper/codex_review_gpt55_v3_18_3.md: 12.5 KB / 128 lines. Codex re-audited
v3.18.3 against its own round-17 review, the live filesystem (verified
all 17 Appendix B paths exist), and the SQLite database. Verdict: Minor
Revision; the round-18 finding was that the v3.18.3 reconciliation note
for 55,921 vs 55,922 was empirically false (DB query showed the cause
was accountants.firm vs signatures.excel_firm field mismatch, not
floating-point/snapshot drift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:59:07 +08:00

12 KiB

Independent Peer Review (Round 18) - Paper A v3.18.3

Reviewer role: independent peer reviewer for IEEE Access Regular Paper. Manuscript reviewed: "Replication-Dominated Calibration" - CPA signature analysis, v3.18.3, commits f1c2537 + 26b934c on yolo-signature-pipeline. Audit basis: manuscript sections under paper/, prior round-16 and round-17 reviews, scripts under signature_analysis/, the current SQLite/report artifacts under /Volumes/NV2/PDF-Processing/signature-analysis/, and direct filesystem checks of Appendix B paths.

1. Overall Verdict: Minor Revision

I recommend Minor Revision, not Accept.

v3.18.3 resolves the main round-17 provenance problem: the four fabricated Appendix B paths have been replaced with paths that exist in the available report tree, and the manuscript now explicitly states the local report root (/Volumes/NV2/PDF-Processing/signature-analysis/) plus the fact that the ablation artifact is a sibling of reports/. The prior "single dominant mechanism" wording is also removed from the main Methodology/Discussion passages, and the mistaken "p = 0.17 at n >= 10 signatures" parenthetical is fixed.

However, the new reconciliation note for the 55,921 vs 55,922 Firm A cosine-only counts is not supported by the current artifacts. The manuscript attributes the one-record difference to successive database snapshots and a downstream floating-point shift of one borderline Firm A signature. Direct database checks indicate a different cause: Table IX is based on Firm A membership from accountants.firm, whereas signature_analysis/28_byte_identity_decomposition.py groups Firm A by signatures.excel_firm. In the current database, one signature above cos > 0.95 belongs to an accountant whose registry firm is Firm A but whose excel_firm field is not Firm A. Thus the new note fixes the arithmetic discrepancy but introduces a false provenance explanation.

This is Minor rather than Major because the one-record drift has negligible numerical effect and does not overturn the central findings. It should still be corrected before submission because v3.18.3 was specifically intended to repair provenance discipline.

2. Re-audit of Round-17 Findings

Round-17 finding v3.18.3 status Re-audit notes
Appendix B provenance paths overclaimed / several did not exist RESOLVED All listed Appendix B report artifacts now exist when rebased to /Volumes/NV2/PDF-Processing/signature-analysis/. The replacement paths for formal statistics, Firm A per-year data, PDF verdicts, ablation, and byte decomposition are real.
Residual "single dominant mechanism" wording RESOLVED enough The exact phrase is gone from Methodology III-H and Discussion V-C. Current wording uses "dominant high-similarity regime plus residual within-firm heterogeneity," which is more defensible.
III-H "p = 0.17 at n >= 10 signatures" parenthetical RESOLVED The current text correctly reports the signature-level dip result as p = 0.17, N = 60,448 Firm A signatures. The n >= 10 filter is no longer attached to that claim.
"Widely recognized / widely held" practitioner wording RESOLVED enough Introduction now frames Firm A as selected by practitioner-knowledge motivation and evaluated by image evidence. III-H says "is understood within the audit profession" but immediately marks this as non-load-bearing. A citation would still be cleaner, but this is no longer a submission blocker.
55,921 vs 55,922 Firm A cosine-only count discrepancy PARTIAL / NEW ERROR The manuscript now acknowledges the discrepancy, but the explanation appears wrong. Current DB evidence points to different Firm A attribution fields (accountants.firm vs signatures.excel_firm), not a snapshot/floating-point shift.
Still-unverifiable operational details: YOLO logs, VLM prompt/config, HSV thresholds, throughput log UNRESOLVED but not new These remain plausible method claims, but I did not find dedicated artifacts establishing them. This is acceptable for main-paper review only if the supplement includes training/config/runtime logs.
Section reference for 145/50/180/35 byte decomposition PARTIAL Appendix B now maps the decomposition to script 28, but the main results Section IV-F.1 still reports only the all-sample 310 byte-identical signatures, not the Firm A 145/50/180/35 decomposition. Several locations still cite Section IV-F.1 for a decomposition that is actually in III-H / V-C / Appendix B.

3. Appendix B Path Verification

I checked every Appendix B artifact path directly against the filesystem. Rebased to /Volumes/NV2/PDF-Processing/signature-analysis/, all listed artifacts exist:

Appendix B artifact Exists?
reports/extraction_methodology.md Yes
reports/pdf_signature_verdicts.json Yes
reports/formal_statistical_data.json Yes
reports/formal_statistical_report.md Yes
reports/dip_test/dip_test_results.json Yes
reports/beta_mixture/beta_mixture_results.json Yes
reports/bd_sensitivity/bd_sensitivity.json Yes
reports/pixel_validation/pixel_validation_results.json Yes
reports/validation_recalibration/validation_recalibration.json Yes
reports/expanded_validation/expanded_validation_results.json Yes
reports/accountant_similarity_analysis.json Yes
reports/figures/ Yes
reports/partner_ranking/partner_ranking_results.json Yes
reports/intra_report/intra_report_results.json Yes
reports/pdf_signature_verdict_report.md Yes
ablation/ablation_results.json Yes
reports/byte_identity_decomp/byte_identity_decomposition.json Yes

The path replacements are real. The only caveat is semantic rather than filesystem-level: Table XIII is described as "derived from reports/accountant_similarity_analysis.json filtered to Firm A; figures in reports/figures/." That is acceptable as provenance if the supplement documents the filter/query used for the table.

4. Empirical-Claim Audit

I focused on claims introduced or changed by v3.18.3.

Verified

  • Appendix B path replacements exist in the actual report tree.
  • reports/byte_identity_decomp/byte_identity_decomposition.json exists and reports:
    • Firm A byte-identical signatures: 145
    • distinct Firm A partners: 50
    • registered Firm A partners: 180
    • cross-year byte-identical matches: 35
  • The same JSON reports cross-firm dual convergence:
    • Firm A: 49,388 / 55,921 = 88.32%
    • Non-Firm-A: 27,596 / 65,515 = 42.12%
  • validation_recalibration.json reports Table IX's Firm A cos > 0.95 count as 55,922 / 60,448 = 92.51%.

New / Incorrect

  • The new Results IV-H.2 reconciliation note says the 55,921 vs 55,922 discrepancy comes from successive snapshots and one borderline Firm A signature shifting from cos > 0.95 to cos = 0.95... at floating-point precision. I could not reproduce that explanation.
  • Direct SQLite checks on the current database show:
    • Firm A by accountants.firm, cos > 0.95: 55,922
    • Firm A by signatures.excel_firm, cos > 0.95: 55,921
    • exactly one cos > 0.95 signature has accountants.firm = Firm A but signatures.excel_firm != Firm A.
  • The discrepant row I saw was signature_id = 37768, assigned_accountant = 徐文亞, excel_firm = 黃毅民, max_similarity_to_same_accountant = 0.978511691093445, min_dhash_independent = 0. That is not a cos = 0.95... borderline case.

The corrected explanation should be along the lines of: Table IX uses accountant-registry Firm A membership, while script 28's cross-firm decomposition uses the excel_firm field; one above-threshold signature differs between those two firm-attribution fields. Alternatively, change script 28 to use the same accountants.firm join as the validation artifacts and regenerate the JSON.

Still only partially supported

  • YOLO validation metrics, VLM prompt/settings, HSV red-removal thresholds, and 43.1 docs/sec throughput remain method claims without visible log/config artifacts in the inspected report tree.
  • The two Firm A CPAs excluded from the held-out split due to disambiguation ties remain plausible but not directly documented in a report field.
  • The 15 document types / 86.4% standard audit-report breakdown remains plausible but was not traced to a packaged table.

5. Methodological + Narrative Discipline

The narrative is materially cleaner than v3.18.2. The manuscript now keeps the central inference where it belongs: the evidence supports a replication-dominated calibration population and a continuous similarity-quality spectrum, not a directly observed signing workflow or a clean two-mechanism mixture.

The remaining narrative issues are narrow:

  1. Fix the new count-reconciliation note. The current note is too specific and appears empirically false. Do not invoke successive snapshots or a floating-point boundary shift unless that can be shown from archived artifacts. The current evidence points to a firm-attribution-field mismatch.

  2. Clarify Firm A membership consistently. Several scripts use accountants.firm; script 28 uses signatures.excel_firm. Both may be defensible for different questions, but the paper must state which field defines Firm A in each table or harmonize the scripts.

  3. Remove or soften remaining "known-majority-positive" phrasing. The term appears in the Introduction, Methodology, Discussion, and Conclusion. The paper's better phrase is "replication-dominated reference population." "Known" still implies external ground truth stronger than the paper can document.

  4. Correct the auditor-year / cross-year pooling description. Methodology III-G says the auditor-year ranking is a "deliberately within-year aggregation that avoids cross-year pooling." But the same section and Results IV-G.2 state that each signature's best match is computed against the full same-CPA cross-year pool. The aggregation is by auditor-year, but the underlying similarity statistic is cross-year. Replace "avoids cross-year pooling" with "aggregates signatures within each auditor-year while using the full same-CPA pool for each signature's best-match statistic."

  5. Align the byte-decomposition section reference. If the 145/50/180/35 decomposition is meant to be a Results claim, put a sentence in IV-F.1 or cite Appendix B directly. As written, Section IV-F.1 reports the 310 all-sample byte-identical signatures, not the Firm A decomposition.

6. IEEE Access Fit

The paper remains a good IEEE Access fit. It is application-driven, computationally substantial, and methodologically relevant to document forensics, audit analytics, and computer vision. The contribution is not a novel neural architecture; it is a defensible calibration and validation strategy for a large archival corpus with limited ground truth.

The remaining problems are reproducibility/provenance polish, not a collapse of the empirical core. Still, IEEE Access reviewers may scrutinize the supplement and table provenance. v3.18.3's Appendix B is now much stronger, but the newly added reconciliation note should be corrected because it is exactly the kind of precise provenance statement that reviewers can audit.

7. Specific Actionable Revisions

  1. Replace the IV-H.2 55,921 vs 55,922 explanation. Either:

    • harmonize script 28 to use accountants.firm like validation_recalibration.py and regenerate the byte-decomposition JSON; or
    • keep the current script 28 output and state that the one-record difference arises from accountants.firm versus signatures.excel_firm Firm A attribution.
  2. Add a short note in Appendix B or the script 28 report defining the Firm A grouping field for each artifact.

  3. Replace "known-majority-positive" with "replication-dominated" or "candidate replication-dominated" unless an external citation/ground-truth source is supplied.

  4. Revise Methodology III-G's auditor-year sentence so it does not claim the ranking avoids cross-year pooling.

  5. Add the 145/50/180/35 Firm A byte-decomposition sentence to Results IV-F.1, or cite Appendix B directly instead of Section IV-F.1 when discussing that decomposition.

  6. If time permits before submission, include supplementary logs/configs for YOLO metrics, VLM prompt/settings, HSV thresholds, and throughput. These are not central-result blockers, but they would strengthen the reproducibility package.

Bottom line: v3.18.3 successfully fixes the fabricated Appendix B paths and most narrative overclaim from round 17. The manuscript should not be accepted until the new count-reconciliation explanation and the auditor-year pooling wording are corrected, but the required changes are small and localized.