# Independent Peer Review (Round 19) - Paper A v3.18.4

## 1. Overall Verdict: Major Revision

I recommend **Major Revision**. While v3.18.4 resolves the fabricated Appendix B paths and the cross-firm dual-descriptor arithmetic discrepancy, my independent audit found several profound new discrepancies, fabricated rationalizations, and a critical methodological flaw that survived the previous 18 review rounds.

The most severe issues are:
1. **Fabricated Rationalization for Excluded Documents:** Section IV-H claims 656 documents were excluded because they "carry only a single detected signature, for which no same-CPA pairwise comparison and therefore no best-match cosine / min dHash statistic is available." This fundamentally contradicts the pipeline's core logic (which computes maximum pairwise similarity across the *entire corpus* per CPA, not intra-document) and Section IV-D.1 (which correctly states only 15 signatures belong to singleton CPAs). The 656 documents were actually excluded because they had no CPA-matched signatures at all (`assigned_accountant IS NULL`).
2. **Fabricated Provenance for Table XIII:** Appendix B claims Table XIII (Firm A per-year cosine distribution) is derived from `reports/accountant_similarity_analysis.json`. However, the generating script (`08_accountant_similarity_analysis.py`) neither extracts nor groups by the `year_month` field. The table's temporal data has no supporting script in the provided pipeline.
3. **Fabricated Rationalization for Firm A Partners:** Section IV-F.2 claims "two [CPAs were] excluded for disambiguation ties" to explain the 178 vs. 180 Firm A partner split. The actual script `24_validation_recalibration.py` contains no disambiguation logic; it simply takes the set of unique CPAs successfully assigned to Firm A in the database, which happens to be 178.
4. **Methodological Flaw in Inter-CPA Negative Anchor:** Script `21_expanded_validation.py` claims to generate ~50,000 random inter-CPA pairs for validation. However, the script artificially draws these pairs from a tiny pool of just `n=3,000` randomly selected signatures, rather than the full 168,755 corpus. This severely constrains diversity (reusing the same signatures ~33 times each) and artificially tightens the confidence intervals reported in Table X.

These issues represent severe provenance, narrative, and statistical failures. The paper must undergo a major revision to correct these fabricated rationalizations and ensure the reported numbers and methodologies match the actual execution.

## 2. Empirical-Claim Audit Table

| Claim | Status | Audit basis / notes |
|---|---|---|
| 656 single-signature documents excluded because "no same-CPA pairwise comparison" is available | **FABRICATED** | Contradicts cross-document comparison logic and IV-D.1 (only 15 singleton CPAs lack comparison). The real reason is they failed CPA matching entirely. |
| 178 Firm A CPAs in split vs 180 registry; "two excluded for disambiguation ties" | **FABRICATED** | `24_validation_recalibration.py` simply takes unique accountants with `firm=FIRM_A`. There is no disambiguation logic in the script. |
| Table XIII (Firm A per-year cosine distribution) | **FABRICATED PROVENANCE** | App. B claims it's derived from `accountant_similarity_analysis.json`, but `08_accountant_similarity_analysis.py` doesn't extract or group by year. |
| 50,000 inter-CPA negative pairs | **METHODOLOGICALLY FLAWED** | `21_expanded_validation.py` draws 50,000 pairs from a tiny pool of `n=3000` signatures, artificially constraining diversity. |
| 145/50/180/35 byte-identity decomp | **VERIFIED-AGAINST-ARTIFACT** | Matches `28_byte_identity_decomposition.py`. |
| Cross-firm convergence 42.12% vs 88.32% | **VERIFIED-AGAINST-ARTIFACT** | Denominators (65,514 and 55,922) reconcile correctly with the updated `accountants.firm` logic. |
| 90,282 PDFs, 2013-2023, Taiwan | **VERIFIED-IN-TEXT** | Consistent across manuscript. |
| 86,072 VLM-positive documents; 12 corrupted PDFs; final 86,071 | **VERIFIED-IN-TEXT** | Internally consistent in III-C. |
| 182,328 extracted signatures; 168,755 CPA-matched; 13,573 unmatched | **VERIFIED-IN-TEXT** | Matches manuscript counts. |
| 758 CPAs, 15 document types, 86.4% standard audit reports | **UNVERIFIABLE** | Plausible, but no direct packaged JSON verifies the 15/86.4% split. |
| Qwen2.5-VL 32B, 180 DPI, first-quartile scan, temperature 0 | **UNVERIFIABLE** | No prompt/config/log artifact inspected. |
| YOLO metrics (precision, recall, mAP) and 43.1 docs/sec throughput | **UNVERIFIABLE** | No training-results or runtime artifact in `signature_analysis/`. |
| Same-CPA best-match N = 168,740, 15 fewer than matched due to singleton CPAs | **VERIFIED-AGAINST-ARTIFACT** | Matches dip-test report and script logic. |
| ResNet-50 ImageNet-1K V2, 2048-d, L2 normalized | **VERIFIED-AGAINST-ARTIFACT** | Consistent with methods and ablation script. |
| All-pairs intra/inter distribution N = 41,352,824 / 500,000; KDE crossover 0.837 | **VERIFIED-AGAINST-ARTIFACT** | Supported by formal-statistical script. |
| Firm A dip result N=60,448, dip=0.0019, p=0.169 | **VERIFIED-AGAINST-ARTIFACT** | `15_hartigan_dip_test.py`. |
| Beta mixture Delta BIC = 381 for Firm A; forced crossings 0.977/0.999 | **VERIFIED-AGAINST-ARTIFACT** | `17_beta_mixture_em.py`. |

## 3. Methodological Soundness

While the dual-descriptor design and replication-dominated anchor are fundamentally sound, there is a severe flaw in the inter-CPA negative anchor construction that must be corrected.
**Flawed Inter-CPA Anchor Generation:** `21_expanded_validation.py` randomly selects just 3,000 feature vectors out of the 168,755 available signatures (via `load_feature_vectors_sample`), and then randomly pairs them to generate 50,000 negative samples. This means that each of the 3,000 signatures is reused in approximately 33 different pairs, artificially deflating the variance and diversity of the negative population. This compromises the tight Wilson 95% confidence intervals on FAR reported in Table X. The script should sample pairs uniformly across the entire 168,755 corpus.

## 4. Narrative Discipline

The manuscript's narrative discipline has improved regarding the removal of the "known-majority-positive" residue. However, the authors have resorted to fabricating rationalizations to explain simple arithmetic gaps:
- **The 656 Document Exclusion:** Inventing a false methodological limitation ("single signature ... no same-CPA pairwise comparison") to explain a drop in document counts is unacceptable and undermines the paper's credibility, especially when the core methodology explicitly relies on cross-document matching.
- **The 2 CPAs Exclusion:** Inventing "disambiguation ties" to explain why 178 CPAs are in the Firm A split instead of the registered 180 is similarly dishonest. If the database only successfully matched signatures to 178 Firm A CPAs, the text should state exactly that.

## 5. IEEE Access Fit

The work remains a strong fit for IEEE Access due to its scale and real-world application, provided the provenance and methodological issues are rectified. The journal emphasizes reproducibility, making the fabricated provenance for Table XIII and the statistical flaw in the FAR validation critical blockers for publication.

## 6. Specific Actionable Revisions

1. **Rewrite the 656-document exclusion explanation (Section IV-H):** State that 656 documents were excluded from the per-document classification because none of their extracted signatures could be successfully matched to a registered CPA name, not because single signatures lack cross-document comparison.
2. **Remove the fabricated "disambiguation ties" claim (Section IV-F.2):** State simply that the 70/30 split was performed over the 178 Firm A CPAs who had successfully matched signatures in the corpus (compared to the 180 in the registry).
3. **Provide actual script provenance for Table XIII:** Either supply the script that generates the year-by-year left-tail distribution, or remove Table XIII from the manuscript. Do not falsely attribute it to `08_accountant_similarity_analysis.py` (which does not group by year).
4. **Fix the Inter-CPA Negative Anchor Script:** Modify `21_expanded_validation.py` to sample 50,000 pairs uniformly from the entire 168,755 matched-signature corpus, rather than from a pre-sampled subset of 3,000. Re-run and update Table X.
5. **(Optional but recommended) Include Unverifiable Logs:** Add YOLO training logs, VLM configuration details, and the 15-document-type breakdown table to the supplementary materials so that claims in Section III-B, III-C, and III-D become verifiable.

## 7. Disagreements with Codex Round-18

I strongly disagree with the Round-18 Codex reviewer's conclusion that the manuscript only required a "Minor Revision." 
- Codex completely missed that the "656 single-signature documents" explanation in Section IV-H is a fabricated rationalization that fundamentally contradicts the cross-document matching methodology correctly established elsewhere in the paper.
- Codex blindly accepted the provenance of Table XIII (claiming it was derived from `accountant_similarity_analysis.json`) without checking that the generating script (`08_accountant_similarity_analysis.py`) contains absolutely no temporal (`year_month`) extraction or aggregation logic.
- Codex missed the completely invented "two CPAs excluded for disambiguation ties" rationalization.
- Codex missed the statistical flaw in `21_expanded_validation.py` where 50,000 negative pairs are artificially drawn from an overly restricted pool of only 3,000 signatures.

These are significant issues involving empirical honesty and statistical validity that 18 rounds of AI review failed to catch. A Major Revision is strictly required before submission.