# Review Handoff: Abstract and Introduction Date: 2026-05-15 Target manuscript: `paper/paper_a_v4_combined.md` Scope reviewed: Abstract and Introduction only ## Overall Assessment The Abstract and Introduction are substantively strong and defensible. The current argument is clear: - Regulations require CPA attestation, but digitized PDF workflows make stored-signature reuse operationally easy. - The problem is not signature forgery; identity is not in dispute. The target is detecting possible image-level reproduction by the legitimate signer or firm workflow. - The paper avoids claiming validated forensic detection and instead frames the system as an anchor-calibrated screening framework under unsupervised constraints. - The strongest methodological move is replacing unsupported distributional "natural threshold" logic with anchor-based inter-CPA coincidence-rate (ICCR) calibration. Recommended disposition: Minor Revision for prose and narrative complexity, not for core empirical weakness. ## Main Reviewer Concern The Introduction currently explains the methodology shift too explicitly as a research-process or version-history pivot. This is useful internally, but in the submitted paper it may increase complexity and invite reviewers to focus on why earlier versions used a different framing. The final manuscript should explain the final methodological choice, not the internal research journey. Keep: - The descriptor distribution does not support a stable within-population bimodal antimode. - Apparent multimodality is explained by firm composition and integer mass-point artefacts. - Mixture fits are descriptive, not threshold-generating. - Operational rules are characterized using anchor-based ICCR at multiple units. Reduce or remove: - "Earlier work in this lineage..." - "v4.0 contribution..." - "overturns this reading..." - "inherited Paper A v3.x..." - Internal script-heavy provenance in the Introduction. Detailed provenance belongs in Methodology, Results, Appendix, or reproducibility notes, not in the opening narrative. ## Suggested Rewrite Direction for Introduction Pivot Paragraph Current issue location: around `paper/paper_a_v4_combined.md`, Introduction paragraph beginning with "The methodological reframing relative to earlier versions..." Recommended replacement direction: ```text A key empirical finding is that the descriptor distributions do not support a within-population natural threshold. The apparent multimodality in the Big-4 accountant-level distribution is explained by between-firm location shifts and integer mass-point artefacts on the dHash axis. After firm-mean centring and integer-tie jitter, the pooled dHash dip-test rejection disappears. Within-firm diagnostics likewise do not reveal a stable bimodal antimode. We therefore treat mixture fits as descriptive summaries of firm-compositional structure rather than threshold-generating mechanisms, and calibrate the deployed operating rules using inter-CPA coincidence-rate anchors. ``` This preserves the methodological defense while removing the internal v3-to-v4 story. ## Abstract-Specific Comments The Abstract is strong but very dense. It is currently optimized for technical reviewers rather than broad readability. That may be acceptable for IEEE Access, but the first sentence has a small grammar/style issue. Suggested edit: ```text Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes it feasible to reuse a stored signature image across reports -- through administrative stamping or firm-level electronic signing -- thereby undermining individualized attestation. ``` Reason: - Current wording: "digitization makes reusing ... undermining ..." is grammatically awkward. - The suggested version makes the causal relation explicit. No need to remove the final limitation sentence. The sentence "not as a validated forensic detector; no calibrated error rates..." is important and should remain. ## Introduction-Specific Comments ### 1. Keep the legal framing but avoid legal overclaiming The sentence saying non-hand-signed workflows "may fall within the literal statutory requirement" is acceptable because it is cautious. Do not strengthen it into a legal conclusion. Preferred style: - "may fall within" - "raises substantive concerns" - "may not represent meaningful individual attestation" Avoid: - "violates" - "illegal" - "non-compliant" - "fraudulent" ### 2. Preserve the forgery distinction The distinction between non-hand-signing detection and signature forgery detection is one of the strongest conceptual contributions. Keep it prominent. Key idea to preserve: - Forgery detection asks whether the signer is genuine. - This paper asks whether the signing act was repeated for each document or a stored image was reused. ### 3. Reduce script/provenance detail in the Introduction Current paragraph references scripts such as Script 39c and Script 39d. This makes the Introduction read like an internal review memo. Recommendation: - Remove or simplify script references from Introduction. - Keep exact script provenance in Methodology, Results, Appendix B, or supplementary material. Specific risk: - The current parenthetical "10 firms tested in Script 39c" is imprecise for jittered-dHash. Script 39c raw dHash tests reject unimodality; the non-Big-4 jittered-dHash no-rejection statement depends on a codex-verified read-only spike on the same substrate. Safer Introduction wording: ```text Within-firm diagnostics likewise fail to reveal stable bimodal structure after accounting for integer ties, including in eligible mid/small-firm checks. ``` If provenance must remain: ```text Within-firm signature-level cosine checks fail to reject in eligible firms, and corresponding jittered-dHash checks fail to reject in Big-4 firms and in a read-only spike on the same mid/small-firm substrate. ``` ### 4. Avoid presenting the Introduction as a Results section The Introduction currently contains many detailed numbers. Some are necessary because the paper is methodological, but the v4 pivot paragraphs are numerically heavy. Keep headline numbers: - Dataset size: 90,282 reports, 182,328 signatures, 758 CPAs. - Big-4 scope: 437 CPAs, 150,442 signatures. - Key ICCR levels: per-comparison, per-signature, per-document. - Firm heterogeneity: Firm A 0.62 vs Firms B/C/D 0.09-0.16. Consider moving or reducing: - Full script-specific details. - Too many parenthetical rule semantics in the Introduction. - Repeated mentions of inherited/v3/v4 framing. ## Recommended Minimum Patch List 1. Fix Abstract first sentence grammar: ```text digitization makes it feasible to reuse... ``` 2. Rewrite the Introduction paragraph that begins with "The methodological reframing relative to earlier versions..." so it describes the final methodological rationale rather than v3-to-v4 revision history. 3. Remove or narrow `Script 39c` provenance in the Introduction because the raw vs jittered dHash distinction is subtle and currently risky. 4. Replace internal-version language across the Introduction: - Replace "v4.0 adopts..." with "We adopt..." - Replace "Earlier work in this lineage..." with "A distributional-threshold approach would be inappropriate here because..." - Replace "inherited Paper A v3.x five-way box rule" with "the deployed five-way box rule" unless historical provenance is essential. 5. Preserve limitation language: - The paper should continue to say it is not a validated forensic detector. - The paper should continue to say calibrated error rates cannot be reported without signature-level ground truth. ## Reviewer Bottom Line The paper should not hide that the distributional threshold path failed; that is actually a methodological strength. But it should present this as a final empirical finding and design rationale, not as a visible research-history correction. Recommended framing: ```text Because the observed distribution does not provide a defensible natural threshold, we use ICCR calibration to characterize the deployed operating rules under explicit unsupervised assumptions. ``` This is cleaner, less complex, and more reviewer-facing than the current v3-to-v4 narrative. ## Additional Framing Issue: Are We Giving Thresholds or Not? A likely reviewer confusion point is whether the paper provides a concrete classifier threshold or merely explains why no defensible threshold can be derived. The intended answer should be explicit: - The paper does provide a concrete, reproducible operational classifier. - The paper does not claim that this classifier is ground-truth-optimal. - The paper does not claim that the operating thresholds are natural antimodes in the descriptor distribution. - The paper's calibration contribution is to characterize the deployed rule's inter-CPA coincidence behavior under unsupervised assumptions. Recommended high-level framing: ```text We use a fixed, pre-specified five-way operating rule. The present calibration does not derive an optimal threshold; instead, it quantifies the rule's inter-CPA coincidence behavior at per-comparison, per-signature, and per-document units under explicit unsupervised assumptions. ``` Chinese interpretation: ```text 我們有一組明確、可重現的五分類操作規則;本文不是宣稱這組門檻是最佳門檻或自然分界點,而是在沒有 signature-level ground truth 的情況下,用 ICCR 量化這組規則的 specificity-proxy 行為。 ``` ## Concrete Threshold Language to Make Visible The manuscript should not bury the actual operating thresholds. Somewhere early in Methodology, and preferably summarized in Introduction, make the rule explicit: ```text High-confidence non-hand-signed: cosine > 0.95 AND dHash <= 5. Moderate-confidence non-hand-signed: cosine > 0.95 AND 5 < dHash <= 15. Other outcomes follow the fixed five-way box rule. ``` If space allows, add a compact sentence: ```text Thus, the system has explicit decision rules; what remains uncalibrated in the absence of signature-level labels is their true false-positive and false-negative error rate. ``` This directly answers the reviewer question: "Do the authors actually have a classifier?" ## Rewrite Style Recommendation Avoid language that sounds like the authors are unable to provide thresholds: - Avoid: "No threshold can be derived." - Avoid: "The distribution does not support classification." - Avoid: "We cannot determine a threshold." Use language that distinguishes operational thresholds from statistically natural or supervised-optimal thresholds: - Prefer: "The deployed thresholds are operational rules rather than natural antimodes." - Prefer: "We characterize these rules with ICCR rather than claiming supervised error rates." - Prefer: "The absence of a distributional antimode motivates anchor-based calibration, not threshold-free analysis." - Prefer: "The system is a concrete screening classifier with explicit unsupervised calibration limits." ## Reviewer-Facing Answer to the Threshold Question If the manuscript needs one sentence that resolves the ambiguity, use: ```text The system therefore uses explicit operating thresholds, but the evidentiary claim attached to those thresholds is limited: they define a reproducible screening rule whose coincidence behavior can be estimated under inter-CPA anchors, not a validated forensic decision boundary with calibrated error rates. ``` This should be the guiding style for Abstract, Introduction, and the start of Methodology. ## Readability Risk: Too Many Diagnostics Can Look Like Methodological Overbuilding The manuscript's multi-method statistical design increases rigor, but it also creates a readability risk. In the current form, some sections may feel like a defensive accumulation of diagnostics rather than a clean research design. Reviewer risk: - The reader may ask: "Are the authors using many methods because the core classifier is unclear?" - The reader may miss the simple main claim because the paper introduces too many caveats and validation tools early. - The paper may look like "we used many methods, therefore credible" instead of "each method answers one necessary question." Recommended main-thread sentence: ```text We deploy a fixed five-way screening rule and characterize its unsupervised reliability limits using ICCR, after showing that the descriptor distribution does not support a natural threshold. ``` Chinese interpretation: ```text 我們有明確五分類篩檢規則;先證明不能用自然分布切點來當門檻,再用 ICCR 描述這組規則在無標註資料中的可靠性邊界。 ``` All methods and diagnostics should serve this main thread. ## Core vs Supporting Diagnostics Treat the following as core and keep them prominent: - End-to-end pipeline: VLM -> YOLO -> ResNet -> cosine/dHash. - Explicit five-way operating rule. - Composition decomposition showing why the descriptor distribution does not yield a natural threshold. - ICCR calibration at three units: per-comparison, per-signature, per-document. - Firm heterogeneity and within-firm collision concentration. - Ground-truth limitation and no true error-rate claim. Treat the following as supporting diagnostics and avoid letting them dominate the main narrative: - K=2 / K=3 mixture fits. - Three-score Spearman convergence. - Leave-one-firm-out reproducibility. - BD/McCrary sensitivity. - Ten-tool validation table. - Pixel-identity positive anchor, especially because it is close to tautological for the high-confidence rule. These supporting diagnostics can stay, but they should be framed as robustness checks, assumption checks, or supplementary evidence, not as independent central contributions. ## Suggested Manuscript Structure for Clarity Recommended structure for the Methodology / Results narrative: 1. Core Method Describe the pipeline, descriptor construction, and five-way rule. 2. Why the Threshold Is Operational Rather Than Natural Use the composition decomposition only. Avoid over-explaining K=3, BD/McCrary, or historical mixture logic here. 3. How the Rule Is Calibrated Without Ground Truth Explain ICCR and the three reporting units: per-comparison, per-signature, per-document. 4. What the Calibration Reveals Report firm heterogeneity and within-firm collision concentration. 5. Supporting Diagnostics Place K=3, Spearman convergence, LOOO, BD/McCrary, and pixel-identity checks here as supporting evidence. ## Rewrite Style for Multi-Method Sections Avoid: ```text We apply a multi-tool validation framework consisting of ten diagnostics... ``` This can sound like methodological stacking. Prefer: ```text Each supporting diagnostic addresses a specific failure mode: composition artefacts, inter-CPA coincidence, pool-size effects, firm heterogeneity, or positive-anchor capture. ``` Avoid: ```text The conjunction of ten tools constitutes validation... ``` Prefer: ```text Together, these diagnostics define the limits of what can be supported without signature-level ground truth. ``` Avoid presenting auxiliary diagnostics before the reader understands the classifier. Preferred order: ```text Rule first. Then why not natural threshold. Then ICCR calibration. Then robustness. ``` ## Reviewer-Facing Principle The paper should not read as: ```text We used many methods, so the result is credible. ``` It should read as: ```text We use one explicit screening rule. Each statistical diagnostic answers one necessary question about how that rule should be interpreted under unsupervised constraints. ``` This distinction is important for readability and reviewer trust.