Phase 6 round-2 reviewer revisions: §III-H.1 promotion + framing alignment
Structural: - Promote operational classifier definition from §III-L.0 to new §III-H.1, so the reader meets the five-way HC/MC/HSC/UN/LH rule before the §III-I/J/K diagnostic chain instead of ~130 lines after. §III-L renamed to "Anchor-Based Threshold Calibration"; §III-L.0 retains only calibration methodology, three units of analysis, any-pair semantics, and the FAR terminological note. §III-L.7 deleted (redundant with §III-J). - Reorganise §V-H Limitations into Primary / Secondary / Documented features / Engineering groupings (was a flat 14-item list). - Reframe §III-M from "ten-tool unsupervised-validation collection" to "each diagnostic addresses one specific unsupervised failure mode"; rename "What v4.0 does/does not claim" → "Limits / Scope of the present analysis"; retitle Table XXVII. Framing alignment (cross-section): - Strip all v3.x / v4.0 / v3.20 / v4-new / inherited lineage labels from rendered text (Abstract, Intro, §II, §III, §IV, §V, §VI, Appendix, Impact). - Replace "Paper A" rule references with "deployed" rule references. - Soften "validation" to "characterise" / "check" / "screening label" / "consistency check" / "support"; "verdict" → "screening label". - Remove codex-verified spike claims (non-Big-4 jittered dHash, Big-4 pooled cosine after firm-mean centring). Only formally scripted evidence (Scripts 39b–39e) retained; non-Big-4 evidence framed as corroborating raw-axis cosine, not as calibration evidence. - Strip script-provenance parentheticals from Introduction; defer Script 39c internal references and similar to Methodology / Appendix. Numerical / table fixes: - §III-C document-count arithmetic: 12 corrupted → 13 corrupted/unreadable, verified against sqlite DB and total-pdf/ folder counts (90,282 - 4,198 no-sig - 13 corrupted = 86,071 → 85,042 with detections → 182,328 sigs → 168,755 CPA-matched). Table I shows VLM-positive (86,084) and processed-for-extraction (86,071) as separate rows. - Wilson 95% CIs added for joint-rule ICCR rows in Table XXI / methodology table ([0.00011, 0.00018] and [0.00008, 0.00014]). - Unit error fixed: 0.3856 pp / 0.4431 pp → 0.3856 (38.6 pp) / 0.4431 (44.3 pp). Smaller revisions: - Pipeline framing: "detecting" → "screening" in Abstract / Intro / Conclusion for consistency with the unsupervised-screening positioning. - "hard ground-truth subset" → "conservative hard-positive subset" throughout. - §III-F SSIM / pixel-comparison rebuttal compressed from ~15 lines to 4; design-level argument deferred to supplementary materials. - "stakeholders can adopt / can derive thresholds" → "alternative operating points can be characterised by inverting" (less prescriptive). - "the same mechanism extending in milder form to Firms B/C/D" → "similar, milder production-related reuse patterns at Firms B/C/D" (mechanism claim softened). - Appendix A "non-hand-signed mode" / "two-mechanism mixture" lineage language aligned with v4 framing. Appendix B: - Rebuilt as a redirect-only stub. The HTML-commented obsolete table mapping (Table IX–XVIII labels with FAR / capture-rate / validation language) is removed; replaced with a short paragraph pointing to supplementary materials for full table-to-script provenance. Cross-references: - All §III-L references for the rule definition retargeted to §III-H.1; references for calibration still point to §III-L. - §III-H references for byte-level Firm A evidence / non-Big-4 reverse anchor retargeted to §III-H.2. Artefacts: - Combined manuscript regenerated: paper_a_v4_combined.md, 1314 lines (was 1346 pre-review). - Two review handoff documents added: paper/review_handoff_abstract_intro_20260515.md paper/review_handoff_body_20260515.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,361 @@
|
||||
# Review Handoff: Abstract and Introduction
|
||||
|
||||
Date: 2026-05-15
|
||||
Target manuscript: `paper/paper_a_v4_combined.md`
|
||||
Scope reviewed: Abstract and Introduction only
|
||||
|
||||
## Overall Assessment
|
||||
|
||||
The Abstract and Introduction are substantively strong and defensible. The current argument is clear:
|
||||
|
||||
- Regulations require CPA attestation, but digitized PDF workflows make stored-signature reuse operationally easy.
|
||||
- The problem is not signature forgery; identity is not in dispute. The target is detecting possible image-level reproduction by the legitimate signer or firm workflow.
|
||||
- The paper avoids claiming validated forensic detection and instead frames the system as an anchor-calibrated screening framework under unsupervised constraints.
|
||||
- The strongest methodological move is replacing unsupported distributional "natural threshold" logic with anchor-based inter-CPA coincidence-rate (ICCR) calibration.
|
||||
|
||||
Recommended disposition: Minor Revision for prose and narrative complexity, not for core empirical weakness.
|
||||
|
||||
## Main Reviewer Concern
|
||||
|
||||
The Introduction currently explains the methodology shift too explicitly as a research-process or version-history pivot. This is useful internally, but in the submitted paper it may increase complexity and invite reviewers to focus on why earlier versions used a different framing.
|
||||
|
||||
The final manuscript should explain the final methodological choice, not the internal research journey.
|
||||
|
||||
Keep:
|
||||
|
||||
- The descriptor distribution does not support a stable within-population bimodal antimode.
|
||||
- Apparent multimodality is explained by firm composition and integer mass-point artefacts.
|
||||
- Mixture fits are descriptive, not threshold-generating.
|
||||
- Operational rules are characterized using anchor-based ICCR at multiple units.
|
||||
|
||||
Reduce or remove:
|
||||
|
||||
- "Earlier work in this lineage..."
|
||||
- "v4.0 contribution..."
|
||||
- "overturns this reading..."
|
||||
- "inherited Paper A v3.x..."
|
||||
- Internal script-heavy provenance in the Introduction.
|
||||
|
||||
Detailed provenance belongs in Methodology, Results, Appendix, or reproducibility notes, not in the opening narrative.
|
||||
|
||||
## Suggested Rewrite Direction for Introduction Pivot Paragraph
|
||||
|
||||
Current issue location: around `paper/paper_a_v4_combined.md`, Introduction paragraph beginning with "The methodological reframing relative to earlier versions..."
|
||||
|
||||
Recommended replacement direction:
|
||||
|
||||
```text
|
||||
A key empirical finding is that the descriptor distributions do not support a within-population natural threshold. The apparent multimodality in the Big-4 accountant-level distribution is explained by between-firm location shifts and integer mass-point artefacts on the dHash axis. After firm-mean centring and integer-tie jitter, the pooled dHash dip-test rejection disappears. Within-firm diagnostics likewise do not reveal a stable bimodal antimode. We therefore treat mixture fits as descriptive summaries of firm-compositional structure rather than threshold-generating mechanisms, and calibrate the deployed operating rules using inter-CPA coincidence-rate anchors.
|
||||
```
|
||||
|
||||
This preserves the methodological defense while removing the internal v3-to-v4 story.
|
||||
|
||||
## Abstract-Specific Comments
|
||||
|
||||
The Abstract is strong but very dense. It is currently optimized for technical reviewers rather than broad readability. That may be acceptable for IEEE Access, but the first sentence has a small grammar/style issue.
|
||||
|
||||
Suggested edit:
|
||||
|
||||
```text
|
||||
Regulations require Certified Public Accountants (CPAs) to attest each audit report with a signature, but digitization makes it feasible to reuse a stored signature image across reports -- through administrative stamping or firm-level electronic signing -- thereby undermining individualized attestation.
|
||||
```
|
||||
|
||||
Reason:
|
||||
|
||||
- Current wording: "digitization makes reusing ... undermining ..." is grammatically awkward.
|
||||
- The suggested version makes the causal relation explicit.
|
||||
|
||||
No need to remove the final limitation sentence. The sentence "not as a validated forensic detector; no calibrated error rates..." is important and should remain.
|
||||
|
||||
## Introduction-Specific Comments
|
||||
|
||||
### 1. Keep the legal framing but avoid legal overclaiming
|
||||
|
||||
The sentence saying non-hand-signed workflows "may fall within the literal statutory requirement" is acceptable because it is cautious. Do not strengthen it into a legal conclusion.
|
||||
|
||||
Preferred style:
|
||||
|
||||
- "may fall within"
|
||||
- "raises substantive concerns"
|
||||
- "may not represent meaningful individual attestation"
|
||||
|
||||
Avoid:
|
||||
|
||||
- "violates"
|
||||
- "illegal"
|
||||
- "non-compliant"
|
||||
- "fraudulent"
|
||||
|
||||
### 2. Preserve the forgery distinction
|
||||
|
||||
The distinction between non-hand-signing detection and signature forgery detection is one of the strongest conceptual contributions. Keep it prominent.
|
||||
|
||||
Key idea to preserve:
|
||||
|
||||
- Forgery detection asks whether the signer is genuine.
|
||||
- This paper asks whether the signing act was repeated for each document or a stored image was reused.
|
||||
|
||||
### 3. Reduce script/provenance detail in the Introduction
|
||||
|
||||
Current paragraph references scripts such as Script 39c and Script 39d. This makes the Introduction read like an internal review memo.
|
||||
|
||||
Recommendation:
|
||||
|
||||
- Remove or simplify script references from Introduction.
|
||||
- Keep exact script provenance in Methodology, Results, Appendix B, or supplementary material.
|
||||
|
||||
Specific risk:
|
||||
|
||||
- The current parenthetical "10 firms tested in Script 39c" is imprecise for jittered-dHash. Script 39c raw dHash tests reject unimodality; the non-Big-4 jittered-dHash no-rejection statement depends on a codex-verified read-only spike on the same substrate.
|
||||
|
||||
Safer Introduction wording:
|
||||
|
||||
```text
|
||||
Within-firm diagnostics likewise fail to reveal stable bimodal structure after accounting for integer ties, including in eligible mid/small-firm checks.
|
||||
```
|
||||
|
||||
If provenance must remain:
|
||||
|
||||
```text
|
||||
Within-firm signature-level cosine checks fail to reject in eligible firms, and corresponding jittered-dHash checks fail to reject in Big-4 firms and in a read-only spike on the same mid/small-firm substrate.
|
||||
```
|
||||
|
||||
### 4. Avoid presenting the Introduction as a Results section
|
||||
|
||||
The Introduction currently contains many detailed numbers. Some are necessary because the paper is methodological, but the v4 pivot paragraphs are numerically heavy.
|
||||
|
||||
Keep headline numbers:
|
||||
|
||||
- Dataset size: 90,282 reports, 182,328 signatures, 758 CPAs.
|
||||
- Big-4 scope: 437 CPAs, 150,442 signatures.
|
||||
- Key ICCR levels: per-comparison, per-signature, per-document.
|
||||
- Firm heterogeneity: Firm A 0.62 vs Firms B/C/D 0.09-0.16.
|
||||
|
||||
Consider moving or reducing:
|
||||
|
||||
- Full script-specific details.
|
||||
- Too many parenthetical rule semantics in the Introduction.
|
||||
- Repeated mentions of inherited/v3/v4 framing.
|
||||
|
||||
## Recommended Minimum Patch List
|
||||
|
||||
1. Fix Abstract first sentence grammar:
|
||||
|
||||
```text
|
||||
digitization makes it feasible to reuse...
|
||||
```
|
||||
|
||||
2. Rewrite the Introduction paragraph that begins with "The methodological reframing relative to earlier versions..." so it describes the final methodological rationale rather than v3-to-v4 revision history.
|
||||
|
||||
3. Remove or narrow `Script 39c` provenance in the Introduction because the raw vs jittered dHash distinction is subtle and currently risky.
|
||||
|
||||
4. Replace internal-version language across the Introduction:
|
||||
|
||||
- Replace "v4.0 adopts..." with "We adopt..."
|
||||
- Replace "Earlier work in this lineage..." with "A distributional-threshold approach would be inappropriate here because..."
|
||||
- Replace "inherited Paper A v3.x five-way box rule" with "the deployed five-way box rule" unless historical provenance is essential.
|
||||
|
||||
5. Preserve limitation language:
|
||||
|
||||
- The paper should continue to say it is not a validated forensic detector.
|
||||
- The paper should continue to say calibrated error rates cannot be reported without signature-level ground truth.
|
||||
|
||||
## Reviewer Bottom Line
|
||||
|
||||
The paper should not hide that the distributional threshold path failed; that is actually a methodological strength. But it should present this as a final empirical finding and design rationale, not as a visible research-history correction.
|
||||
|
||||
Recommended framing:
|
||||
|
||||
```text
|
||||
Because the observed distribution does not provide a defensible natural threshold, we use ICCR calibration to characterize the deployed operating rules under explicit unsupervised assumptions.
|
||||
```
|
||||
|
||||
This is cleaner, less complex, and more reviewer-facing than the current v3-to-v4 narrative.
|
||||
|
||||
## Additional Framing Issue: Are We Giving Thresholds or Not?
|
||||
|
||||
A likely reviewer confusion point is whether the paper provides a concrete classifier threshold or merely explains why no defensible threshold can be derived.
|
||||
|
||||
The intended answer should be explicit:
|
||||
|
||||
- The paper does provide a concrete, reproducible operational classifier.
|
||||
- The paper does not claim that this classifier is ground-truth-optimal.
|
||||
- The paper does not claim that the operating thresholds are natural antimodes in the descriptor distribution.
|
||||
- The paper's calibration contribution is to characterize the deployed rule's inter-CPA coincidence behavior under unsupervised assumptions.
|
||||
|
||||
Recommended high-level framing:
|
||||
|
||||
```text
|
||||
We use a fixed, pre-specified five-way operating rule. The present calibration does not derive an optimal threshold; instead, it quantifies the rule's inter-CPA coincidence behavior at per-comparison, per-signature, and per-document units under explicit unsupervised assumptions.
|
||||
```
|
||||
|
||||
Chinese interpretation:
|
||||
|
||||
```text
|
||||
我們有一組明確、可重現的五分類操作規則;本文不是宣稱這組門檻是最佳門檻或自然分界點,而是在沒有 signature-level ground truth 的情況下,用 ICCR 量化這組規則的 specificity-proxy 行為。
|
||||
```
|
||||
|
||||
## Concrete Threshold Language to Make Visible
|
||||
|
||||
The manuscript should not bury the actual operating thresholds. Somewhere early in Methodology, and preferably summarized in Introduction, make the rule explicit:
|
||||
|
||||
```text
|
||||
High-confidence non-hand-signed: cosine > 0.95 AND dHash <= 5.
|
||||
Moderate-confidence non-hand-signed: cosine > 0.95 AND 5 < dHash <= 15.
|
||||
Other outcomes follow the fixed five-way box rule.
|
||||
```
|
||||
|
||||
If space allows, add a compact sentence:
|
||||
|
||||
```text
|
||||
Thus, the system has explicit decision rules; what remains uncalibrated in the absence of signature-level labels is their true false-positive and false-negative error rate.
|
||||
```
|
||||
|
||||
This directly answers the reviewer question: "Do the authors actually have a classifier?"
|
||||
|
||||
## Rewrite Style Recommendation
|
||||
|
||||
Avoid language that sounds like the authors are unable to provide thresholds:
|
||||
|
||||
- Avoid: "No threshold can be derived."
|
||||
- Avoid: "The distribution does not support classification."
|
||||
- Avoid: "We cannot determine a threshold."
|
||||
|
||||
Use language that distinguishes operational thresholds from statistically natural or supervised-optimal thresholds:
|
||||
|
||||
- Prefer: "The deployed thresholds are operational rules rather than natural antimodes."
|
||||
- Prefer: "We characterize these rules with ICCR rather than claiming supervised error rates."
|
||||
- Prefer: "The absence of a distributional antimode motivates anchor-based calibration, not threshold-free analysis."
|
||||
- Prefer: "The system is a concrete screening classifier with explicit unsupervised calibration limits."
|
||||
|
||||
## Reviewer-Facing Answer to the Threshold Question
|
||||
|
||||
If the manuscript needs one sentence that resolves the ambiguity, use:
|
||||
|
||||
```text
|
||||
The system therefore uses explicit operating thresholds, but the evidentiary claim attached to those thresholds is limited: they define a reproducible screening rule whose coincidence behavior can be estimated under inter-CPA anchors, not a validated forensic decision boundary with calibrated error rates.
|
||||
```
|
||||
|
||||
This should be the guiding style for Abstract, Introduction, and the start of Methodology.
|
||||
|
||||
## Readability Risk: Too Many Diagnostics Can Look Like Methodological Overbuilding
|
||||
|
||||
The manuscript's multi-method statistical design increases rigor, but it also creates a readability risk. In the current form, some sections may feel like a defensive accumulation of diagnostics rather than a clean research design.
|
||||
|
||||
Reviewer risk:
|
||||
|
||||
- The reader may ask: "Are the authors using many methods because the core classifier is unclear?"
|
||||
- The reader may miss the simple main claim because the paper introduces too many caveats and validation tools early.
|
||||
- The paper may look like "we used many methods, therefore credible" instead of "each method answers one necessary question."
|
||||
|
||||
Recommended main-thread sentence:
|
||||
|
||||
```text
|
||||
We deploy a fixed five-way screening rule and characterize its unsupervised reliability limits using ICCR, after showing that the descriptor distribution does not support a natural threshold.
|
||||
```
|
||||
|
||||
Chinese interpretation:
|
||||
|
||||
```text
|
||||
我們有明確五分類篩檢規則;先證明不能用自然分布切點來當門檻,再用 ICCR 描述這組規則在無標註資料中的可靠性邊界。
|
||||
```
|
||||
|
||||
All methods and diagnostics should serve this main thread.
|
||||
|
||||
## Core vs Supporting Diagnostics
|
||||
|
||||
Treat the following as core and keep them prominent:
|
||||
|
||||
- End-to-end pipeline: VLM -> YOLO -> ResNet -> cosine/dHash.
|
||||
- Explicit five-way operating rule.
|
||||
- Composition decomposition showing why the descriptor distribution does not yield a natural threshold.
|
||||
- ICCR calibration at three units: per-comparison, per-signature, per-document.
|
||||
- Firm heterogeneity and within-firm collision concentration.
|
||||
- Ground-truth limitation and no true error-rate claim.
|
||||
|
||||
Treat the following as supporting diagnostics and avoid letting them dominate the main narrative:
|
||||
|
||||
- K=2 / K=3 mixture fits.
|
||||
- Three-score Spearman convergence.
|
||||
- Leave-one-firm-out reproducibility.
|
||||
- BD/McCrary sensitivity.
|
||||
- Ten-tool validation table.
|
||||
- Pixel-identity positive anchor, especially because it is close to tautological for the high-confidence rule.
|
||||
|
||||
These supporting diagnostics can stay, but they should be framed as robustness checks, assumption checks, or supplementary evidence, not as independent central contributions.
|
||||
|
||||
## Suggested Manuscript Structure for Clarity
|
||||
|
||||
Recommended structure for the Methodology / Results narrative:
|
||||
|
||||
1. Core Method
|
||||
|
||||
Describe the pipeline, descriptor construction, and five-way rule.
|
||||
|
||||
2. Why the Threshold Is Operational Rather Than Natural
|
||||
|
||||
Use the composition decomposition only. Avoid over-explaining K=3, BD/McCrary, or historical mixture logic here.
|
||||
|
||||
3. How the Rule Is Calibrated Without Ground Truth
|
||||
|
||||
Explain ICCR and the three reporting units: per-comparison, per-signature, per-document.
|
||||
|
||||
4. What the Calibration Reveals
|
||||
|
||||
Report firm heterogeneity and within-firm collision concentration.
|
||||
|
||||
5. Supporting Diagnostics
|
||||
|
||||
Place K=3, Spearman convergence, LOOO, BD/McCrary, and pixel-identity checks here as supporting evidence.
|
||||
|
||||
## Rewrite Style for Multi-Method Sections
|
||||
|
||||
Avoid:
|
||||
|
||||
```text
|
||||
We apply a multi-tool validation framework consisting of ten diagnostics...
|
||||
```
|
||||
|
||||
This can sound like methodological stacking.
|
||||
|
||||
Prefer:
|
||||
|
||||
```text
|
||||
Each supporting diagnostic addresses a specific failure mode: composition artefacts, inter-CPA coincidence, pool-size effects, firm heterogeneity, or positive-anchor capture.
|
||||
```
|
||||
|
||||
Avoid:
|
||||
|
||||
```text
|
||||
The conjunction of ten tools constitutes validation...
|
||||
```
|
||||
|
||||
Prefer:
|
||||
|
||||
```text
|
||||
Together, these diagnostics define the limits of what can be supported without signature-level ground truth.
|
||||
```
|
||||
|
||||
Avoid presenting auxiliary diagnostics before the reader understands the classifier.
|
||||
|
||||
Preferred order:
|
||||
|
||||
```text
|
||||
Rule first. Then why not natural threshold. Then ICCR calibration. Then robustness.
|
||||
```
|
||||
|
||||
## Reviewer-Facing Principle
|
||||
|
||||
The paper should not read as:
|
||||
|
||||
```text
|
||||
We used many methods, so the result is credible.
|
||||
```
|
||||
|
||||
It should read as:
|
||||
|
||||
```text
|
||||
We use one explicit screening rule. Each statistical diagnostic answers one necessary question about how that rule should be interpreted under unsupervised constraints.
|
||||
```
|
||||
|
||||
This distinction is important for readability and reviewer trust.
|
||||
Reference in New Issue
Block a user