Paper A v3.12: resolve Gemini 3.1 Pro round-11 full-paper review findings
Round-11 Gemini 3.1 Pro fresh full-paper review (Minor Revision) surfaced four issues that the prior 10 rounds (codex gpt-5.4 x4, codex gpt-5.5 x1, Gemini 3.1 Pro x2, Opus 4.7 x1, paragraph-level v3.11 review) all missed: 1. MAJOR - Percentile-terminology contradiction between Section III-L L290 and Section III-H L160. III-L called 0.95 the "whole-sample Firm A P95" of the per-signature best-match cosine distribution, but III-H states 92.5% of Firm A signatures exceed 0.95. Under standard bottom-up percentile convention this makes 0.95 the P7.5, not the P95; Table XI calibration-fold data (Firm A cosine median = 0.9862, P5 = 0.9407) confirms true P95 is near 0.998. Fix: rewrote III-L L290 to state 0.95 corresponds to approximately the whole-sample Firm A P7.5 with the 92.5%/7.5% complement stated explicitly. dHash P95 claims elsewhere (Table XI, L229/L233) were already correct under standard convention and are unchanged. 2. MINOR - Firm A CPA count inconsistency. Discussion V-C L44 said "Nine additional Firm A CPAs are excluded from the GMM for having fewer than 10 signatures" but Results IV-G.2 L216 defines 178 valid Firm A CPAs (180 registry minus 2 disambiguation-excluded); 178 - 171 = 7. Fix: corrected to "seven are outside the GMM" with explicit 178-baseline and cross-reference to IV-G.2. 3. MINOR - Table XVI mixed-firm handling broken promise. Results L355-356 previously said "mixed-firm reports are reported separately" but Table XVI only lists single-firm rows summing to exactly 83,970, and no subsequent prose reports the 384 mixed-firm agreement rate. Fix: rewrote L355-356 to state Table XVI covers the 83,970 single-firm reports only and that the 384 mixed-firm reports (0.46%) are excluded because firm-level agreement is not well defined when the two signers are at different firms. 4. MINOR - Contribution-count structural inconsistency. Introduction enumerates seven contributions, Conclusion opens with "Our contributions are fourfold." Fix: rewrote the Conclusion lead to "The seven numbered contributions listed in Section I can be grouped into four broader methodological themes," making the grouping explicit. No re-computation. All tables (IV-XVIII) and Appendix A numbers unchanged. Abstract unchanged (still 248/250 words). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -5,7 +5,7 @@
|
|||||||
We have presented an end-to-end AI pipeline for detecting non-hand-signed auditor signatures in financial audit reports at scale.
|
We have presented an end-to-end AI pipeline for detecting non-hand-signed auditor signatures in financial audit reports at scale.
|
||||||
Applied to 90,282 audit reports from Taiwanese publicly listed companies spanning 2013--2023, our system extracted and analyzed 182,328 CPA signatures using a combination of VLM-based page identification, YOLO-based signature detection, deep feature extraction, and dual-descriptor similarity verification, with threshold selection placed on a statistically principled footing through two methodologically distinct threshold estimators and a density-smoothness diagnostic applied at two analysis levels.
|
Applied to 90,282 audit reports from Taiwanese publicly listed companies spanning 2013--2023, our system extracted and analyzed 182,328 CPA signatures using a combination of VLM-based page identification, YOLO-based signature detection, deep feature extraction, and dual-descriptor similarity verification, with threshold selection placed on a statistically principled footing through two methodologically distinct threshold estimators and a density-smoothness diagnostic applied at two analysis levels.
|
||||||
|
|
||||||
Our contributions are fourfold.
|
The seven numbered contributions listed in Section I can be grouped into four broader methodological themes, summarized below.
|
||||||
|
|
||||||
First, we argued that non-hand-signing detection is a distinct problem from signature forgery detection, requiring analytical tools focused on the upper tail of intra-signer similarity rather than inter-signer discriminability.
|
First, we argued that non-hand-signing detection is a distinct problem from signature forgery detection, requiring analytical tools focused on the upper tail of intra-signer similarity rather than inter-signer discriminability.
|
||||||
|
|
||||||
|
|||||||
@@ -41,7 +41,7 @@ Three convergent strands of evidence support the replication-dominated framing.
|
|||||||
First, the visual-inspection evidence: randomly sampled Firm A reports exhibit pixel-identical signature images across different audit engagements and fiscal years for the majority of partners---a physical impossibility under independent hand-signing events.
|
First, the visual-inspection evidence: randomly sampled Firm A reports exhibit pixel-identical signature images across different audit engagements and fiscal years for the majority of partners---a physical impossibility under independent hand-signing events.
|
||||||
Second, the signature-level statistical evidence: Firm A's per-signature cosine distribution is unimodal long-tail rather than a tight single peak; 92.5% of Firm A signatures exceed cosine 0.95, with the remaining 7.5% forming the left tail.
|
Second, the signature-level statistical evidence: Firm A's per-signature cosine distribution is unimodal long-tail rather than a tight single peak; 92.5% of Firm A signatures exceed cosine 0.95, with the remaining 7.5% forming the left tail.
|
||||||
Third, the accountant-level evidence: of the 171 Firm A CPAs with enough signatures ($\geq 10$) to enter the accountant-level GMM, 32 (19%) fall into the middle-band C2 cluster rather than the high-replication C1 cluster---consistent with within-firm heterogeneity in signing practice (spanning a minority of hand-signers, CPAs undergoing mid-sample mechanism transitions, and CPAs whose pooled coordinates reflect mixed-quality replication) rather than a pure replication population.
|
Third, the accountant-level evidence: of the 171 Firm A CPAs with enough signatures ($\geq 10$) to enter the accountant-level GMM, 32 (19%) fall into the middle-band C2 cluster rather than the high-replication C1 cluster---consistent with within-firm heterogeneity in signing practice (spanning a minority of hand-signers, CPAs undergoing mid-sample mechanism transitions, and CPAs whose pooled coordinates reflect mixed-quality replication) rather than a pure replication population.
|
||||||
Nine additional Firm A CPAs are excluded from the GMM for having fewer than 10 signatures, so we cannot place them in a cluster from the cross-sectional analysis alone.
|
Of the 178 valid Firm A CPAs (the 180 registered CPAs minus two excluded for disambiguation ties in the registry; Section IV-G.2), seven are outside the GMM for having fewer than 10 signatures, so we cannot place them in a cluster from the cross-sectional analysis alone.
|
||||||
The held-out Firm A 70/30 validation (Section IV-G.2) gives capture rates on a non-calibration Firm A subset that sit in the same replication-dominated regime as the calibration fold across the full range of operating rules (extreme rules are statistically indistinguishable; operational rules in the 85–95% band differ between folds by 1–5 percentage points, reflecting within-Firm-A heterogeneity in replication intensity rather than a generalization failure).
|
The held-out Firm A 70/30 validation (Section IV-G.2) gives capture rates on a non-calibration Firm A subset that sit in the same replication-dominated regime as the calibration fold across the full range of operating rules (extreme rules are statistically indistinguishable; operational rules in the 85–95% band differ between folds by 1–5 percentage points, reflecting within-Firm-A heterogeneity in replication intensity rather than a generalization failure).
|
||||||
The accountant-level GMM (Section IV-E) and the threshold-independent partner-ranking analysis (Section IV-H.2) are the cross-checks that are robust to fold-level sampling variance.
|
The accountant-level GMM (Section IV-E) and the threshold-independent partner-ranking analysis (Section IV-H.2) are the cross-checks that are robust to fold-level sampling variance.
|
||||||
|
|
||||||
|
|||||||
@@ -287,7 +287,7 @@ High feature-level similarity without structural corroboration---consistent with
|
|||||||
5. **Likely hand-signed:** Cosine below the all-pairs KDE crossover threshold.
|
5. **Likely hand-signed:** Cosine below the all-pairs KDE crossover threshold.
|
||||||
|
|
||||||
We note three conventions about the thresholds.
|
We note three conventions about the thresholds.
|
||||||
First, the cosine cutoff $0.95$ is the whole-sample Firm A P95 of the per-signature best-match cosine distribution (chosen for its transparent percentile interpretation in the whole-sample reference distribution), and the cosine crossover $0.837$ is the all-pairs intra/inter KDE crossover; both are derived from whole-sample distributions rather than from the 70% calibration fold, so the classifier inherits its operational cosine cuts from the whole-sample Firm A and all-pairs distributions.
|
First, the cosine cutoff $0.95$ corresponds to approximately the whole-sample Firm A P7.5 of the per-signature best-match cosine distribution---that is, 92.5% of whole-sample Firm A signatures exceed this cutoff and 7.5% fall at or below it (Section III-H)---chosen as a round-number lower-tail boundary whose complement (92.5% above) has a transparent interpretation in the whole-sample reference distribution; the cosine crossover $0.837$ is the all-pairs intra/inter KDE crossover; both are derived from whole-sample distributions rather than from the 70% calibration fold, so the classifier inherits its operational cosine cuts from the whole-sample Firm A and all-pairs distributions.
|
||||||
Section IV-G.2 reports both calibration-fold and held-out-fold capture rates for this classifier so that fold-level sampling variance is visible.
|
Section IV-G.2 reports both calibration-fold and held-out-fold capture rates for this classifier so that fold-level sampling variance is visible.
|
||||||
Second, the dHash cutoffs $\leq 5$ and $> 15$ are chosen from the whole-sample Firm A $\text{dHash}_\text{indep}$ distribution: $\leq 5$ captures the upper tail of the high-similarity mode (whole-sample Firm A median $\text{dHash}_\text{indep} = 2$, P75 $\approx 4$, so $\leq 5$ is the band immediately above median), while $> 15$ marks the regime in which independent-minimum structural similarity is no longer indicative of image reproduction.
|
Second, the dHash cutoffs $\leq 5$ and $> 15$ are chosen from the whole-sample Firm A $\text{dHash}_\text{indep}$ distribution: $\leq 5$ captures the upper tail of the high-similarity mode (whole-sample Firm A median $\text{dHash}_\text{indep} = 2$, P75 $\approx 4$, so $\leq 5$ is the band immediately above median), while $> 15$ marks the regime in which independent-minimum structural similarity is no longer indicative of image reproduction.
|
||||||
Third, the three accountant-level 1D estimators (KDE antimode $0.973$, Beta-2 crossing $0.979$, logit-GMM-2 crossing $0.976$) and the accountant-level 2D GMM marginal ($0.945$) are *not* the operational thresholds of this classifier: they are the *convergent external reference* that supports the choice of signature-level operational cut.
|
Third, the three accountant-level 1D estimators (KDE antimode $0.973$, Beta-2 crossing $0.979$, logit-GMM-2 crossing $0.976$) and the accountant-level 2D GMM marginal ($0.945$) are *not* the operational thresholds of this classifier: they are the *convergent external reference* that supports the choice of signature-level operational cut.
|
||||||
|
|||||||
@@ -352,8 +352,8 @@ Taiwanese statutory audit reports are co-signed by two engagement partners (a pr
|
|||||||
Under firm-wide stamping practice at a given firm, both signers on the same report should receive the same signature-level classification.
|
Under firm-wide stamping practice at a given firm, both signers on the same report should receive the same signature-level classification.
|
||||||
Disagreement between the two signers on a report is informative about whether the stamping practice is firm-wide or partner-specific.
|
Disagreement between the two signers on a report is informative about whether the stamping practice is firm-wide or partner-specific.
|
||||||
|
|
||||||
For each report with exactly two signatures and complete per-signature data (83,970 reports assigned to a single firm, plus 384 reports with one signer per firm in the mixed-firm buckets for 84,354 total), we classify each signature using the dual-descriptor rules of Section III-L and record whether the two classifications agree.
|
For each report with exactly two signatures and complete per-signature data (84,354 reports total: 83,970 single-firm reports, in which both signers are at the same firm, and 384 mixed-firm reports, in which the two signers are at different firms), we classify each signature using the dual-descriptor rules of Section III-L and record whether the two classifications agree.
|
||||||
Table XVI reports per-firm intra-report agreement (firm-assignment defined by the firm identity of both signers; mixed-firm reports are reported separately).
|
Table XVI reports per-firm intra-report agreement for the 83,970 single-firm reports only (firm-assignment defined by the common firm identity of both signers); the 384 mixed-firm reports (0.46% of the 2-signature corpus) are excluded from the intra-report analysis because firm-level agreement is not well defined when the two signers are at different firms.
|
||||||
|
|
||||||
<!-- TABLE XVI: Intra-Report Classification Agreement by Firm
|
<!-- TABLE XVI: Intra-Report Classification Agreement by Firm
|
||||||
| Firm | Total 2-signer reports | Both non-hand-signed | Both uncertain | Both style | Both hand-signed | Mixed | Agreement rate |
|
| Firm | Total 2-signer reports | Both non-hand-signed | Both uncertain | Both style | Both hand-signed | Mixed | Agreement rate |
|
||||||
|
|||||||
Reference in New Issue
Block a user