Paper A v3.8: resolve Gemini 3.1 Pro round-6 independent-review findings
Gemini round-6 (paper/gemini_review_v3_7.md) gave Minor Revision but
flagged three issues that five rounds of codex review had missed.
This commit addresses all three.
BLOCKER: Accountant-level BD/McCrary null is a power artifact, not
proof of smoothness (Gemini Issue 1)
- At N=686 accountants the BD/McCrary test has limited statistical
power; interpreting a failure-to-reject as affirmative proof of
smoothness is a Type II error risk.
- Discussion V-B: "itself diagnostic of smoothness" replaced with
"failure-to-reject rather than a failure of the method ---
informative alongside the other evidence but subject to the power
caveat in Section V-G".
- Discussion V-G (Sixth limitation): added a power-aware paragraph
naming N=686 explicitly and clarifying that the substantive claim
of smoothly-mixed clustering rests on the JOINT weight of dip
test + BIC-selected GMM + BD null, not on BD alone.
- Results IV-D.1 and IV-E: reframe accountant-level null as
"consistent with --- not affirmative proof of" clustered-but-
smoothly-mixed, citing V-G for the power caveat.
- Appendix A interpretation paragraph: explicit inferential-asymmetry
sentence ("consistency is what the BD null delivers, not
affirmative proof"); "itself evidence for" removed.
- Conclusion: "consistent with clustered but smoothly mixed"
rephrased with explicit power caveat ("at N = 686 the test has
limited power and cannot affirmatively establish smoothness").
MAJOR: Table X FRR / EER was tautological reviewer-bait
(Gemini Issue 2)
- Byte-identical positive anchor has cosine approx 1 by construction,
so FRR against that subset is trivially 0 at every threshold
below 1 and any EER calculation is arithmetic tautology, not
biometric performance.
- Results IV-G.1: removed EER row; dropped FRR column from Table X;
added a table note explaining the omission and directing readers
to Section V-F for the conservative-subset discussion.
- Methodology III-K: removed the EER / FRR-against-byte-identical
reporting clause; clarified that FAR against inter-CPA negatives
is the primary reported quantity.
- Table X is now FAR + Wilson 95% CI only, which is the quantity
that actually carries empirical content on this anchor design.
MINOR: Document-level worst-case aggregation narrative (Gemini
Issue 3) + 15-signature delta (Gemini spot-check)
- Results IV-I: added two sentences explicitly noting that the
document-level percentages reflect the Section III-L worst-case
aggregation rule (a report with one stamped + one hand-signed
signature inherits the most-replication-consistent label), and
cross-referencing Section IV-H.3 / Table XVI for the mixed-report
composition that qualifies the headline percentages.
- Results IV-D: added a one-sentence footnote explaining that the
15-signature delta between the Table III CPA-matched count
(168,755) and the all-pairs analyzed count (168,740) is due to
CPAs with exactly one signature, for whom no same-CPA pairwise
best-match statistic exists.
Abstract remains 243 words, comfortably under the IEEE Access
250-word cap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,120 @@
|
|||||||
|
# Independent Peer Review: Paper A (v3.7)
|
||||||
|
|
||||||
|
**Target Venue:** IEEE Access (Regular Paper)
|
||||||
|
**Date:** April 21, 2026
|
||||||
|
**Reviewer:** Gemini CLI (6th Round Independent Review)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Overall Verdict
|
||||||
|
|
||||||
|
**Verdict: Minor Revision**
|
||||||
|
|
||||||
|
**Rationale:**
|
||||||
|
The manuscript presents a methodologically rigorous, highly sophisticated, and large-scale empirical analysis of non-hand-signed auditor signatures. Analyzing over 180,000 signatures from 90,282 audit reports is an impressive feat, and the pipeline architecture combining VLM prescreening, YOLO detection, and ResNet-50 feature extraction is fundamentally sound. The utilization of a "replication-dominated" calibration strategy—validated across both intra-firm consistency metrics and held-out cross-validation folds—represents a significant contribution to document forensics where ground-truth labeling is scarce and expensive. Furthermore, the dual-descriptor approach (using cosine similarity for semantic features and dHash for structural features) effectively resolves the ambiguity between stylistic consistency and mechanical image reproduction. The demotion of the Burgstahler-Dichev / McCrary (BD/McCrary) test to a density-smoothness diagnostic, supported by the new Appendix A, is analytically correct.
|
||||||
|
|
||||||
|
However, approaching this manuscript with a fresh perspective reveals three distinct methodological blind spots that previous review rounds missed. Specifically, the manuscript commits a statistical overclaim regarding the statistical power of the BD/McCrary test at the accountant level, it presents a mathematically tautological False Rejection Rate (FRR) evaluation that borders on reviewer-bait, and it lacks narrative guardrails around its document-level aggregation metrics. Resolving these localized issues will not alter the paper's conclusions but will significantly harden the manuscript against aggressive peer review, making it fully submission-ready for IEEE Access.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Scientific Soundness Audit
|
||||||
|
|
||||||
|
### Three-Level Framework Coherence
|
||||||
|
The separation of the analysis into signature-level, accountant-level, and auditor-year units is intellectually rigorous and highly defensible. By strictly separating the *pixel-level output quality* (signature level) from the *aggregate behavioral regime* (accountant level), the authors successfully avoid the ecological fallacy of assuming that because an individual practitioner acts in a binary fashion (hand-signing vs. stamping), the aggregate distribution of signature pixels must be neatly bimodal. The evidence compellingly demonstrates that the data forms a continuous quality degradation spectrum at the pixel level.
|
||||||
|
|
||||||
|
### Firm A 'Replication-Dominated' Framing
|
||||||
|
This is perhaps the strongest conceptual pillar of the paper. Assuming that Firm A acts as a "pure" positive class would inevitably force the thresholding model to interpret the long left tail of the cosine distribution as algorithmic noise or pipeline error. The explicit validation of Firm A as "replication-dominated but not pure"—quantified elegantly by the 139/32 split between high-replication and middle-band clusters in the accountant-level Gaussian Mixture Model (Section IV-E)—logically resolves the 92.5% capture rate without overclaiming. It is a highly defensible stance.
|
||||||
|
|
||||||
|
### BD/McCrary Demotion
|
||||||
|
Moving the BD/McCrary test from a co-equal threshold estimator to a "density-smoothness diagnostic" is the correct scientific decision. Appendix A empirically demonstrates that the test behaves exactly as one would expect when applied to a large ($N > 60,000$), smooth, heavy-tailed distribution: it detects localized non-linearities caused by histogram binning resolution rather than true mechanistic discontinuities. The theoretical tension is resolved by this demotion.
|
||||||
|
|
||||||
|
### Statistical Choices
|
||||||
|
The statistical foundations of the paper are appropriate and well-applied:
|
||||||
|
* **Beta/Logit-Gaussian Mixtures:** Fitting Beta mixtures via the EM algorithm is perfectly suited for bounded cosine similarity data $[0,1]$, and the logit-Gaussian cross-check serves as an excellent robustness measure against parametric misspecification.
|
||||||
|
* **Hartigan Dip Test:** The use of the dip test provides a rigorous, non-parametric verification of unimodality/multimodality.
|
||||||
|
* **Wilson Confidence Intervals:** Utilizing Wilson score intervals for the held-out validation metrics (Table XI) correctly models binomial variance, preventing zero-bound confidence interval collapse.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Numerical Consistency Cross-Check
|
||||||
|
|
||||||
|
An exhaustive spot-check of the manuscript’s arithmetic, table values, and cited numbers reveals a practically flawless internal consistency. The scripts supporting the pipeline operate exactly as claimed.
|
||||||
|
|
||||||
|
* **Table VIII:** The reported accountant-level threshold band (KDE antimode: 0.973, Beta-2: 0.979, logit-GMM-2: 0.976) matches the narrative text precisely.
|
||||||
|
* **Table IX:** The proportion of Firm A captures under the dual rule ($54,370 / 60,448 = 89.945\%$) correctly rounds to the reported $89.95\%$.
|
||||||
|
* **Table XI:** The calibration fold's operational dual rule yields $40,335 / 45,116 = 89.402\%$ (reported $89.40\%$), and the held-out fold yields $14,035 / 15,332 = 91.540\%$ (reported $91.54\%$).
|
||||||
|
* **Table XII:** The column sums for $N = 168,740$ match perfectly. Furthermore, the delta column balances precisely to zero ($+2,294 + 6,095 + 119 - 8,508 + 0 = 0$).
|
||||||
|
* **Table XIV:** Top 10% Firm A occupancy is $443 / 462 = 95.88\%$ (reported $95.9\%$), against a baseline of $1,287 / 4,629 = 27.80\%$ (reported $27.8\%$).
|
||||||
|
* **Table XVI:** Firm A's intra-report agreement is correctly calculated as $(26,435 + 734 + 4) / 30,222 = 89.91\%$.
|
||||||
|
|
||||||
|
**Minor Narrative Clarification Required:**
|
||||||
|
In Table III, total extracted signatures are reported as $182,328$, with $168,755$ successfully matched to CPAs. However, Table V and Table XII utilize $N = 168,740$ signatures for the all-pairs best-match analysis. This delta of $15$ signatures is mathematically implied by CPAs who possess exactly *one* signature in the entire database, rendering a "same-CPA pairwise comparison" impossible. While logically sound to anyone analyzing the pipeline closely, this microscopic $15$-signature discrepancy is exactly the kind of arithmetic artifact that distracts meticulous reviewers.
|
||||||
|
*Recommendation:* Add a one-sentence footnote or parenthetical to Section IV-D explicitly stating this $15$-signature delta is due to single-signature CPAs lacking a pairwise match.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Appendix A Validity
|
||||||
|
|
||||||
|
The addition of Appendix A successfully and empirically justifies the main-text demotion of the BD/McCrary test.
|
||||||
|
|
||||||
|
**Strengths:**
|
||||||
|
The argument demonstrating that the BD/McCrary transitions drift monotonically with bin width (e.g., Firm A cosine drifting across 0.987 $\rightarrow$ 0.985 $\rightarrow$ 0.980 $\rightarrow$ 0.975) is brilliant. Coupled with the observation that the Z-statistics inflate superlinearly with bin width (from $|Z| \sim 9$ at bin 0.003 to $|Z| \sim 106$ at bin 0.015), the appendix irrefutably proves that the test is interacting with the local curvature of a heavily-populated continuous distribution rather than identifying a discrete, mechanistic boundary. Table A.I is arithmetically consistent with the script's logic.
|
||||||
|
|
||||||
|
**Weaknesses:**
|
||||||
|
The interpretation paragraph overstates the implications of the accountant-level null finding. It claims that the lack of a transition at the accountant level ($N=686$) is a "robust finding that survives the bin-width sweep." As detailed in Section 6 below, a non-finding surviving a bin-width sweep in a small sample is largely a function of low statistical power, not definitive proof of a smoothly-mixed boundary.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. IEEE Access Submission Readiness
|
||||||
|
|
||||||
|
The manuscript is in excellent shape for submission to IEEE Access.
|
||||||
|
* **Scope Fit:** High. The paper sits perfectly at the intersection of applied AI, document forensics, and interdisciplinary data science, which is a core demographic for IEEE Access.
|
||||||
|
* **Abstract Length:** The abstract is approximately 234 words, comfortably satisfying the stringent $\leq 250$ word limit requirement.
|
||||||
|
* **Formatting & Structure:** The document adheres to standard IEEE double-column formatting conventions (Roman numeral sections, appropriate table/figure references).
|
||||||
|
* **Anonymization:** Properly handled. Author placeholders, affiliation blocks, and correspondence emails are appropriately bracketed for single-anonymized peer review.
|
||||||
|
* **Desk-Return Risks:** Very low. The inclusion of the ablation study (Table XVIII) and explicit baseline comparisons ensures the paper meets the journal's expectations for methodological validation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Novel Issues and Methodological Blind Spots
|
||||||
|
|
||||||
|
While the previous review rounds improved the manuscript significantly, habituation has allowed three specific narrative and statistical blind spots to persist. These are prime targets for reviewer pushback.
|
||||||
|
|
||||||
|
### Issue 1: The Accountant-Level BD/McCrary Null is a Power Artifact, not Proof of Smoothness
|
||||||
|
In Section V-B and Appendix A, the authors claim that because the BD/McCrary test yields no significant transition at the accountant level, this "pattern is consistent with a clustered but smoothly mixed accountant-level distribution." Furthermore, Section V-B states that this non-transition is "itself diagnostic of smoothness rather than a failure of the method."
|
||||||
|
|
||||||
|
**The Critique:** The McCrary (2008) test relies on local linear regression smoothing. The variance of the estimator scales inversely with $N \cdot h$ (where $h$ is the bin width). With a sample size of only $N=686$ accountants, the test is severely underpowered and lacks the statistical capacity to reject the null of smoothness unless the discontinuity is an absolute, sheer cliff. Asserting that a failure to reject the null affirmatively *proves* the null is true (smoothness) is a fundamental statistical fallacy (Type II error risk).
|
||||||
|
*Impact:* Statistically literate reviewers will immediately flag this as an overclaim. The demotion of the test to a diagnostic is correct, but interpreting the null at $N=686$ as definitive proof of smoothness is flawed.
|
||||||
|
|
||||||
|
### Issue 2: Tautological Presentation of FRR and EER (Table X)
|
||||||
|
Table X presents a False Rejection Rate (FRR) computed against a "byte-identical" positive anchor. It reports an FRR of $0.000$ for thresholds like 0.95 and 0.973, and subsequently reports an Equal Error Rate (EER) of $\approx 0$ at cosine = 0.990.
|
||||||
|
|
||||||
|
**The Critique:** By definition, byte-identical signatures have a cosine similarity asymptotically approaching 1.0 (modulo minor float/cropping artifacts). Evaluating a similarity threshold of 0.95 against inputs that are mathematically defined to score near 1.0 yields a 0% FRR trivially. It is a tautology. While the text in Section V-F attempts to caveat this ("perfect recall against this subset therefore does not generalize"), presenting it as a formal column in Table X with an EER calculation treats it as a standard biometric evaluation. There are no crossing error distributions here to warrant an EER.
|
||||||
|
*Impact:* This is reviewer-bait. Reviewers from the biometric or forensics domains will argue that an EER of 0 is artificially constructed. The true scientific value of Table X is purely the empirical False Acceptance Rate (FAR) derived from the 50,000 inter-CPA negatives.
|
||||||
|
|
||||||
|
### Issue 3: Document-Level Worst-Case Aggregation Narrative
|
||||||
|
Section IV-I reports that 35.0% of documents are classified as "High-confidence non-hand-signed" and 43.8% as "Moderate-confidence." This relies on the worst-case rule defined in Section III-L (if one signature on a dual-signed report is stamped, the whole document inherits that label).
|
||||||
|
|
||||||
|
**The Critique:** While this "worst-case" aggregation is highly practical for building an operational regulatory auditing tool (flagging the report for review), the narrative in IV-I presents these percentages without reminding the reader that a document might contain a mix of genuine and stamped signatures. Without immediate context, stating that nearly 80% of the market's reports are non-hand-signed invites the ecological fallacy that *both* partners are stamping.
|
||||||
|
*Impact:* A brief narrative safeguard is missing. Section IV-I must briefly cross-reference the intra-report agreement findings (Table XVI) to remind the reader of the composition of these documents, mitigating the risk that the reader misinterprets the document-level severity.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Final Recommendation and v3.8 Action Items
|
||||||
|
|
||||||
|
The manuscript is exceptionally strong but requires a few surgical narrative adjustments to remove reviewer-bait and statistical overclaims. I recommend a **Minor Revision** encompassing the following ranked action items.
|
||||||
|
|
||||||
|
### BLOCKER (Must Fix for Submission)
|
||||||
|
1. **Revise the interpretation of the accountant-level BD/McCrary null.**
|
||||||
|
* *Action:* In Section V-B, Section VI (Conclusion), and Appendix A, remove any explicit claims that the null affirmatively proves "smoothly mixed" boundaries.
|
||||||
|
* *Replacement Phrasing:* Reframe this finding to acknowledge statistical power. For example: *"We fail to find evidence of a discontinuity at the accountant level. While this is consistent with smoothly mixed clusters, it also reflects the limited statistical power of the BD/McCrary test at smaller sample sizes ($N=686$), reinforcing its role as a diagnostic rather than a definitive estimator."*
|
||||||
|
|
||||||
|
### MAJOR (Highly Recommended to Prevent Desk-Reject/Major Revision)
|
||||||
|
2. **Reframe Table X to eliminate the tautological FRR/EER presentation.**
|
||||||
|
* *Action:* Remove the Equal Error Rate (EER) calculation entirely. Add an explicit, prominent table note to Table X stating that FRR is computed against a definitionally extreme subset (byte-identical signatures), making the $0.000$ values an expected mathematical boundary check rather than an empirical discovery of real-world recall. Emphasize that the primary contribution of Table X is the FAR evaluation against the large inter-CPA negative anchor.
|
||||||
|
|
||||||
|
### MINOR (Quick Wins for Readability and Precision)
|
||||||
|
3. **Contextualize the Document-Level Aggregation (Section IV-I).**
|
||||||
|
* *Action:* When presenting the 35.0% / 43.8% document-level figures in Section IV-I, explicitly remind the reader of the worst-case aggregation rule. Add a single sentence cross-referencing Table XVI's mixed-report rates to ensure the reader understands the internal composition of these flagged documents.
|
||||||
|
4. **Clarify the 15-Signature Delta (Section IV-D / Table XII).**
|
||||||
|
* *Action:* Add a one-sentence clarification explaining that the delta between the 168,755 CPA-matched signatures (Table III) and the 168,740 signatures analyzed in the all-pairs distributions (Table V/Table XII) consists of CPAs who have exactly one signature in the corpus, making intra-CPA pairwise comparison impossible. This will preempt arithmetic nitpicking by reviewers.
|
||||||
@@ -34,10 +34,12 @@ The $Z$ statistics also inflate superlinearly with the bin width (Firm A cosine
|
|||||||
Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.
|
Both features are characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity.
|
||||||
|
|
||||||
Second, at the accountant level---the unit we rely on for primary threshold inference (Sections III-H, III-J, IV-E)---the procedure produces no significant transition at two of three cosine bin widths and two of three dHash bin widths, and the one marginal transition it does produce ($Z_\text{below} = -2.00$ in the dHash sweep at bin width $1.0$) sits exactly at the critical value for $\alpha = 0.05$.
|
Second, at the accountant level---the unit we rely on for primary threshold inference (Sections III-H, III-J, IV-E)---the procedure produces no significant transition at two of three cosine bin widths and two of three dHash bin widths, and the one marginal transition it does produce ($Z_\text{below} = -2.00$ in the dHash sweep at bin width $1.0$) sits exactly at the critical value for $\alpha = 0.05$.
|
||||||
This pattern is itself informative: it is consistent with *clustered-but-smoothly-mixed* accountant-level aggregates, in which the between-cluster boundary is gradual enough that a discontinuity-based test cannot reject the smoothness null at conventional significance.
|
We stress the inferential asymmetry here: *consistency* with smoothly-mixed clustering is what the BD null delivers, not *affirmative proof* of smoothness.
|
||||||
|
At $N = 686$ accountants the BD/McCrary test has limited statistical power and can typically reject only sharp cliff-type discontinuities; failure to reject the smoothness null therefore constrains the data only to distributions whose between-cluster transitions are gradual *enough* to escape the test's sensitivity at that sample size.
|
||||||
|
We read this as reinforcing---not establishing---the clustered-but-smoothly-mixed interpretation derived from the GMM fit and the dip-test evidence.
|
||||||
|
|
||||||
Taken together, Table A.I shows (i) that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes---and (ii) that the accountant-level BD/McCrary null is a robust finding that survives the bin-width sweep.
|
Taken together, Table A.I shows (i) that the signature-level BD/McCrary transitions are not a threshold in the usual sense---they are histogram-resolution-dependent local density anomalies located *inside* the non-hand-signed mode rather than between modes---and (ii) that the accountant-level BD/McCrary null persists across the bin-width sweep, consistent with but not alone sufficient to establish the clustered-but-smoothly-mixed interpretation discussed in Section V-B and limitation-caveated in Section V-G.
|
||||||
Both observations support the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator.
|
Both observations support the main-text decision to use BD/McCrary as a density-smoothness diagnostic rather than as a threshold estimator.
|
||||||
The accountant-level threshold band reported in Table VIII ($\text{cosine} \approx 0.975$ from the convergence of the KDE antimode, the Beta-2 crossing, and the logit-GMM-2 crossing) is therefore not adjusted to include any BD/McCrary location, and the absence of a BD transition at the accountant level is reported as itself evidence for the clustered-but-smooth interpretation in Section V-B.
|
The accountant-level threshold band reported in Table VIII ($\text{cosine} \approx 0.975$ from the convergence of the KDE antimode, the Beta-2 crossing, and the logit-GMM-2 crossing) is therefore not adjusted to include any BD/McCrary location.
|
||||||
|
|
||||||
Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials (`reports/bd_sensitivity/bd_sensitivity.json`) produced by `signature_analysis/25_bd_mccrary_sensitivity.py`.
|
Raw per-bin $Z$ sequences and $p$-values for every (variant, bin-width) panel are available in the supplementary materials (`reports/bd_sensitivity/bd_sensitivity.json`) produced by `signature_analysis/25_bd_mccrary_sensitivity.py`.
|
||||||
|
|||||||
@@ -13,8 +13,8 @@ Second, we showed that combining cosine similarity of deep embeddings with diffe
|
|||||||
|
|
||||||
Third, we introduced a convergent threshold framework combining two methodologically distinct estimators---KDE antimode (with a Hartigan unimodality test) and an EM-fitted Beta mixture (with a logit-Gaussian robustness check)---together with a Burgstahler-Dichev / McCrary density-smoothness diagnostic.
|
Third, we introduced a convergent threshold framework combining two methodologically distinct estimators---KDE antimode (with a Hartigan unimodality test) and an EM-fitted Beta mixture (with a logit-Gaussian robustness check)---together with a Burgstahler-Dichev / McCrary density-smoothness diagnostic.
|
||||||
Applied at both the signature and accountant levels, this framework surfaced an informative structural asymmetry: at the per-signature level the distribution is a continuous quality spectrum for which no two-mechanism mixture provides a good fit, whereas at the per-accountant level BIC cleanly selects a three-component mixture and the KDE antimode together with the Beta-mixture and logit-Gaussian estimators agree within $\sim 0.006$ at cosine $\approx 0.975$.
|
Applied at both the signature and accountant levels, this framework surfaced an informative structural asymmetry: at the per-signature level the distribution is a continuous quality spectrum for which no two-mechanism mixture provides a good fit, whereas at the per-accountant level BIC cleanly selects a three-component mixture and the KDE antimode together with the Beta-mixture and logit-Gaussian estimators agree within $\sim 0.006$ at cosine $\approx 0.975$.
|
||||||
The Burgstahler-Dichev / McCrary test, by contrast, finds no significant transition at the accountant level, consistent with clustered but smoothly mixed rather than sharply discrete accountant-level heterogeneity.
|
The Burgstahler-Dichev / McCrary test, by contrast, finds no significant transition at the accountant level; at $N = 686$ accountants the test has limited power and cannot affirmatively establish smoothness, but its non-transition is consistent with the smoothly-mixed cluster boundaries implied by the accountant-level GMM.
|
||||||
The substantive reading is therefore narrower than "discrete behavior": *pixel-level output quality* is continuous and heavy-tailed, and *accountant-level aggregate behavior* is clustered with smooth cluster boundaries.
|
The substantive reading is therefore narrower than "discrete behavior": *pixel-level output quality* is continuous and heavy-tailed, and *accountant-level aggregate behavior* is clustered into three recognizable groups whose inter-cluster boundaries are gradual rather than sharp.
|
||||||
|
|
||||||
Fourth, we introduced a *replication-dominated* calibration methodology---explicitly distinguishing replication-dominated from replication-pure calibration anchors and validating classification against a byte-level pixel-identity anchor (310 byte-identical signatures) paired with a $\sim$50,000-pair inter-CPA negative anchor.
|
Fourth, we introduced a *replication-dominated* calibration methodology---explicitly distinguishing replication-dominated from replication-pure calibration anchors and validating classification against a byte-level pixel-identity anchor (310 byte-identical signatures) paired with a $\sim$50,000-pair inter-CPA negative anchor.
|
||||||
To document the within-firm sampling variance of using the calibration firm as its own validation reference, we split the firm's CPAs 70/30 at the CPA level and report capture rates on both folds with Wilson 95% confidence intervals; extreme rules agree across folds while rules in the operational 85-95% capture band differ by 1-5 percentage points, reflecting within-firm heterogeneity in replication intensity rather than generalization failure.
|
To document the within-firm sampling variance of using the calibration firm as its own validation reference, we split the firm's CPAs 70/30 at the CPA level and report capture rates on both folds with Wilson 95% confidence intervals; extreme rules agree across folds while rules in the operational 85-95% capture band differ by 1-5 percentage points, reflecting within-firm heterogeneity in replication intensity rather than generalization failure.
|
||||||
|
|||||||
@@ -25,11 +25,12 @@ Replication quality varies continuously with scan equipment, PDF compression, st
|
|||||||
At the per-accountant aggregate level the picture partly reverses.
|
At the per-accountant aggregate level the picture partly reverses.
|
||||||
The distribution of per-accountant mean cosine (and mean dHash) rejects unimodality, a BIC-selected three-component Gaussian mixture cleanly separates (C1) a high-replication cluster dominated by Firm A, (C2) a middle band shared by the other Big-4 firms, and (C3) a hand-signed-tendency cluster dominated by smaller domestic firms, and the three 1D threshold methods applied at the accountant level produce mutually consistent estimates (KDE antimode $= 0.973$, Beta-2 crossing $= 0.979$, logit-GMM-2 crossing $= 0.976$).
|
The distribution of per-accountant mean cosine (and mean dHash) rejects unimodality, a BIC-selected three-component Gaussian mixture cleanly separates (C1) a high-replication cluster dominated by Firm A, (C2) a middle band shared by the other Big-4 firms, and (C3) a hand-signed-tendency cluster dominated by smaller domestic firms, and the three 1D threshold methods applied at the accountant level produce mutually consistent estimates (KDE antimode $= 0.973$, Beta-2 crossing $= 0.979$, logit-GMM-2 crossing $= 0.976$).
|
||||||
The BD/McCrary test, however, does not produce a significant transition at the accountant level either, in contrast to the signature level.
|
The BD/McCrary test, however, does not produce a significant transition at the accountant level either, in contrast to the signature level.
|
||||||
This pattern is consistent with a clustered *but smoothly mixed* accountant-level distribution rather than with a sharp density discontinuity: accountant-level means cluster into three recognizable groups, yet the transitions between them are gradual rather than discrete at the bin resolution BD/McCrary requires.
|
This pattern is consistent with a clustered *but smoothly mixed* accountant-level distribution rather than with a sharp density discontinuity: accountant-level means cluster into three recognizable groups, yet the test fails to reject the smoothness null at the sample size available ($N = 686$), and the GMM cluster boundaries appear gradual rather than sheer.
|
||||||
|
We caveat this interpretation appropriately in Section V-G: the BD null alone cannot affirmatively establish smoothness---only fail to falsify it---and our substantive claim of smoothly-mixed clustering rests on the joint weight of the GMM fit, the dip test, and the BD null rather than on the BD null alone.
|
||||||
|
|
||||||
The substantive interpretation we take from this evidence is therefore narrower than a "discrete-behaviour" claim: *pixel-level output quality* is continuous and heavy-tailed, and *accountant-level aggregate behaviour* is clustered (three recognizable groups) but not sharply discrete.
|
The substantive interpretation we take from this evidence is therefore narrower than a "discrete-behaviour" claim: *pixel-level output quality* is continuous and heavy-tailed, and *accountant-level aggregate behaviour* is clustered (three recognizable groups) but not sharply discrete.
|
||||||
The accountant-level mixture is a useful classifier of firm-and-practitioner-level signing regimes; individual behaviour may still transition or mix over time within a practitioner, and our cross-sectional analysis does not rule this out.
|
The accountant-level mixture is a useful classifier of firm-and-practitioner-level signing regimes; individual behaviour may still transition or mix over time within a practitioner, and our cross-sectional analysis does not rule this out.
|
||||||
Methodologically, the implication is that the three 1D methods are meaningfully applied at the accountant level for threshold estimation, while the BD/McCrary non-transition at the same level is itself diagnostic of smoothness rather than a failure of the method.
|
Methodologically, the implication is that the two threshold estimators (KDE antimode, Beta mixture with logit-Gaussian robustness) are meaningfully applied at the accountant level for threshold estimation, while the BD/McCrary non-transition at the same level is a failure-to-reject rather than a failure of the method---informative alongside the other evidence but subject to the power caveat recorded in Section V-G.
|
||||||
|
|
||||||
## C. Firm A as a Replication-Dominated, Not Pure, Population
|
## C. Firm A as a Replication-Dominated, Not Pure, Population
|
||||||
|
|
||||||
@@ -103,7 +104,8 @@ Extending the accountant-level analysis to auditor-year units is a natural next
|
|||||||
|
|
||||||
Sixth, the BD/McCrary transition estimates fall inside rather than between modes for the per-signature cosine distribution, and the test produces no significant transition at all at the accountant level.
|
Sixth, the BD/McCrary transition estimates fall inside rather than between modes for the per-signature cosine distribution, and the test produces no significant transition at all at the accountant level.
|
||||||
In our application, therefore, BD/McCrary contributes diagnostic information about local density-smoothness rather than an independent accountant-level threshold estimate; that role is played by the KDE antimode and the two mixture-based estimators.
|
In our application, therefore, BD/McCrary contributes diagnostic information about local density-smoothness rather than an independent accountant-level threshold estimate; that role is played by the KDE antimode and the two mixture-based estimators.
|
||||||
The BD/McCrary results remain informative as a robustness check---their non-transition at the accountant level is consistent with the dip-test and Beta-mixture evidence that accountant-level clustering is smooth rather than sharply discontinuous.
|
We emphasize that the accountant-level BD/McCrary null is *consistent with*---not affirmative proof of---smoothly mixed cluster boundaries: the BD/McCrary test is known to have limited statistical power at modest sample sizes, and with $N = 686$ accountants in our analysis the test cannot reliably detect anything less than a sharp cliff-type density discontinuity.
|
||||||
|
Failure to reject the smoothness null at this sample size therefore reinforces BD/McCrary's role as a diagnostic rather than a definitive estimator; the substantive claim of smoothly-mixed accountant-level clustering rests on the joint weight of the dip-test and Beta-mixture evidence together with the BD null, not on the BD null alone.
|
||||||
|
|
||||||
Finally, the legal and regulatory implications of our findings depend on jurisdictional definitions of "signature" and "signing."
|
Finally, the legal and regulatory implications of our findings depend on jurisdictional definitions of "signature" and "signing."
|
||||||
Whether non-hand-signing of a CPA's own stored signature constitutes a violation of signing requirements is a legal question that our technical analysis can inform but cannot resolve.
|
Whether non-hand-signing of a CPA's own stored signature constitutes a violation of signing requirements is a legal question that our technical analysis can inform but cannot resolve.
|
||||||
|
|||||||
@@ -244,7 +244,8 @@ The heldout fold is used exclusively to report post-hoc capture rates with Wilso
|
|||||||
4. **Low-similarity same-CPA anchor (supplementary negative):** signatures whose maximum same-CPA cosine similarity is below 0.70.
|
4. **Low-similarity same-CPA anchor (supplementary negative):** signatures whose maximum same-CPA cosine similarity is below 0.70.
|
||||||
This anchor is retained for continuity with prior work but is small in our dataset ($n = 35$) and is reported only as a supplementary reference; its confidence intervals are too wide for quantitative inference.
|
This anchor is retained for continuity with prior work but is small in our dataset ($n = 35$) and is reported only as a supplementary reference; its confidence intervals are too wide for quantitative inference.
|
||||||
|
|
||||||
From these anchors we report FAR with Wilson 95% confidence intervals (against the inter-CPA negative anchor) and FRR (against the byte-identical positive anchor), together with the Equal Error Rate (EER) interpolated at the threshold where FAR $=$ FRR, following biometric-verification reporting conventions [3].
|
From these anchors we report FAR with Wilson 95% confidence intervals against the inter-CPA negative anchor.
|
||||||
|
We do not report an Equal Error Rate or FRR column against the byte-identical positive anchor, because byte-identical pairs have cosine $\approx 1$ by construction and any FRR computed against that subset is trivially $0$ at every threshold below $1$; the conservative-subset role of the byte-identical anchor is instead discussed qualitatively in Section V-F.
|
||||||
Precision and $F_1$ are not meaningful in this anchor-based evaluation because the positive and negative anchors are constructed from different sampling units (intra-CPA byte-identical pairs vs random inter-CPA pairs), so their relative prevalence in the combined set is an arbitrary construction rather than a population parameter; we therefore omit precision and $F_1$ from Table X.
|
Precision and $F_1$ are not meaningful in this anchor-based evaluation because the positive and negative anchors are constructed from different sampling units (intra-CPA byte-identical pairs vs random inter-CPA pairs), so their relative prevalence in the combined set is an arbitrary construction rather than a population parameter; we therefore omit precision and $F_1$ from Table X.
|
||||||
The 70/30 held-out Firm A fold of Section IV-G.2 additionally reports capture rates with Wilson 95% confidence intervals computed within the held-out fold, which is a valid population for rate inference.
|
The 70/30 held-out Firm A fold of Section IV-G.2 additionally reports capture rates with Wilson 95% confidence intervals computed within the held-out fold, which is a valid population for rate inference.
|
||||||
We additionally draw a small stratified sample (30 signatures across high-confidence replication, borderline, style-only, pixel-identical, and likely-genuine strata) for manual visual sanity inspection; this sample is used only for spot-check and does not contribute to reported metrics.
|
We additionally draw a small stratified sample (30 signatures across high-confidence replication, borderline, style-only, pixel-identical, and likely-genuine strata) for manual visual sanity inspection; this sample is used only for spot-check and does not contribute to reported metrics.
|
||||||
|
|||||||
+23
-16
@@ -56,6 +56,7 @@ A Cohen's $d$ of 0.669 indicates a medium effect size [29], confirming that the
|
|||||||
## D. Hartigan Dip Test: Unimodality at the Signature Level
|
## D. Hartigan Dip Test: Unimodality at the Signature Level
|
||||||
|
|
||||||
Applying the Hartigan & Hartigan dip test [37] to the per-signature best-match distributions reveals a critical structural finding (Table V).
|
Applying the Hartigan & Hartigan dip test [37] to the per-signature best-match distributions reveals a critical structural finding (Table V).
|
||||||
|
The $N = 168{,}740$ count used in Table V and downstream all-pairs analyses (Tables XII, XVIII) is $15$ signatures smaller than the $168{,}755$ CPA-matched count reported in Table III: these $15$ signatures belong to CPAs with exactly one signature in the entire corpus, for whom no same-CPA pairwise best-match statistic can be computed, and are therefore excluded from all same-CPA similarity analyses.
|
||||||
|
|
||||||
<!-- TABLE V: Hartigan Dip Test Results
|
<!-- TABLE V: Hartigan Dip Test Results
|
||||||
| Distribution | N | dip | p-value | Verdict (α=0.05) |
|
| Distribution | N | dip | p-value | Verdict (α=0.05) |
|
||||||
@@ -81,8 +82,9 @@ Applying the BD/McCrary procedure (Section III-I.3) to the per-signature cosine
|
|||||||
Two cautions, however, prevent us from treating these signature-level transitions as thresholds.
|
Two cautions, however, prevent us from treating these signature-level transitions as thresholds.
|
||||||
First, the cosine transition at 0.985 lies *inside* the non-hand-signed mode rather than at the separation between two mechanisms, consistent with the dip-test finding that per-signature cosine is not cleanly bimodal.
|
First, the cosine transition at 0.985 lies *inside* the non-hand-signed mode rather than at the separation between two mechanisms, consistent with the dip-test finding that per-signature cosine is not cleanly bimodal.
|
||||||
Second, Appendix A documents that the signature-level transition locations are not bin-width-stable (Firm A cosine drifts across 0.987, 0.985, 0.980, 0.975 as the bin width is widened from 0.003 to 0.015, and full-sample dHash transitions drift across 2, 10, 9 as bin width grows from 1 to 3), which is characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity between two mechanisms.
|
Second, Appendix A documents that the signature-level transition locations are not bin-width-stable (Firm A cosine drifts across 0.987, 0.985, 0.980, 0.975 as the bin width is widened from 0.003 to 0.015, and full-sample dHash transitions drift across 2, 10, 9 as bin width grows from 1 to 3), which is characteristic of a histogram-resolution artifact rather than of a genuine density discontinuity between two mechanisms.
|
||||||
At the accountant level the test does not produce a significant transition in either the cosine-mean or the dHash-mean distribution, and this null is robust across the Appendix-A bin-width sweep.
|
At the accountant level the test does not produce a significant transition in either the cosine-mean or the dHash-mean distribution, and this null persists across the Appendix-A bin-width sweep.
|
||||||
We therefore read the BD/McCrary pattern as evidence that accountant-level aggregates are clustered-but-smoothly-mixed rather than sharply discontinuous, and we use BD/McCrary as a density-smoothness diagnostic rather than as an independent threshold estimator.
|
We read this accountant-level pattern as *consistent with*---not affirmative proof of---clustered-but-smoothly-mixed aggregates: at $N = 686$ accountants the BD/McCrary test has limited statistical power, so a non-rejection of the smoothness null does not by itself establish smoothness (Section V-G).
|
||||||
|
We therefore use BD/McCrary as a density-smoothness diagnostic rather than as an independent threshold estimator, and the substantive claim of smoothly-mixed accountant clustering rests on the joint evidence of the dip test, the BIC-selected GMM, and the BD null.
|
||||||
|
|
||||||
### 2) Beta Mixture at Signature Level: A Forced Fit
|
### 2) Beta Mixture at Signature Level: A Forced Fit
|
||||||
|
|
||||||
@@ -147,7 +149,7 @@ Table VIII summarizes the threshold estimates produced by the two threshold esti
|
|||||||
| Firm A calibration-fold dHash_indep median | — | 2 |
|
| Firm A calibration-fold dHash_indep median | — | 2 |
|
||||||
-->
|
-->
|
||||||
|
|
||||||
At the accountant level the two threshold estimators (KDE antimode and Beta-2 crossing) together with the logit-Gaussian robustness crossing converge to a cosine threshold of $\approx 0.975 \pm 0.003$ and a dHash threshold of $\approx 3.8 \pm 0.4$; the BD/McCrary density-smoothness diagnostic produces no significant transition at the same level (and this null is robust across Appendix A's bin-width sweep), consistent with clustered-but-smoothly-mixed accountant-level aggregates.
|
At the accountant level the two threshold estimators (KDE antimode and Beta-2 crossing) together with the logit-Gaussian robustness crossing converge to a cosine threshold of $\approx 0.975 \pm 0.003$ and a dHash threshold of $\approx 3.8 \pm 0.4$; the BD/McCrary density-smoothness diagnostic produces no significant transition at the same level (a null that persists across Appendix A's bin-width sweep), which is *consistent with*---though, at $N = 686$, not sufficient to affirmatively establish---clustered-but-smoothly-mixed accountant-level aggregates.
|
||||||
This is the accountant-level convergence we rely on for the primary threshold interpretation; the two-dimensional GMM marginal crossings (cosine $= 0.945$, dHash $= 8.10$) differ because they reflect joint (cosine, dHash) covariance structure, and we report them as a secondary cross-check.
|
This is the accountant-level convergence we rely on for the primary threshold interpretation; the two-dimensional GMM marginal crossings (cosine $= 0.945$, dHash $= 8.10$) differ because they reflect joint (cosine, dHash) covariance structure, and we report them as a secondary cross-check.
|
||||||
The signature-level estimates are reported for completeness and as diagnostic evidence of the continuous-spectrum asymmetry (Section IV-D.2) rather than as primary classification boundaries.
|
The signature-level estimates are reported for completeness and as diagnostic evidence of the continuous-spectrum asymmetry (Section IV-D.2) rather than as primary classification boundaries.
|
||||||
|
|
||||||
@@ -185,23 +187,26 @@ We report three validation analyses corresponding to the anchors of Section III-
|
|||||||
Of the 182,328 extracted signatures, 310 have a same-CPA nearest match that is byte-identical after crop and normalization (pixel-identical-to-closest = 1); these form the gold-positive anchor.
|
Of the 182,328 extracted signatures, 310 have a same-CPA nearest match that is byte-identical after crop and normalization (pixel-identical-to-closest = 1); these form the gold-positive anchor.
|
||||||
As the gold-negative anchor we sample 50,000 random cross-CPA signature pairs (inter-CPA cosine: mean $= 0.762$, $P_{95} = 0.884$, $P_{99} = 0.913$, max $= 0.988$).
|
As the gold-negative anchor we sample 50,000 random cross-CPA signature pairs (inter-CPA cosine: mean $= 0.762$, $P_{95} = 0.884$, $P_{99} = 0.913$, max $= 0.988$).
|
||||||
Because the positive and negative anchor populations are constructed from different sampling units (byte-identical same-CPA pairs vs random inter-CPA pairs), their relative prevalence in the combined anchor set is arbitrary, and precision / $F_1$ / recall therefore have no meaningful population interpretation.
|
Because the positive and negative anchor populations are constructed from different sampling units (byte-identical same-CPA pairs vs random inter-CPA pairs), their relative prevalence in the combined anchor set is arbitrary, and precision / $F_1$ / recall therefore have no meaningful population interpretation.
|
||||||
We accordingly report FAR with Wilson 95% confidence intervals against the large inter-CPA negative anchor and FRR against the byte-identical positive anchor in Table X; these two error rates are well defined within their respective anchor populations.
|
We accordingly report FAR with Wilson 95% confidence intervals against the large inter-CPA negative anchor in Table X.
|
||||||
The Equal-Error-Rate point, interpolated at FAR $=$ FRR, is located at cosine $= 0.990$ with EER $\approx 0$, which is trivially small because every byte-identical positive falls at cosine very close to 1.
|
The primary quantity reported by Table X is FAR: the probability that a random pair of signatures from *different* CPAs exceeds the candidate threshold.
|
||||||
|
We do not report an Equal Error Rate: EER is meaningful only when the positive and negative error-rate curves cross in a nontrivial interior region, but byte-identical positives all sit at cosine $\approx 1$ by construction, so FRR against that subset is trivially $0$ at every threshold below $1$. An EER calculation against this anchor would be arithmetic tautology rather than biometric performance, and we therefore omit it.
|
||||||
|
|
||||||
<!-- TABLE X: Cosine Threshold Sweep (positives = 310 byte-identical signatures; negatives = 50,000 inter-CPA pairs)
|
<!-- TABLE X: Cosine Threshold Sweep — FAR Against 50,000 Inter-CPA Negative Pairs
|
||||||
| Threshold | FAR | FAR 95% Wilson CI | FRR (byte-identical) |
|
| Threshold | FAR | FAR 95% Wilson CI |
|
||||||
|-----------|-----|-------------------|----------------------|
|
|-----------|-----|-------------------|
|
||||||
| 0.837 (all-pairs KDE crossover) | 0.2062 | [0.2027, 0.2098] | 0.000 |
|
| 0.837 (all-pairs KDE crossover) | 0.2062 | [0.2027, 0.2098] |
|
||||||
| 0.900 | 0.0233 | [0.0221, 0.0247] | 0.000 |
|
| 0.900 | 0.0233 | [0.0221, 0.0247] |
|
||||||
| 0.945 (2D GMM marginal) | 0.0008 | [0.0006, 0.0011] | 0.000 |
|
| 0.945 (2D GMM marginal) | 0.0008 | [0.0006, 0.0011] |
|
||||||
| 0.950 | 0.0007 | [0.0005, 0.0009] | 0.000 |
|
| 0.950 | 0.0007 | [0.0005, 0.0009] |
|
||||||
| 0.973 (accountant KDE antimode) | 0.0003 | [0.0002, 0.0004] | 0.000 |
|
| 0.973 (accountant KDE antimode) | 0.0003 | [0.0002, 0.0004] |
|
||||||
| 0.979 (accountant Beta-2) | 0.0002 | [0.0001, 0.0004] | 0.000 |
|
| 0.979 (accountant Beta-2) | 0.0002 | [0.0001, 0.0004] |
|
||||||
|
|
||||||
|
Table note: We do not include FRR against the byte-identical positive anchor as a column here: the byte-identical subset has cosine $\approx 1$ by construction, so FRR against that subset is trivially $0$ at every threshold below $1$ and carries no biometric information beyond verifying that the threshold does not exceed $1$. The conservative-subset FRR role of the byte-identical anchor is instead discussed qualitatively in Section V-F.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
Two caveats apply.
|
Two caveats apply.
|
||||||
First, the gold-positive anchor is a *conservative subset* of the true non-hand-signed population: it captures only those non-hand-signed signatures whose nearest match happens to be byte-identical, not those that are near-identical but not bytewise identical.
|
First, the byte-identical positive anchor referenced above is a *conservative subset* of the true non-hand-signed population: it captures only those non-hand-signed signatures whose nearest match happens to be byte-identical, not those that are near-identical but not bytewise identical.
|
||||||
Zero FRR against this subset does not establish zero FRR against the broader positive class, and the reported FRR should therefore be interpreted as a lower-bound calibration check on the classifier's ability to catch the clearest positives rather than a generalizable miss rate.
|
A would-be FRR computed against this subset is definitionally $0$ at every threshold below $1$ (since byte-identical pairs have cosine $\approx 1$), so such an FRR is a mathematical boundary check rather than an empirical miss-rate estimate; we discuss the generalization limits of this conservative-subset framing in Section V-F.
|
||||||
Second, the 0.945 / 0.95 / 0.973 thresholds are derived from the Firm A calibration fold or the accountant-level methods rather than from this anchor set, so the FAR values in Table X are post-hoc-fit-free evaluations of thresholds that were not chosen to optimize Table X.
|
Second, the 0.945 / 0.95 / 0.973 thresholds are derived from the Firm A calibration fold or the accountant-level methods rather than from this anchor set, so the FAR values in Table X are post-hoc-fit-free evaluations of thresholds that were not chosen to optimize Table X.
|
||||||
The very low FAR at the accountant-level thresholds is therefore informative about specificity against a realistic inter-CPA negative population.
|
The very low FAR at the accountant-level thresholds is therefore informative about specificity against a realistic inter-CPA negative population.
|
||||||
|
|
||||||
@@ -371,6 +376,8 @@ We note that this test uses the calibrated classifier of Section III-L rather th
|
|||||||
|
|
||||||
Table XVII presents the final classification results under the dual-descriptor framework with Firm A-calibrated thresholds for 84,386 documents.
|
Table XVII presents the final classification results under the dual-descriptor framework with Firm A-calibrated thresholds for 84,386 documents.
|
||||||
The document count (84,386) differs from the 85,042 documents with any YOLO detection (Table III) because 656 documents carry only a single detected signature, for which no same-CPA pairwise comparison and therefore no best-match cosine / min dHash statistic is available; those documents are excluded from the classification reported here.
|
The document count (84,386) differs from the 85,042 documents with any YOLO detection (Table III) because 656 documents carry only a single detected signature, for which no same-CPA pairwise comparison and therefore no best-match cosine / min dHash statistic is available; those documents are excluded from the classification reported here.
|
||||||
|
We emphasize that the document-level proportions below reflect the *worst-case aggregation rule* of Section III-L: a report carrying one stamped signature and one hand-signed signature is labeled with the most-replication-consistent of the two signature-level verdicts.
|
||||||
|
Document-level rates therefore bound the share of reports in which *at least one* signature is non-hand-signed rather than the share in which *both* are; the intra-report agreement analysis of Section IV-H.3 (Table XVI) reports how frequently the two co-signers share the same signature-level label within each firm, so that readers can judge what fraction of the non-hand-signed document-level share corresponds to fully non-hand-signed reports versus mixed reports.
|
||||||
|
|
||||||
<!-- TABLE XVII: Document-Level Classification (Dual-Descriptor: Cosine + dHash)
|
<!-- TABLE XVII: Document-Level Classification (Dual-Descriptor: Cosine + dHash)
|
||||||
| Verdict | N (PDFs) | % | Firm A | Firm A % |
|
| Verdict | N (PDFs) | % | Firm A | Firm A % |
|
||||||
|
|||||||
Reference in New Issue
Block a user