Files
pdf_signature_extraction/paper/paper_a_discussion_v3.md
T
gbanyan 12637cd413 Phase 6 manuscript splice (2/2): §IV / §V / §VI spliced
Lands v4.0 §IV / §V / §VI content into v3.20.0 master sub-files.
Strips internal close-out checklists, draft notes, and open-questions
blocks at splice. Completes the Phase 6 manuscript-master file
assembly.

§IV Results (paper_a_results_v3.md):
- §IV-A..C: kept v3.20.0 inherited content (experimental setup,
  detection performance, all-pairs distribution); added v4 scope
  note (Big-4 primary) at the §IV header
- §IV-D..K: replaced v3.20.0 §IV-D..H with v4.0 §IV-D..K (Big-4
  distributional / mixture / convergence / LOOO / pixel-identity /
  inter-CPA reference / five-way classification / full-dataset
  robustness)
- §IV-L: renumbered v3.20.0 §IV-I (backbone ablation) content to
  match v4's "§IV-L inherited from v3.20.0 §IV-I" reframing
- §IV-M: appended v4.0 ICCR calibration tables (XX-XXVI):
  composition decomposition, per-comparison/per-signature/
  per-document ICCRs, firm heterogeneity + cross-firm hit matrix,
  alert-rate sensitivity
- §III-K ablation cross-ref updated to §IV-L (was §IV-I)
- Phase 3 close-out checklist (lines 365+) stripped

§V Discussion (paper_a_discussion_v3.md):
- Replaced v3.20.0 §V with v4.0 §V (8 sub-sections A-H):
  A. Distinct problem framing
  B. Continuous quality spectrum + composition-driven multimodality
  C. Firm A as templated end (case study, not anchor)
  D. K=2 / K=3 descriptive partitions
  E. Three-score convergent internal-consistency
  F. Anchor-based multi-level calibration
  G. Pixel-identity hard positive anchor + ICCR reframing
  H. Limitations (14 items: 9 v4-specific + 5 inherited from v3.x)

§VI Conclusion (paper_a_conclusion_v3.md):
- Replaced v3.20.0 §VI with v4.0 §VI (8 contribution items mirroring
  §I contributions; 4-direction future work).

Known splice-time issue (deferred to typesetting): §IV table numbering
is sequential by label (V, VI, ..., XXVI) but Table XIX (document-level
worst-case) appears physically before Tables XVI/XVII/XVIII in §IV-J
narrative flow. IEEE Access typesetters typically normalize table order
during typesetting; we accept the in-file ordering quirk to preserve
the §IV-J narrative arc (per-signature -> document-level worst-case ->
K=3 cross-tab). Renumbering to strictly-ascending physical order would
require renaming Tables XVI/XVII/XVIII -> XVII/XVIII/XIX with
downstream cross-reference updates; deferred unless partner Jimmy
review or IEEE Access submission portal flags it.

Manuscript splice complete. Working drafts in paper/v4/ retained as
archive of the round-by-round Phase 5 fix history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:43:41 +08:00

18 KiB
Raw Blame History

V. Discussion

A. Non-Hand-Signing Detection as a Distinct Problem

Non-hand-signing differs from forgery in that the questioned signature is produced by its legitimate signer's own stored image rather than by an impostor. The detection problem is therefore framed around intra-signer image reproduction rather than inter-signer imitation. This framing has analytical consequences. The within-CPA signature distribution is the analytical population of interest; the cross-CPA inter-class distribution is a reference against which intra-CPA similarity is interpreted, not the population to be modelled. This contrasts with most prior offline signature verification work, which treats genuine-versus-forged as the central two-class problem.

B. Per-Signature Similarity is a Continuous Quality Spectrum; the Accountant-Level Multimodality is Composition-Driven

A central empirical finding of v3.x was that per-signature similarity does not admit a clean two-mechanism mixture: dip-test fails to reject unimodality at the signature level for Firm A, BIC prefers a 3-component fit, and BD/McCrary candidate transitions lie inside the high-similarity mode rather than between modes. v4.0 strengthens and extends this signature-level reading.

The Big-4 accountant-level descriptor distribution does reject unimodality on both marginals at p < 5 \times 10^{-4} (Script 34). v4.0's composition decomposition (§III-I.4; Scripts 39b39e) shows that this rejection is fully attributable to two non-mechanistic sources: (a) between-firm location-shift effects on both axes — Firm A's mean dHash of 2.73 versus Firms B/C/D's 6.46, 7.39, 7.21 creates a multi-peaked pooled distribution that any single firm's distribution lacks — and (b) integer mass-point artefacts on the integer-valued dHash axis, which inflate the dip statistic against a continuous-density null. A 2×2 factorial diagnostic applied to the Big-4 pooled dHash (firm-mean centring × uniform integer jitter [-0.5, +0.5], 5 jitter seeds) shows that the dip test fails to reject (p_{\text{median}} = 0.35, 0/5 seeds reject) when both corrections are applied; either correction alone leaves the rejection in place. Within-firm signature-level cosine and jittered-dHash dip tests fail to reject in every individual Big-4 firm and in every individual non-Big-4 firm with \geq 500 signatures tested (cosine: Scripts 39b/39c; jittered-dHash: Script 39d for Big-4 plus codex-verified read-only spike for the ten non-Big-4 firms; see §III-I.4). The descriptor distributions therefore lack a within-population bimodal antimode that could anchor an operational threshold. The K=2 / K=3 mixture fits are retained in §III-J as descriptive partitions of the joint Big-4 distribution that reflect firm-compositional structure, not as inferential evidence for two or three latent mechanism modes.

C. Firm A as the Templated End of Big-4 (Case Study, Not Calibration Anchor)

Firm A is empirically the firm whose CPAs are most concentrated in the high-cosine, low-dHash corner of the Big-4 descriptor plane. In the Big-4 K=3 hard-posterior assignment (now interpreted as a firm-compositional position assignment; §III-J), Firm A accounts for 0\% of C1 (low-cos / high-dHash position) and 82.5\% of C3 (high-cos / low-dHash position); the opposite pattern holds at Firm C, which has the highest C1 concentration at 23.5\%. Firm A also accounts for 145 of the 262 byte-identical signatures in the Big-4 byte-identical anchor of §IV-H (with Firm B 8, Firm C 107, Firm D 2). The additional v3.x finding that the 145 Firm A pixel-identical signatures span 50 distinct Firm A partners (of 180 registered), with 35 byte-identical matches across different fiscal years, is inherited from v3.20.0 §IV-F.1 / Script 28 / Appendix B byte-decomposition output and was not regenerated in v4.0's spike scripts; we retain those numbers by reference.

In v4.0 we treat Firm A as a templated-end case study rather than as the calibration anchor for the operational threshold. Firm A enters the Big-4 anchor-based ICCR calibration on equal footing with the other three Big-4 firms (§III-L). The cross-firm hit matrix of §III-L.4 strengthens this framing: under the deployed any-pair rule, within-firm collision concentration is 98.8\% at Firm A and $76.7$83.7\% at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$99.96\% within-firm across all four firms). Firm A's high per-document HC$+$MC alarm rate of 0.62 (versus Firms B/C/D's $0.09$0.16) reflects high inter-CPA collision concentration under the deployed rule on real same-CPA pools, consistent with firm-specific template, stamp, or document-production reuse — though the inter-CPA-anchor analysis alone is not diagnostic of deliberate template sharing. The byte-level evidence of v3.x §IV-F.1 (Firm A's 145 pixel-identical signatures across \sim 50 distinct partners) provides direct evidence that firm-level template reuse does occur at Firm A; the within-firm collision pattern at all four Big-4 firms is consistent with that mechanism extending in milder form to Firms B/C/D.

D. K=2 / K=3 as Descriptive Firm-Compositional Partitions

Leave-one-firm-out cross-validation of the Big-4 mixture fit reveals a sharp contrast between K=2 and K=3 behaviour. K=2 is unstable: across-fold cosine-crossing deviation is 0.028, and holding Firm A out gives a fold rule (cos > 0.938, dHash \leq 8.79) that classifies 100\% of held-out Firm A in the upper component, while holding any non-Firm-A Big-4 firm out gives a fold rule near (cos > 0.975, dHash \leq 3.76) that classifies 0\% of the held-out firm in the upper component. The K=2 boundary is essentially a Firm-A-vs-others separator — direct evidence that the K=2 partition reflects firm-compositional rather than mechanistic structure.

K=3 in contrast has a reproducible component shape at the descriptor-position level: across the four folds the C1 (low-cos / high-dHash) component cosine mean varies by at most 0.005, the dHash mean by at most 0.96, and the weight by at most 0.023. Hard-posterior membership for the held-out firm is composition-sensitive (absolute differences $1.8$12.8 pp across folds). Together with the §III-I.4 composition decomposition (no within-population bimodal antimode), the K=3 stability supports a descriptive reading: the Big-4 descriptor plane has a reproducible three-region partition that reflects how firm-compositional weight is distributed across the descriptor space, not a three-mechanism latent-class structure. We accordingly do not use K=3 hard-posterior membership as an operational classifier; we use it as the accountant-level descriptive summary that complements the deployed signature-level five-way classifier of §III-L.

E. Three-Score Convergent Internal-Consistency

Three feature-derived scores agree on the per-CPA descriptor-position ranking at Spearman \rho \geq 0.879: the K=3 mixture posterior (a firm-compositional position score, not a mechanism cluster posterior); the reverse-anchor cosine percentile under a non-Big-4 reference distribution; and the inherited Paper A box-rule less-replication-dominated rate. The three scores are not statistically independent measurements — they are deterministic functions of the same per-CPA descriptor pair — so the convergence is documented as internal consistency rather than external validation against an independent ground truth (which the corpus does not provide for the hand-signed class). The strength of the convergence (all pairwise |\rho| > 0.87) and its persistence at the signature level (Cohen \kappa = 0.87 between per-CPA-fit and per-signature-fit K=3 binary labels) are nevertheless informative: per-CPA aggregation does not collapse the broad three-region ordering, and three different summarisations of the descriptor space produce broadly concordant per-CPA rankings, with a residual non-Firm-A disagreement (the reverse-anchor cosine percentile ranks Firm D fractionally above Firm C, while the mixture posterior and the box-rule rate rank Firm C highest among non-Firm-A firms).

F. Anchor-Based Multi-Level Calibration

The operational specificity of the deployed five-way classifier is characterised at three units of analysis (§III-L), all against the same inter-CPA negative-anchor coincidence-rate proxy. The per-comparison ICCR replicates v3.x's per-comparison rate (cos$>0.95 \to 0.00060$) and extends it to the structural dimension (dHash$\leq 5 \to 0.00129$; joint \to 0.00014). The pool-normalised per-signature ICCR captures the deployed rule's effective per-signature rate under inter-CPA candidate-pool replacement (0.1102 pooled Big-4 any-pair HC), exposing that the per-comparison rate is not the deployed-rule rate at the per-signature classifier level: the deployed classifier takes max-cosine and min-dHash over a same-CPA pool of size n_{\text{pool}}, so the inter-CPA-equivalent rate scales approximately as 1 - (1 - p_{\text{pair}})^{n_{\text{pool}}} in the independence limit. The per-document ICCR aggregates to operational alarm-rate units: HC alone 0.18, the operational HC$+$MC alarm 0.34.

Two additional findings refine the calibration story. First, the per-pair conditional ICCR for dHash$\leq 5$ given cos$>0.95$ is 0.234 (Wilson 95% [0.190, 0.285]): given the cosine gate, the structural dimension provides further per-comparison specificity at \sim 4.3\times refinement. Second, the alert-rate sensitivity analysis (§III-L.5; Script 46) shows the inherited HC threshold is locally sensitive rather than plateau-stable (local gradient \approx 25\times the median for cosine, \approx 3.8\times for dHash); stakeholders requiring different specificity-alert-yield operating points can derive thresholds by inverting the ICCR curves (a tighter rule cos$>0.95$ AND dHash$\leq 3$ on the same-pair joint gives per-signature ICCR \approx 0.045). The MC/HSC sub-band boundary at dHash$=15$, by contrast, is plateau-like (local-to-median ratio \approx 0.08), consistent with high-dHash-tail saturation.

G. Pixel-Identity as a Hard Positive Anchor; Inherited Inter-CPA Negative Anchor Reframed as Coincidence Rate

The only hard ground-truth subset in the corpus is pixel-identical signatures: those whose nearest same-CPA match is byte-identical after crop and normalisation. Independent hand-signing cannot produce byte-identical images, so these signatures are conservative-subset ground truth for the replicated class. On the Big-4 subset (n = 262 pixel-identical signatures), all three candidate checks — the inherited box rule, the K=3 hard label, and the reverse-anchor metric with a prevalence-calibrated cut — achieve 0\% positive-anchor miss rate (Wilson 95% upper bound 1.45\%). We caution that this result is necessary but not sufficient: for the box rule it is close to tautological, because byte-identical neighbours have cosine \approx 1 and dHash \approx 0, well inside the rule's high-confidence region. The corresponding signature-level negative anchor evidence is developed in §III-L.1 above (v4 spike: cos$>0.95$ per-comparison ICCR = 0.00060, replicating v3.20.0's reported 0.0005 under prior "FAR" terminology). We frame the per-comparison rate as a specificity proxy under the assumption that inter-CPA pairs constitute a clean negative anchor, and we document in §III-L.4 that this assumption is partially violated by within-firm cross-CPA template-like collision structures.

H. Limitations

Several limitations should be transparent. The first nine are v4.0-specific; the last five are inherited from v3.20.0 §V-G and still apply to the v4.0 pipeline.

No signature-level ground truth; no true error rates reportable. The corpus does not contain labelled hand-signed or replicated classes at the signature level. We therefore cannot report False Rejection Rate, sensitivity, recall, Equal Error Rate, ROC-AUC, precision, or positive predictive value against ground truth. All quantitative rates reported in §III-L are inter-CPA negative-anchor coincidence rates (ICCRs) under the assumption that inter-CPA pairs constitute a clean negative anchor; this is a specificity proxy, not a calibrated specificity (§III-M).

Inter-CPA negative-anchor assumption is partially violated and the violation is firm-dependent. The cross-firm hit matrix of §III-L.4 shows that under the deployed any-pair rule, within-firm collision concentration is 98.8\% at Firm A and $76.7$83.7\% at Firms B/C/D (the stricter same-pair joint event saturates at $97.0$99.96\% within-firm across all four firms), consistent with firm-specific template, stamp, or document-production reuse. The inter-CPA-as-negative assumption is therefore not exactly satisfied — some inter-CPA pairs may share firm-level templates rather than being independent random matches. Our reported per-comparison ICCRs are best read as specificity-proxy rates under a partially-violated assumption, not as calibrated FARs. Because the violation is firm-dependent, Firm A's per-firm ICCR is more contaminated by within-firm sharing than Firms B/C/D's; the per-firm B/C/D rates of $0.09$0.16 are therefore closer to a clean specificity estimate than the pooled rate, and the Firm A vs Firms B/C/D contrast reflects both genuine firm heterogeneity and a firm-dependent proxy-contamination gradient.

Scope. The v4.0 primary analyses are scoped to the Big-4 sub-corpus. We did not perform the full per-signature pool-normalised ICCR analysis at the full n = 686 scope; the §IV-K full-dataset Spearman re-run shows the K=3 + box-rule rank-convergence is preserved at n = 686 but does not validate the Big-4 operational ICCRs, the LOOO firm-fold structure, or the five-way operational classifier at the broader scope.

Pixel-identity is a conservative subset. Byte-identical pairs are the easiest replicated cases, and for the inherited box rule the positive-anchor miss rate against byte-identical pairs is close to tautological (byte-identical \Rightarrow cosine \approx 1, dHash \approx 0, well inside the high-confidence box). A score that fails the pixel-identity check would be disqualified, but passing the check does not guarantee correct behaviour on the broader replicated population (e.g., re-stamped or noisy-template-variant signatures).

Inherited rule components are not separately v4-validated. The five-way classifier's moderate-confidence band (cos > 0.95 AND 5 < \text{dHash} \leq 15), the style-consistency band (\text{dHash} > 15), and the document-level worst-case aggregation rule retain their v3.20.0 calibration and capture-rate evidence; v4.0's anchor-based ICCR calibration covers the binary high-confidence sub-rule (and its tightening alternatives such as dHash$\leq 3$), and the alert-rate sensitivity analysis (§III-L.5) characterises only the HC threshold. The MC and HSC sub-band boundaries are not separately re-validated by v4.0's diagnostic battery.

Deployed-rate excess is not a presumed true-positive rate. The $\sim 44$-pp per-document gap between the observed deployed alert rate (HC: 0.62 on real same-CPA pools) and the inter-CPA proxy rate (HC: 0.18) cannot be interpreted as a presumed true-positive rate without additional assumptions that §III-M shows are unsafe (consistent within-CPA signing can exceed inter-CPA similarity at the cosine axis; within-firm template sharing inflates the inter-CPA proxy baseline). The gap is best read as a same-CPA repeatability signal.

A1 pair-detectability stipulation. The per-signature detector requires at least one same-CPA pair to be near-identical when a CPA uses image replication. A1 is plausible for high-volume stamping or firm-level electronic signing but not guaranteed when a corpus contains only one observed replicated report for a CPA, multiple template variants used in parallel, or scan-stage noise that pushes a replicated pair outside the detection regime.

K=3 hard-posterior membership is composition-sensitive. The K=3 hard-posterior membership for any single firm varies by up to 12.8 pp across LOOO folds. This is documented as a composition-sensitivity band rather than failure, but it means K=3 hard labels are not used as v4.0 operational classifier output; they are reported only as accountant-level descriptive characterisation.

No partner-level mechanism attribution. v4.0 reports population-level patterns; it does not perform partner-level mechanism attribution or report-level claims of intent. The signature-level outputs are signature-level quantities throughout. The within-firm cross-CPA collision concentration of §III-L.4 is consistent with template-like reuse but is not by itself diagnostic of deliberate sharing.

Transferred ImageNet features (inherited from v3.20.0). The ResNet-50 feature extractor uses pre-trained ImageNet weights without signature-domain fine-tuning. While our backbone-ablation study (§IV-L, inherited from v3.20.0 §IV-I) and prior literature support the effectiveness of transferred ImageNet features for signature comparison, a signature-domain fine-tuned feature extractor could improve discriminative performance.

Red-stamp HSV preprocessing artifacts (inherited from v3.20.0). The red stamp removal preprocessing uses simple HSV color-space filtering, which may introduce artifacts where handwritten strokes overlap with red seal impressions. Blended pixels are replaced with white, potentially creating small gaps in signature strokes that could reduce dHash similarity. This bias would push classifications toward false negatives rather than false positives.

Longitudinal scan / PDF / compression confounds (inherited from v3.20.0). Scanning equipment, PDF generation software, and compression algorithms may have changed over the 20132023 study period, potentially affecting similarity measurements. While cosine similarity and dHash are designed to be robust to such variations, longitudinal confounds cannot be entirely excluded.

Source-exemplar misattribution in max/min pair logic (inherited from v3.20.0). The max-cosine / min-dHash detection logic treats both ends of a near-identical same-CPA pair as non-hand-signed. In the rare case where one of the two documents contains a genuinely hand-signed exemplar that was subsequently reused as a stamping or e-signature template, the pair correctly identifies image reuse but misattributes non-hand-signed status to the source exemplar. This affects at most one source document per template variant per CPA and is not expected to be common.

Legal and regulatory interpretation (inherited from v3.20.0). Whether non-hand-signing of a CPA's own stored signature constitutes a violation of signing requirements is a jurisdiction-specific legal question. Our technical analysis can inform such determinations but cannot resolve them.