Paper A v3.18.2: address codex GPT-5.5 round-16 Minor-Revision findings

Codex independent peer review (paper/codex_review_gpt55_v3_18_1.md) audited
empirical claims against scripts/JSON reports rather than rubber-stamping
prior Accept verdicts. Verdict: Minor Revision. This commit addresses every
flagged item.

- Soften mechanism-identification language (Results IV-D.1, Discussion B):
  per-signature cosine "fails to reject unimodality" rather than "reflects a
  single dominant generative mechanism"; framing tied to joint evidence.
- Replace overabsolute "single stored image" with multi-template phrasing
  in Introduction and Methodology III-A.
- Reframe Methodology III-H so practitioner knowledge is non-load-bearing;
  evidentiary basis is the paper's own image evidence.
- Fix stale section cross-references after the v3.18 retitling: IV-F.* ->
  IV-G.* in 11 locations across methodology and results.
- Fix 0.941 / 0.945 / 0.9407 wording in Methodology III-K to use the
  calibration-fold P5 = 0.9407 and the rounded sensitivity cut 0.945.
- Soften "sharp discontinuity" in Results IV-G.3 to "23-28 percentage-point
  gap consistent with firm-wide non-hand-signing practice".
- Soften Conclusion's "directly generalizable" with explicit conditions on
  analogous anchors and artifact-generation physics.
- Add Appendix B: table-to-script provenance map (15 manuscript tables
  mapped to generating scripts and JSON report artifacts).
- New script signature_analysis/28_byte_identity_decomposition.py produces
  reproducible artifacts for two previously-unverified claims:
  (a) 145 / 50 / 180 / 35 Firm A byte-identity decomposition (verified);
  (b) cross-firm dual-descriptor convergence -- corrected from the previous
      manuscript text "non-Firm-A 11.3% vs Firm A 58.7% (5x)" to the
      database-verified "non-Firm-A 42.12% vs Firm A 88.32% (~2.1x)".
- Clarify scripts 19 / 21 docstrings: legacy EER / FRR / Precision / F1
  helpers are retained for diagnostic use only and are NOT cited as
  biometric performance in the paper. Remove "interview evidence" wording.
- Rebuild Paper_A_IEEE_Access_Draft_v3.docx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-27 20:23:08 +08:00
parent cb77f481ec
commit 4bb7aa9189
9 changed files with 299 additions and 53 deletions
@@ -8,39 +8,40 @@ occurring reference populations instead of manual labels:
Positive anchor 1: pixel_identical_to_closest = 1
Two signature images byte-identical after crop/resize.
Mathematically impossible to arise from independent hand-signing
=> absolute ground truth for replication.
=> pair-level proof of image reuse and a CONSERVATIVE-SUBSET
ground truth for non-hand-signing (only those whose nearest
same-CPA match happens to be byte-identical).
Positive anchor 2: Firm A (Deloitte) signatures
Interview evidence from multiple Firm A accountants confirms that
MOST use replication (stamping / firm-level e-signing) but a
MINORITY may still hand-sign. Firm A is therefore a
"replication-dominated" population (not a pure one). We use it as
a strong prior positive for the majority regime, while noting that
~7% of Firm A signatures fall below cosine 0.95 consistent with
the minority hand-signers. This matches the long left tail
observed in the dip test (Script 15) and the Firm A members who
land in C2 (middle band) of the accountant-level GMM (Script 18).
Positive anchor 2: Firm A signatures
Treated in the manuscript as a REPLICATION-DOMINATED population
based on the paper's own image evidence: the byte-level pair
analysis, the Firm A per-signature similarity distribution, the
partner-ranking concentration, and the intra-report consistency
gap. Approximately 7% of Firm A signatures fall below cosine
0.95, forming the long left tail observed in the dip test
(Script 15).
Negative anchor: signatures with cosine <= low threshold
Pairs with very low cosine similarity cannot plausibly be pixel
duplicates, so they serve as absolute negatives.
duplicates, so they serve as a conservative supplementary
negative reference.
Metrics reported:
- FAR/FRR/EER using the pixel-identity anchor as the gold positive
and low-similarity pairs as the gold negative.
- Precision/Recall/F1 at cosine and dHash thresholds from Scripts
15/16/17/18.
Metrics computed (legacy; NOT all reported in the manuscript):
- FAR against the inter-CPA negative anchor is the primary metric
reported (Table X). The byte-identical positive anchor has cosine
~= 1 by construction, so FRR / EER / Precision / F1 against that
subset are arithmetic tautologies (FRR is trivially 0 below
threshold 1) and are intentionally OMITTED from Table X. Legacy
EER/FRR/precision/F1 helper functions remain in this script for
diagnostic use only and their outputs are NOT cited as biometric
performance in the paper.
- Convergence with Firm A anchor (what fraction of Firm A signatures
are correctly classified at each threshold).
Small visual sanity sample (30 pairs) is exported for spot-check, but
metrics are derived entirely from pixel and Firm A evidence.
Output:
reports/pixel_validation/pixel_validation_report.md
reports/pixel_validation/pixel_validation_results.json
reports/pixel_validation/roc_cosine.png, roc_dhash.png
reports/pixel_validation/sanity_sample.csv
"""
import sqlite3
+24 -11
View File
@@ -2,26 +2,39 @@
"""
Script 21: Expanded Validation with Larger Negative Anchor + Held-out Firm A
============================================================================
Addresses codex review weaknesses of Script 19's pixel-identity validation:
Addresses three weaknesses of Script 19's pixel-identity validation:
(a) Negative anchor of n=35 (cosine<0.70) is too small to give
meaningful FAR confidence intervals.
(b) Pixel-identical positive anchor is an easy subset, not
representative of the broader positive class.
(c) Firm A is both the calibration anchor and the validation anchor
(circular).
(b) Pixel-identical positive anchor is a CONSERVATIVE SUBSET of the
true non-hand-signed class, not representative of the broader
positive class. Recall against this subset is therefore a
lower-bound calibration check, not a generalizable recall
estimate.
(c) Firm A is both the calibration anchor and a validation anchor
(circular). The 70/30 fold split makes within-Firm-A sampling
variance visible without claiming external validation.
This script:
1. Constructs a large inter-CPA negative anchor (~50,000 pairs) by
randomly sampling pairs from different CPAs. Inter-CPA high
similarity is highly unlikely to arise from legitimate signing.
2. Splits Firm A CPAs 70/30 into CALIBRATION and HELDOUT folds.
Re-derives signature-level / accountant-level thresholds from the
calibration fold only, then reports all metrics (including Firm A
anchor rates) on the heldout fold.
3. Computes proper EER (FAR = FRR interpolated) in addition to
metrics at canonical thresholds.
4. Computes 95% Wilson confidence intervals for each FAR/FRR.
Re-derives signature-level thresholds from the calibration fold
only, then reports capture rates on the heldout fold.
3. Computes 95% Wilson confidence intervals for FAR at canonical
thresholds (Table X in the manuscript).
Legacy / diagnostic-only metrics:
Helper functions for EER, Precision, Recall, F1, and FRR remain in
this script for backward compatibility. The manuscript intentionally
OMITS these metrics from Table X because the byte-identical positive
anchor has cosine ~= 1 by construction (so FRR / EER are arithmetic
tautologies) and because positive and negative anchors are
constructed from different sampling units, making prevalence
arbitrary (so Precision and F1 have no meaningful population
interpretation). Only FAR against the large inter-CPA negative
anchor is reported as a biometric metric in the paper.
Output:
reports/expanded_validation/expanded_validation_report.md
@@ -0,0 +1,204 @@
#!/usr/bin/env python3
"""
Script 28: Byte-Identity Decomposition + Cross-Firm Dual-Descriptor Convergence
================================================================================
Produces two reproducible artifacts cited in the manuscript that previously
lacked dedicated provenance (codex review v3.18.1 items #7 and #8):
(#7) Byte-identical Firm A signature decomposition:
- Total Firm A signatures with pixel_identical_to_closest = 1
- Number of distinct Firm A partners they span
- Number of partners in the registry (denominator)
- Number of byte-identical pairs that span DIFFERENT fiscal years
(#8) Cross-firm dual-descriptor convergence:
- Among signatures with cosine > 0.95 (per-signature best-match),
the fraction with min_dhash_independent <= 5, broken out by
Firm A vs Non-Firm-A.
Output:
/Volumes/NV2/PDF-Processing/signature-analysis/reports/byte_identity_decomp/
byte_identity_decomposition.json
byte_identity_decomposition.md
These figures are intended to be cited from the paper (Section IV-F.1 for #7;
Section IV-H.2 for #8) so that every quantitative claim in the manuscript
traces to a specific JSON field.
"""
import json
import sqlite3
from datetime import datetime
from pathlib import Path
DB = '/Volumes/NV2/PDF-Processing/signature-analysis/signature_analysis.db'
OUT = Path('/Volumes/NV2/PDF-Processing/signature-analysis/reports/'
'byte_identity_decomp')
OUT.mkdir(parents=True, exist_ok=True)
FIRM_A = '勤業眾信聯合'
def byte_identity_decomposition(conn):
"""Codex item #7: 145 / 50 / 180 / 35 decomposition."""
cur = conn.cursor()
cur.execute("""
SELECT COUNT(DISTINCT name)
FROM accountants
WHERE firm = ?
""", (FIRM_A,))
n_registered_partners = cur.fetchone()[0]
cur.execute("""
WITH byte_pairs AS (
SELECT s1.signature_id AS sig_a,
s1.assigned_accountant AS partner,
s1.year_month AS ym_a,
s2.year_month AS ym_b
FROM signatures s1
JOIN signatures s2 ON s1.closest_match_file = s2.image_filename
WHERE s1.pixel_identical_to_closest = 1
AND s1.excel_firm = ?
)
SELECT
COUNT(*) AS total_pixel_identical_firm_a,
COUNT(DISTINCT partner) AS partners_with_pixel_identical,
SUM(CASE WHEN substr(ym_a,1,4) <> substr(ym_b,1,4) THEN 1 ELSE 0 END)
AS cross_year_pairs
FROM byte_pairs
""", (FIRM_A,))
n_total, n_partners, n_cross_year = cur.fetchone()
return {
'definition': (
'Among Firm A signatures whose nearest same-CPA match is '
'byte-identical after crop and normalization '
'(pixel_identical_to_closest = 1), this section reports the '
'count, the distinct-partner spread, the registry denominator, '
'and the subset whose byte-identical match is in a different '
'fiscal year.'
),
'firm_label': 'Firm A',
'n_pixel_identical_firm_a_signatures': n_total,
'n_distinct_partners_with_pixel_identical': n_partners,
'n_registered_partners_in_firm_a': n_registered_partners,
'partner_coverage_share': round(n_partners / n_registered_partners, 4),
'n_cross_year_byte_identical_pairs': n_cross_year,
}
def cross_firm_dual_convergence(conn):
"""Codex item #8: per-signature dual-descriptor convergence by firm."""
cur = conn.cursor()
cur.execute("""
SELECT
CASE WHEN excel_firm = ? THEN 'Firm A' ELSE 'Non-Firm-A' END
AS firm_group,
COUNT(*) AS n_signatures_above_095,
SUM(CASE WHEN min_dhash_independent <= 5 THEN 1 ELSE 0 END)
AS n_dhash_le_5
FROM signatures
WHERE max_similarity_to_same_accountant > 0.95
AND assigned_accountant IS NOT NULL
AND min_dhash_independent IS NOT NULL
GROUP BY firm_group
ORDER BY firm_group
""", (FIRM_A,))
rows = cur.fetchall()
by_group = {}
for firm_group, n_above, n_dhash in rows:
by_group[firm_group] = {
'n_signatures_above_cosine_095': n_above,
'n_dhash_indep_le_5': n_dhash,
'pct_dhash_indep_le_5': round(100.0 * n_dhash / n_above, 2),
}
return {
'definition': (
'Per-signature best-match cosine > 0.95 AND assigned_accountant '
'IS NOT NULL AND min_dhash_independent IS NOT NULL. The reported '
'percentage is the share of these signatures whose independent '
'min dHash to any same-CPA signature is <= 5.'
),
'unit_of_observation': 'signature',
'cosine_threshold': 0.95,
'dhash_indep_threshold': 5,
'by_firm_group': by_group,
}
def write_markdown(payload, path):
bid = payload['byte_identity_decomposition']
cf = payload['cross_firm_dual_convergence']
lines = []
lines.append('# Byte-Identity Decomposition + Cross-Firm Dual-Descriptor '
'Convergence')
lines.append('')
lines.append(f"Generated at: {payload['generated_at']}")
lines.append('')
lines.append('## 1. Byte-Identity Decomposition (Firm A)')
lines.append('')
lines.append(bid['definition'])
lines.append('')
lines.append('| Quantity | Value |')
lines.append('|----------|-------|')
lines.append(f"| Pixel-identical Firm A signatures | "
f"{bid['n_pixel_identical_firm_a_signatures']} |")
lines.append(f"| Distinct Firm A partners with at least one such pair | "
f"{bid['n_distinct_partners_with_pixel_identical']} |")
lines.append(f"| Registered Firm A partners | "
f"{bid['n_registered_partners_in_firm_a']} |")
lines.append(f"| Partner coverage share | "
f"{bid['partner_coverage_share']:.3f} |")
lines.append(f"| Pairs whose byte-identical match spans different fiscal "
f"years | {bid['n_cross_year_byte_identical_pairs']} |")
lines.append('')
lines.append('## 2. Cross-Firm Dual-Descriptor Convergence')
lines.append('')
lines.append(cf['definition'])
lines.append('')
lines.append('| Firm group | N signatures with cosine > 0.95 | '
'N with dHash_indep <= 5 | % with dHash_indep <= 5 |')
lines.append('|------------|--------------------------------:|'
'------------------------:|------------------------:|')
for grp in ('Firm A', 'Non-Firm-A'):
g = cf['by_firm_group'][grp]
lines.append(f"| {grp} | "
f"{g['n_signatures_above_cosine_095']:,} | "
f"{g['n_dhash_indep_le_5']:,} | "
f"{g['pct_dhash_indep_le_5']:.2f}% |")
path.write_text('\n'.join(lines) + '\n', encoding='utf-8')
def main():
conn = sqlite3.connect(DB)
try:
payload = {
'generated_at': datetime.now().isoformat(timespec='seconds'),
'database_path': DB,
'firm_a_label': FIRM_A,
'byte_identity_decomposition': byte_identity_decomposition(conn),
'cross_firm_dual_convergence': cross_firm_dual_convergence(conn),
}
finally:
conn.close()
json_path = OUT / 'byte_identity_decomposition.json'
json_path.write_text(json.dumps(payload, indent=2, ensure_ascii=False),
encoding='utf-8')
print(f'Wrote {json_path}')
md_path = OUT / 'byte_identity_decomposition.md'
write_markdown(payload, md_path)
print(f'Wrote {md_path}')
if __name__ == '__main__':
main()