Paper A v3.18.2: address codex GPT-5.5 round-16 Minor-Revision findings

Codex independent peer review (paper/codex_review_gpt55_v3_18_1.md) audited empirical claims against scripts/JSON reports rather than rubber-stamping prior Accept verdicts. Verdict: Minor Revision. This commit addresses every flagged item. - Soften mechanism-identification language (Results IV-D.1, Discussion B): per-signature cosine "fails to reject unimodality" rather than "reflects a single dominant generative mechanism"; framing tied to joint evidence. - Replace overabsolute "single stored image" with multi-template phrasing in Introduction and Methodology III-A. - Reframe Methodology III-H so practitioner knowledge is non-load-bearing; evidentiary basis is the paper's own image evidence. - Fix stale section cross-references after the v3.18 retitling: IV-F.* -> IV-G.* in 11 locations across methodology and results. - Fix 0.941 / 0.945 / 0.9407 wording in Methodology III-K to use the calibration-fold P5 = 0.9407 and the rounded sensitivity cut 0.945. - Soften "sharp discontinuity" in Results IV-G.3 to "23-28 percentage-point gap consistent with firm-wide non-hand-signing practice". - Soften Conclusion's "directly generalizable" with explicit conditions on analogous anchors and artifact-generation physics. - Add Appendix B: table-to-script provenance map (15 manuscript tables mapped to generating scripts and JSON report artifacts). - New script signature_analysis/28_byte_identity_decomposition.py produces reproducible artifacts for two previously-unverified claims: (a) 145 / 50 / 180 / 35 Firm A byte-identity decomposition (verified); (b) cross-firm dual-descriptor convergence -- corrected from the previous manuscript text "non-Firm-A 11.3% vs Firm A 58.7% (5x)" to the database-verified "non-Firm-A 42.12% vs Firm A 88.32% (~2.1x)". - Clarify scripts 19 / 21 docstrings: legacy EER / FRR / Precision / F1 helpers are retained for diagnostic use only and are NOT cited as biometric performance in the paper. Remove "interview evidence" wording. - Rebuild Paper_A_IEEE_Access_Draft_v3.docx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:23:08 +08:00
parent cb77f481ec
commit 4bb7aa9189
9 changed files with 299 additions and 53 deletions
@@ -8,39 +8,40 @@ occurring reference populations instead of manual labels:
  Positive anchor 1:  pixel_identical_to_closest = 1
      Two signature images byte-identical after crop/resize.
      Mathematically impossible to arise from independent hand-signing
-      => absolute ground truth for replication.
+      => pair-level proof of image reuse and a CONSERVATIVE-SUBSET
+      ground truth for non-hand-signing (only those whose nearest
+      same-CPA match happens to be byte-identical).

-  Positive anchor 2:  Firm A (Deloitte) signatures
-      Interview evidence from multiple Firm A accountants confirms that
-      MOST use replication (stamping / firm-level e-signing) but a
-      MINORITY may still hand-sign. Firm A is therefore a
-      "replication-dominated" population (not a pure one). We use it as
-      a strong prior positive for the majority regime, while noting that
-      ~7% of Firm A signatures fall below cosine 0.95 consistent with
-      the minority hand-signers. This matches the long left tail
-      observed in the dip test (Script 15) and the Firm A members who
-      land in C2 (middle band) of the accountant-level GMM (Script 18).
+  Positive anchor 2:  Firm A signatures
+      Treated in the manuscript as a REPLICATION-DOMINATED population
+      based on the paper's own image evidence: the byte-level pair
+      analysis, the Firm A per-signature similarity distribution, the
+      partner-ranking concentration, and the intra-report consistency
+      gap. Approximately 7% of Firm A signatures fall below cosine
+      0.95, forming the long left tail observed in the dip test
+      (Script 15).

  Negative anchor:    signatures with cosine <= low threshold
      Pairs with very low cosine similarity cannot plausibly be pixel
-      duplicates, so they serve as absolute negatives.
+      duplicates, so they serve as a conservative supplementary
+      negative reference.

-Metrics reported:
-  - FAR/FRR/EER using the pixel-identity anchor as the gold positive
-    and low-similarity pairs as the gold negative.
-  - Precision/Recall/F1 at cosine and dHash thresholds from Scripts
-    15/16/17/18.
+Metrics computed (legacy; NOT all reported in the manuscript):
+  - FAR against the inter-CPA negative anchor is the primary metric
+    reported (Table X). The byte-identical positive anchor has cosine
+    ~= 1 by construction, so FRR / EER / Precision / F1 against that
+    subset are arithmetic tautologies (FRR is trivially 0 below
+    threshold 1) and are intentionally OMITTED from Table X. Legacy
+    EER/FRR/precision/F1 helper functions remain in this script for
+    diagnostic use only and their outputs are NOT cited as biometric
+    performance in the paper.
  - Convergence with Firm A anchor (what fraction of Firm A signatures
    are correctly classified at each threshold).

-Small visual sanity sample (30 pairs) is exported for spot-check, but
-metrics are derived entirely from pixel and Firm A evidence.
-
 Output:
  reports/pixel_validation/pixel_validation_report.md
  reports/pixel_validation/pixel_validation_results.json
  reports/pixel_validation/roc_cosine.png, roc_dhash.png
-  reports/pixel_validation/sanity_sample.csv
 """

 import sqlite3
@@ -2,26 +2,39 @@
 """
 Script 21: Expanded Validation with Larger Negative Anchor + Held-out Firm A
 ============================================================================
-Addresses codex review weaknesses of Script 19's pixel-identity validation:
+Addresses three weaknesses of Script 19's pixel-identity validation:

  (a) Negative anchor of n=35 (cosine<0.70) is too small to give
      meaningful FAR confidence intervals.
-  (b) Pixel-identical positive anchor is an easy subset, not
-      representative of the broader positive class.
-  (c) Firm A is both the calibration anchor and the validation anchor
-      (circular).
+  (b) Pixel-identical positive anchor is a CONSERVATIVE SUBSET of the
+      true non-hand-signed class, not representative of the broader
+      positive class. Recall against this subset is therefore a
+      lower-bound calibration check, not a generalizable recall
+      estimate.
+  (c) Firm A is both the calibration anchor and a validation anchor
+      (circular). The 70/30 fold split makes within-Firm-A sampling
+      variance visible without claiming external validation.

 This script:
  1. Constructs a large inter-CPA negative anchor (~50,000 pairs) by
     randomly sampling pairs from different CPAs. Inter-CPA high
     similarity is highly unlikely to arise from legitimate signing.
  2. Splits Firm A CPAs 70/30 into CALIBRATION and HELDOUT folds.
-     Re-derives signature-level / accountant-level thresholds from the
-     calibration fold only, then reports all metrics (including Firm A
-     anchor rates) on the heldout fold.
-  3. Computes proper EER (FAR = FRR interpolated) in addition to
-     metrics at canonical thresholds.
-  4. Computes 95% Wilson confidence intervals for each FAR/FRR.
+     Re-derives signature-level thresholds from the calibration fold
+     only, then reports capture rates on the heldout fold.
+  3. Computes 95% Wilson confidence intervals for FAR at canonical
+     thresholds (Table X in the manuscript).
+
+Legacy / diagnostic-only metrics:
+  Helper functions for EER, Precision, Recall, F1, and FRR remain in
+  this script for backward compatibility. The manuscript intentionally
+  OMITS these metrics from Table X because the byte-identical positive
+  anchor has cosine ~= 1 by construction (so FRR / EER are arithmetic
+  tautologies) and because positive and negative anchors are
+  constructed from different sampling units, making prevalence
+  arbitrary (so Precision and F1 have no meaningful population
+  interpretation). Only FAR against the large inter-CPA negative
+  anchor is reported as a biometric metric in the paper.

 Output:
  reports/expanded_validation/expanded_validation_report.md
@@ -0,0 +1,204 @@
+#!/usr/bin/env python3
+"""
+Script 28: Byte-Identity Decomposition + Cross-Firm Dual-Descriptor Convergence
+================================================================================
+Produces two reproducible artifacts cited in the manuscript that previously
+lacked dedicated provenance (codex review v3.18.1 items #7 and #8):
+
+  (#7) Byte-identical Firm A signature decomposition:
+       - Total Firm A signatures with pixel_identical_to_closest = 1
+       - Number of distinct Firm A partners they span
+       - Number of partners in the registry (denominator)
+       - Number of byte-identical pairs that span DIFFERENT fiscal years
+
+  (#8) Cross-firm dual-descriptor convergence:
+       - Among signatures with cosine > 0.95 (per-signature best-match),
+         the fraction with min_dhash_independent <= 5, broken out by
+         Firm A vs Non-Firm-A.
+
+Output:
+  /Volumes/NV2/PDF-Processing/signature-analysis/reports/byte_identity_decomp/
+      byte_identity_decomposition.json
+      byte_identity_decomposition.md
+
+These figures are intended to be cited from the paper (Section IV-F.1 for #7;
+Section IV-H.2 for #8) so that every quantitative claim in the manuscript
+traces to a specific JSON field.
+"""
+
+import json
+import sqlite3
+from datetime import datetime
+from pathlib import Path
+
+DB = '/Volumes/NV2/PDF-Processing/signature-analysis/signature_analysis.db'
+OUT = Path('/Volumes/NV2/PDF-Processing/signature-analysis/reports/'
+           'byte_identity_decomp')
+OUT.mkdir(parents=True, exist_ok=True)
+
+FIRM_A = '勤業眾信聯合'
+
+
+def byte_identity_decomposition(conn):
+    """Codex item #7: 145 / 50 / 180 / 35 decomposition."""
+    cur = conn.cursor()
+
+    cur.execute("""
+        SELECT COUNT(DISTINCT name)
+        FROM accountants
+        WHERE firm = ?
+    """, (FIRM_A,))
+    n_registered_partners = cur.fetchone()[0]
+
+    cur.execute("""
+        WITH byte_pairs AS (
+          SELECT s1.signature_id AS sig_a,
+                 s1.assigned_accountant AS partner,
+                 s1.year_month AS ym_a,
+                 s2.year_month AS ym_b
+          FROM signatures s1
+          JOIN signatures s2 ON s1.closest_match_file = s2.image_filename
+          WHERE s1.pixel_identical_to_closest = 1
+            AND s1.excel_firm = ?
+        )
+        SELECT
+          COUNT(*) AS total_pixel_identical_firm_a,
+          COUNT(DISTINCT partner) AS partners_with_pixel_identical,
+          SUM(CASE WHEN substr(ym_a,1,4) <> substr(ym_b,1,4) THEN 1 ELSE 0 END)
+            AS cross_year_pairs
+        FROM byte_pairs
+    """, (FIRM_A,))
+    n_total, n_partners, n_cross_year = cur.fetchone()
+
+    return {
+        'definition': (
+            'Among Firm A signatures whose nearest same-CPA match is '
+            'byte-identical after crop and normalization '
+            '(pixel_identical_to_closest = 1), this section reports the '
+            'count, the distinct-partner spread, the registry denominator, '
+            'and the subset whose byte-identical match is in a different '
+            'fiscal year.'
+        ),
+        'firm_label': 'Firm A',
+        'n_pixel_identical_firm_a_signatures': n_total,
+        'n_distinct_partners_with_pixel_identical': n_partners,
+        'n_registered_partners_in_firm_a': n_registered_partners,
+        'partner_coverage_share': round(n_partners / n_registered_partners, 4),
+        'n_cross_year_byte_identical_pairs': n_cross_year,
+    }
+
+
+def cross_firm_dual_convergence(conn):
+    """Codex item #8: per-signature dual-descriptor convergence by firm."""
+    cur = conn.cursor()
+
+    cur.execute("""
+        SELECT
+          CASE WHEN excel_firm = ? THEN 'Firm A' ELSE 'Non-Firm-A' END
+            AS firm_group,
+          COUNT(*) AS n_signatures_above_095,
+          SUM(CASE WHEN min_dhash_independent <= 5 THEN 1 ELSE 0 END)
+            AS n_dhash_le_5
+        FROM signatures
+        WHERE max_similarity_to_same_accountant > 0.95
+          AND assigned_accountant IS NOT NULL
+          AND min_dhash_independent IS NOT NULL
+        GROUP BY firm_group
+        ORDER BY firm_group
+    """, (FIRM_A,))
+
+    rows = cur.fetchall()
+    by_group = {}
+    for firm_group, n_above, n_dhash in rows:
+        by_group[firm_group] = {
+            'n_signatures_above_cosine_095': n_above,
+            'n_dhash_indep_le_5': n_dhash,
+            'pct_dhash_indep_le_5': round(100.0 * n_dhash / n_above, 2),
+        }
+
+    return {
+        'definition': (
+            'Per-signature best-match cosine > 0.95 AND assigned_accountant '
+            'IS NOT NULL AND min_dhash_independent IS NOT NULL. The reported '
+            'percentage is the share of these signatures whose independent '
+            'min dHash to any same-CPA signature is <= 5.'
+        ),
+        'unit_of_observation': 'signature',
+        'cosine_threshold': 0.95,
+        'dhash_indep_threshold': 5,
+        'by_firm_group': by_group,
+    }
+
+
+def write_markdown(payload, path):
+    bid = payload['byte_identity_decomposition']
+    cf = payload['cross_firm_dual_convergence']
+
+    lines = []
+    lines.append('# Byte-Identity Decomposition + Cross-Firm Dual-Descriptor '
+                 'Convergence')
+    lines.append('')
+    lines.append(f"Generated at: {payload['generated_at']}")
+    lines.append('')
+
+    lines.append('## 1. Byte-Identity Decomposition (Firm A)')
+    lines.append('')
+    lines.append(bid['definition'])
+    lines.append('')
+    lines.append('| Quantity | Value |')
+    lines.append('|----------|-------|')
+    lines.append(f"| Pixel-identical Firm A signatures | "
+                 f"{bid['n_pixel_identical_firm_a_signatures']} |")
+    lines.append(f"| Distinct Firm A partners with at least one such pair | "
+                 f"{bid['n_distinct_partners_with_pixel_identical']} |")
+    lines.append(f"| Registered Firm A partners | "
+                 f"{bid['n_registered_partners_in_firm_a']} |")
+    lines.append(f"| Partner coverage share | "
+                 f"{bid['partner_coverage_share']:.3f} |")
+    lines.append(f"| Pairs whose byte-identical match spans different fiscal "
+                 f"years | {bid['n_cross_year_byte_identical_pairs']} |")
+    lines.append('')
+
+    lines.append('## 2. Cross-Firm Dual-Descriptor Convergence')
+    lines.append('')
+    lines.append(cf['definition'])
+    lines.append('')
+    lines.append('| Firm group | N signatures with cosine > 0.95 | '
+                 'N with dHash_indep <= 5 | % with dHash_indep <= 5 |')
+    lines.append('|------------|--------------------------------:|'
+                 '------------------------:|------------------------:|')
+    for grp in ('Firm A', 'Non-Firm-A'):
+        g = cf['by_firm_group'][grp]
+        lines.append(f"| {grp} | "
+                     f"{g['n_signatures_above_cosine_095']:,} | "
+                     f"{g['n_dhash_indep_le_5']:,} | "
+                     f"{g['pct_dhash_indep_le_5']:.2f}% |")
+
+    path.write_text('\n'.join(lines) + '\n', encoding='utf-8')
+
+
+def main():
+    conn = sqlite3.connect(DB)
+    try:
+        payload = {
+            'generated_at': datetime.now().isoformat(timespec='seconds'),
+            'database_path': DB,
+            'firm_a_label': FIRM_A,
+            'byte_identity_decomposition': byte_identity_decomposition(conn),
+            'cross_firm_dual_convergence': cross_firm_dual_convergence(conn),
+        }
+    finally:
+        conn.close()
+
+    json_path = OUT / 'byte_identity_decomposition.json'
+    json_path.write_text(json.dumps(payload, indent=2, ensure_ascii=False),
+                         encoding='utf-8')
+    print(f'Wrote {json_path}')
+
+    md_path = OUT / 'byte_identity_decomposition.md'
+    write_markdown(payload, md_path)
+    print(f'Wrote {md_path}')
+
+
+if __name__ == '__main__':
+    main()