pdf_signature_extraction

Author	SHA1	Message	Date
gbanyan	1e8466f7a8	Paper A v4.3: unify Firm A positioning to out-of-sample templated-end target Finishes the BCD re-anchor chassis (audit critique #1): Firm A was inconsistently framed as both a "within-Big-4 case study" and an "out-of-sample target". Harmonised to a single label, "out-of-sample templated-end target" (held out of the calibration negative anchor; scored against the normative Firms-B/C/D baseline), across: - §I contribution #3 (title + body) - §III-H.2 (opening trio BCD/Firm-A/non-Big-4; sub-header; role sentence) - §V-C body (removed the dual case-study/out-of-sample phrasing) (§V-C header already fixed in ac3372d.) Zero "case study" wording remains; no numbers changed. codex gpt-5.5 focused check: all consistency items PASS, no new findings. Also restore the BCD+non-Big-4 joint ICCR Wilson CI [0.000001, 0.000015] to the §IV-M Table XXI note (three-scope CI symmetry; the one MINOR completeness gap surfaced by a codex old-vs-new content diff, which otherwise confirmed no substantive content was dropped by the trim). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 02:11:09 +08:00
gbanyan	ac3372d2d2	Paper A v4.3: restructure §III baseline-first + deep trim (audit #3/#4) #3 Reorder §III around "establish normative baseline → show who deviates": new order A–O with I=Normative Baseline + inter-CPA coincidence floor (old L.0–L.3), J=Firm-Level Deviation (old L.4+L.6), K=Why the distribution gives no threshold (old §I distributional + L.5), L=K=3 partition (old §J), M=Convergent checks (old §K), N=limits (old §M), O=data source (old §N). ~170 cross-refs remapped (two-pass tokenized), incl. 5 spelled-out "Section III-X" refs. #4 Deep trim: §III 10,960→8,461w (−23%). Removed §III↔§IV-M and §III↔§IV-F table duplication (Results keeps canonical tables; §III keeps method+headline+pointer); condensed distributional diagnostics; consolidated repeated caveat. No locked number changed. Also: §V-C header "Case Study"→"Out-of-Sample Target"; abstract 251→250w; housekeeping (rm superseded draft_section_L_bcd.md + v4.0 pandoc docx, remove stale OCR/handoff docs, gitignore .serena/). codex gpt-5.5 review: 0 BLOCKER / 3 MAJOR / 3 MINOR; 3 MAJOR fixed (§III-J.2 observed-vs-counterfactual transition, §III-M table pointers κ→XI/pixel→XIV, §III-N stale tightening figures); 2nd pass clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 01:49:09 +08:00
gbanyan	17156516a0	§III-L.0: reword e-signature-adoption rationale as industry background (no interview citation) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 21:52:05 +08:00
gbanyan	1eb323e959	Paper A v4.2: re-anchor primary calibration to clean BCD 2013-2019 baseline - Restrict the calibration negative anchor to Firms B/C/D, fiscal years 2013-2019 (pre-electronic-signature hand-signing period); B/C/D adopted e-signing post-2020 at staggered times, so 2013-2019 is the construct-clean baseline. Firm A scored across its full 2013-2023 record against it. - New locked numbers (codex-audited, Scripts 54/55): per-comparison HC floor 0.000010; per-signature HC floor 0.0059 [boot 0.0045-0.0073]; per-document HC 0.0117 / HC+MC 0.1753; per-firm HC+MC B 0.162 / C 0.225 / D 0.089. Firm A observed 0.817 = ~139x the clean floor (was ~70x on all-period BCD); Firm A out-of-sample vs clean pool 0.0001 (below floor -> never resembles genuine hand-signing). BCD 2020+ robustness: per-sig 0.0105, per-comparison 0.000036 (~2x pre-2020) quantifies the e-signing contamination. - Propagated through abstract / Sec. I / III-L / IV-M / V / conclusion; 0.837 crossover kept corpus-wide; ABCD retained as contamination comparison. - Grounded the 2013-2019 choice on data (floor drift) + e-sign-adoption background, not on in-text interview claims (double-blind). - Add Scripts 54 (temporal floor stability) and 55 (BCD 2013-2019 primary calibration + Firm A scoring). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 21:30:06 +08:00
gbanyan	3c7fcc010f	Paper A v4.1: BCD-baseline reframe + screening positioning + trim - Re-anchor inter-CPA coincidence-rate (ICCR) calibration on a normative non-Firm-A baseline (Firms B/C/D); Firm A held out as an out-of-sample target. Locked canonical numbers (codex-audited; Scripts 46/52/53): per-comparison HC 0.00014->0.000018, per-signature HC 0.0116, per-document HC+MC 0.34->0.1905; KDE crossover 0.837 retained corpus-wide. - Reposition as an operator-tunable, semi-automated screening/triage framework (title -> "Automated Screening..."): HC = high-specificity operating point; MC band demoted to low-specificity advisory; Firm A = demonstration that the screening surfaces a templated end, audit-quality implications deferred. - Apply codex prose-review fixes: triage-neutral five-way labels, soften mechanism/specificity wording, supersede MC claim-strength, update stale Appendix script references (40b/43/45 -> 46/52/53). - Trim pass: compress Sec. V discussion + Sec. III echoes (27.7k -> 26.8k words); no substantive content removed. - Add analysis scripts 45-53 (firm-year trends; BCD-only ICCR recompute; canonical-sampler locked numbers; Firm-A out-of-sample; BCD regression + cross-firm hit matrix). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 19:35:10 +08:00
gbanyan	becce857e1	Phase 6 round-7 codex 3-axis review fixes: 11 MAJOR + 5 MINOR Codex GPT-5.5 3-axis peer review (paper/codex_review_gpt55_v4_round_3axis.md) identified 11 MAJOR + 5 MINOR + 0 BLOCKER on three axes: (1) abstract/body tone consistency, (2) methodology clarity / v3 residue, (3) no implicit within-CPA or cross-year signature-consistency assumptions. 13 patches applied across 4 source files; mirrored in paper_a_v4_combined.md. Axis 1 (tone consistency between abstract and body): - S I L33: "resolves the ambiguity" -> "provides complementary evidence for screening cases where ... hypotheses diverge" - S I L35: "disproves the distributional-threshold path" -> "does not support the distributional-threshold path" - S I L37 / S V-F L29: "characterise the deployed five-way classifier at three units" -> "characterise the deployed HC sub-rule and document-level HC+MC alarm derived from the five-way classifier at three units" (consistent with S V-H which says only HC sub-rule and HC+MC alarm are re-characterised by the present ICCR battery) - S I L39 / S V-C / S III-L.4: "consistent with firm-specific template, stamp, or document-production reuse mechanisms" -> "consistent with -- but does not independently establish -- firm-level template-like reuse, digitisation-pipeline homogeneity, or signing-style homogeneity, which descriptor-only data cannot separate (S V-H)" (mirrors abstract) Axis 2 (methodology clarity / v3 residue): - S III-G: added unit-bridge sentence distinguishing "descriptor-summary units" (signature/accountant) from "operational reporting units" (per-comparison/per-signature/per-document, S III-L) - S III-H.2: "The calibration distinguishes two reference populations" -> "The supporting diagnostics use two reference populations" with explicit "neither is the calibration anchor" - S III-L.1: "specificity" -> "ICCR refinement" - S III-L.2: added "descriptive intuition, not an independence assumption used for estimation" caveat after the 1-(1-p)^n form Axis 3 (no implicit signature-consistency assumptions): - S III-F: hand-signing motivation rewritten as working hypothesis that "the classifier does not require ... to hold for all CPAs" - S III-G A1: added "A1 does not assume temporal stability of handwriting or scanning workflow within or across years" - S III-H.1: added label-caveat paragraph (operational rule outputs, not validated ground-truth classes); HC "strong replication evidence" -> "image-similarity evidence consistent with replication"; HSC "consistent with a CPA who signs very consistently" -> "mechanism not resolved by descriptor data alone"; LH explicitly owns that cross-year handwriting drift, scanner workflow change, or template variant rotation can also yield low max-cosine within a same-CPA pool - S III-L.6 / S IV-M.6: "same-CPA repeatability signal" -> "observed same-CPA-pool excess ... not attributed to within-CPA handwriting repeatability" Deferred (structural, not single-sentence patch): codex S III-I.2 / S III-J K=2/K=3 deduplication; codex S III-K LOOO / S III-J duplication. Both are MINOR stylistic redundancies, not reviewer-rejection risks. DOCX rebuilt via export_v3.py; v4.0_20260515 file refreshed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 03:11:53 +08:00
gbanyan	3672c9343e	Phase 6 round-6: soften firm-heterogeneity framing, fix DOCX table render Framing softening (per partner tone decision: own the limitation rather than defend the strong claim). Abstract: "Firm heterogeneity is decisive ... consistent with firm-level template-like reuse" -> "The framework surfaces pronounced firm-level heterogeneity ... consistent with firm-level template-like reuse but not independently diagnostic, since descriptor-only data cannot separate reuse from digitisation-pipeline or signing-style homogeneity within a firm; we report it as a scope limitation rather than a mechanism finding." S V-H Limitations: new bullet "Mechanism attribution for the firm-level heterogeneity is not identifiable from descriptor-only data." enumerates three non-mutually-exclusive firm-level mechanisms (template-like reuse / digitisation-pipeline homogeneity / signing-style homogeneity), notes the (cosine, dHash) descriptor pair is by construction indifferent to which mechanism generated a near-identical pair, and lists what additional data would be needed for attribution. S VI Conclusion items (3) and (4): "firm heterogeneity quantification" -> "firm-level heterogeneity surfaced by the framework ... reported as a framework-discriminative observation rather than a mechanism finding"; item (4) expanded from template/stamp/document-production reuse alone to the three-mechanism scope, with explicit "not independently establishing" and S V-H cross-reference. DOCX export fix (export_v3.py): add missing LaTeX-to-Unicode tokens (\checkmark, \lvert/\rvert, \lVert/\rVert, \in, \notin, \max, \min, \log, \ln, \exp, \bullet) that were silently dropping content from Table III rows 2-4 (integer-jitter robustness check marks empty) and Table XVIII drift column (\|Delta\| empty). Rebuild Paper_A_IEEE_Access_Draft_v3.docx via export_v3.py and install copy as Paper_A_IEEE_Access_Draft_v4.0_20260515.docx (replaces prior pandoc-built v4 DOCX which had empty cells in every table header with LaTeX math and inconsistent column widths). All 43 tables now have non-empty cells with sub/superscript runs. Mirrored in paper_a_v4_combined.md for consistency with the single-file combined source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 02:16:05 +08:00
gbanyan	4efbb7f4b8	Phase 6 round-5 codex-review fixes: minor + nit cleanup Codex round-4 verification (commit `bbb662a`) disposition: Minor revision. All prior blockers and majors confirmed resolved. Three small items remained. MINOR (1 of 2 addressed in markdown source): - Appendix rendered AFTER References (combined L1132 vs L1227), but IEEE convention places appendices BEFORE references. Swapped concatenation order in combined-file regeneration: abstract -> intro -> related_work -> methodology -> results -> discussion -> conclusion -> APPENDIX -> REFERENCES -> declarations -> impact. Combined file now has Appendix A at L1132 and References at L1194. MINOR (deferred to typesetting): - Table A.II is prose-heavy for IEEE double-column layout. This is a table-formatting concern for the LaTeX/DOCX export step (table*, smaller font, or column-break adjustments), not a markdown-source issue. Documenting as a known typesetting consideration for the export pipeline. NIT: - Table A.II referenced "§IV-M.4 footnote" but the content at §IV-M.4 L1007 is inline prose, not a footnote. Changed to "(§IV-M.4)". Artefacts: - Combined manuscript regenerated: paper_a_v4_combined.md, 1316 lines. - Appendix A.1 (BD/McCrary) + A.2 (Diagnostic Summary) precede References. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 20:05:24 +08:00
gbanyan	bbb662acd1	Phase 6 round-4 codex-review fixes: blocker + 2 majors + minors Codex round-3 verification (commit `9e68f2e`) flagged remaining items: BLOCKER: - Appendix A Table A.I was inside an HTML comment but visible prose at L1268 and L1275 referenced it as if it rendered. Un-commented Table A.I (same fix pattern as Tables I-IV in round-3). MAJOR: - Table XXVII (diagnostic summary) appeared in §III-M at L593 before Tables III-XXVI in §IV, breaking sequential IEEE-style numbering. Moved the table to Appendix A as Table A.II (new §A.2 "Diagnostic Summary"). §III-M now references "Appendix A Table A.II" instead of hosting the table inline; §VI Conclusion contribution (8) updated similarly. Appendix A heading generalised to "Supplementary Diagnostic Detail" with §A.1 (BD/McCrary) and §A.2 (Diagnostic Summary). - §III-M L614 rate-definition conflation: the sentence "per-signature and per-document rates ($0.11$ and $0.34$ respectively under the deployed any-pair HC + MC alarm)" mixed unit labels. Rewrote to label each rate by its actual unit and source table: per-signature any-pair HC ICCR ($0.11$; Table XXII) and per-document HC+MC alarm-rate ICCR ($0.34$; Table XXIII). MINORS: - L400 §III-L stale ref retargeted: "operational signature-level classifier of §III-L" -> "of §III-H.1 calibrated in §III-L". - L663 §III-L stale ref retargeted: "KDE crossover used in per-document classification (Section III-L)" -> "(Section III-H.1)". - L1146 Conclusion "corpus-universal" overstated generality -> "across the tested eligible scopes" (non-Big-4 evidence is cosine-axis only, not full calibration scope). - L1024 Results §IV-M.4 footnote: np.argmax / np.unique implementation detail moved to "alphabetically-ordered tie-break ... full implementation detail in the supplementary materials" (less script-internal prose). Artefacts: - Combined manuscript regenerated: paper_a_v4_combined.md, 1316 lines. - Final main-text table sequence: I, II, III, ..., XXVI (26 tables, all sequential in rendered order). Appendix tables: A.I, A.II. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 19:59:56 +08:00
gbanyan	9e68f2e1d3	Phase 6 round-3 codex-review fixes: blockers + majors + minors Resolved Codex review (gpt-5.5 xhigh) findings against `b6913d2`. BLOCKERS: - Appendix B reference mismatch: rewrote all main-text "Appendix B" references to "supplementary materials" since Appendix B is now a redirect stub. Affected the SSIM design-argument pointer, threshold provenance, byte-level decomposition, MC band capture-rate, and backbone-ablation table references across §III-F / §III-H.1 / §III-H.2 / §III-K / §III-L.4 / §III-M / §IV-F / §IV-J / §IV-K / §IV-L / §V-C / §V-H. - Table rendering: un-commented Tables I-IV (Dataset Summary, YOLO Detection, Extraction Results, Cosine Distribution Statistics) which were inside HTML comment blocks and would not have rendered in the submission. - Table numbering out of order: Table XIX appeared before Tables XVI-XVIII. Renumbered XIX -> XVI (document-level worst-case counts), XVI -> XVII (Firm x K=3 cross-tab), XVII -> XVIII (K=3 component comparison), XVIII -> XIX (Spearman correlation). Cross-references updated in §IV-J / §IV-K and §V-C. - Table V mis-citation: §IV-C said "KDE crossover ... (Table V)" but Table V is the dip test. Dropped the (Table V) tag; crossover is a textual finding. - Submission cleanup: wrapped the archived Impact Statement section heading and body inside the existing HTML comment (was rendering). Funding placeholder wrapped in HTML comment with a TO-DO note (won't render but is preserved as reminder). MAJORS: - Line 1077 numerical conflation: rewrote the §V-C / §III-L.4 paragraph that labelled Firm A's per-document HC+MC inter-CPA proxy ICCR of 0.6201 as a rate "on real same-CPA pools." 0.6201 is a counterfactual proxy under inter-CPA candidate-pool replacement, not the observed rate. Added explicit disambig: the corresponding observed rate from Table XVI (formerly XIX) is 97.5% HC+MC for Firm A; the proxy and observed rates measure different quantities. - Residual "validation" language softened: "Dual-descriptor verification" -> "Dual-descriptor similarity"; "we validate the backbone choice" -> "we support the backbone choice"; "pixel-identity validation" -> "pixel-identity positive-anchor check"; "## M. Validation Strategy and Limitations under Unsupervised Setting" -> "## M. Unsupervised Diagnostic Strategy and Limits". - "Specificity behaviour" overclaim: "characterises the cosine threshold's specificity behaviour" -> "specificity-proxy behaviour" (methodology §III-L.0 and discussion §V-F). - "Prior published / prior calibration" ambiguity: replaced "prior published per-comparison rate" with "the corpus-wide rate reported in §IV-I"; replaced "(prior published operating point)" with "(alternative operating point from supplementary calibration evidence)" in Tables XXI; replaced "prior reporting and the existing literature" with "the existing literature and the supplementary calibration evidence." MINORS: - Line 116 Bayes-optimal qualifier: "the local density minimum ... is the Bayes-optimal decision boundary under equal priors" -> "In idealized two-class mixture settings with equal priors and equal misclassification costs, the local density minimum ... coincides with the Bayes-optimal decision boundary." - Stale section refs: §V-G for the fine-tuning caveat retargeted to §V-H Engineering-level caveats (where it lives after the §V-H reorganisation); §III-L for the worst-case rule retargeted to §III-H.1; "Section IV-D.2" (nonexistent) retargeted to "Section IV-D Table VI." - Abstract / Introduction "after pool-size adjustment": separated the document-level D2 proxy ICCR claim from the per-signature logistic regression claim. Now: "Per-document D2 inter-CPA proxy ICCRs differ by an order of magnitude across firms ... a per-signature logistic regression confirms the firm gap persists after pool-size control." NIT: - Related Work HTML comment "(see paper_a_references_v3.md for full list)" -> "(full list in the References section)"; removes the version-coded filename reference from the source. Artefacts: - Combined manuscript regenerated: paper_a_v4_combined.md, 1312 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:28:14 +08:00
gbanyan	b6913d2f93	Phase 6 round-2 reviewer revisions: §III-H.1 promotion + framing alignment Structural: - Promote operational classifier definition from §III-L.0 to new §III-H.1, so the reader meets the five-way HC/MC/HSC/UN/LH rule before the §III-I/J/K diagnostic chain instead of ~130 lines after. §III-L renamed to "Anchor-Based Threshold Calibration"; §III-L.0 retains only calibration methodology, three units of analysis, any-pair semantics, and the FAR terminological note. §III-L.7 deleted (redundant with §III-J). - Reorganise §V-H Limitations into Primary / Secondary / Documented features / Engineering groupings (was a flat 14-item list). - Reframe §III-M from "ten-tool unsupervised-validation collection" to "each diagnostic addresses one specific unsupervised failure mode"; rename "What v4.0 does/does not claim" → "Limits / Scope of the present analysis"; retitle Table XXVII. Framing alignment (cross-section): - Strip all v3.x / v4.0 / v3.20 / v4-new / inherited lineage labels from rendered text (Abstract, Intro, §II, §III, §IV, §V, §VI, Appendix, Impact). - Replace "Paper A" rule references with "deployed" rule references. - Soften "validation" to "characterise" / "check" / "screening label" / "consistency check" / "support"; "verdict" → "screening label". - Remove codex-verified spike claims (non-Big-4 jittered dHash, Big-4 pooled cosine after firm-mean centring). Only formally scripted evidence (Scripts 39b–39e) retained; non-Big-4 evidence framed as corroborating raw-axis cosine, not as calibration evidence. - Strip script-provenance parentheticals from Introduction; defer Script 39c internal references and similar to Methodology / Appendix. Numerical / table fixes: - §III-C document-count arithmetic: 12 corrupted → 13 corrupted/unreadable, verified against sqlite DB and total-pdf/ folder counts (90,282 - 4,198 no-sig - 13 corrupted = 86,071 → 85,042 with detections → 182,328 sigs → 168,755 CPA-matched). Table I shows VLM-positive (86,084) and processed-for-extraction (86,071) as separate rows. - Wilson 95% CIs added for joint-rule ICCR rows in Table XXI / methodology table ([0.00011, 0.00018] and [0.00008, 0.00014]). - Unit error fixed: 0.3856 pp / 0.4431 pp → 0.3856 (38.6 pp) / 0.4431 (44.3 pp). Smaller revisions: - Pipeline framing: "detecting" → "screening" in Abstract / Intro / Conclusion for consistency with the unsupervised-screening positioning. - "hard ground-truth subset" → "conservative hard-positive subset" throughout. - §III-F SSIM / pixel-comparison rebuttal compressed from ~15 lines to 4; design-level argument deferred to supplementary materials. - "stakeholders can adopt / can derive thresholds" → "alternative operating points can be characterised by inverting" (less prescriptive). - "the same mechanism extending in milder form to Firms B/C/D" → "similar, milder production-related reuse patterns at Firms B/C/D" (mechanism claim softened). - Appendix A "non-hand-signed mode" / "two-mechanism mixture" lineage language aligned with v4 framing. Appendix B: - Rebuilt as a redirect-only stub. The HTML-commented obsolete table mapping (Table IX–XVIII labels with FAR / capture-rate / validation language) is removed; replaced with a short paragraph pointing to supplementary materials for full table-to-script provenance. Cross-references: - All §III-L references for the rule definition retargeted to §III-H.1; references for calibration still point to §III-L. - §III-H references for byte-level Firm A evidence / non-Big-4 reverse anchor retargeted to §III-H.2. Artefacts: - Combined manuscript regenerated: paper_a_v4_combined.md, 1314 lines (was 1346 pre-review). - Two review handoff documents added: paper/review_handoff_abstract_intro_20260515.md paper/review_handoff_body_20260515.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:07:31 +08:00

11 Commits