pdf_signature_extraction

Author	SHA1	Message	Date
gbanyan	f1c253768a	Paper A v3.18.3: address codex GPT-5.5 round-17 self-comparing review findings Codex round-17 (paper/codex_review_gpt55_v3_18_2.md) re-audited v3.18.2 and flagged three new issues introduced by the v3.18.2 edits themselves plus items it had partially RESOLVED but not fully cleaned up. Verdict still Minor Revision; this commit closes the new findings. - Fix Appendix B provenance paths: replace four fabricated paths (formal_statistical/, deloitte_distribution/, pdf_level/, ablation/) with the actual artifact paths verified in the local report tree. - Acknowledge that the report tree is at /Volumes/NV2/PDF-Processing/... and reviewers should rebase to their own report root rather than rely on absolute paths. - Remove residual "single dominant mechanism" wording from Methodology III-H (third primary evidence sentence) and Discussion V-C. - Fix Methodology III-H Hartigan dip-test parenthetical: "p = 0.17 at n >= 10 signatures" wrongly attached the accountant-level filter to the signature-level dip; corrected to "p = 0.17, N = 60,448 Firm A signatures". - Soften Introduction Firm A motivation: replace "widely recognized within the audit profession as making substantial use of non-hand-signing for the majority of its certifying partners" with a methodology-first framing that defers to the image evidence reported in the paper. - Soften Methodology III-H "widely held within the audit profession" wording (kept as motivation, marked clearly as non-load-bearing in the next sentence). - Reconcile 55,921 vs 55,922 Firm A cosine-only counts in Section IV-H.2: document explicitly that the one-record drift comes from successive DB snapshots used to materialize Table IX vs the new script-28 artifact; no rate at two decimal places is affected. - Rebuild Paper_A_IEEE_Access_Draft_v3.docx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:45:54 +08:00
gbanyan	4bb7aa9189	Paper A v3.18.2: address codex GPT-5.5 round-16 Minor-Revision findings Codex independent peer review (paper/codex_review_gpt55_v3_18_1.md) audited empirical claims against scripts/JSON reports rather than rubber-stamping prior Accept verdicts. Verdict: Minor Revision. This commit addresses every flagged item. - Soften mechanism-identification language (Results IV-D.1, Discussion B): per-signature cosine "fails to reject unimodality" rather than "reflects a single dominant generative mechanism"; framing tied to joint evidence. - Replace overabsolute "single stored image" with multi-template phrasing in Introduction and Methodology III-A. - Reframe Methodology III-H so practitioner knowledge is non-load-bearing; evidentiary basis is the paper's own image evidence. - Fix stale section cross-references after the v3.18 retitling: IV-F.* -> IV-G.* in 11 locations across methodology and results. - Fix 0.941 / 0.945 / 0.9407 wording in Methodology III-K to use the calibration-fold P5 = 0.9407 and the rounded sensitivity cut 0.945. - Soften "sharp discontinuity" in Results IV-G.3 to "23-28 percentage-point gap consistent with firm-wide non-hand-signing practice". - Soften Conclusion's "directly generalizable" with explicit conditions on analogous anchors and artifact-generation physics. - Add Appendix B: table-to-script provenance map (15 manuscript tables mapped to generating scripts and JSON report artifacts). - New script signature_analysis/28_byte_identity_decomposition.py produces reproducible artifacts for two previously-unverified claims: (a) 145 / 50 / 180 / 35 Firm A byte-identity decomposition (verified); (b) cross-firm dual-descriptor convergence -- corrected from the previous manuscript text "non-Firm-A 11.3% vs Firm A 58.7% (5x)" to the database-verified "non-Firm-A 42.12% vs Firm A 88.32% (~2.1x)". - Clarify scripts 19 / 21 docstrings: legacy EER / FRR / Precision / F1 helpers are retained for diagnostic use only and are NOT cited as biometric performance in the paper. Remove "interview evidence" wording. - Rebuild Paper_A_IEEE_Access_Draft_v3.docx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:23:08 +08:00
gbanyan	16e90bab20	Paper A v3.18: remove accountant-level + replication-dominated calibration + Gemini 2.5 Pro review minor fixes Major changes (per partner red-pen + user decision): - Delete entire accountant-level analysis (III.J, IV.E, Tables VI/VII/VIII, Fig 4) -- cross-year pooling assumption unjustified, removes the implicit "habitually stamps = always stamps" reading. - Renumber sections III.J/K/L (was K/L/M) and IV.E/F/G/H/I (was F/G/H/I/J). - Title: "Three-Method Convergent Thresholding" -> "Replication-Dominated Calibration" (the three diagnostics do NOT converge at signature level). - Operational cosine cut anchored on whole-sample Firm A P7.5 (cos > 0.95). - Three statistical diagnostics (Hartigan/Beta/BD-McCrary) reframed as descriptive characterisation, not threshold estimators. - Firm A replication-dominated framing: 3 evidence strands -> 2. - Discussion limitation list: drop accountant-level cross-year pooling and BD/McCrary diagnostic; add auditor-year longitudinal tracking as future work. - Tone-shift: "we do not claim / do not derive" -> "we find / motivates". Reference verification (independent web-search audit of all 41 refs): - Fix [5] author hallucination: Hadjadj et al. -> Kao & Wen (real authors of Appl. Sci. 10:11:3716; report at paper/reference_verification_v3.md). - Polish [16] [21] [22] [25] (year/volume/page-range/model-name). Gemini 2.5 Pro peer review (Minor Revision verdict, A-F all positive): - Neutralize script-path references in tables/appendix -> "supplementary materials". - Move conflict-of-interest declaration from III-L to new Declarations section before References (paper_a_declarations_v3.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:43:09 +08:00
gbanyan	fcce58aff0	Paper A v3.8: resolve Gemini 3.1 Pro round-6 independent-review findings Gemini round-6 (paper/gemini_review_v3_7.md) gave Minor Revision but flagged three issues that five rounds of codex review had missed. This commit addresses all three. BLOCKER: Accountant-level BD/McCrary null is a power artifact, not proof of smoothness (Gemini Issue 1) - At N=686 accountants the BD/McCrary test has limited statistical power; interpreting a failure-to-reject as affirmative proof of smoothness is a Type II error risk. - Discussion V-B: "itself diagnostic of smoothness" replaced with "failure-to-reject rather than a failure of the method --- informative alongside the other evidence but subject to the power caveat in Section V-G". - Discussion V-G (Sixth limitation): added a power-aware paragraph naming N=686 explicitly and clarifying that the substantive claim of smoothly-mixed clustering rests on the JOINT weight of dip test + BIC-selected GMM + BD null, not on BD alone. - Results IV-D.1 and IV-E: reframe accountant-level null as "consistent with --- not affirmative proof of" clustered-but- smoothly-mixed, citing V-G for the power caveat. - Appendix A interpretation paragraph: explicit inferential-asymmetry sentence ("consistency is what the BD null delivers, not affirmative proof"); "itself evidence for" removed. - Conclusion: "consistent with clustered but smoothly mixed" rephrased with explicit power caveat ("at N = 686 the test has limited power and cannot affirmatively establish smoothness"). MAJOR: Table X FRR / EER was tautological reviewer-bait (Gemini Issue 2) - Byte-identical positive anchor has cosine approx 1 by construction, so FRR against that subset is trivially 0 at every threshold below 1 and any EER calculation is arithmetic tautology, not biometric performance. - Results IV-G.1: removed EER row; dropped FRR column from Table X; added a table note explaining the omission and directing readers to Section V-F for the conservative-subset discussion. - Methodology III-K: removed the EER / FRR-against-byte-identical reporting clause; clarified that FAR against inter-CPA negatives is the primary reported quantity. - Table X is now FAR + Wilson 95% CI only, which is the quantity that actually carries empirical content on this anchor design. MINOR: Document-level worst-case aggregation narrative (Gemini Issue 3) + 15-signature delta (Gemini spot-check) - Results IV-I: added two sentences explicitly noting that the document-level percentages reflect the Section III-L worst-case aggregation rule (a report with one stamped + one hand-signed signature inherits the most-replication-consistent label), and cross-referencing Section IV-H.3 / Table XVI for the mixed-report composition that qualifies the headline percentages. - Results IV-D: added a one-sentence footnote explaining that the 15-signature delta between the Table III CPA-matched count (168,755) and the all-pairs analyzed count (168,740) is due to CPAs with exactly one signature, for whom no same-CPA pairwise best-match statistic exists. Abstract remains 243 words, comfortably under the IEEE Access 250-word cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 14:47:48 +08:00
gbanyan	552b6b80d4	Paper A v3.7: demote BD/McCrary to density-smoothness diagnostic; add Appendix A Implements codex gpt-5.4 recommendation (paper/codex_bd_mccrary_opinion.md, "option (c) hybrid"): demote BD/McCrary in the main text from a co-equal threshold estimator to a density-smoothness diagnostic, and add a bin-width sensitivity appendix as an audit trail. Why: the bin-width sweep (Script 25) confirms that at the signature level the BD transition drifts monotonically with bin width (Firm A cosine: 0.987 -> 0.985 -> 0.980 -> 0.975 as bin width widens 0.003 -> 0.015; full-sample dHash transitions drift from 2 to 10 to 9 across bin widths 1 / 2 / 3) and Z statistics inflate superlinearly with bin width, both characteristic of a histogram-resolution artifact. At the accountant level the BD null is robust across the sweep. The paper's earlier "three methodologically distinct estimators" framing therefore could not be defended to an IEEE Access reviewer once the sweep was run. Added - signature_analysis/25_bd_mccrary_sensitivity.py: bin-width sweep across 6 variants (Firm A / full-sample / accountant-level, each cosine + dHash_indep) and 3-4 bin widths per variant. Reports Z_below, Z_above, p-values, and number of significant transitions per cell. Writes reports/bd_sensitivity/bd_sensitivity.{json,md}. - paper/paper_a_appendix_v3.md: new "Appendix A. BD/McCrary Bin-Width Sensitivity" with Table A.I (all 20 sensitivity cells) and interpretation linking the empirical pattern to the main-text framing decision. - export_v3.py: appendix inserted into SECTIONS between conclusion and references. - paper/codex_bd_mccrary_opinion.md: codex gpt-5.4 recommendation captured verbatim for audit trail. Main-text reframing - Abstract: "three methodologically distinct estimators" -> "two estimators plus a Burgstahler-Dichev/McCrary density- smoothness diagnostic". Trimmed to 243 words. - Introduction: related-work summary, pipeline step 5, accountant- level convergence sentence, contribution 4, and section-outline line all updated. Contribution 4 renamed to "Convergent threshold framework with a smoothness diagnostic". - Methodology III-I: section renamed to "Convergent Threshold Determination with a Density-Smoothness Diagnostic". "Method 2: BD/McCrary Discontinuity" converted to "Density-Smoothness Diagnostic" in a new subsection; Method 3 (Beta mixture) renumbered to Method 2. Subsections 4 and 5 updated to refer to "two threshold estimators" with BD as diagnostic. - Methodology III-A pipeline overview: "three methodologically distinct statistical methods" -> "two methodologically distinct threshold estimators complemented by a density-smoothness diagnostic". - Methodology III-L: "three-method analysis" -> "accountant-level threshold analysis (KDE antimode, Beta-2 crossing, logit-Gaussian robustness crossing)". - Results IV-D.1 heading: "BD/McCrary Discontinuity" -> "BD/McCrary Density-Smoothness Diagnostic". Prose now notes the Appendix-A bin-width instability explicitly. - Results IV-E: Table VIII restructured to label BD rows "(diagnostic only; bin-unstable)" and "(diagnostic; null across Appendix A)". Summary sentence rewritten to frame BD null as evidence for clustered-but-smoothly-mixed rather than as a convergence failure. Table cosine P5 row corrected from 0.941 to 0.9407 to match III-K. - Results IV-G.3 and IV-I.2: "three-method convergence/thresholds" -> "accountant-level convergent thresholds" (clarifies the 3 converging estimates are KDE antimode, Beta-2, logit-Gaussian, not KDE/BD/Beta). - Discussion V-B: "three-method framework" -> "convergent threshold framework". - Conclusion: "three methodologically distinct methods" -> "two threshold estimators and a density-smoothness diagnostic"; contribution 3 restated; future-work sentence updated. - Impact Statement (archived): "three methodologically distinct threshold-selection methods" -> "two methodologically distinct threshold estimators plus a density-smoothness diagnostic" so the archived text is internally consistent if reused. Discussion V-B / V-G already framed BD as a diagnostic in v3.5 (unchanged in this commit). The reframing therefore brings Abstract / Introduction / Methodology / Results / Conclusion into alignment with the Discussion framing that codex had already endorsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 14:32:50 +08:00

5 Commits