pdf_signature_extraction

gbanyan/pdf_signature_extraction

Fork 0

Commit Graph

Author	SHA1	Message	Date
gbanyan	fcce58aff0	Paper A v3.8: resolve Gemini 3.1 Pro round-6 independent-review findings Gemini round-6 (paper/gemini_review_v3_7.md) gave Minor Revision but flagged three issues that five rounds of codex review had missed. This commit addresses all three. BLOCKER: Accountant-level BD/McCrary null is a power artifact, not proof of smoothness (Gemini Issue 1) - At N=686 accountants the BD/McCrary test has limited statistical power; interpreting a failure-to-reject as affirmative proof of smoothness is a Type II error risk. - Discussion V-B: "itself diagnostic of smoothness" replaced with "failure-to-reject rather than a failure of the method --- informative alongside the other evidence but subject to the power caveat in Section V-G". - Discussion V-G (Sixth limitation): added a power-aware paragraph naming N=686 explicitly and clarifying that the substantive claim of smoothly-mixed clustering rests on the JOINT weight of dip test + BIC-selected GMM + BD null, not on BD alone. - Results IV-D.1 and IV-E: reframe accountant-level null as "consistent with --- not affirmative proof of" clustered-but- smoothly-mixed, citing V-G for the power caveat. - Appendix A interpretation paragraph: explicit inferential-asymmetry sentence ("consistency is what the BD null delivers, not affirmative proof"); "itself evidence for" removed. - Conclusion: "consistent with clustered but smoothly mixed" rephrased with explicit power caveat ("at N = 686 the test has limited power and cannot affirmatively establish smoothness"). MAJOR: Table X FRR / EER was tautological reviewer-bait (Gemini Issue 2) - Byte-identical positive anchor has cosine approx 1 by construction, so FRR against that subset is trivially 0 at every threshold below 1 and any EER calculation is arithmetic tautology, not biometric performance. - Results IV-G.1: removed EER row; dropped FRR column from Table X; added a table note explaining the omission and directing readers to Section V-F for the conservative-subset discussion. - Methodology III-K: removed the EER / FRR-against-byte-identical reporting clause; clarified that FAR against inter-CPA negatives is the primary reported quantity. - Table X is now FAR + Wilson 95% CI only, which is the quantity that actually carries empirical content on this anchor design. MINOR: Document-level worst-case aggregation narrative (Gemini Issue 3) + 15-signature delta (Gemini spot-check) - Results IV-I: added two sentences explicitly noting that the document-level percentages reflect the Section III-L worst-case aggregation rule (a report with one stamped + one hand-signed signature inherits the most-replication-consistent label), and cross-referencing Section IV-H.3 / Table XVI for the mixed-report composition that qualifies the headline percentages. - Results IV-D: added a one-sentence footnote explaining that the 15-signature delta between the Table III CPA-matched count (168,755) and the all-pairs analyzed count (168,740) is due to CPAs with exactly one signature, for whom no same-CPA pairwise best-match statistic exists. Abstract remains 243 words, comfortably under the IEEE Access 250-word cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 14:47:48 +08:00
gbanyan	552b6b80d4	Paper A v3.7: demote BD/McCrary to density-smoothness diagnostic; add Appendix A Implements codex gpt-5.4 recommendation (paper/codex_bd_mccrary_opinion.md, "option (c) hybrid"): demote BD/McCrary in the main text from a co-equal threshold estimator to a density-smoothness diagnostic, and add a bin-width sensitivity appendix as an audit trail. Why: the bin-width sweep (Script 25) confirms that at the signature level the BD transition drifts monotonically with bin width (Firm A cosine: 0.987 -> 0.985 -> 0.980 -> 0.975 as bin width widens 0.003 -> 0.015; full-sample dHash transitions drift from 2 to 10 to 9 across bin widths 1 / 2 / 3) and Z statistics inflate superlinearly with bin width, both characteristic of a histogram-resolution artifact. At the accountant level the BD null is robust across the sweep. The paper's earlier "three methodologically distinct estimators" framing therefore could not be defended to an IEEE Access reviewer once the sweep was run. Added - signature_analysis/25_bd_mccrary_sensitivity.py: bin-width sweep across 6 variants (Firm A / full-sample / accountant-level, each cosine + dHash_indep) and 3-4 bin widths per variant. Reports Z_below, Z_above, p-values, and number of significant transitions per cell. Writes reports/bd_sensitivity/bd_sensitivity.{json,md}. - paper/paper_a_appendix_v3.md: new "Appendix A. BD/McCrary Bin-Width Sensitivity" with Table A.I (all 20 sensitivity cells) and interpretation linking the empirical pattern to the main-text framing decision. - export_v3.py: appendix inserted into SECTIONS between conclusion and references. - paper/codex_bd_mccrary_opinion.md: codex gpt-5.4 recommendation captured verbatim for audit trail. Main-text reframing - Abstract: "three methodologically distinct estimators" -> "two estimators plus a Burgstahler-Dichev/McCrary density- smoothness diagnostic". Trimmed to 243 words. - Introduction: related-work summary, pipeline step 5, accountant- level convergence sentence, contribution 4, and section-outline line all updated. Contribution 4 renamed to "Convergent threshold framework with a smoothness diagnostic". - Methodology III-I: section renamed to "Convergent Threshold Determination with a Density-Smoothness Diagnostic". "Method 2: BD/McCrary Discontinuity" converted to "Density-Smoothness Diagnostic" in a new subsection; Method 3 (Beta mixture) renumbered to Method 2. Subsections 4 and 5 updated to refer to "two threshold estimators" with BD as diagnostic. - Methodology III-A pipeline overview: "three methodologically distinct statistical methods" -> "two methodologically distinct threshold estimators complemented by a density-smoothness diagnostic". - Methodology III-L: "three-method analysis" -> "accountant-level threshold analysis (KDE antimode, Beta-2 crossing, logit-Gaussian robustness crossing)". - Results IV-D.1 heading: "BD/McCrary Discontinuity" -> "BD/McCrary Density-Smoothness Diagnostic". Prose now notes the Appendix-A bin-width instability explicitly. - Results IV-E: Table VIII restructured to label BD rows "(diagnostic only; bin-unstable)" and "(diagnostic; null across Appendix A)". Summary sentence rewritten to frame BD null as evidence for clustered-but-smoothly-mixed rather than as a convergence failure. Table cosine P5 row corrected from 0.941 to 0.9407 to match III-K. - Results IV-G.3 and IV-I.2: "three-method convergence/thresholds" -> "accountant-level convergent thresholds" (clarifies the 3 converging estimates are KDE antimode, Beta-2, logit-Gaussian, not KDE/BD/Beta). - Discussion V-B: "three-method framework" -> "convergent threshold framework". - Conclusion: "three methodologically distinct methods" -> "two threshold estimators and a density-smoothness diagnostic"; contribution 3 restated; future-work sentence updated. - Impact Statement (archived): "three methodologically distinct threshold-selection methods" -> "two methodologically distinct threshold estimators plus a density-smoothness diagnostic" so the archived text is internally consistent if reused. Discussion V-B / V-G already framed BD as a diagnostic in v3.5 (unchanged in this commit). The reframing therefore brings Abstract / Introduction / Methodology / Results / Conclusion into alignment with the Discussion framing that codex had already endorsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 14:32:50 +08:00

Author

SHA1

Message

Date

gbanyan

fcce58aff0

Paper A v3.8: resolve Gemini 3.1 Pro round-6 independent-review findings

Gemini round-6 (paper/gemini_review_v3_7.md) gave Minor Revision but
flagged three issues that five rounds of codex review had missed.
This commit addresses all three.

BLOCKER: Accountant-level BD/McCrary null is a power artifact, not
proof of smoothness (Gemini Issue 1)
- At N=686 accountants the BD/McCrary test has limited statistical
  power; interpreting a failure-to-reject as affirmative proof of
  smoothness is a Type II error risk.
- Discussion V-B: "itself diagnostic of smoothness" replaced with
  "failure-to-reject rather than a failure of the method ---
  informative alongside the other evidence but subject to the power
  caveat in Section V-G".
- Discussion V-G (Sixth limitation): added a power-aware paragraph
  naming N=686 explicitly and clarifying that the substantive claim
  of smoothly-mixed clustering rests on the JOINT weight of dip
  test + BIC-selected GMM + BD null, not on BD alone.
- Results IV-D.1 and IV-E: reframe accountant-level null as
  "consistent with --- not affirmative proof of" clustered-but-
  smoothly-mixed, citing V-G for the power caveat.
- Appendix A interpretation paragraph: explicit inferential-asymmetry
  sentence ("consistency is what the BD null delivers, not
  affirmative proof"); "itself evidence for" removed.
- Conclusion: "consistent with clustered but smoothly mixed"
  rephrased with explicit power caveat ("at N = 686 the test has
  limited power and cannot affirmatively establish smoothness").

MAJOR: Table X FRR / EER was tautological reviewer-bait
(Gemini Issue 2)
- Byte-identical positive anchor has cosine approx 1 by construction,
  so FRR against that subset is trivially 0 at every threshold
  below 1 and any EER calculation is arithmetic tautology, not
  biometric performance.
- Results IV-G.1: removed EER row; dropped FRR column from Table X;
  added a table note explaining the omission and directing readers
  to Section V-F for the conservative-subset discussion.
- Methodology III-K: removed the EER / FRR-against-byte-identical
  reporting clause; clarified that FAR against inter-CPA negatives
  is the primary reported quantity.
- Table X is now FAR + Wilson 95% CI only, which is the quantity
  that actually carries empirical content on this anchor design.

MINOR: Document-level worst-case aggregation narrative (Gemini
Issue 3) + 15-signature delta (Gemini spot-check)
- Results IV-I: added two sentences explicitly noting that the
  document-level percentages reflect the Section III-L worst-case
  aggregation rule (a report with one stamped + one hand-signed
  signature inherits the most-replication-consistent label), and
  cross-referencing Section IV-H.3 / Table XVI for the mixed-report
  composition that qualifies the headline percentages.
- Results IV-D: added a one-sentence footnote explaining that the
  15-signature delta between the Table III CPA-matched count
  (168,755) and the all-pairs analyzed count (168,740) is due to
  CPAs with exactly one signature, for whom no same-CPA pairwise
  best-match statistic exists.

Abstract remains 243 words, comfortably under the IEEE Access
250-word cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-21 14:47:48 +08:00

gbanyan

552b6b80d4

Paper A v3.7: demote BD/McCrary to density-smoothness diagnostic; add Appendix A

Implements codex gpt-5.4 recommendation (paper/codex_bd_mccrary_opinion.md,
"option (c) hybrid"): demote BD/McCrary in the main text from a co-equal
threshold estimator to a density-smoothness diagnostic, and add a
bin-width sensitivity appendix as an audit trail.

Why: the bin-width sweep (Script 25) confirms that at the signature
level the BD transition drifts monotonically with bin width (Firm A
cosine: 0.987 -> 0.985 -> 0.980 -> 0.975 as bin width widens 0.003 ->
0.015; full-sample dHash transitions drift from 2 to 10 to 9 across
bin widths 1 / 2 / 3) and Z statistics inflate superlinearly with bin
width, both characteristic of a histogram-resolution artifact. At the
accountant level the BD null is robust across the sweep. The paper's
earlier "three methodologically distinct estimators" framing therefore
could not be defended to an IEEE Access reviewer once the sweep was
run.

Added
- signature_analysis/25_bd_mccrary_sensitivity.py: bin-width sweep
  across 6 variants (Firm A / full-sample / accountant-level, each
  cosine + dHash_indep) and 3-4 bin widths per variant. Reports
  Z_below, Z_above, p-values, and number of significant transitions
  per cell. Writes reports/bd_sensitivity/bd_sensitivity.{json,md}.
- paper/paper_a_appendix_v3.md: new "Appendix A. BD/McCrary Bin-Width
  Sensitivity" with Table A.I (all 20 sensitivity cells) and
  interpretation linking the empirical pattern to the main-text
  framing decision.
- export_v3.py: appendix inserted into SECTIONS between conclusion
  and references.
- paper/codex_bd_mccrary_opinion.md: codex gpt-5.4 recommendation
  captured verbatim for audit trail.

Main-text reframing
- Abstract: "three methodologically distinct estimators" ->
  "two estimators plus a Burgstahler-Dichev/McCrary density-
  smoothness diagnostic". Trimmed to 243 words.
- Introduction: related-work summary, pipeline step 5, accountant-
  level convergence sentence, contribution 4, and section-outline
  line all updated. Contribution 4 renamed to "Convergent threshold
  framework with a smoothness diagnostic".
- Methodology III-I: section renamed to "Convergent Threshold
  Determination with a Density-Smoothness Diagnostic". "Method 2:
  BD/McCrary Discontinuity" converted to "Density-Smoothness
  Diagnostic" in a new subsection; Method 3 (Beta mixture) renumbered
  to Method 2. Subsections 4 and 5 updated to refer to "two threshold
  estimators" with BD as diagnostic.
- Methodology III-A pipeline overview: "three methodologically
  distinct statistical methods" -> "two methodologically distinct
  threshold estimators complemented by a density-smoothness
  diagnostic".
- Methodology III-L: "three-method analysis" -> "accountant-level
  threshold analysis (KDE antimode, Beta-2 crossing, logit-Gaussian
  robustness crossing)".
- Results IV-D.1 heading: "BD/McCrary Discontinuity" ->
  "BD/McCrary Density-Smoothness Diagnostic". Prose now notes the
  Appendix-A bin-width instability explicitly.
- Results IV-E: Table VIII restructured to label BD rows
  "(diagnostic only; bin-unstable)" and "(diagnostic; null across
  Appendix A)". Summary sentence rewritten to frame BD null as
  evidence for clustered-but-smoothly-mixed rather than as a
  convergence failure. Table cosine P5 row corrected from 0.941 to
  0.9407 to match III-K.
- Results IV-G.3 and IV-I.2: "three-method convergence/thresholds"
  -> "accountant-level convergent thresholds" (clarifies the 3
  converging estimates are KDE antimode, Beta-2, logit-Gaussian,
  not KDE/BD/Beta).
- Discussion V-B: "three-method framework" -> "convergent threshold
  framework".
- Conclusion: "three methodologically distinct methods" -> "two
  threshold estimators and a density-smoothness diagnostic";
  contribution 3 restated; future-work sentence updated.
- Impact Statement (archived): "three methodologically distinct
  threshold-selection methods" -> "two methodologically distinct
  threshold estimators plus a density-smoothness diagnostic" so the
  archived text is internally consistent if reused.

Discussion V-B / V-G already framed BD as a diagnostic in v3.5
(unchanged in this commit). The reframing therefore brings Abstract /
Introduction / Methodology / Results / Conclusion into alignment with
the Discussion framing that codex had already endorsed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-21 14:32:50 +08:00

2 Commits