Commit Graph

5 Commits

Author SHA1 Message Date
gbanyan f1c253768a Paper A v3.18.3: address codex GPT-5.5 round-17 self-comparing review findings
Codex round-17 (paper/codex_review_gpt55_v3_18_2.md) re-audited v3.18.2 and
flagged three new issues introduced by the v3.18.2 edits themselves plus
items it had partially RESOLVED but not fully cleaned up. Verdict still
Minor Revision; this commit closes the new findings.

- Fix Appendix B provenance paths: replace four fabricated paths
  (formal_statistical/*, deloitte_distribution/*, pdf_level/*, ablation/*)
  with the actual artifact paths verified in the local report tree.
- Acknowledge that the report tree is at /Volumes/NV2/PDF-Processing/...
  and reviewers should rebase to their own report root rather than rely on
  absolute paths.
- Remove residual "single dominant mechanism" wording from Methodology
  III-H (third primary evidence sentence) and Discussion V-C.
- Fix Methodology III-H Hartigan dip-test parenthetical: "p = 0.17 at
  n >= 10 signatures" wrongly attached the accountant-level filter to the
  signature-level dip; corrected to "p = 0.17, N = 60,448 Firm A
  signatures".
- Soften Introduction Firm A motivation: replace "widely recognized
  within the audit profession as making substantial use of non-hand-signing
  for the majority of its certifying partners" with a methodology-first
  framing that defers to the image evidence reported in the paper.
- Soften Methodology III-H "widely held within the audit profession"
  wording (kept as motivation, marked clearly as non-load-bearing in the
  next sentence).
- Reconcile 55,921 vs 55,922 Firm A cosine-only counts in Section IV-H.2:
  document explicitly that the one-record drift comes from successive DB
  snapshots used to materialize Table IX vs the new script-28 artifact;
  no rate at two decimal places is affected.
- Rebuild Paper_A_IEEE_Access_Draft_v3.docx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:45:54 +08:00
gbanyan 4bb7aa9189 Paper A v3.18.2: address codex GPT-5.5 round-16 Minor-Revision findings
Codex independent peer review (paper/codex_review_gpt55_v3_18_1.md) audited
empirical claims against scripts/JSON reports rather than rubber-stamping
prior Accept verdicts. Verdict: Minor Revision. This commit addresses every
flagged item.

- Soften mechanism-identification language (Results IV-D.1, Discussion B):
  per-signature cosine "fails to reject unimodality" rather than "reflects a
  single dominant generative mechanism"; framing tied to joint evidence.
- Replace overabsolute "single stored image" with multi-template phrasing
  in Introduction and Methodology III-A.
- Reframe Methodology III-H so practitioner knowledge is non-load-bearing;
  evidentiary basis is the paper's own image evidence.
- Fix stale section cross-references after the v3.18 retitling: IV-F.* ->
  IV-G.* in 11 locations across methodology and results.
- Fix 0.941 / 0.945 / 0.9407 wording in Methodology III-K to use the
  calibration-fold P5 = 0.9407 and the rounded sensitivity cut 0.945.
- Soften "sharp discontinuity" in Results IV-G.3 to "23-28 percentage-point
  gap consistent with firm-wide non-hand-signing practice".
- Soften Conclusion's "directly generalizable" with explicit conditions on
  analogous anchors and artifact-generation physics.
- Add Appendix B: table-to-script provenance map (15 manuscript tables
  mapped to generating scripts and JSON report artifacts).
- New script signature_analysis/28_byte_identity_decomposition.py produces
  reproducible artifacts for two previously-unverified claims:
  (a) 145 / 50 / 180 / 35 Firm A byte-identity decomposition (verified);
  (b) cross-firm dual-descriptor convergence -- corrected from the previous
      manuscript text "non-Firm-A 11.3% vs Firm A 58.7% (5x)" to the
      database-verified "non-Firm-A 42.12% vs Firm A 88.32% (~2.1x)".
- Clarify scripts 19 / 21 docstrings: legacy EER / FRR / Precision / F1
  helpers are retained for diagnostic use only and are NOT cited as
  biometric performance in the paper. Remove "interview evidence" wording.
- Rebuild Paper_A_IEEE_Access_Draft_v3.docx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:23:08 +08:00
gbanyan 16e90bab20 Paper A v3.18: remove accountant-level + replication-dominated calibration + Gemini 2.5 Pro review minor fixes
Major changes (per partner red-pen + user decision):
- Delete entire accountant-level analysis (III.J, IV.E, Tables VI/VII/VIII,
  Fig 4) -- cross-year pooling assumption unjustified, removes the implicit
  "habitually stamps = always stamps" reading.
- Renumber sections III.J/K/L (was K/L/M) and IV.E/F/G/H/I (was F/G/H/I/J).
- Title: "Three-Method Convergent Thresholding" -> "Replication-Dominated
  Calibration" (the three diagnostics do NOT converge at signature level).
- Operational cosine cut anchored on whole-sample Firm A P7.5 (cos > 0.95).
- Three statistical diagnostics (Hartigan/Beta/BD-McCrary) reframed as
  descriptive characterisation, not threshold estimators.
- Firm A replication-dominated framing: 3 evidence strands -> 2.
- Discussion limitation list: drop accountant-level cross-year pooling and
  BD/McCrary diagnostic; add auditor-year longitudinal tracking as future work.
- Tone-shift: "we do not claim / do not derive" -> "we find / motivates".

Reference verification (independent web-search audit of all 41 refs):
- Fix [5] author hallucination: Hadjadj et al. -> Kao & Wen (real authors of
  Appl. Sci. 10:11:3716; report at paper/reference_verification_v3.md).
- Polish [16] [21] [22] [25] (year/volume/page-range/model-name).

Gemini 2.5 Pro peer review (Minor Revision verdict, A-F all positive):
- Neutralize script-path references in tables/appendix -> "supplementary
  materials".
- Move conflict-of-interest declaration from III-L to new Declarations
  section before References (paper_a_declarations_v3.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:43:09 +08:00
gbanyan fcce58aff0 Paper A v3.8: resolve Gemini 3.1 Pro round-6 independent-review findings
Gemini round-6 (paper/gemini_review_v3_7.md) gave Minor Revision but
flagged three issues that five rounds of codex review had missed.
This commit addresses all three.

BLOCKER: Accountant-level BD/McCrary null is a power artifact, not
proof of smoothness (Gemini Issue 1)
- At N=686 accountants the BD/McCrary test has limited statistical
  power; interpreting a failure-to-reject as affirmative proof of
  smoothness is a Type II error risk.
- Discussion V-B: "itself diagnostic of smoothness" replaced with
  "failure-to-reject rather than a failure of the method ---
  informative alongside the other evidence but subject to the power
  caveat in Section V-G".
- Discussion V-G (Sixth limitation): added a power-aware paragraph
  naming N=686 explicitly and clarifying that the substantive claim
  of smoothly-mixed clustering rests on the JOINT weight of dip
  test + BIC-selected GMM + BD null, not on BD alone.
- Results IV-D.1 and IV-E: reframe accountant-level null as
  "consistent with --- not affirmative proof of" clustered-but-
  smoothly-mixed, citing V-G for the power caveat.
- Appendix A interpretation paragraph: explicit inferential-asymmetry
  sentence ("consistency is what the BD null delivers, not
  affirmative proof"); "itself evidence for" removed.
- Conclusion: "consistent with clustered but smoothly mixed"
  rephrased with explicit power caveat ("at N = 686 the test has
  limited power and cannot affirmatively establish smoothness").

MAJOR: Table X FRR / EER was tautological reviewer-bait
(Gemini Issue 2)
- Byte-identical positive anchor has cosine approx 1 by construction,
  so FRR against that subset is trivially 0 at every threshold
  below 1 and any EER calculation is arithmetic tautology, not
  biometric performance.
- Results IV-G.1: removed EER row; dropped FRR column from Table X;
  added a table note explaining the omission and directing readers
  to Section V-F for the conservative-subset discussion.
- Methodology III-K: removed the EER / FRR-against-byte-identical
  reporting clause; clarified that FAR against inter-CPA negatives
  is the primary reported quantity.
- Table X is now FAR + Wilson 95% CI only, which is the quantity
  that actually carries empirical content on this anchor design.

MINOR: Document-level worst-case aggregation narrative (Gemini
Issue 3) + 15-signature delta (Gemini spot-check)
- Results IV-I: added two sentences explicitly noting that the
  document-level percentages reflect the Section III-L worst-case
  aggregation rule (a report with one stamped + one hand-signed
  signature inherits the most-replication-consistent label), and
  cross-referencing Section IV-H.3 / Table XVI for the mixed-report
  composition that qualifies the headline percentages.
- Results IV-D: added a one-sentence footnote explaining that the
  15-signature delta between the Table III CPA-matched count
  (168,755) and the all-pairs analyzed count (168,740) is due to
  CPAs with exactly one signature, for whom no same-CPA pairwise
  best-match statistic exists.

Abstract remains 243 words, comfortably under the IEEE Access
250-word cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:47:48 +08:00
gbanyan 552b6b80d4 Paper A v3.7: demote BD/McCrary to density-smoothness diagnostic; add Appendix A
Implements codex gpt-5.4 recommendation (paper/codex_bd_mccrary_opinion.md,
"option (c) hybrid"): demote BD/McCrary in the main text from a co-equal
threshold estimator to a density-smoothness diagnostic, and add a
bin-width sensitivity appendix as an audit trail.

Why: the bin-width sweep (Script 25) confirms that at the signature
level the BD transition drifts monotonically with bin width (Firm A
cosine: 0.987 -> 0.985 -> 0.980 -> 0.975 as bin width widens 0.003 ->
0.015; full-sample dHash transitions drift from 2 to 10 to 9 across
bin widths 1 / 2 / 3) and Z statistics inflate superlinearly with bin
width, both characteristic of a histogram-resolution artifact. At the
accountant level the BD null is robust across the sweep. The paper's
earlier "three methodologically distinct estimators" framing therefore
could not be defended to an IEEE Access reviewer once the sweep was
run.

Added
- signature_analysis/25_bd_mccrary_sensitivity.py: bin-width sweep
  across 6 variants (Firm A / full-sample / accountant-level, each
  cosine + dHash_indep) and 3-4 bin widths per variant. Reports
  Z_below, Z_above, p-values, and number of significant transitions
  per cell. Writes reports/bd_sensitivity/bd_sensitivity.{json,md}.
- paper/paper_a_appendix_v3.md: new "Appendix A. BD/McCrary Bin-Width
  Sensitivity" with Table A.I (all 20 sensitivity cells) and
  interpretation linking the empirical pattern to the main-text
  framing decision.
- export_v3.py: appendix inserted into SECTIONS between conclusion
  and references.
- paper/codex_bd_mccrary_opinion.md: codex gpt-5.4 recommendation
  captured verbatim for audit trail.

Main-text reframing
- Abstract: "three methodologically distinct estimators" ->
  "two estimators plus a Burgstahler-Dichev/McCrary density-
  smoothness diagnostic". Trimmed to 243 words.
- Introduction: related-work summary, pipeline step 5, accountant-
  level convergence sentence, contribution 4, and section-outline
  line all updated. Contribution 4 renamed to "Convergent threshold
  framework with a smoothness diagnostic".
- Methodology III-I: section renamed to "Convergent Threshold
  Determination with a Density-Smoothness Diagnostic". "Method 2:
  BD/McCrary Discontinuity" converted to "Density-Smoothness
  Diagnostic" in a new subsection; Method 3 (Beta mixture) renumbered
  to Method 2. Subsections 4 and 5 updated to refer to "two threshold
  estimators" with BD as diagnostic.
- Methodology III-A pipeline overview: "three methodologically
  distinct statistical methods" -> "two methodologically distinct
  threshold estimators complemented by a density-smoothness
  diagnostic".
- Methodology III-L: "three-method analysis" -> "accountant-level
  threshold analysis (KDE antimode, Beta-2 crossing, logit-Gaussian
  robustness crossing)".
- Results IV-D.1 heading: "BD/McCrary Discontinuity" ->
  "BD/McCrary Density-Smoothness Diagnostic". Prose now notes the
  Appendix-A bin-width instability explicitly.
- Results IV-E: Table VIII restructured to label BD rows
  "(diagnostic only; bin-unstable)" and "(diagnostic; null across
  Appendix A)". Summary sentence rewritten to frame BD null as
  evidence for clustered-but-smoothly-mixed rather than as a
  convergence failure. Table cosine P5 row corrected from 0.941 to
  0.9407 to match III-K.
- Results IV-G.3 and IV-I.2: "three-method convergence/thresholds"
  -> "accountant-level convergent thresholds" (clarifies the 3
  converging estimates are KDE antimode, Beta-2, logit-Gaussian,
  not KDE/BD/Beta).
- Discussion V-B: "three-method framework" -> "convergent threshold
  framework".
- Conclusion: "three methodologically distinct methods" -> "two
  threshold estimators and a density-smoothness diagnostic";
  contribution 3 restated; future-work sentence updated.
- Impact Statement (archived): "three methodologically distinct
  threshold-selection methods" -> "two methodologically distinct
  threshold estimators plus a density-smoothness diagnostic" so the
  archived text is internally consistent if reused.

Discussion V-B / V-G already framed BD as a diagnostic in v3.5
(unchanged in this commit). The reframing therefore brings Abstract /
Introduction / Methodology / Results / Conclusion into alignment with
the Discussion framing that codex had already endorsed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:32:50 +08:00