paper-a-v4-big4
11 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1e8466f7a8 |
Paper A v4.3: unify Firm A positioning to out-of-sample templated-end target
Finishes the BCD re-anchor chassis (audit critique #1): Firm A was inconsistently framed as both a "within-Big-4 case study" and an "out-of-sample target". Harmonised to a single label, "out-of-sample templated-end target" (held out of the calibration negative anchor; scored against the normative Firms-B/C/D baseline), across: - §I contribution #3 (title + body) - §III-H.2 (opening trio BCD/Firm-A/non-Big-4; sub-header; role sentence) - §V-C body (removed the dual case-study/out-of-sample phrasing) (§V-C header already fixed in ac3372d.) Zero "case study" wording remains; no numbers changed. codex gpt-5.5 focused check: all consistency items PASS, no new findings. Also restore the BCD+non-Big-4 joint ICCR Wilson CI [0.000001, 0.000015] to the §IV-M Table XXI note (three-scope CI symmetry; the one MINOR completeness gap surfaced by a codex old-vs-new content diff, which otherwise confirmed no substantive content was dropped by the trim). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
ac3372d2d2 |
Paper A v4.3: restructure §III baseline-first + deep trim (audit #3/#4)
#3 Reorder §III around "establish normative baseline → show who deviates": new order A–O with I=Normative Baseline + inter-CPA coincidence floor (old L.0–L.3), J=Firm-Level Deviation (old L.4+L.6), K=Why the distribution gives no threshold (old §I distributional + L.5), L=K=3 partition (old §J), M=Convergent checks (old §K), N=limits (old §M), O=data source (old §N). ~170 cross-refs remapped (two-pass tokenized), incl. 5 spelled-out "Section III-X" refs. #4 Deep trim: §III 10,960→8,461w (−23%). Removed §III↔§IV-M and §III↔§IV-F table duplication (Results keeps canonical tables; §III keeps method+headline+pointer); condensed distributional diagnostics; consolidated repeated caveat. No locked number changed. Also: §V-C header "Case Study"→"Out-of-Sample Target"; abstract 251→250w; housekeeping (rm superseded draft_section_L_bcd.md + v4.0 pandoc docx, remove stale OCR/handoff docs, gitignore .serena/). codex gpt-5.5 review: 0 BLOCKER / 3 MAJOR / 3 MINOR; 3 MAJOR fixed (§III-J.2 observed-vs-counterfactual transition, §III-M table pointers κ→XI/pixel→XIV, §III-N stale tightening figures); 2nd pass clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
17156516a0 |
§III-L.0: reword e-signature-adoption rationale as industry background (no interview citation)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
1eb323e959 |
Paper A v4.2: re-anchor primary calibration to clean BCD 2013-2019 baseline
- Restrict the calibration negative anchor to Firms B/C/D, fiscal years 2013-2019 (pre-electronic-signature hand-signing period); B/C/D adopted e-signing post-2020 at staggered times, so 2013-2019 is the construct-clean baseline. Firm A scored across its full 2013-2023 record against it. - New locked numbers (codex-audited, Scripts 54/55): per-comparison HC floor 0.000010; per-signature HC floor 0.0059 [boot 0.0045-0.0073]; per-document HC 0.0117 / HC+MC 0.1753; per-firm HC+MC B 0.162 / C 0.225 / D 0.089. Firm A observed 0.817 = ~139x the clean floor (was ~70x on all-period BCD); Firm A out-of-sample vs clean pool 0.0001 (below floor -> never resembles genuine hand-signing). BCD 2020+ robustness: per-sig 0.0105, per-comparison 0.000036 (~2x pre-2020) quantifies the e-signing contamination. - Propagated through abstract / Sec. I / III-L / IV-M / V / conclusion; 0.837 crossover kept corpus-wide; ABCD retained as contamination comparison. - Grounded the 2013-2019 choice on data (floor drift) + e-sign-adoption background, not on in-text interview claims (double-blind). - Add Scripts 54 (temporal floor stability) and 55 (BCD 2013-2019 primary calibration + Firm A scoring). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
3c7fcc010f |
Paper A v4.1: BCD-baseline reframe + screening positioning + trim
- Re-anchor inter-CPA coincidence-rate (ICCR) calibration on a normative non-Firm-A baseline (Firms B/C/D); Firm A held out as an out-of-sample target. Locked canonical numbers (codex-audited; Scripts 46/52/53): per-comparison HC 0.00014->0.000018, per-signature HC 0.0116, per-document HC+MC 0.34->0.1905; KDE crossover 0.837 retained corpus-wide. - Reposition as an operator-tunable, semi-automated screening/triage framework (title -> "Automated Screening..."): HC = high-specificity operating point; MC band demoted to low-specificity advisory; Firm A = demonstration that the screening surfaces a templated end, audit-quality implications deferred. - Apply codex prose-review fixes: triage-neutral five-way labels, soften mechanism/specificity wording, supersede MC claim-strength, update stale Appendix script references (40b/43/45 -> 46/52/53). - Trim pass: compress Sec. V discussion + Sec. III echoes (27.7k -> 26.8k words); no substantive content removed. - Add analysis scripts 45-53 (firm-year trends; BCD-only ICCR recompute; canonical-sampler locked numbers; Firm-A out-of-sample; BCD regression + cross-firm hit matrix). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
becce857e1 |
Phase 6 round-7 codex 3-axis review fixes: 11 MAJOR + 5 MINOR
Codex GPT-5.5 3-axis peer review (paper/codex_review_gpt55_v4_round_3axis.md) identified 11 MAJOR + 5 MINOR + 0 BLOCKER on three axes: (1) abstract/body tone consistency, (2) methodology clarity / v3 residue, (3) no implicit within-CPA or cross-year signature-consistency assumptions. 13 patches applied across 4 source files; mirrored in paper_a_v4_combined.md. Axis 1 (tone consistency between abstract and body): - S I L33: "resolves the ambiguity" -> "provides complementary evidence for screening cases where ... hypotheses diverge" - S I L35: "disproves the distributional-threshold path" -> "does not support the distributional-threshold path" - S I L37 / S V-F L29: "characterise the deployed five-way classifier at three units" -> "characterise the deployed HC sub-rule and document-level HC+MC alarm derived from the five-way classifier at three units" (consistent with S V-H which says only HC sub-rule and HC+MC alarm are re-characterised by the present ICCR battery) - S I L39 / S V-C / S III-L.4: "consistent with firm-specific template, stamp, or document-production reuse mechanisms" -> "consistent with -- but does not independently establish -- firm-level template-like reuse, digitisation-pipeline homogeneity, or signing-style homogeneity, which descriptor-only data cannot separate (S V-H)" (mirrors abstract) Axis 2 (methodology clarity / v3 residue): - S III-G: added unit-bridge sentence distinguishing "descriptor-summary units" (signature/accountant) from "operational reporting units" (per-comparison/per-signature/per-document, S III-L) - S III-H.2: "The calibration distinguishes two reference populations" -> "The supporting diagnostics use two reference populations" with explicit "neither is the calibration anchor" - S III-L.1: "specificity" -> "ICCR refinement" - S III-L.2: added "descriptive intuition, not an independence assumption used for estimation" caveat after the 1-(1-p)^n form Axis 3 (no implicit signature-consistency assumptions): - S III-F: hand-signing motivation rewritten as working hypothesis that "the classifier does not require ... to hold for all CPAs" - S III-G A1: added "A1 does not assume temporal stability of handwriting or scanning workflow within or across years" - S III-H.1: added label-caveat paragraph (operational rule outputs, not validated ground-truth classes); HC "strong replication evidence" -> "image-similarity evidence consistent with replication"; HSC "consistent with a CPA who signs very consistently" -> "mechanism not resolved by descriptor data alone"; LH explicitly owns that cross-year handwriting drift, scanner workflow change, or template variant rotation can also yield low max-cosine within a same-CPA pool - S III-L.6 / S IV-M.6: "same-CPA repeatability signal" -> "observed same-CPA-pool excess ... not attributed to within-CPA handwriting repeatability" Deferred (structural, not single-sentence patch): codex S III-I.2 / S III-J K=2/K=3 deduplication; codex S III-K LOOO / S III-J duplication. Both are MINOR stylistic redundancies, not reviewer-rejection risks. DOCX rebuilt via export_v3.py; v4.0_20260515 file refreshed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3672c9343e |
Phase 6 round-6: soften firm-heterogeneity framing, fix DOCX table render
Framing softening (per partner tone decision: own the limitation rather than defend the strong claim). Abstract: "Firm heterogeneity is decisive ... consistent with firm-level template-like reuse" -> "The framework surfaces pronounced firm-level heterogeneity ... consistent with firm-level template-like reuse but not independently diagnostic, since descriptor-only data cannot separate reuse from digitisation-pipeline or signing-style homogeneity within a firm; we report it as a scope limitation rather than a mechanism finding." S V-H Limitations: new bullet "Mechanism attribution for the firm-level heterogeneity is not identifiable from descriptor-only data." enumerates three non-mutually-exclusive firm-level mechanisms (template-like reuse / digitisation-pipeline homogeneity / signing-style homogeneity), notes the (cosine, dHash) descriptor pair is by construction indifferent to which mechanism generated a near-identical pair, and lists what additional data would be needed for attribution. S VI Conclusion items (3) and (4): "firm heterogeneity quantification" -> "firm-level heterogeneity surfaced by the framework ... reported as a framework-discriminative observation rather than a mechanism finding"; item (4) expanded from template/stamp/document-production reuse alone to the three-mechanism scope, with explicit "not independently establishing" and S V-H cross-reference. DOCX export fix (export_v3.py): add missing LaTeX-to-Unicode tokens (\checkmark, \lvert/\rvert, \lVert/\rVert, \in, \notin, \max, \min, \log, \ln, \exp, \bullet) that were silently dropping content from Table III rows 2-4 (integer-jitter robustness check marks empty) and Table XVIII drift column (|Delta| empty). Rebuild Paper_A_IEEE_Access_Draft_v3.docx via export_v3.py and install copy as Paper_A_IEEE_Access_Draft_v4.0_20260515.docx (replaces prior pandoc-built v4 DOCX which had empty cells in every table header with LaTeX math and inconsistent column widths). All 43 tables now have non-empty cells with sub/superscript runs. Mirrored in paper_a_v4_combined.md for consistency with the single-file combined source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4efbb7f4b8 |
Phase 6 round-5 codex-review fixes: minor + nit cleanup
Codex round-4 verification (commit
|
||
|
|
bbb662acd1 |
Phase 6 round-4 codex-review fixes: blocker + 2 majors + minors
Codex round-3 verification (commit
|
||
|
|
9e68f2e1d3 |
Phase 6 round-3 codex-review fixes: blockers + majors + minors
Resolved Codex review (gpt-5.5 xhigh) findings against
|
||
|
|
b6913d2f93 |
Phase 6 round-2 reviewer revisions: §III-H.1 promotion + framing alignment
Structural: - Promote operational classifier definition from §III-L.0 to new §III-H.1, so the reader meets the five-way HC/MC/HSC/UN/LH rule before the §III-I/J/K diagnostic chain instead of ~130 lines after. §III-L renamed to "Anchor-Based Threshold Calibration"; §III-L.0 retains only calibration methodology, three units of analysis, any-pair semantics, and the FAR terminological note. §III-L.7 deleted (redundant with §III-J). - Reorganise §V-H Limitations into Primary / Secondary / Documented features / Engineering groupings (was a flat 14-item list). - Reframe §III-M from "ten-tool unsupervised-validation collection" to "each diagnostic addresses one specific unsupervised failure mode"; rename "What v4.0 does/does not claim" → "Limits / Scope of the present analysis"; retitle Table XXVII. Framing alignment (cross-section): - Strip all v3.x / v4.0 / v3.20 / v4-new / inherited lineage labels from rendered text (Abstract, Intro, §II, §III, §IV, §V, §VI, Appendix, Impact). - Replace "Paper A" rule references with "deployed" rule references. - Soften "validation" to "characterise" / "check" / "screening label" / "consistency check" / "support"; "verdict" → "screening label". - Remove codex-verified spike claims (non-Big-4 jittered dHash, Big-4 pooled cosine after firm-mean centring). Only formally scripted evidence (Scripts 39b–39e) retained; non-Big-4 evidence framed as corroborating raw-axis cosine, not as calibration evidence. - Strip script-provenance parentheticals from Introduction; defer Script 39c internal references and similar to Methodology / Appendix. Numerical / table fixes: - §III-C document-count arithmetic: 12 corrupted → 13 corrupted/unreadable, verified against sqlite DB and total-pdf/ folder counts (90,282 - 4,198 no-sig - 13 corrupted = 86,071 → 85,042 with detections → 182,328 sigs → 168,755 CPA-matched). Table I shows VLM-positive (86,084) and processed-for-extraction (86,071) as separate rows. - Wilson 95% CIs added for joint-rule ICCR rows in Table XXI / methodology table ([0.00011, 0.00018] and [0.00008, 0.00014]). - Unit error fixed: 0.3856 pp / 0.4431 pp → 0.3856 (38.6 pp) / 0.4431 (44.3 pp). Smaller revisions: - Pipeline framing: "detecting" → "screening" in Abstract / Intro / Conclusion for consistency with the unsupervised-screening positioning. - "hard ground-truth subset" → "conservative hard-positive subset" throughout. - §III-F SSIM / pixel-comparison rebuttal compressed from ~15 lines to 4; design-level argument deferred to supplementary materials. - "stakeholders can adopt / can derive thresholds" → "alternative operating points can be characterised by inverting" (less prescriptive). - "the same mechanism extending in milder form to Firms B/C/D" → "similar, milder production-related reuse patterns at Firms B/C/D" (mechanism claim softened). - Appendix A "non-hand-signed mode" / "two-mechanism mixture" lineage language aligned with v4 framing. Appendix B: - Rebuilt as a redirect-only stub. The HTML-commented obsolete table mapping (Table IX–XVIII labels with FAR / capture-rate / validation language) is removed; replaced with a short paragraph pointing to supplementary materials for full table-to-script provenance. Cross-references: - All §III-L references for the rule definition retargeted to §III-H.1; references for calibration still point to §III-L. - §III-H references for byte-level Firm A evidence / non-Big-4 reverse anchor retargeted to §III-H.2. Artefacts: - Combined manuscript regenerated: paper_a_v4_combined.md, 1314 lines (was 1346 pre-review). - Two review handoff documents added: paper/review_handoff_abstract_intro_20260515.md paper/review_handoff_body_20260515.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |