Commit Graph

15 Commits

Author SHA1 Message Date
gbanyan 980295d5bd Update §IV v3.3: soften §IV-D/E framing + rename §IV-I + add §IV-M
- §IV-D opening: note that the accountant-level dip rejection is
  fully explained by between-firm composition + integer ties per
  §III-I.4 (Scripts 39b-e), no longer "the empirical justification
  for fitting a mixture model"
- §IV-E Tables VII/VIII: K=2/K=3 component labels changed from
  "hand-leaning / mixed / replicated" to position-on-plane labels
  per §III-J recasting
- §IV-I retitled "Inter-CPA Pair-Level Coincidence Rate"; v3.x's
  "FAR" terminology retroactively reframed; references §IV-M for
  the v4 Big-4 spike (Script 40b)
- New §IV-M (7 tables XIX-XXV): v4-new anchor-based ICCR
  calibration results consolidated — composition decomposition
  (Scripts 39b-e), pair-level ICCR sweep (Script 40b), pool-
  normalised per-signature ICCR (Script 43), document-level
  ICCR by alarm definition (Script 45), firm-heterogeneity
  logistic regression + cross-firm hit matrix (Script 44),
  alert-rate sensitivity (Script 46)
- Header bumped to v3.3 (post codex rounds 21-34)

Companion to §III v7 commit 723a3f6 and Phase 4 prose v3 commit
b33e20d.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:18:59 +08:00
gbanyan b33e20d479 Rewrite Phase 4 prose v3: Abstract / §I / §V / §VI to match §III v7
Major Phase 4 prose update aligning narrative with the §III v7
anchor-based ICCR framework (codex rounds 29-34):

- Abstract (247 words, under 250 limit): replaced K=3 mixture +
  natural-threshold framing with composition decomposition +
  multi-level ICCR + firm heterogeneity. Positioning as
  specificity-proxy-anchored screening framework.

- §I Introduction:
  * Methodological-design paragraph rewritten (no natural threshold;
    multi-level reporting; per-firm stratification; unsupervised
    disclosure)
  * Two new paragraphs documenting composition decomposition
    overturning distributional path, and anchor-based three-unit
    ICCR calibration
  * Firm heterogeneity + within-firm collision concentration as
    central findings
  * Contribution list rewritten (8 items): composition decomposition
    disproves natural threshold (NEW #4); multi-level ICCR
    calibration (NEW #5); firm heterogeneity quantification (NEW #6);
    K=3 demoted to descriptive partition (#7); multi-tool validation
    ceiling positioning (#8)

- §V Discussion:
  * §V-B retitled "composition-driven multimodality"; 2x2 factorial
    decomposition reported
  * §V-C Firm A reframed: position contrast + within-firm collision
    pattern, not "templated-end calibration anchor"
  * §V-D K=2/K=3 reframed as descriptive firm-compositional
    partitions (no "mechanism boundary" language)
  * §V-E three-score convergence reinterpreted as descriptor-position
    ranking, not hand-leaning mechanism ranking
  * §V-F (new title) Anchor-based multi-level calibration with all
    three units of analysis
  * §V-G expanded to 9 v4-specific limitations (no signature-level
    ground truth; assumption-violation; scope; conservative-subset;
    inherited rule components; deployed-rate excess not TPR; A1
    stipulation; K=3 composition sensitivity; no partner-level
    mechanism attribution) plus 5 inherited limitations

- §VI Conclusion: 8-point contribution list mirroring §I; 4 future
  work directions including within-firm collision-mechanism
  disambiguation and audit-quality companion analysis.

- Header draft-note updated to v3 (post codex rounds 26-34);
  Phase 4 v2 changelog moved to CHANGELOG.md placeholder.

Companion to §III v7 commit 723a3f6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:10:04 +08:00
gbanyan 723a3f6eaf Rewrite §III v7: anchor-based ICCR framework + composition-decomp finding
Major §III restructuring after codex rounds 29-34 demolished the
distributional path to thresholds (Scripts 39b-39e prove (cos, dHash)
multimodality is composition-driven + integer-tie artefact).

v4.0 pivots to anchor-based multi-level inter-CPA coincidence-rate
(ICCR) calibration via Scripts 40b, 43, 44, 45, 46:

- §III-G: scope justification rewritten (LOOO + Firm A case study +
  within-firm collision structure; dropped "smallest scope rejects
  unimodality" rationale); added sample-size reconciliation
  (150,442 descriptor-complete vs 150,453 vector-complete; 437
  accountant-level vs 468 all)
- §III-I: new sub-section I.4 composition decomposition (2x2 factorial
  centred + jittered Big-4 pooled dh p=0.35); I.5 conclusion of no
  natural threshold
- §III-J: K=3 recast as firm-compositional descriptive partition
  (not three mechanism clusters); bridge to §III-L.4 cross-firm
  hit matrix added
- §III-K: Score 1 reframed as firm-composition position score
- §III-L: NEW major sub-section — anchor-based threshold calibration
  with L.0 methodology, L.1 per-comparison ICCR (replicates v3
  cos>0.95 -> 0.0006; new dh<=5 -> 0.0013; joint -> 0.00014),
  L.2 pool-normalised per-signature ICCR (any-pair HC 11.02%;
  per-firm A 25.94% vs B/C/D <1.5%), L.3 doc-level ICCR (HC 18%;
  HC+MC 34%), L.4 firm heterogeneity logistic OR 0.01-0.05 +
  cross-firm hit matrix (98-100% within-firm), L.5 alert-rate
  sensitivity (HC threshold locally sensitive not plateau-stable),
  L.6 observed deployed alert rate excess over inter-CPA proxy
- §III-M: NEW sub-section — multi-tool validation strategy under
  unsupervised setting; 9 partial-evidence diagnostics each with
  disclosed untested assumption; positioning as anchor-calibrated
  screening framework with human-in-the-loop review, NOT validated
  forensic detector
- Terminology: "FAR" replaced with "inter-CPA coincidence rate
  (ICCR)" throughout; primary metric name change documented in
  §III-L.0
- Provenance table: ~35 new rows for Scripts 39b-e/40b/43-46;
  "key numerical claims" instead of "every numerical claim"
- Removed v2-v6 internal changelog metadata; v7 draft note added

Codex round-32 SOUND_WITH_QUALIFICATIONS, round-33 GO_WITH_REVISIONS,
round-34 READY_WITH_NARROW_FIXES (all 8 patches applied).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:27:01 +08:00
gbanyan 6db5d635f5 Apply codex round-27 narrow fixes; Phase 4 prose v2.1
Codex round 27 returned Minor Revision: 10/11 Major + 14/15 Minor
CLOSED. Two narrow residuals applied:

  1. §V-F line 99 'all three candidate classifiers' replaced with
     'all three candidate checks' with explicit enumeration
     (the inherited box rule, the K=3 hard label, and the
     prevalence-calibrated reverse-anchor cut). Keeps the K=3
     hard label explicitly descriptive rather than operational.

  2. Close-out checklist's stale '~235 words' abstract claim
     updated to the verified 243-244 word count.

Deferred to manuscript-assembly time (not blockers for Phase 5
cross-AI peer review):
  - §II [42]-[44] citation finalisation (placeholders are
    transparent in the current draft state).
  - Internal draft notes and close-out checklists (these
    explicitly help reviewers track the convergence cycle).
  - Manuscript-level lint pass (last step before submission
    packaging).

Closure summary across 7 codex rounds (21-27):
  - Empirical: ALL Major + Minor findings CLOSED on the
    §III/§IV/Phase 4 substantive content.
  - Packaging: 2 OPEN items (§II citations, internal notes)
    intentionally deferred to manuscript-assembly time.

Phase 5 readiness: substantively YES. The §III v6 + §IV v3.2 +
Phase 4 v2.1 is converged for cross-AI peer review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
2026-05-13 00:15:35 +08:00
gbanyan 918d55154a Abstract trim: 253 -> 245 words (within IEEE Access 250-word target)
Six minor edits to reduce word count:
- 'a YOLOv11 detector localizes signatures' -> 'YOLOv11 localizes
  signatures'
- 'filed in Taiwan over 2013-2023' -> 'Taiwan audit reports
  (2013-2023)'
- 'statistical analysis is scoped to the Big-4 sub-corpus
  (437 CPAs, 150,442 signatures)' -> 'analysis is scoped to the
  Big-4 sub-corpus (437 CPAs; 150,442 signatures)'
- 'Wilson 95% upper bound 1.45%' -> 'Wilson upper bound 1.45%'
- 'cross-scope check (n = 686) preserves the K=3 + box-rule
  Spearman convergence with drift 0.007' -> 'check (n = 686)
  preserves the K=3 + box-rule Spearman convergence (drift
  0.007)'

All numerical anchors preserved. Phase 4 prose v2 now within
IEEE Access 250-word abstract limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
2026-05-12 23:57:01 +08:00
gbanyan 10c82fd446 Apply codex round-26 corrections to Phase 4 prose v2
Codex round 26 returned Major Revision on Phase 4 v1: 9 Major
findings + 12 Minor + reviewer-attack vulnerabilities. v2
applies all flagged corrections.

Abstract changes:
  - "Three independent feature-derived scores" -> "Three
    feature-derived scores ... not statistically independent
    because all three are functions of the same descriptor
    pair". Names the operational output as the inherited
    five-way classifier.
  - Trimmed from 277 to ~245 words to stay within IEEE Access
    250-word limit while keeping all numerical anchors.

§I Introduction:
  - Line 29 cross-ref §III-D -> §III-G through §III-J
    (§III-D was wrong; the methodology lives in §III-G/I/J).
  - Big-4 scope claim narrowed: "neither any single firm pooled
    alone nor the broader full-dataset variant rejects" -> "none
    of the narrower comparison scopes tested in Script 32
    rejects" with explicit enumeration (Firm A pooled alone;
    Firms B+C+D pooled; all non-Firm-A pooled).
  - "Three independent feature-derived scores" -> "Three
    feature-derived scores ... not statistically independent".
  - Contribution 4 "not at narrower scopes" -> "not in the
    narrower comparison scopes tested".
  - Contribution 8 "demonstrating pipeline reproducibility at
    multiple scopes" -> narrowed to "K=3 + box-rule
    rank-convergence reproduces at full n=686; does not
    re-validate operational thresholds / LOOO / five-way / pixel
    identity at the broader scope".
  - "external validation" softened to "annotation-free
    validation" in methodological-safeguards paragraph.
  - "(5)–(8)" pipeline stage list updated with corrected
    section references.
  - "Published box rule" -> "inherited Paper A box rule".
  - Added Big-4 pixel-identity per-firm breakdown (145/8/107/2)
    in §I body for completeness.

§II Related Work:
  - Replaced placeholder with explicit defer-to-master statement:
    v3.20.0 §II is inherited substantively unchanged in the master
    manuscript; only the LOOO addition is reproduced here.
  - "[add citation]" replaced with placeholder references
    [42] Stone 1974, [43] Geisser 1975, [44] Vehtari et al. 2017
    explicitly marked as draft references to be finalised at
    copy-edit time.
  - LOOO addition reframed: composition-sensitivity band on the
    mixture characterisation, not on the operational classifier.

§V Discussion:
  - §V-B "v4.0 inherits and confirms" softened to "v4.0 inherits
    this signature-level reading and remains consistent with it
    (no signature-level diagnostic was newly run in v4)".
  - §V-B "some CPAs are templated, some are hand-leaning, some
    are mixed" rewritten as component-membership wording: "some
    CPAs' observed signatures place their per-CPA means in the
    templated/mixed/hand-leaning region of the descriptor plane".
  - §V-B within-CPA unimodality explanation softened from
    "produces" to "can be jointly consistent" with explicit
    §III-G cross-ref.
  - §V-C Firm A byte-level provenance: 145 pixel-identical
    signatures verified in Script 40; 50 partners / 35 cross-year
    explicitly inherited from v3 / Script 28 not regenerated in
    v4 spikes.
  - §V-C "anchors §IV-H's positive-anchor miss-rate" -> "is the
    largest of the four Big-4 subsets, with full anchor pooling
    Firm A 145, Firm B 8, Firm C 107, Firm D 2".
  - §V-E "published box rule" -> "inherited Paper A box rule";
    "produce the same per-CPA ranking" -> "broadly concordant
    rankings, with residual non-Firm-A disagreement".
  - §V-G limitations expanded from 7 to 12 items: restored the
    5 v3.20.0 inherited limitations (transferred ImageNet
    features, HSV stamp-removal artifacts, longitudinal scan
    confounds, source-exemplar misattribution, legal
    interpretation).
  - §V-G scope limitation: removed unsupported "narrower or
    broader scopes" full-dataset dip-test claim.

§VI Conclusion:
  - Names operational output: "inherited Paper A five-way
    per-signature classifier with worst-case document-level
    aggregation".
  - "Cross-scope pipeline reproducibility" -> "K=3 + box-rule
    rank-convergence reproduces at full n=686; does not
    re-validate operational thresholds, LOOO, five-way classifier,
    or pixel-identity at the broader scope".
  - Future-work direction 3 explicitly qualifies the within-Big-4
    contrast as "accountant-level descriptive features of the K=3
    mixture, not validated mechanism-level claims and not
    currently linked to audit-quality outcomes".

Round 26 closure post-v2:
  - All 9 Major findings: CLOSED in v2 prose body.
  - All 12 Minor findings: CLOSED in v2 prose body.
  - Phase 5 readiness: should now move from Partial to Yes
    pending codex round 27 verification.

Provenance: codex round-26 confirmed 17/17 numerical claims in
Phase 4 v1 (only finding #5, the scope-test wording, was an
overclaim rather than a numerical error). v2 keeps all confirmed
numerics and narrows only the scope-test wording.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:50:09 +08:00
gbanyan e36c49d2d8 Add Phase 4 prose draft v1 (Abstract + I + II + V + VI)
Phase 4 first-pass draft replacing the v3.20.0 Abstract,
§I Introduction, §II Related Work, §V Discussion, and §VI
Conclusion blocks with the Big-4 reframed v4.0 prose. Single
consolidated file at paper/v4/paper_a_prose_v4_phase4.md.

Structure:
  Abstract  (~235 words, IEEE Access target <= 250)
  §I Introduction  (8-item contributions list updated for v4)
  §II Related Work  (mostly inherited; LOOO citation added)
  §V Discussion  (7 sub-sections: A-G covering distinct-problem
                  framing, accountant-level multimodality,
                  Firm A as templated-end case study, K=2
                  firm-mass conflation, K=3 reproducible shape,
                  three-score internal-consistency, pixel-
                  identity + inter-CPA validation, limitations)
  §VI Conclusion + Future Work  (4 future directions)

Key reframing decisions baked into the prose:
  - Abstract leads with Big-4 scope + dip-test multimodality +
    K=3 reproducibility + three-score convergence + 0% miss
    rate + full-dataset robustness.
  - §I positions the Big-4 sub-corpus scope as the
    methodologically privileged calibration unit ("smallest
    tested scope at which a finite-mixture model is
    statistically supportable").
  - §I-Contribution-4: Big-4 scope as substantive methodological
    finding (was v3.x "percentile-anchored operational
    threshold").
  - §I-Contribution-5: K=3 mixture as descriptive (was v3.x
    "distributional characterisation" framing).
  - §I-Contribution-6: three-score convergent internal-
    consistency (NEW in v4).
  - §I-Contribution-8: full-dataset robustness as light
    secondary scope (NEW in v4).
  - §V-D: explicit "K=2 is firm-mass driven; K=3 is
    reproducible in shape" framing — preempts the LOOO
    reviewer attack vector codex round 23 first flagged.
  - §V-G Limitations: seven explicit limitations including no
    signature-level hand-signed ground truth, pixel-identity
    conservative subset, MC band not separately v4-validated.
  - §VI Future Work: four directions including a Paper B
    placeholder for audit-quality companion analysis.

The technical §III v6 + §IV v3.2 are the foundation; this Phase
4 draft aligns the narrative with the codex-converged
methodology and results.

6 close-out items flagged at end of file (word-count check,
contribution count, LOOO citation, limitations grouping, Paper B
cross-ref, draft note stripping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:46:19 +08:00
gbanyan 6ba128ded4 Apply codex round-25 final polish: §III v6 + §IV v3.2
Codex round 25 returned Minor Revision: round-24's empirical and
cross-reference issues mostly CLOSED. Remaining items were all
partner-facing cosmetic / internal-notes hygiene.

§III v6 polish:
  1. §III:11 v5 changelog reprint of real firm names removed
     ("real firm names 'EY' and 'KPMG'" -> "real firm names/aliases")
     -- this was a self-regression I introduced in v5 while
     documenting the v5 anonymisation fix.
  2. §III:14 empirical anchor range updated:
     "Scripts 32-40" -> "Scripts 32-42" (includes Scripts 41 + 42).
  3. New v6 changelog entry added under the draft note documenting
     the round-25 fixes.
  4. Draft note version stamp refreshed: v5 -> v6.

§IV v3.2 polish:
  1. §IV draft note rewritten and version label corrected:
     "Draft v3" -> "Draft v3.2"; "post codex rounds 21-23" ->
     "post codex rounds 21-25". The v3 -> v3.1 -> v3.2 lineage is
     now recorded.
  2. §IV close-out checklist item 2 rewritten to remove residual
     "Tables IV-XVIII" wording. v3.2 explicitly states: v4 table
     sequence is Tables V-XVIII plus Table XV-B; no v4 Table IV
     is printed; the inherited v3.20.0 Table IV (per-firm
     detection counts) remains a v3.x reference only.

Verification:
  - Strict-case grep for KPMG / Deloitte / PwC / EY (with word
    boundaries) + Chinese firm names: ZERO matches in either
    file. Anonymisation is now complete throughout the
    manuscript body AND internal notes.

Round 25 closure post-polish:
  Major:     all CLOSED (round 24 Major 1 table numbering: now
             fully explicit V-XVIII + XV-B with v4 Table IV
             absent; Major 4 anonymisation: §III:11 leak removed)
  Minor:     all CLOSED (weight drift 0.023 confirmed across 4
             sites; cos <= 0.837 confirmed across 2 sites; n=686
             provenance row confirmed)
  Editorial: 1 still PARTIAL (internal draft notes + Phase 3
             close-out checklist remain in the files but
             explicitly marked "internal -- remove before
             submission"; these are author working artefacts
             intentionally retained until submission packaging)

Phase 4 readiness: technically Yes; the §III/§IV technical
content is converged across 5 codex review rounds. Internal
notes will be stripped at submission packaging time. Ready to
proceed to Phase 4 (Abstract/Intro/Discussion/Conclusion prose).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:36:16 +08:00
gbanyan 6d2eddb6e8 Apply codex round-24 final cleanup: §III v5 + §IV v3.1
Codex round 24 returned Minor Revision: 3 Major CLOSED + 3 Major
PARTIAL + 4 Minor CLOSED + 2 Minor PARTIAL + 4 Editorial CLOSED
+ 1 Editorial OPEN. All 7 narrow residual fixes were §III-side
(I applied §IV fixes thoroughly in v3 but didn't mirror them to
§III v4).

§III v5 fixes:

  1. Anonymisation leak repaired:
     - "held-out-EY fold" -> "held-out-Firm-D fold" (L71)
     - "Firms B (KPMG) and D (EY)" -> "Firms B and D" (L99)
  2. K=3 LOOO weight drift 0.025 -> 0.023 at three sites
     (L71, L115, L173 provenance table). Matches Script 37 max
     C1 weight deviation and §IV v3 line 139.
  3. §III-K positive-anchor paragraph cross-ref repaired:
     "v3.x inter-CPA negative anchor (§III-J inherited; Table X)"
     -> "(§IV-I, inheriting v3.20.0 §IV-F.1 Table X)".
  4. §III-L five-way Likely-hand-signed band made inclusive:
     "Cosine below the all-pairs KDE crossover threshold." ->
     "Cosine at or below the all-pairs KDE crossover threshold
     (cos <= 0.837)." Matches Script 42 and §IV:19.
  5. Open question 1's pointer changed from current §IV-F (which
     is Convergent Internal-Consistency Checks) to v3.20.0
     Tables IX/XI/XII/XII-B + §IV-J descriptive proportions.
  6. Provenance table: new row for full-dataset n=686 citing
     Script 41 fulldataset_report.md.
  7. Draft-note header refreshed: v3 -> v5; v4 -> v5 etc.;
     "internal -- remove before submission" tag added.

§IV v3.1 fixes:

  - Close-out checklist L262 stale "codex round 23" wording
    updated to "rounds 21-24 and before partner Jimmy review".
  - Close-out item 4 "in this v2" stale wording -> "in this v3".
  - New item 5 added: internal author notes (this checklist +
    §III cross-reference index + both files' draft-note headers)
    are author working artefacts and should be moved/stripped
    before partner / submission packaging.

Round 24 finding summary post-v5/v3.1:
  Major:     3 CLOSED, 3 -> CLOSED (anonymisation + cross-ref +
             table numbering note residuals)
  Minor:     4 CLOSED, 2 -> CLOSED (weight drift 0.025 -> 0.023;
             low-cosine inclusivity cos <= 0.837)
  Editorial: 4 CLOSED, 1 PARTIAL (draft notes remain visible but
             explicitly marked as internal-only "remove before
             submission")

Phase 4 readiness: pending decision on whether to do one more
codex verification round (round 25) before drafting Abstract /
Intro / Discussion / Conclusion prose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:26:14 +08:00
gbanyan ce33156238 Apply codex round-23 corrections: §IV v3 + §III v4
Codex round 23 returned Major Revision on §IV v2: 6 Major + 6
Minor + 5 Editorial findings. Codex confirmed the spike-script
provenance is mostly sound -- no scripts needed rerunning -- so
v3 applies presentation-level fixes only.

Decisions baked in:
  - Anonymisation: maintain Firm A-D pseudonyms throughout the
    manuscript body; remove (Deloitte) / (KPMG) / (PwC) / (EY)
    parentheticals from all v4 §IV tables.
  - Table numbering: v4 tables use fresh V-XVIII (plus Table XV-B);
    inherited v3.x tables are cited only as "v3.20.0 Table N" with
    the original v3 number, NOT renumbered into the v4 sequence.

§IV v3 changes:
  1. Detection denominator rewritten: 86,072 VLM-positive / 12
     corrupted / 86,071 YOLO-processed / 85,042 with-detections /
     182,328 signatures (matches v3.x §IV-B exact wording).
  2. All v4 table labels stripped of "(revised:" / "(NEW:"
     prefixes; replaced with clean "Table N. <descriptor>." form.
  3. Real firm names removed from all tables: 4 replace_all edits.
  4. Line 211 MC-ordering claim removed: MC occupancy is no longer
     described as "consistent with the §III-K Spearman convergence"
     because MC fraction is not monotone in per-CPA hand-leaning
     ranking. New language: descriptive only, with Firm D / Firm B
     ordering counterexample stated.
  5. Line 184 81.70% vs 82.46% qualified as "qualitative
     alignment, not like-for-like consistency check" (different
     units: per-signature class vs per-CPA hard cluster).
  6. Line 43 BD-transition "histogram-resolution artefacts"
     softened to "scope-dependent and not used operationally";
     no specific bin-width artefact claim without sensitivity
     sweep evidence.
  7. K=3 LOOO C1 weight drift corrected: 0.025 -> 0.023 (matches
     Script 37 max deviation 0.0235 / rounded 0.023).
  8. Seed coverage in §IV-A updated: "Scripts 32-42" (was
     "Scripts 32-41", missed Script 42).
  9. Low-cosine cutoff inclusivity: cos < 0.837 -> cos <= 0.837
     (matches Script 42 rule definition).
  10. "round-22 Light scope" process note removed from
      manuscript prose in §IV-K.
  11. §IV-L ablation pointer corrected: v3.20.0 §IV-I (was
      §IV-H.3); v3.20.0 Table XVIII clarified as different from
      v4 Table XVIII.
  12. Line 75 "Component recovery verified across Scripts 35,
      37, 38" rewritten: "the full-fit baseline is reproduced
      in Scripts 35, 37, 38" with explicit note that Script 37
      LOOO fold-specific components differ by design.
  13. Line 110 grammar: "This convergent-checks evidence" ->
      "These convergence checks".
  14. Draft note marked "internal -- remove before submission".

§III v4 changes (cross-reference cleanup):
  1. Line 13 cross-reference repaired: "§IV-D, §IV-F, §IV-G"
     (which are now accountant-level v4 analyses) replaced with
     accurate signature-level references (§IV-J for five-way
     counts; §IV-I for inherited inter-CPA FAR).
  2. Line 23 cross-reference repaired: "all §IV results except
     §IV-K" replaced with explicit list of v4-new vs inherited
     sub-sections.
  3. Line 109 cross-reference repaired: moderate-band capture-
     rate evidence cited as "v3.20.0 Tables IX, XI, XII, XII-B"
     (was "§IV-F", which is now Convergent Internal-Consistency
     Checks, not capture-rate).
  4. Line 131 "without recalibration" claim narrowed: §III-K's
     convergent-checks evidence is now scoped to the binary
     high-confidence rule only; the moderate-confidence band,
     style-consistency band, and document-level aggregation
     are retained by reference to v3.20.0 calibration, not
     claimed as v4.0-validated.

Outstanding open questions: 3 procedural items remain (§IV
table numbering finalisation, §IV-A-C content audit, Phase 4
prose); no methodology blockers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:03:33 +08:00
gbanyan 453f1d8768 Phase 3 close-out: Script 42 + §IV draft v2 (Table XV filled)
Script 42 tabulates the §III-L five-way per-signature classifier
output on the Big-4 sub-corpus (n=150,442 signatures classified)
and aggregates to document-level (n=75,233 unique PDFs) under
the worst-case rule.

Per-signature five-way overall (Table XV):

  HC  74,593  49.58%  high-confidence non-hand-signed
  MC  39,817  26.47%  moderate-confidence non-hand-signed
  HSC    314   0.21%  high style consistency
  UN  35,480  23.58%  uncertain
  LH     238   0.16%  likely hand-signed

Per-firm five-way (% within firm):

  Firm A (Deloitte)  HC 81.70%, MC 10.76%, UN 7.42%
  Firm B (KPMG)      HC 34.56%, MC 35.88%, UN 29.09%
  Firm C (PwC)       HC 23.75%, MC 41.44%, UN 34.21%
  Firm D (EY)        HC 24.51%, MC 29.33%, UN 45.65%

Document-level (Table XV-B, NEW):

  HC  46,857  62.28%
  MC  19,667  26.14%
  HSC    167   0.22%
  UN   8,524  11.33%
  LH      18   0.02%
  Total 75,233 unique Big-4 PDFs (single-firm 74,854; mixed-firm 379)

§IV v2 changes vs v1:
  - Table XV populated with Script 42 counts
  - Table XV-B (NEW): document-level worst-case counts
  - Per-firm five-way breakdown (% within firm) added
  - Per-firm document-level breakdown added
  - Document-level paragraph in §IV-J updated to reference Table XV-B
  - Phase 3 close-out checklist: item 1 (Table XV TBD) and item 4
    (document-level counts) marked RESOLVED; remaining items reduced
    from 5 to 3 (renumbering, content audit, codex open-questions)

The per-firm pattern is consistent with the §III-K Spearman-and-
cluster ordering: Firm A's signatures concentrate in HC (81.7%),
the three non-Firm-A firms have markedly lower HC and substantially
higher Uncertain rates (29-46%), with Firm D having the highest
Uncertain rate of the Big-4 -- consistent with the reverse-anchor
score (§III-K Score 2) ranking Firm D fractionally above Firm C in
the hand-leaning direction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:45:22 +08:00
gbanyan 165b3ab384 Add Phase 3 §IV draft v1 (Big-4 reframe + light §IV-K robustness)
Section IV expands from 8 sub-sections in v3.20.0 to 12
sub-sections (A through L) to mirror the §III-G..L lineage.

Sub-section structure:
  A Experimental Setup (inherited)
  B Signature Detection Performance (inherited)
  C All-Pairs Intra-vs-Inter Class Distribution (inherited; corpus-wide)
  D Big-4 Accountant-Level Distributional Characterisation (NEW)
    - Table V revised: Big-4 dip-test
    - Table VI revised: BD/McCrary diagnostic
  E Big-4 K=2 / K=3 Mixture Fits (NEW)
    - Table VII revised: K=2 components + bootstrap CIs
    - Table VIII revised: K=3 components
  F Convergent Internal-Consistency Checks (NEW)
    - Table IX revised: 3-score per-CPA Spearman
    - Table X revised: per-firm summary
    - Table XI revised: per-signature Cohen kappa
  G Leave-One-Firm-Out Reproducibility (NEW)
    - Table XII revised: K=2 LOOO across 4 folds
    - Table XIII revised: K=3 LOOO
  H Pixel-Identity Positive-Anchor Miss Rate
    - Table XIV revised: 0% miss rate, n=262
  I Inter-CPA Negative-Anchor FAR (inherited from v3.x §IV-F.1)
  J Five-Way Per-Signature + Document-Level Classification
    - Table XV: per-signature category counts (TBD; close-out task)
    - Table XVI NEW: firm x K=3 cluster cross-tab
  K Full-Dataset Robustness (NEW; light scope per author choice)
    - Table XVII NEW: K=3 component comparison Big-4 vs full
    - Table XVIII NEW: Spearman drift |0.0069|
  L Feature Backbone Ablation (inherited from v3.x §IV-H.3)

5 close-out items flagged at end of draft: per-signature category
counts on Big-4 subset (Table XV), table renumbering, §IV-A-C
content audit, document-level worst-case aggregation counts on
Big-4 subset, codex round-22 open questions resolved
(moderate-band inherited; firm anonymisation maintained;
table numbering set provisionally).

Empirical anchors: Scripts 32-41 on this branch. Script 41
(committed in previous commit) supplies the §IV-K Light
scope numbers; all other tables draw from Scripts 32-40
already on the branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:35:37 +08:00
gbanyan c8c7656513 Apply codex round-22 corrections to §III v3 (Minor -> ready)
Codex gpt-5.5 round 22 returned Minor Revision after v2 closed
3/5 Major findings fully and 2/5 partially. Five narrow fixes
applied for v3:

  1. Per-firm ranking unanimity corrected (v2:93). The reverse-
     anchor score ranks Firm D fractionally higher than Firm C
     (-0.7125 vs -0.7672); only Scores 1 and 3 rank Firm C
     highest. The unanimity claim was wrong; v3 prose now says
     all three agree on Firm A as most replication-dominated
     and on the non-Firm-A Big-4 as more hand-leaning, with a
     modest disagreement on Firm C vs D ordering.

  2. "Smallest scope" / "any single firm" overclaim narrowed
     (v2:21, v2:43). Script 32 only tested Firm A alone, big4_non_A
     pooled, and all_non_A pooled -- not Firms B, C, D individually.
     v3 explicitly says "comparison scopes tested in Script 32"
     and notes single-firm dip tests for B, C, D were not
     separately computed.

  3. K=3 hard label vs posterior in Spearman correctly
     attributed (v2:143). Script 38 uses the K=3 posterior P(C1),
     not the hard label, in the internal-consistency Spearman
     correlations. v3 §III-L now correctly says the hard label
     is for the §IV cluster cross-tabulation while the posterior
     is the continuous Score 1 in §III-K.

  4. Provenance source for n=150,442 corrected (v2:17, v2:152).
     Script 39 directly reports this count in its per-signature
     K=3 fit; Script 38's report does not. v3 cites Script 39 for
     this number.

  5. "Max fold-to-fold deviation" wording made precise (v2:65,
     v2:107). The $0.028$ value is the max absolute deviation
     from the across-fold mean (Script 36 stability summary), not
     the pairwise across-fold range (which is $0.0376 = 0.9756 -
     0.9380$). v3 reports both statistics with explicit
     definitions.

Also: draft note at top now records v2 (round-21) and v3
(round-22) revision lineage. Cross-reference index and open-
question block retained as author working checklist (will be
removed before manuscript submission per codex e7).

Outstanding open questions reduced to 3 (codex round-22 view):
  - Five-way moderate-confidence band: validate in Big-4 specifically
    (Phase 3 §IV-F work) or document as inherited from v3.x?
  - Firm anonymisation policy in §IV-V (procedural)
  - §IV table numbering (procedural; defer until §IV done)

Phase 2 §III draft is now Minor-Revision-quality. Ready for
Phase 3 (Results regeneration §IV).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:26:02 +08:00
gbanyan 62a22ceb83 Revise §III v4.0 draft per codex round-21 review (Major Revision -> v2)
Codex gpt-5.5 xhigh review of v1 draft returned Major Revision with
5 Major findings + 7 Minor + editorial nits. v2 addresses all of
them.

Key v2 changes:

  1. Primary classifier declared: inherited v3.x five-way per-signature
     box rule. K=3 mixture is demoted to accountant-level descriptive
     characterisation (Script 35 / Script 38 footing), explicitly NOT
     used to assign signature- or document-level labels.

  2. §III-J reframed as "Mixture Model and Accountant-Level
     Characterisation" (was "Mixture Model and Operational Threshold
     Derivation"). K=3 LOOO P2_PARTIAL verdict surfaced in prose
     including the "not predictively useful as an operational
     classifier" interpretation from the Script 37 verdict legend.

  3. §III-K renamed "Convergent Internal-Consistency Checks" (was
     "Convergent Validation") with explicit caveat that the three
     scores share underlying features and are not statistically
     independent measurements.

  4. §III-H reverse-anchor paragraph rewritten: the directional
     error in v1 (the non-Big-4 reference described as a "more-
     replicated-population baseline") is corrected -- the reference
     is in fact in the LESS-replicated regime relative to Big-4,
     and the score measures deviation in the hand-leaning direction.

  5. Pixel-identity metric renamed from "FAR" to "positive-anchor
     miss rate" with explicit conservative-subset caveat
     ("near-tautological for the box rule because byte-identical
     => cosine ~1 / dHash ~0").

  6. §III-L title changed to "Signature- and Document-Level
     Classification" (was "Per-Document Classification") and
     reorganised so the per-signature five-way rule + document-level
     worst-case aggregation are both clearly under this section.

  7. Empirical slips corrected:
     - K=2 LOOO comparison: now correctly says "5.6x the stability
       tolerance 0.005" rather than "5.6x the bootstrap CI half-width";
       full-Big-4 bootstrap half-width 0.0015 cited separately.
     - all-non-Firm-A dip: now correctly (0.998, 0.907), not "p > 0.99".
     - BD/McCrary: now narrowed to Big-4 scope (Script 34 null), with
       Script 32 dHash transitions for non-Big-4 subsets noted but
       not used as operational thresholds.
     - Firm A byte-identical "50 partners of 180 registered, 35
       cross-year" -- now explicitly inherited from v3.x §IV-F.1 /
       Script 28 / Appendix B; provenance row in the new table flags
       this as inherited, not v4-regenerated.
     - "mid/small-firm tail actively pulling" -> "the full-sample and
       Big-4-only calibrations differ" (causal language softened).
     - $\Delta\text{BIC}$ sign convention: explicit "lower BIC is
       preferred; BIC(K=3) - BIC(K=2) = -3.48".

  8. Editorial nits applied:
     - "failure rate" -> "box-rule hand-leaning rate"
     - "boundary moves modestly" -> "membership remains
       composition-sensitive"
     - "calibration uncertainty band +/- 5-13 pp" -> "observed absolute
       differences of 1.8-12.8 pp, with Firm C exceeding the 5 pp
       viability bar"
     - "strongest single methodology-validation signal" -> "strongest
       internal-consistency signal"
     - "the same component structure recovers" -> "a broadly similar
       three-component ordering recovers"
     - Cross-reference index marked as author checklist (remove
       before submission).

  9. New provenance table at end of §III mapping every numerical claim
     to (script, source, direct/derived/inherited).

  10. Open questions reduced from 5 to 3 (codex resolved questions 2,
      3, 4 with concrete answers); remaining 3 are forward-looking
      (5-way moderate band, pseudonym consistency, table numbering).

Also commits: paper/codex_review_gpt55_v4_round1.md (codex review
artifact, 143 lines).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:49:59 +08:00
gbanyan a06e9456e6 Add Phase 2 §III-G..L methodology rewrite (v4.0 draft)
Single consolidated draft of Section III sub-sections G through L,
replacing the v3.20.0 §III-G..L block with the Big-4 reframe.

Sub-sections (note: G/H/I/J/K/L written together to keep cross-
references coherent; user originally requested G/I/J/L only but
H rewrite and new K were required for cohesion):

  G Unit of Analysis and Scope
    -- accountant unit defined; Big-4 scope justified by
       within-pool homogeneity, dip-test multimodality,
       LOOO feasibility.
  H Reference Populations
    -- Firm A pivots from "calibration anchor" to "templated-end
       case study"; non-Big-4 added as reverse-anchor reference.
  I Distributional Characterisation
    -- dip-test multimodality at Big-4 level (p < 1e-4 both axes);
       BD/McCrary null as honest density-smoothness diagnostic.
  J Mixture Model and Operational Threshold Derivation
    -- K=2 vs K=3 fits reported; K=3 selected with rationale
       deferred to §III-K LOOO evidence.
  K Convergent Validation (NEW in v4.0)
    -- three-lens Spearman convergence (rho >= 0.879);
       per-signature K=3 fit (kappa = 0.870 vs per-CPA);
       K=2 LOOO UNSTABLE / K=3 LOOO PARTIAL;
       pixel-identity FAR 0% on 262 ground-truth signatures.
  L Per-Document Classification
    -- inherits v3.x five-way box rule for continuity;
       K=3 alternative output documented.

Includes: cross-reference index, script-to-section evidence map
(linking each empirical claim to the spike Script 32-40 commit),
and 5 open questions flagged at the end for partner / reviewer
review of this draft.

Output: paper/v4/paper_a_methodology_v4_section_iii.md (single
file replacing the v3.20.0 §III-G..L block on this branch only;
v3.20.0 paper/paper_a_methodology_v3.md left untouched).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:15:36 +08:00