165b3ab384237b8946beef3798c2bd4ebca6d301
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
165b3ab384 |
Add Phase 3 §IV draft v1 (Big-4 reframe + light §IV-K robustness)
Section IV expands from 8 sub-sections in v3.20.0 to 12
sub-sections (A through L) to mirror the §III-G..L lineage.
Sub-section structure:
A Experimental Setup (inherited)
B Signature Detection Performance (inherited)
C All-Pairs Intra-vs-Inter Class Distribution (inherited; corpus-wide)
D Big-4 Accountant-Level Distributional Characterisation (NEW)
- Table V revised: Big-4 dip-test
- Table VI revised: BD/McCrary diagnostic
E Big-4 K=2 / K=3 Mixture Fits (NEW)
- Table VII revised: K=2 components + bootstrap CIs
- Table VIII revised: K=3 components
F Convergent Internal-Consistency Checks (NEW)
- Table IX revised: 3-score per-CPA Spearman
- Table X revised: per-firm summary
- Table XI revised: per-signature Cohen kappa
G Leave-One-Firm-Out Reproducibility (NEW)
- Table XII revised: K=2 LOOO across 4 folds
- Table XIII revised: K=3 LOOO
H Pixel-Identity Positive-Anchor Miss Rate
- Table XIV revised: 0% miss rate, n=262
I Inter-CPA Negative-Anchor FAR (inherited from v3.x §IV-F.1)
J Five-Way Per-Signature + Document-Level Classification
- Table XV: per-signature category counts (TBD; close-out task)
- Table XVI NEW: firm x K=3 cluster cross-tab
K Full-Dataset Robustness (NEW; light scope per author choice)
- Table XVII NEW: K=3 component comparison Big-4 vs full
- Table XVIII NEW: Spearman drift |0.0069|
L Feature Backbone Ablation (inherited from v3.x §IV-H.3)
5 close-out items flagged at end of draft: per-signature category
counts on Big-4 subset (Table XV), table renumbering, §IV-A-C
content audit, document-level worst-case aggregation counts on
Big-4 subset, codex round-22 open questions resolved
(moderate-band inherited; firm anonymisation maintained;
table numbering set provisionally).
Empirical anchors: Scripts 32-41 on this branch. Script 41
(committed in previous commit) supplies the §IV-K Light
scope numbers; all other tables draw from Scripts 32-40
already on the branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c8c7656513 |
Apply codex round-22 corrections to §III v3 (Minor -> ready)
Codex gpt-5.5 round 22 returned Minor Revision after v2 closed
3/5 Major findings fully and 2/5 partially. Five narrow fixes
applied for v3:
1. Per-firm ranking unanimity corrected (v2:93). The reverse-
anchor score ranks Firm D fractionally higher than Firm C
(-0.7125 vs -0.7672); only Scores 1 and 3 rank Firm C
highest. The unanimity claim was wrong; v3 prose now says
all three agree on Firm A as most replication-dominated
and on the non-Firm-A Big-4 as more hand-leaning, with a
modest disagreement on Firm C vs D ordering.
2. "Smallest scope" / "any single firm" overclaim narrowed
(v2:21, v2:43). Script 32 only tested Firm A alone, big4_non_A
pooled, and all_non_A pooled -- not Firms B, C, D individually.
v3 explicitly says "comparison scopes tested in Script 32"
and notes single-firm dip tests for B, C, D were not
separately computed.
3. K=3 hard label vs posterior in Spearman correctly
attributed (v2:143). Script 38 uses the K=3 posterior P(C1),
not the hard label, in the internal-consistency Spearman
correlations. v3 §III-L now correctly says the hard label
is for the §IV cluster cross-tabulation while the posterior
is the continuous Score 1 in §III-K.
4. Provenance source for n=150,442 corrected (v2:17, v2:152).
Script 39 directly reports this count in its per-signature
K=3 fit; Script 38's report does not. v3 cites Script 39 for
this number.
5. "Max fold-to-fold deviation" wording made precise (v2:65,
v2:107). The $0.028$ value is the max absolute deviation
from the across-fold mean (Script 36 stability summary), not
the pairwise across-fold range (which is $0.0376 = 0.9756 -
0.9380$). v3 reports both statistics with explicit
definitions.
Also: draft note at top now records v2 (round-21) and v3
(round-22) revision lineage. Cross-reference index and open-
question block retained as author working checklist (will be
removed before manuscript submission per codex e7).
Outstanding open questions reduced to 3 (codex round-22 view):
- Five-way moderate-confidence band: validate in Big-4 specifically
(Phase 3 §IV-F work) or document as inherited from v3.x?
- Firm anonymisation policy in §IV-V (procedural)
- §IV table numbering (procedural; defer until §IV done)
Phase 2 §III draft is now Minor-Revision-quality. Ready for
Phase 3 (Results regeneration §IV).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
62a22ceb83 |
Revise §III v4.0 draft per codex round-21 review (Major Revision -> v2)
Codex gpt-5.5 xhigh review of v1 draft returned Major Revision with
5 Major findings + 7 Minor + editorial nits. v2 addresses all of
them.
Key v2 changes:
1. Primary classifier declared: inherited v3.x five-way per-signature
box rule. K=3 mixture is demoted to accountant-level descriptive
characterisation (Script 35 / Script 38 footing), explicitly NOT
used to assign signature- or document-level labels.
2. §III-J reframed as "Mixture Model and Accountant-Level
Characterisation" (was "Mixture Model and Operational Threshold
Derivation"). K=3 LOOO P2_PARTIAL verdict surfaced in prose
including the "not predictively useful as an operational
classifier" interpretation from the Script 37 verdict legend.
3. §III-K renamed "Convergent Internal-Consistency Checks" (was
"Convergent Validation") with explicit caveat that the three
scores share underlying features and are not statistically
independent measurements.
4. §III-H reverse-anchor paragraph rewritten: the directional
error in v1 (the non-Big-4 reference described as a "more-
replicated-population baseline") is corrected -- the reference
is in fact in the LESS-replicated regime relative to Big-4,
and the score measures deviation in the hand-leaning direction.
5. Pixel-identity metric renamed from "FAR" to "positive-anchor
miss rate" with explicit conservative-subset caveat
("near-tautological for the box rule because byte-identical
=> cosine ~1 / dHash ~0").
6. §III-L title changed to "Signature- and Document-Level
Classification" (was "Per-Document Classification") and
reorganised so the per-signature five-way rule + document-level
worst-case aggregation are both clearly under this section.
7. Empirical slips corrected:
- K=2 LOOO comparison: now correctly says "5.6x the stability
tolerance 0.005" rather than "5.6x the bootstrap CI half-width";
full-Big-4 bootstrap half-width 0.0015 cited separately.
- all-non-Firm-A dip: now correctly (0.998, 0.907), not "p > 0.99".
- BD/McCrary: now narrowed to Big-4 scope (Script 34 null), with
Script 32 dHash transitions for non-Big-4 subsets noted but
not used as operational thresholds.
- Firm A byte-identical "50 partners of 180 registered, 35
cross-year" -- now explicitly inherited from v3.x §IV-F.1 /
Script 28 / Appendix B; provenance row in the new table flags
this as inherited, not v4-regenerated.
- "mid/small-firm tail actively pulling" -> "the full-sample and
Big-4-only calibrations differ" (causal language softened).
- $\Delta\text{BIC}$ sign convention: explicit "lower BIC is
preferred; BIC(K=3) - BIC(K=2) = -3.48".
8. Editorial nits applied:
- "failure rate" -> "box-rule hand-leaning rate"
- "boundary moves modestly" -> "membership remains
composition-sensitive"
- "calibration uncertainty band +/- 5-13 pp" -> "observed absolute
differences of 1.8-12.8 pp, with Firm C exceeding the 5 pp
viability bar"
- "strongest single methodology-validation signal" -> "strongest
internal-consistency signal"
- "the same component structure recovers" -> "a broadly similar
three-component ordering recovers"
- Cross-reference index marked as author checklist (remove
before submission).
9. New provenance table at end of §III mapping every numerical claim
to (script, source, direct/derived/inherited).
10. Open questions reduced from 5 to 3 (codex resolved questions 2,
3, 4 with concrete answers); remaining 3 are forward-looking
(5-way moderate band, pseudonym consistency, table numbering).
Also commits: paper/codex_review_gpt55_v4_round1.md (codex review
artifact, 143 lines).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a06e9456e6 |
Add Phase 2 §III-G..L methodology rewrite (v4.0 draft)
Single consolidated draft of Section III sub-sections G through L,
replacing the v3.20.0 §III-G..L block with the Big-4 reframe.
Sub-sections (note: G/H/I/J/K/L written together to keep cross-
references coherent; user originally requested G/I/J/L only but
H rewrite and new K were required for cohesion):
G Unit of Analysis and Scope
-- accountant unit defined; Big-4 scope justified by
within-pool homogeneity, dip-test multimodality,
LOOO feasibility.
H Reference Populations
-- Firm A pivots from "calibration anchor" to "templated-end
case study"; non-Big-4 added as reverse-anchor reference.
I Distributional Characterisation
-- dip-test multimodality at Big-4 level (p < 1e-4 both axes);
BD/McCrary null as honest density-smoothness diagnostic.
J Mixture Model and Operational Threshold Derivation
-- K=2 vs K=3 fits reported; K=3 selected with rationale
deferred to §III-K LOOO evidence.
K Convergent Validation (NEW in v4.0)
-- three-lens Spearman convergence (rho >= 0.879);
per-signature K=3 fit (kappa = 0.870 vs per-CPA);
K=2 LOOO UNSTABLE / K=3 LOOO PARTIAL;
pixel-identity FAR 0% on 262 ground-truth signatures.
L Per-Document Classification
-- inherits v3.x five-way box rule for continuity;
K=3 alternative output documented.
Includes: cross-reference index, script-to-section evidence map
(linking each empirical claim to the spike Script 32-40 commit),
and 5 open questions flagged at the end for partner / reviewer
review of this draft.
Output: paper/v4/paper_a_methodology_v4_section_iii.md (single
file replacing the v3.20.0 §III-G..L block on this branch only;
v3.20.0 paper/paper_a_methodology_v3.md left untouched).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|