pdf_signature_extraction

Author	SHA1	Message	Date
gbanyan	6db5d635f5	Apply codex round-27 narrow fixes; Phase 4 prose v2.1 Codex round 27 returned Minor Revision: 10/11 Major + 14/15 Minor CLOSED. Two narrow residuals applied: 1. §V-F line 99 'all three candidate classifiers' replaced with 'all three candidate checks' with explicit enumeration (the inherited box rule, the K=3 hard label, and the prevalence-calibrated reverse-anchor cut). Keeps the K=3 hard label explicitly descriptive rather than operational. 2. Close-out checklist's stale '~235 words' abstract claim updated to the verified 243-244 word count. Deferred to manuscript-assembly time (not blockers for Phase 5 cross-AI peer review): - §II [42]-[44] citation finalisation (placeholders are transparent in the current draft state). - Internal draft notes and close-out checklists (these explicitly help reviewers track the convergence cycle). - Manuscript-level lint pass (last step before submission packaging). Closure summary across 7 codex rounds (21-27): - Empirical: ALL Major + Minor findings CLOSED on the §III/§IV/Phase 4 substantive content. - Packaging: 2 OPEN items (§II citations, internal notes) intentionally deferred to manuscript-assembly time. Phase 5 readiness: substantively YES. The §III v6 + §IV v3.2 + Phase 4 v2.1 is converged for cross-AI peer review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> EOF	2026-05-13 00:15:35 +08:00
gbanyan	918d55154a	Abstract trim: 253 -> 245 words (within IEEE Access 250-word target) Six minor edits to reduce word count: - 'a YOLOv11 detector localizes signatures' -> 'YOLOv11 localizes signatures' - 'filed in Taiwan over 2013-2023' -> 'Taiwan audit reports (2013-2023)' - 'statistical analysis is scoped to the Big-4 sub-corpus (437 CPAs, 150,442 signatures)' -> 'analysis is scoped to the Big-4 sub-corpus (437 CPAs; 150,442 signatures)' - 'Wilson 95% upper bound 1.45%' -> 'Wilson upper bound 1.45%' - 'cross-scope check (n = 686) preserves the K=3 + box-rule Spearman convergence with drift 0.007' -> 'check (n = 686) preserves the K=3 + box-rule Spearman convergence (drift 0.007)' All numerical anchors preserved. Phase 4 prose v2 now within IEEE Access 250-word abstract limit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> EOF	2026-05-12 23:57:01 +08:00
gbanyan	10c82fd446	Apply codex round-26 corrections to Phase 4 prose v2 Codex round 26 returned Major Revision on Phase 4 v1: 9 Major findings + 12 Minor + reviewer-attack vulnerabilities. v2 applies all flagged corrections. Abstract changes: - "Three independent feature-derived scores" -> "Three feature-derived scores ... not statistically independent because all three are functions of the same descriptor pair". Names the operational output as the inherited five-way classifier. - Trimmed from 277 to ~245 words to stay within IEEE Access 250-word limit while keeping all numerical anchors. §I Introduction: - Line 29 cross-ref §III-D -> §III-G through §III-J (§III-D was wrong; the methodology lives in §III-G/I/J). - Big-4 scope claim narrowed: "neither any single firm pooled alone nor the broader full-dataset variant rejects" -> "none of the narrower comparison scopes tested in Script 32 rejects" with explicit enumeration (Firm A pooled alone; Firms B+C+D pooled; all non-Firm-A pooled). - "Three independent feature-derived scores" -> "Three feature-derived scores ... not statistically independent". - Contribution 4 "not at narrower scopes" -> "not in the narrower comparison scopes tested". - Contribution 8 "demonstrating pipeline reproducibility at multiple scopes" -> narrowed to "K=3 + box-rule rank-convergence reproduces at full n=686; does not re-validate operational thresholds / LOOO / five-way / pixel identity at the broader scope". - "external validation" softened to "annotation-free validation" in methodological-safeguards paragraph. - "(5)–(8)" pipeline stage list updated with corrected section references. - "Published box rule" -> "inherited Paper A box rule". - Added Big-4 pixel-identity per-firm breakdown (145/8/107/2) in §I body for completeness. §II Related Work: - Replaced placeholder with explicit defer-to-master statement: v3.20.0 §II is inherited substantively unchanged in the master manuscript; only the LOOO addition is reproduced here. - "[add citation]" replaced with placeholder references [42] Stone 1974, [43] Geisser 1975, [44] Vehtari et al. 2017 explicitly marked as draft references to be finalised at copy-edit time. - LOOO addition reframed: composition-sensitivity band on the mixture characterisation, not on the operational classifier. §V Discussion: - §V-B "v4.0 inherits and confirms" softened to "v4.0 inherits this signature-level reading and remains consistent with it (no signature-level diagnostic was newly run in v4)". - §V-B "some CPAs are templated, some are hand-leaning, some are mixed" rewritten as component-membership wording: "some CPAs' observed signatures place their per-CPA means in the templated/mixed/hand-leaning region of the descriptor plane". - §V-B within-CPA unimodality explanation softened from "produces" to "can be jointly consistent" with explicit §III-G cross-ref. - §V-C Firm A byte-level provenance: 145 pixel-identical signatures verified in Script 40; 50 partners / 35 cross-year explicitly inherited from v3 / Script 28 not regenerated in v4 spikes. - §V-C "anchors §IV-H's positive-anchor miss-rate" -> "is the largest of the four Big-4 subsets, with full anchor pooling Firm A 145, Firm B 8, Firm C 107, Firm D 2". - §V-E "published box rule" -> "inherited Paper A box rule"; "produce the same per-CPA ranking" -> "broadly concordant rankings, with residual non-Firm-A disagreement". - §V-G limitations expanded from 7 to 12 items: restored the 5 v3.20.0 inherited limitations (transferred ImageNet features, HSV stamp-removal artifacts, longitudinal scan confounds, source-exemplar misattribution, legal interpretation). - §V-G scope limitation: removed unsupported "narrower or broader scopes" full-dataset dip-test claim. §VI Conclusion: - Names operational output: "inherited Paper A five-way per-signature classifier with worst-case document-level aggregation". - "Cross-scope pipeline reproducibility" -> "K=3 + box-rule rank-convergence reproduces at full n=686; does not re-validate operational thresholds, LOOO, five-way classifier, or pixel-identity at the broader scope". - Future-work direction 3 explicitly qualifies the within-Big-4 contrast as "accountant-level descriptive features of the K=3 mixture, not validated mechanism-level claims and not currently linked to audit-quality outcomes". Round 26 closure post-v2: - All 9 Major findings: CLOSED in v2 prose body. - All 12 Minor findings: CLOSED in v2 prose body. - Phase 5 readiness: should now move from Partial to Yes pending codex round 27 verification. Provenance: codex round-26 confirmed 17/17 numerical claims in Phase 4 v1 (only finding #5, the scope-test wording, was an overclaim rather than a numerical error). v2 keeps all confirmed numerics and narrows only the scope-test wording. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:50:09 +08:00
gbanyan	e36c49d2d8	Add Phase 4 prose draft v1 (Abstract + I + II + V + VI) Phase 4 first-pass draft replacing the v3.20.0 Abstract, §I Introduction, §II Related Work, §V Discussion, and §VI Conclusion blocks with the Big-4 reframed v4.0 prose. Single consolidated file at paper/v4/paper_a_prose_v4_phase4.md. Structure: Abstract (~235 words, IEEE Access target <= 250) §I Introduction (8-item contributions list updated for v4) §II Related Work (mostly inherited; LOOO citation added) §V Discussion (7 sub-sections: A-G covering distinct-problem framing, accountant-level multimodality, Firm A as templated-end case study, K=2 firm-mass conflation, K=3 reproducible shape, three-score internal-consistency, pixel- identity + inter-CPA validation, limitations) §VI Conclusion + Future Work (4 future directions) Key reframing decisions baked into the prose: - Abstract leads with Big-4 scope + dip-test multimodality + K=3 reproducibility + three-score convergence + 0% miss rate + full-dataset robustness. - §I positions the Big-4 sub-corpus scope as the methodologically privileged calibration unit ("smallest tested scope at which a finite-mixture model is statistically supportable"). - §I-Contribution-4: Big-4 scope as substantive methodological finding (was v3.x "percentile-anchored operational threshold"). - §I-Contribution-5: K=3 mixture as descriptive (was v3.x "distributional characterisation" framing). - §I-Contribution-6: three-score convergent internal- consistency (NEW in v4). - §I-Contribution-8: full-dataset robustness as light secondary scope (NEW in v4). - §V-D: explicit "K=2 is firm-mass driven; K=3 is reproducible in shape" framing — preempts the LOOO reviewer attack vector codex round 23 first flagged. - §V-G Limitations: seven explicit limitations including no signature-level hand-signed ground truth, pixel-identity conservative subset, MC band not separately v4-validated. - §VI Future Work: four directions including a Paper B placeholder for audit-quality companion analysis. The technical §III v6 + §IV v3.2 are the foundation; this Phase 4 draft aligns the narrative with the codex-converged methodology and results. 6 close-out items flagged at end of file (word-count check, contribution count, LOOO citation, limitations grouping, Paper B cross-ref, draft note stripping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:46:19 +08:00

4 Commits