Paper A v3.18.4: address codex GPT-5.5 round-18 self-comparing review findings

Codex round-18 (paper/codex_review_gpt55_v3_18_3.md) caught a falsified
provenance claim I introduced in v3.18.3 plus four cleaner narrative items
that survived the prior 17 rounds. Verdict was Minor Revision; this
commit closes all 5 actionable items.

- Harmonize signature_analysis/28_byte_identity_decomposition.py to use
  accountants.firm (joined on signatures.assigned_accountant) for Firm A
  membership, matching the convention in 24_validation_recalibration.py.
  Regenerated reports/byte_identity_decomp/byte_identity_decomposition.json.
  Cross-firm convergence now reports Firm A 49,389 / 55,922 = 88.32% and
  Non-Firm-A 27,595 / 65,514 = 42.12% (percentages unchanged at two
  decimal places; counts now match Table IX exactly).
- Replace the Section IV-H.2 reconciliation note. The previous note
  speculated that the one-record discrepancy was a snapshot/floating-point
  artifact, which codex round-18 falsified by direct DB queries: the real
  cause was that script 28 used signatures.excel_firm while Table IX uses
  accountants.firm. With script 28 now harmonized, Table IX and the
  cross-firm artifact agree exactly at 55,922; the new note documents the
  Firm A grouping convention plus the dHash-non-null filter.
- Replace residual "known-majority-positive" wording with
  "replication-dominated" in Introduction (contributions 4 and 6) and
  Methodology III-I (anchor-rationale paragraph).
- Correct Methodology III-G's auditor-year description: the per-signature
  best-match cosine that feeds each auditor-year mean is computed against
  the full same-CPA cross-year pool, not within-year only. The aggregation
  unit is within-year, but the underlying similarity statistic is not.
- Add the 145 / 50 / 180 / 35 Firm A byte-decomposition sentence to
  Results IV-F.1 with explicit pointer to script 28 and the JSON artifact;
  this resolves the round-18 finding that several manuscript locations
  cited IV-F.1 for a decomposition that was not actually reported there.
- Rebuild Paper_A_IEEE_Access_Draft_v3.docx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-27 20:59:07 +08:00
parent 26b934c429
commit 6b64eabbfb
5 changed files with 21 additions and 14 deletions
@@ -16,6 +16,12 @@ lacked dedicated provenance (codex review v3.18.1 items #7 and #8):
the fraction with min_dhash_independent <= 5, broken out by
Firm A vs Non-Firm-A.
Firm A membership is defined throughout via accountants.firm (the CPA
registry firm) joined on signatures.assigned_accountant. This matches
the convention used by signature_analysis/24_validation_recalibration.py
and the validation_recalibration JSON, so counts are directly comparable
to Tables IX / XI / XII.
Output:
/Volumes/NV2/PDF-Processing/signature-analysis/reports/byte_identity_decomp/
byte_identity_decomposition.json
@@ -57,9 +63,10 @@ def byte_identity_decomposition(conn):
s1.year_month AS ym_a,
s2.year_month AS ym_b
FROM signatures s1
JOIN accountants a ON s1.assigned_accountant = a.name
JOIN signatures s2 ON s1.closest_match_file = s2.image_filename
WHERE s1.pixel_identical_to_closest = 1
AND s1.excel_firm = ?
AND a.firm = ?
)
SELECT
COUNT(*) AS total_pixel_identical_firm_a,
@@ -94,15 +101,15 @@ def cross_firm_dual_convergence(conn):
cur.execute("""
SELECT
CASE WHEN excel_firm = ? THEN 'Firm A' ELSE 'Non-Firm-A' END
CASE WHEN a.firm = ? THEN 'Firm A' ELSE 'Non-Firm-A' END
AS firm_group,
COUNT(*) AS n_signatures_above_095,
SUM(CASE WHEN min_dhash_independent <= 5 THEN 1 ELSE 0 END)
SUM(CASE WHEN s.min_dhash_independent <= 5 THEN 1 ELSE 0 END)
AS n_dhash_le_5
FROM signatures
WHERE max_similarity_to_same_accountant > 0.95
AND assigned_accountant IS NOT NULL
AND min_dhash_independent IS NOT NULL
FROM signatures s
JOIN accountants a ON s.assigned_accountant = a.name
WHERE s.max_similarity_to_same_accountant > 0.95
AND s.min_dhash_independent IS NOT NULL
GROUP BY firm_group
ORDER BY firm_group
""", (FIRM_A,))