Files
pdf_signature_extraction/paper/build_docx.sh
T
gbanyan 939a348da4 Add Paper A (IEEE TAI) complete draft with Firm A-calibrated dual-method classification
Paper draft includes all sections (Abstract through Conclusion), 36 references,
and supporting scripts. Key methodology: Cosine similarity + dHash dual-method
verification with thresholds calibrated against known-replication firm (Firm A).

Includes:
- 8 section markdown files (paper_a_*.md)
- Ablation study script (ResNet-50 vs VGG-16 vs EfficientNet-B0)
- Recalibrated classification script (84,386 PDFs, 5-tier system)
- Figure generation and Word export scripts
- Citation renumbering script ([1]-[36])
- Signature analysis pipeline (12 steps)
- YOLO extraction scripts

Three rounds of AI review completed (GPT-5.4, Claude Opus 4.6, Gemini 3 Pro).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:05:33 +08:00

84 lines
2.7 KiB
Bash

#!/bin/bash
# Build complete Paper A Word document from section markdown files
# Uses pandoc with embedded figures
PAPER_DIR="/Volumes/NV2/pdf_recognize/paper"
FIG_DIR="/Volumes/NV2/PDF-Processing/signature-analysis/paper_figures"
OUTPUT="$PAPER_DIR/Paper_A_IEEE_TAI_Draft_v2.docx"
# Create combined markdown with title page
cat > "$PAPER_DIR/_combined.md" << 'TITLEEOF'
---
title: "Automated Detection of Digitally Replicated Signatures in Large-Scale Financial Audit Reports"
author: "[Authors removed for double-blind review]"
date: ""
geometry: margin=1in
fontsize: 11pt
---
TITLEEOF
# Append each section (strip the # heading line if it duplicates)
for section in \
paper_a_abstract.md \
paper_a_impact_statement.md \
paper_a_introduction.md \
paper_a_related_work.md \
paper_a_methodology.md \
paper_a_results.md \
paper_a_discussion.md \
paper_a_conclusion.md \
paper_a_references.md
do
echo "" >> "$PAPER_DIR/_combined.md"
# Strip HTML comments and append
sed '/^<!--/,/-->$/d' "$PAPER_DIR/$section" >> "$PAPER_DIR/_combined.md"
echo "" >> "$PAPER_DIR/_combined.md"
done
# Insert figure references as actual images
# Fig 1 after "Fig. 1 illustrates"
sed -i '' "s|Fig. 1 illustrates the overall architecture.|Fig. 1 illustrates the overall architecture.\n\n![Fig. 1. Pipeline architecture for automated signature replication detection.]($FIG_DIR/fig1_pipeline.png){width=100%}\n|" "$PAPER_DIR/_combined.md"
# Fig 2 after "Fig. 2 presents the cosine"
sed -i '' "s|Fig. 2 presents the cosine similarity distributions|Fig. 2 presents the cosine similarity distributions|" "$PAPER_DIR/_combined.md"
sed -i '' "/^Fig. 2 presents the cosine/a\\
\\
![Fig. 2. Cosine similarity distributions: intra-class vs. inter-class. KDE crossover at 0.837.]($FIG_DIR/fig2_intra_inter_kde.png){width=60%}\\
" "$PAPER_DIR/_combined.md"
# Fig 3 after "Fig. 3 presents"
sed -i '' "/^Fig. 3 presents/a\\
\\
![Fig. 3. Per-signature best-match cosine similarity: Firm A vs. other CPAs.]($FIG_DIR/fig3_firm_a_calibration.png){width=60%}\\
" "$PAPER_DIR/_combined.md"
# Fig 4 after "we compared three pre-trained"
sed -i '' "/^To validate the choice of ResNet-50.*we conducted/a\\
\\
![Fig. 4. Ablation study: backbone comparison.]($FIG_DIR/fig4_ablation.png){width=100%}\\
" "$PAPER_DIR/_combined.md"
# Build with pandoc
pandoc "$PAPER_DIR/_combined.md" \
-o "$OUTPUT" \
--reference-doc=/dev/null \
-f markdown \
--wrap=none \
2>&1
# If reference-doc fails, try without it
if [ $? -ne 0 ]; then
pandoc "$PAPER_DIR/_combined.md" \
-o "$OUTPUT" \
-f markdown \
--wrap=none \
2>&1
fi
# Clean up
rm -f "$PAPER_DIR/_combined.md"
echo "Output: $OUTPUT"
ls -lh "$OUTPUT"