# Paper Outline: Expert-Augmented LLM Ideation ## Suggested Titles 1. **"Breaking Semantic Gravity: Expert-Augmented LLM Ideation for Enhanced Creativity"** 2. "Beyond Interpolation: Multi-Expert Perspectives for Combinatorial Innovation" 3. "Escaping the Relevance Trap: Structured Expert Frameworks for Creative AI" 4. "From Crowd to Expert: Simulating Diverse Perspectives for LLM-Based Ideation" --- ## Abstract (Draft) Large Language Models (LLMs) are increasingly used for creative ideation, yet they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs clustered around high-probability regions of their training distribution. This limits the novelty and diversity of generated ideas. We investigate two complementary strategies to overcome this limitation: (1) **attribute decomposition**, which structures the problem space before creative exploration, and (2) **expert perspective transformation**, which conditions LLM generation on simulated domain-expert viewpoints. Through a 2×2 factorial experiment comparing Direct generation, Expert-Only, Attribute-Only, and Full Pipeline (both factors combined), we demonstrate that each factor independently improves semantic diversity, with the combination producing super-additive effects. Our Full Pipeline achieves [X]% higher semantic diversity and [Y]% lower patent overlap compared to direct generation. We contribute a theoretical framework explaining LLM creativity limitations and an open-source system for innovation ideation. --- ## 1. Introduction ### 1.1 The Promise and Problem of LLM Creativity - LLMs widely adopted for creative tasks - Initial enthusiasm: infinite idea generation - Emerging concern: quality and diversity issues ### 1.2 The Semantic Gravity Problem - Define the phenomenon - Why it occurs (statistical learning, mode collapse) - Why it matters (innovation requires novelty) ### 1.3 Our Solution: Expert-Augmented Ideation - Brief overview of the approach - Key insight: expert perspectives as semantic "escape velocity" - Contributions preview ### 1.4 Paper Organization - Roadmap for the rest of the paper --- ## 2. Related Work ### 2.1 Theoretical Foundations - Semantic distance and creativity (Mednick, 1962) - Conceptual blending theory (Fauconnier & Turner) - Design fixation (Jansson & Smith) - Constraint-based creativity ### 2.2 LLM Limitations in Creative Generation - Design fixation from AI (CHI 2024) - Dual mechanisms: inspiration vs. fixation - Bias and pattern perpetuation ### 2.3 Persona-Based Prompting - PersonaFlow (2024) - BILLY persona vectors (2025) - Quantifying persona effects (ACL 2024) ### 2.4 Creativity Support Tools - Wisdom of crowds approaches - Human-AI collaboration in ideation - Evaluation methods (CAT, semantic distance) ### 2.5 Positioning Our Work **Key distinction from PersonaFlow (closest related work)**: ``` PersonaFlow: Query → Experts → Ideas (no problem structure) Our approach: Query → Attributes → (Attributes × Experts) → Ideas ``` - PersonaFlow applies experts to whole query; we apply experts to decomposed attributes - PersonaFlow cannot isolate what helps; our 2×2 factorial design tests each factor - We hypothesize attribute decomposition **amplifies** expert effectiveness (interaction effect) - PersonaFlow showed experts help; we test whether **structuring the problem first** makes experts more effective --- ## 3. System Design ### 3.1 Overview - Pipeline diagram - Design rationale ### 3.2 Attribute Decomposition - Category analysis (dynamic vs. fixed) - Attribute generation per category - DAG relationship mapping ### 3.3 Expert Team Generation - Expert sources: LLM-generated, curated, external databases - Diversity optimization strategies - Domain coverage considerations ### 3.4 Expert Transformation - Conditioning mechanism - Keyword generation - Description generation - Parallel processing for efficiency ### 3.5 Semantic Deduplication - Embedding-based approach - LLM-based approach - Threshold selection ### 3.6 Novelty Validation - Patent search integration - Overlap scoring --- ## 4. Experiments ### 4.1 Research Questions - RQ1: Does attribute decomposition improve semantic diversity? - RQ2: Does expert perspective transformation improve semantic diversity? - RQ3: Is there an interaction effect between the two factors? - RQ4: Which combination produces the highest patent novelty? - RQ5: How do expert sources (LLM vs Curated vs External) affect quality? - RQ6: What is the hallucination/nonsense rate of context-free keyword generation? ### 4.1.1 Design Note: Context-Free Keyword Generation Our system intentionally excludes the original query during keyword generation: - Stage 1: Expert sees attribute only (e.g., "wood" + "accountant"), NOT the query ("chair") - Stage 2: Expert applies keyword to original query with context - Rationale: Maximize semantic distance for novelty - Risk: Some ideas may be too distant (nonsense/hallucination) - RQ6 investigates this tradeoff ### 4.2 Experimental Setup #### 4.2.1 Dataset - 30 queries for ideation (see experimental_protocol.md) - Selection criteria: diverse domains, complexity levels - Categories: everyday objects, technology/tools, services/systems #### 4.2.2 Conditions (2×2 Factorial Design) | Condition | Attributes | Experts | Description | |-----------|------------|---------|-------------| | **C1: Direct** | ❌ | ❌ | Baseline: "Generate 20 creative ideas for [query]" | | **C2: Expert-Only** | ❌ | ✅ | Expert personas generate for whole query | | **C3: Attribute-Only** | ✅ | ❌ | Decompose query, direct generate per attribute | | **C4: Full Pipeline** | ✅ | ✅ | Decompose query, experts generate per attribute | | **C5: Random-Perspective** | ❌ | (random) | Control: 4 random words as "perspectives" | #### 4.2.3 Controls - Same LLM model (specify version) - Same temperature settings - Same total idea count per condition (20 ideas) ### 4.3 Metrics #### 4.3.1 Semantic Diversity - Mean pairwise cosine distance between embeddings - Cluster distribution analysis - Silhouette score for idea clustering #### 4.3.2 Novelty - Patent overlap rate - Semantic distance from query centroid #### 4.3.3 Quality (Human Evaluation) - Novelty rating (1-7 Likert) - Usefulness rating (1-7 Likert) - Creativity rating (1-7 Likert) - **Relevance rating (1-7 Likert) - for RQ6** - Interrater reliability (Cronbach's alpha) #### 4.3.4 Nonsense/Hallucination Analysis (RQ6) - Three Methods | Method | Metric | Purpose | |--------|--------|---------| | Automatic | Semantic distance threshold (>0.85) | Fast screening | | LLM-as-Judge | GPT-4 relevance score (1-3) | Scalable evaluation | | Human | Relevance rating (1-7 Likert) | Gold standard validation | Triangulate all three to validate findings ### 4.4 Procedure - Idea generation process - Evaluation process - Statistical analysis methods --- ## 5. Results ### 5.1 Main Effect of Attribute Decomposition (RQ1) - Compare: (Attribute-Only + Full Pipeline) vs (Direct + Expert-Only) - Quantitative results - Statistical significance (ANOVA main effect) ### 5.2 Main Effect of Expert Perspectives (RQ2) - Compare: (Expert-Only + Full Pipeline) vs (Direct + Attribute-Only) - Quantitative results - Statistical significance (ANOVA main effect) ### 5.3 Interaction Effect (RQ3) - 2×2 interaction analysis - Visualization: interaction plot - Evidence for super-additive vs additive effects ### 5.4 Patent Novelty (RQ4) - Overlap rates by condition - Full Pipeline vs other conditions - Examples of high-novelty ideas ### 5.5 Expert Source Comparison (RQ5) - LLM-generated vs curated vs external - Unconventionality metrics - Within Expert=With conditions only ### 5.6 Control Condition Analysis - Expert-Only vs Random-Perspective - Validates expert knowledge matters ### 5.7 Hallucination/Nonsense Analysis (RQ6) - Nonsense rate by condition (LLM-as-judge) - Semantic distance threshold analysis - Novelty-usefulness tradeoff visualization - Is the context-free design worth the hallucination cost? ### 5.8 Human Evaluation Results - Rating distributions by condition - 2×2 pattern in human judgments - Correlation with automatic metrics --- ## 6. Discussion ### 6.1 Interpreting the Results - Why each factor contributes independently - The interaction: why attributes amplify expert effectiveness - Theoretical explanation via conceptual blending ### 6.2 Theoretical Implications - Semantic gravity as framework for LLM creativity - Two complementary escape mechanisms - Structured decomposition as "scaffolding" for creative exploration ### 6.3 Practical Implications - When to use multi-expert approach - Expert selection strategies - Integration with existing workflows ### 6.4 Limitations - LLM-specific results may not generalize - Patent overlap as proxy for true novelty - Human evaluation subjectivity - Single-language experiments ### 6.5 Future Work - Cross-cultural creativity - Domain-specific expert optimization - Real-world deployment studies - Integration with other creativity techniques --- ## 7. Conclusion - Summary of contributions - Key takeaways - Broader impact --- ## Appendices ### A. Prompt Templates - Expert generation prompts - Keyword generation prompts - Description generation prompts ### B. Full Experimental Results - Complete data tables - Additional visualizations ### C. Expert Source Details - Curated occupation list - DBpedia/Wikidata query details ### D. Human Evaluation Protocol - Instructions for raters - Example ratings - Training materials --- ## Target Venues ### Tier 1 (Recommended) 1. **CHI** - ACM Conference on Human Factors in Computing Systems - Strong fit: creativity support tools, human-AI collaboration - Deadline: typically September 2. **CSCW** - ACM Conference on Computer-Supported Cooperative Work - Good fit: collaborative ideation, crowd wisdom - Deadline: typically April/January 3. **Creativity & Cognition** - ACM Conference - Perfect fit: computational creativity focus - Smaller but specialized venue ### Tier 2 (Alternative) 4. **DIS** - ACM Designing Interactive Systems - Good fit: design ideation tools 5. **UIST** - ACM Symposium on User Interface Software and Technology - If system/interaction focus emphasized 6. **ICCC** - International Conference on Computational Creativity - Specialized computational creativity venue ### Journal Options 1. **International Journal of Human-Computer Studies (IJHCS)** 2. **ACM Transactions on Computer-Human Interaction (TOCHI)** 3. **Design Studies** 4. **Creativity Research Journal** --- ## Timeline Checklist - [ ] Finalize experimental design - [ ] Collect/select query dataset - [ ] Run all experimental conditions - [ ] Compute automatic metrics - [ ] Design human evaluation study - [ ] Recruit evaluators - [ ] Conduct human evaluation - [ ] Statistical analysis - [ ] Write first draft - [ ] Internal review - [ ] Revision - [ ] Submit