- Improve patent search service with expanded functionality - Update PatentSearchPanel UI component - Add new research_report.md - Update experimental protocol, literature review, paper outline, and theoretical framework Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
11 KiB
Paper Outline: Expert-Augmented LLM Ideation
Suggested Titles
- "Breaking Semantic Gravity: Expert-Augmented LLM Ideation for Enhanced Creativity"
- "Beyond Interpolation: Multi-Expert Perspectives for Combinatorial Innovation"
- "Escaping the Relevance Trap: Structured Expert Frameworks for Creative AI"
- "From Crowd to Expert: Simulating Diverse Perspectives for LLM-Based Ideation"
Abstract (Draft)
Large Language Models (LLMs) are increasingly used for creative ideation, yet they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs clustered around high-probability regions of their training distribution. This limits the novelty and diversity of generated ideas. We investigate two complementary strategies to overcome this limitation: (1) attribute decomposition, which structures the problem space before creative exploration, and (2) expert perspective transformation, which conditions LLM generation on simulated domain-expert viewpoints. Through a 2×2 factorial experiment comparing Direct generation, Expert-Only, Attribute-Only, and Full Pipeline (both factors combined), we demonstrate that each factor independently improves semantic diversity, with the combination producing super-additive effects. Our Full Pipeline achieves [X]% higher semantic diversity and [Y]% lower patent overlap compared to direct generation. We contribute a theoretical framework explaining LLM creativity limitations and an open-source system for innovation ideation.
1. Introduction
1.1 The Promise and Problem of LLM Creativity
- LLMs widely adopted for creative tasks
- Initial enthusiasm: infinite idea generation
- Emerging concern: quality and diversity issues
1.2 The Semantic Gravity Problem
- Define the phenomenon
- Why it occurs (statistical learning, mode collapse)
- Why it matters (innovation requires novelty)
1.3 Our Solution: Expert-Augmented Ideation
- Brief overview of the approach
- Key insight: expert perspectives as semantic "escape velocity"
- Contributions preview
1.4 Paper Organization
- Roadmap for the rest of the paper
2. Related Work
2.1 Theoretical Foundations
- Semantic distance and creativity (Mednick, 1962)
- Conceptual blending theory (Fauconnier & Turner)
- Design fixation (Jansson & Smith)
- Constraint-based creativity
2.2 LLM Limitations in Creative Generation
- Design fixation from AI (CHI 2024)
- Dual mechanisms: inspiration vs. fixation
- Bias and pattern perpetuation
2.3 Persona-Based Prompting
- PersonaFlow (2024)
- BILLY persona vectors (2025)
- Quantifying persona effects (ACL 2024)
2.4 Creativity Support Tools
- Wisdom of crowds approaches
- Human-AI collaboration in ideation
- Evaluation methods (CAT, semantic distance)
2.5 Positioning Our Work
Key distinction from PersonaFlow (closest related work):
PersonaFlow: Query → Experts → Ideas (no problem structure)
Our approach: Query → Attributes → (Attributes × Experts) → Ideas
- PersonaFlow applies experts to whole query; we apply experts to decomposed attributes
- PersonaFlow cannot isolate what helps; our 2×2 factorial design tests each factor
- We hypothesize attribute decomposition amplifies expert effectiveness (interaction effect)
- PersonaFlow showed experts help; we test whether structuring the problem first makes experts more effective
3. System Design
3.1 Overview
- Pipeline diagram
- Design rationale
3.2 Attribute Decomposition
- Category analysis (dynamic vs. fixed)
- Attribute generation per category
- DAG relationship mapping
3.3 Expert Team Generation
- Expert sources: LLM-generated, curated, external databases
- Diversity optimization strategies
- Domain coverage considerations
3.4 Expert Transformation
- Conditioning mechanism
- Keyword generation
- Description generation
- Parallel processing for efficiency
3.5 Semantic Deduplication
- Embedding-based approach
- LLM-based approach
- Threshold selection
3.6 Novelty Validation
- Patent search integration
- Overlap scoring
4. Experiments
4.1 Research Questions
- RQ1: Does attribute decomposition improve semantic diversity?
- RQ2: Does expert perspective transformation improve semantic diversity?
- RQ3: Is there an interaction effect between the two factors?
- RQ4: Which combination produces the highest patent novelty?
- RQ5: How do expert sources (LLM vs Curated vs External) affect quality?
- RQ6: What is the hallucination/nonsense rate of context-free keyword generation?
4.1.1 Design Note: Context-Free Keyword Generation
Our system intentionally excludes the original query during keyword generation:
- Stage 1: Expert sees attribute only (e.g., "wood" + "accountant"), NOT the query ("chair")
- Stage 2: Expert applies keyword to original query with context
- Rationale: Maximize semantic distance for novelty
- Risk: Some ideas may be too distant (nonsense/hallucination)
- RQ6 investigates this tradeoff
4.2 Experimental Setup
4.2.1 Dataset
- 30 queries for ideation (see experimental_protocol.md)
- Selection criteria: diverse domains, complexity levels
- Categories: everyday objects, technology/tools, services/systems
4.2.2 Conditions (2×2 Factorial Design)
| Condition | Attributes | Experts | Description |
|---|---|---|---|
| C1: Direct | ❌ | ❌ | Baseline: "Generate 20 creative ideas for [query]" |
| C2: Expert-Only | ❌ | ✅ | Expert personas generate for whole query |
| C3: Attribute-Only | ✅ | ❌ | Decompose query, direct generate per attribute |
| C4: Full Pipeline | ✅ | ✅ | Decompose query, experts generate per attribute |
| C5: Random-Perspective | ❌ | (random) | Control: 4 random words as "perspectives" |
4.2.3 Controls
- Same LLM model (specify version)
- Same temperature settings
- Same total idea count per condition (20 ideas)
4.3 Metrics
4.3.1 Semantic Diversity
- Mean pairwise cosine distance between embeddings
- Cluster distribution analysis
- Silhouette score for idea clustering
4.3.2 Novelty
- Patent overlap rate
- Semantic distance from query centroid
4.3.3 Quality (Human Evaluation)
- Novelty rating (1-7 Likert)
- Usefulness rating (1-7 Likert)
- Creativity rating (1-7 Likert)
- Relevance rating (1-7 Likert) - for RQ6
- Interrater reliability (Cronbach's alpha)
4.3.4 Nonsense/Hallucination Analysis (RQ6) - Three Methods
| Method | Metric | Purpose |
|---|---|---|
| Automatic | Semantic distance threshold (>0.85) | Fast screening |
| LLM-as-Judge | GPT-4 relevance score (1-3) | Scalable evaluation |
| Human | Relevance rating (1-7 Likert) | Gold standard validation |
Triangulate all three to validate findings
4.4 Procedure
- Idea generation process
- Evaluation process
- Statistical analysis methods
5. Results
5.1 Main Effect of Attribute Decomposition (RQ1)
- Compare: (Attribute-Only + Full Pipeline) vs (Direct + Expert-Only)
- Quantitative results
- Statistical significance (ANOVA main effect)
5.2 Main Effect of Expert Perspectives (RQ2)
- Compare: (Expert-Only + Full Pipeline) vs (Direct + Attribute-Only)
- Quantitative results
- Statistical significance (ANOVA main effect)
5.3 Interaction Effect (RQ3)
- 2×2 interaction analysis
- Visualization: interaction plot
- Evidence for super-additive vs additive effects
5.4 Patent Novelty (RQ4)
- Overlap rates by condition
- Full Pipeline vs other conditions
- Examples of high-novelty ideas
5.5 Expert Source Comparison (RQ5)
- LLM-generated vs curated vs external
- Unconventionality metrics
- Within Expert=With conditions only
5.6 Control Condition Analysis
- Expert-Only vs Random-Perspective
- Validates expert knowledge matters
5.7 Hallucination/Nonsense Analysis (RQ6)
- Nonsense rate by condition (LLM-as-judge)
- Semantic distance threshold analysis
- Novelty-usefulness tradeoff visualization
- Is the context-free design worth the hallucination cost?
5.8 Human Evaluation Results
- Rating distributions by condition
- 2×2 pattern in human judgments
- Correlation with automatic metrics
6. Discussion
6.1 Interpreting the Results
- Why each factor contributes independently
- The interaction: why attributes amplify expert effectiveness
- Theoretical explanation via conceptual blending
6.2 Theoretical Implications
- Semantic gravity as framework for LLM creativity
- Two complementary escape mechanisms
- Structured decomposition as "scaffolding" for creative exploration
6.3 Practical Implications
- When to use multi-expert approach
- Expert selection strategies
- Integration with existing workflows
6.4 Limitations
- LLM-specific results may not generalize
- Patent overlap as proxy for true novelty
- Human evaluation subjectivity
- Single-language experiments
6.5 Future Work
- Cross-cultural creativity
- Domain-specific expert optimization
- Real-world deployment studies
- Integration with other creativity techniques
7. Conclusion
- Summary of contributions
- Key takeaways
- Broader impact
Appendices
A. Prompt Templates
- Expert generation prompts
- Keyword generation prompts
- Description generation prompts
B. Full Experimental Results
- Complete data tables
- Additional visualizations
C. Expert Source Details
- Curated occupation list
- DBpedia/Wikidata query details
D. Human Evaluation Protocol
- Instructions for raters
- Example ratings
- Training materials
Target Venues
Tier 1 (Recommended)
-
CHI - ACM Conference on Human Factors in Computing Systems
- Strong fit: creativity support tools, human-AI collaboration
- Deadline: typically September
-
CSCW - ACM Conference on Computer-Supported Cooperative Work
- Good fit: collaborative ideation, crowd wisdom
- Deadline: typically April/January
-
Creativity & Cognition - ACM Conference
- Perfect fit: computational creativity focus
- Smaller but specialized venue
Tier 2 (Alternative)
-
DIS - ACM Designing Interactive Systems
- Good fit: design ideation tools
-
UIST - ACM Symposium on User Interface Software and Technology
- If system/interaction focus emphasized
-
ICCC - International Conference on Computational Creativity
- Specialized computational creativity venue
Journal Options
- International Journal of Human-Computer Studies (IJHCS)
- ACM Transactions on Computer-Human Interaction (TOCHI)
- Design Studies
- Creativity Research Journal
Timeline Checklist
- Finalize experimental design
- Collect/select query dataset
- Run all experimental conditions
- Compute automatic metrics
- Design human evaluation study
- Recruit evaluators
- Conduct human evaluation
- Statistical analysis
- Write first draft
- Internal review
- Revision
- Submit