- Improve patent search service with expanded functionality - Update PatentSearchPanel UI component - Add new research_report.md - Update experimental protocol, literature review, paper outline, and theoretical framework Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
336 lines
11 KiB
Markdown
336 lines
11 KiB
Markdown
# Paper Outline: Expert-Augmented LLM Ideation
|
||
|
||
## Suggested Titles
|
||
|
||
1. **"Breaking Semantic Gravity: Expert-Augmented LLM Ideation for Enhanced Creativity"**
|
||
2. "Beyond Interpolation: Multi-Expert Perspectives for Combinatorial Innovation"
|
||
3. "Escaping the Relevance Trap: Structured Expert Frameworks for Creative AI"
|
||
4. "From Crowd to Expert: Simulating Diverse Perspectives for LLM-Based Ideation"
|
||
|
||
---
|
||
|
||
## Abstract (Draft)
|
||
|
||
Large Language Models (LLMs) are increasingly used for creative ideation, yet they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs clustered around high-probability regions of their training distribution. This limits the novelty and diversity of generated ideas. We investigate two complementary strategies to overcome this limitation: (1) **attribute decomposition**, which structures the problem space before creative exploration, and (2) **expert perspective transformation**, which conditions LLM generation on simulated domain-expert viewpoints. Through a 2×2 factorial experiment comparing Direct generation, Expert-Only, Attribute-Only, and Full Pipeline (both factors combined), we demonstrate that each factor independently improves semantic diversity, with the combination producing super-additive effects. Our Full Pipeline achieves [X]% higher semantic diversity and [Y]% lower patent overlap compared to direct generation. We contribute a theoretical framework explaining LLM creativity limitations and an open-source system for innovation ideation.
|
||
|
||
---
|
||
|
||
## 1. Introduction
|
||
|
||
### 1.1 The Promise and Problem of LLM Creativity
|
||
- LLMs widely adopted for creative tasks
|
||
- Initial enthusiasm: infinite idea generation
|
||
- Emerging concern: quality and diversity issues
|
||
|
||
### 1.2 The Semantic Gravity Problem
|
||
- Define the phenomenon
|
||
- Why it occurs (statistical learning, mode collapse)
|
||
- Why it matters (innovation requires novelty)
|
||
|
||
### 1.3 Our Solution: Expert-Augmented Ideation
|
||
- Brief overview of the approach
|
||
- Key insight: expert perspectives as semantic "escape velocity"
|
||
- Contributions preview
|
||
|
||
### 1.4 Paper Organization
|
||
- Roadmap for the rest of the paper
|
||
|
||
---
|
||
|
||
## 2. Related Work
|
||
|
||
### 2.1 Theoretical Foundations
|
||
- Semantic distance and creativity (Mednick, 1962)
|
||
- Conceptual blending theory (Fauconnier & Turner)
|
||
- Design fixation (Jansson & Smith)
|
||
- Constraint-based creativity
|
||
|
||
### 2.2 LLM Limitations in Creative Generation
|
||
- Design fixation from AI (CHI 2024)
|
||
- Dual mechanisms: inspiration vs. fixation
|
||
- Bias and pattern perpetuation
|
||
|
||
### 2.3 Persona-Based Prompting
|
||
- PersonaFlow (2024)
|
||
- BILLY persona vectors (2025)
|
||
- Quantifying persona effects (ACL 2024)
|
||
|
||
### 2.4 Creativity Support Tools
|
||
- Wisdom of crowds approaches
|
||
- Human-AI collaboration in ideation
|
||
- Evaluation methods (CAT, semantic distance)
|
||
|
||
### 2.5 Positioning Our Work
|
||
|
||
**Key distinction from PersonaFlow (closest related work)**:
|
||
```
|
||
PersonaFlow: Query → Experts → Ideas (no problem structure)
|
||
Our approach: Query → Attributes → (Attributes × Experts) → Ideas
|
||
```
|
||
|
||
- PersonaFlow applies experts to whole query; we apply experts to decomposed attributes
|
||
- PersonaFlow cannot isolate what helps; our 2×2 factorial design tests each factor
|
||
- We hypothesize attribute decomposition **amplifies** expert effectiveness (interaction effect)
|
||
- PersonaFlow showed experts help; we test whether **structuring the problem first** makes experts more effective
|
||
|
||
---
|
||
|
||
## 3. System Design
|
||
|
||
### 3.1 Overview
|
||
- Pipeline diagram
|
||
- Design rationale
|
||
|
||
### 3.2 Attribute Decomposition
|
||
- Category analysis (dynamic vs. fixed)
|
||
- Attribute generation per category
|
||
- DAG relationship mapping
|
||
|
||
### 3.3 Expert Team Generation
|
||
- Expert sources: LLM-generated, curated, external databases
|
||
- Diversity optimization strategies
|
||
- Domain coverage considerations
|
||
|
||
### 3.4 Expert Transformation
|
||
- Conditioning mechanism
|
||
- Keyword generation
|
||
- Description generation
|
||
- Parallel processing for efficiency
|
||
|
||
### 3.5 Semantic Deduplication
|
||
- Embedding-based approach
|
||
- LLM-based approach
|
||
- Threshold selection
|
||
|
||
### 3.6 Novelty Validation
|
||
- Patent search integration
|
||
- Overlap scoring
|
||
|
||
---
|
||
|
||
## 4. Experiments
|
||
|
||
### 4.1 Research Questions
|
||
- RQ1: Does attribute decomposition improve semantic diversity?
|
||
- RQ2: Does expert perspective transformation improve semantic diversity?
|
||
- RQ3: Is there an interaction effect between the two factors?
|
||
- RQ4: Which combination produces the highest patent novelty?
|
||
- RQ5: How do expert sources (LLM vs Curated vs External) affect quality?
|
||
- RQ6: What is the hallucination/nonsense rate of context-free keyword generation?
|
||
|
||
### 4.1.1 Design Note: Context-Free Keyword Generation
|
||
Our system intentionally excludes the original query during keyword generation:
|
||
- Stage 1: Expert sees attribute only (e.g., "wood" + "accountant"), NOT the query ("chair")
|
||
- Stage 2: Expert applies keyword to original query with context
|
||
- Rationale: Maximize semantic distance for novelty
|
||
- Risk: Some ideas may be too distant (nonsense/hallucination)
|
||
- RQ6 investigates this tradeoff
|
||
|
||
### 4.2 Experimental Setup
|
||
|
||
#### 4.2.1 Dataset
|
||
- 30 queries for ideation (see experimental_protocol.md)
|
||
- Selection criteria: diverse domains, complexity levels
|
||
- Categories: everyday objects, technology/tools, services/systems
|
||
|
||
#### 4.2.2 Conditions (2×2 Factorial Design)
|
||
| Condition | Attributes | Experts | Description |
|
||
|-----------|------------|---------|-------------|
|
||
| **C1: Direct** | ❌ | ❌ | Baseline: "Generate 20 creative ideas for [query]" |
|
||
| **C2: Expert-Only** | ❌ | ✅ | Expert personas generate for whole query |
|
||
| **C3: Attribute-Only** | ✅ | ❌ | Decompose query, direct generate per attribute |
|
||
| **C4: Full Pipeline** | ✅ | ✅ | Decompose query, experts generate per attribute |
|
||
| **C5: Random-Perspective** | ❌ | (random) | Control: 4 random words as "perspectives" |
|
||
|
||
#### 4.2.3 Controls
|
||
- Same LLM model (specify version)
|
||
- Same temperature settings
|
||
- Same total idea count per condition (20 ideas)
|
||
|
||
### 4.3 Metrics
|
||
|
||
#### 4.3.1 Semantic Diversity
|
||
- Mean pairwise cosine distance between embeddings
|
||
- Cluster distribution analysis
|
||
- Silhouette score for idea clustering
|
||
|
||
#### 4.3.2 Novelty
|
||
- Patent overlap rate
|
||
- Semantic distance from query centroid
|
||
|
||
#### 4.3.3 Quality (Human Evaluation)
|
||
- Novelty rating (1-7 Likert)
|
||
- Usefulness rating (1-7 Likert)
|
||
- Creativity rating (1-7 Likert)
|
||
- **Relevance rating (1-7 Likert) - for RQ6**
|
||
- Interrater reliability (Cronbach's alpha)
|
||
|
||
#### 4.3.4 Nonsense/Hallucination Analysis (RQ6) - Three Methods
|
||
| Method | Metric | Purpose |
|
||
|--------|--------|---------|
|
||
| Automatic | Semantic distance threshold (>0.85) | Fast screening |
|
||
| LLM-as-Judge | GPT-4 relevance score (1-3) | Scalable evaluation |
|
||
| Human | Relevance rating (1-7 Likert) | Gold standard validation |
|
||
|
||
Triangulate all three to validate findings
|
||
|
||
### 4.4 Procedure
|
||
- Idea generation process
|
||
- Evaluation process
|
||
- Statistical analysis methods
|
||
|
||
---
|
||
|
||
## 5. Results
|
||
|
||
### 5.1 Main Effect of Attribute Decomposition (RQ1)
|
||
- Compare: (Attribute-Only + Full Pipeline) vs (Direct + Expert-Only)
|
||
- Quantitative results
|
||
- Statistical significance (ANOVA main effect)
|
||
|
||
### 5.2 Main Effect of Expert Perspectives (RQ2)
|
||
- Compare: (Expert-Only + Full Pipeline) vs (Direct + Attribute-Only)
|
||
- Quantitative results
|
||
- Statistical significance (ANOVA main effect)
|
||
|
||
### 5.3 Interaction Effect (RQ3)
|
||
- 2×2 interaction analysis
|
||
- Visualization: interaction plot
|
||
- Evidence for super-additive vs additive effects
|
||
|
||
### 5.4 Patent Novelty (RQ4)
|
||
- Overlap rates by condition
|
||
- Full Pipeline vs other conditions
|
||
- Examples of high-novelty ideas
|
||
|
||
### 5.5 Expert Source Comparison (RQ5)
|
||
- LLM-generated vs curated vs external
|
||
- Unconventionality metrics
|
||
- Within Expert=With conditions only
|
||
|
||
### 5.6 Control Condition Analysis
|
||
- Expert-Only vs Random-Perspective
|
||
- Validates expert knowledge matters
|
||
|
||
### 5.7 Hallucination/Nonsense Analysis (RQ6)
|
||
- Nonsense rate by condition (LLM-as-judge)
|
||
- Semantic distance threshold analysis
|
||
- Novelty-usefulness tradeoff visualization
|
||
- Is the context-free design worth the hallucination cost?
|
||
|
||
### 5.8 Human Evaluation Results
|
||
- Rating distributions by condition
|
||
- 2×2 pattern in human judgments
|
||
- Correlation with automatic metrics
|
||
|
||
---
|
||
|
||
## 6. Discussion
|
||
|
||
### 6.1 Interpreting the Results
|
||
- Why each factor contributes independently
|
||
- The interaction: why attributes amplify expert effectiveness
|
||
- Theoretical explanation via conceptual blending
|
||
|
||
### 6.2 Theoretical Implications
|
||
- Semantic gravity as framework for LLM creativity
|
||
- Two complementary escape mechanisms
|
||
- Structured decomposition as "scaffolding" for creative exploration
|
||
|
||
### 6.3 Practical Implications
|
||
- When to use multi-expert approach
|
||
- Expert selection strategies
|
||
- Integration with existing workflows
|
||
|
||
### 6.4 Limitations
|
||
- LLM-specific results may not generalize
|
||
- Patent overlap as proxy for true novelty
|
||
- Human evaluation subjectivity
|
||
- Single-language experiments
|
||
|
||
### 6.5 Future Work
|
||
- Cross-cultural creativity
|
||
- Domain-specific expert optimization
|
||
- Real-world deployment studies
|
||
- Integration with other creativity techniques
|
||
|
||
---
|
||
|
||
## 7. Conclusion
|
||
|
||
- Summary of contributions
|
||
- Key takeaways
|
||
- Broader impact
|
||
|
||
---
|
||
|
||
## Appendices
|
||
|
||
### A. Prompt Templates
|
||
- Expert generation prompts
|
||
- Keyword generation prompts
|
||
- Description generation prompts
|
||
|
||
### B. Full Experimental Results
|
||
- Complete data tables
|
||
- Additional visualizations
|
||
|
||
### C. Expert Source Details
|
||
- Curated occupation list
|
||
- DBpedia/Wikidata query details
|
||
|
||
### D. Human Evaluation Protocol
|
||
- Instructions for raters
|
||
- Example ratings
|
||
- Training materials
|
||
|
||
---
|
||
|
||
## Target Venues
|
||
|
||
### Tier 1 (Recommended)
|
||
1. **CHI** - ACM Conference on Human Factors in Computing Systems
|
||
- Strong fit: creativity support tools, human-AI collaboration
|
||
- Deadline: typically September
|
||
|
||
2. **CSCW** - ACM Conference on Computer-Supported Cooperative Work
|
||
- Good fit: collaborative ideation, crowd wisdom
|
||
- Deadline: typically April/January
|
||
|
||
3. **Creativity & Cognition** - ACM Conference
|
||
- Perfect fit: computational creativity focus
|
||
- Smaller but specialized venue
|
||
|
||
### Tier 2 (Alternative)
|
||
4. **DIS** - ACM Designing Interactive Systems
|
||
- Good fit: design ideation tools
|
||
|
||
5. **UIST** - ACM Symposium on User Interface Software and Technology
|
||
- If system/interaction focus emphasized
|
||
|
||
6. **ICCC** - International Conference on Computational Creativity
|
||
- Specialized computational creativity venue
|
||
|
||
### Journal Options
|
||
1. **International Journal of Human-Computer Studies (IJHCS)**
|
||
2. **ACM Transactions on Computer-Human Interaction (TOCHI)**
|
||
3. **Design Studies**
|
||
4. **Creativity Research Journal**
|
||
|
||
---
|
||
|
||
## Timeline Checklist
|
||
|
||
- [ ] Finalize experimental design
|
||
- [ ] Collect/select query dataset
|
||
- [ ] Run all experimental conditions
|
||
- [ ] Compute automatic metrics
|
||
- [ ] Design human evaluation study
|
||
- [ ] Recruit evaluators
|
||
- [ ] Conduct human evaluation
|
||
- [ ] Statistical analysis
|
||
- [ ] Write first draft
|
||
- [ ] Internal review
|
||
- [ ] Revision
|
||
- [ ] Submit
|