Files
novelty-seeking/research/paper_outline.md
gbanyan 26a56a2a07 feat: Enhance patent search and update research documentation
- Improve patent search service with expanded functionality
- Update PatentSearchPanel UI component
- Add new research_report.md
- Update experimental protocol, literature review, paper outline, and theoretical framework

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:52:33 +08:00

336 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Paper Outline: Expert-Augmented LLM Ideation
## Suggested Titles
1. **"Breaking Semantic Gravity: Expert-Augmented LLM Ideation for Enhanced Creativity"**
2. "Beyond Interpolation: Multi-Expert Perspectives for Combinatorial Innovation"
3. "Escaping the Relevance Trap: Structured Expert Frameworks for Creative AI"
4. "From Crowd to Expert: Simulating Diverse Perspectives for LLM-Based Ideation"
---
## Abstract (Draft)
Large Language Models (LLMs) are increasingly used for creative ideation, yet they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs clustered around high-probability regions of their training distribution. This limits the novelty and diversity of generated ideas. We investigate two complementary strategies to overcome this limitation: (1) **attribute decomposition**, which structures the problem space before creative exploration, and (2) **expert perspective transformation**, which conditions LLM generation on simulated domain-expert viewpoints. Through a 2×2 factorial experiment comparing Direct generation, Expert-Only, Attribute-Only, and Full Pipeline (both factors combined), we demonstrate that each factor independently improves semantic diversity, with the combination producing super-additive effects. Our Full Pipeline achieves [X]% higher semantic diversity and [Y]% lower patent overlap compared to direct generation. We contribute a theoretical framework explaining LLM creativity limitations and an open-source system for innovation ideation.
---
## 1. Introduction
### 1.1 The Promise and Problem of LLM Creativity
- LLMs widely adopted for creative tasks
- Initial enthusiasm: infinite idea generation
- Emerging concern: quality and diversity issues
### 1.2 The Semantic Gravity Problem
- Define the phenomenon
- Why it occurs (statistical learning, mode collapse)
- Why it matters (innovation requires novelty)
### 1.3 Our Solution: Expert-Augmented Ideation
- Brief overview of the approach
- Key insight: expert perspectives as semantic "escape velocity"
- Contributions preview
### 1.4 Paper Organization
- Roadmap for the rest of the paper
---
## 2. Related Work
### 2.1 Theoretical Foundations
- Semantic distance and creativity (Mednick, 1962)
- Conceptual blending theory (Fauconnier & Turner)
- Design fixation (Jansson & Smith)
- Constraint-based creativity
### 2.2 LLM Limitations in Creative Generation
- Design fixation from AI (CHI 2024)
- Dual mechanisms: inspiration vs. fixation
- Bias and pattern perpetuation
### 2.3 Persona-Based Prompting
- PersonaFlow (2024)
- BILLY persona vectors (2025)
- Quantifying persona effects (ACL 2024)
### 2.4 Creativity Support Tools
- Wisdom of crowds approaches
- Human-AI collaboration in ideation
- Evaluation methods (CAT, semantic distance)
### 2.5 Positioning Our Work
**Key distinction from PersonaFlow (closest related work)**:
```
PersonaFlow: Query → Experts → Ideas (no problem structure)
Our approach: Query → Attributes → (Attributes × Experts) → Ideas
```
- PersonaFlow applies experts to whole query; we apply experts to decomposed attributes
- PersonaFlow cannot isolate what helps; our 2×2 factorial design tests each factor
- We hypothesize attribute decomposition **amplifies** expert effectiveness (interaction effect)
- PersonaFlow showed experts help; we test whether **structuring the problem first** makes experts more effective
---
## 3. System Design
### 3.1 Overview
- Pipeline diagram
- Design rationale
### 3.2 Attribute Decomposition
- Category analysis (dynamic vs. fixed)
- Attribute generation per category
- DAG relationship mapping
### 3.3 Expert Team Generation
- Expert sources: LLM-generated, curated, external databases
- Diversity optimization strategies
- Domain coverage considerations
### 3.4 Expert Transformation
- Conditioning mechanism
- Keyword generation
- Description generation
- Parallel processing for efficiency
### 3.5 Semantic Deduplication
- Embedding-based approach
- LLM-based approach
- Threshold selection
### 3.6 Novelty Validation
- Patent search integration
- Overlap scoring
---
## 4. Experiments
### 4.1 Research Questions
- RQ1: Does attribute decomposition improve semantic diversity?
- RQ2: Does expert perspective transformation improve semantic diversity?
- RQ3: Is there an interaction effect between the two factors?
- RQ4: Which combination produces the highest patent novelty?
- RQ5: How do expert sources (LLM vs Curated vs External) affect quality?
- RQ6: What is the hallucination/nonsense rate of context-free keyword generation?
### 4.1.1 Design Note: Context-Free Keyword Generation
Our system intentionally excludes the original query during keyword generation:
- Stage 1: Expert sees attribute only (e.g., "wood" + "accountant"), NOT the query ("chair")
- Stage 2: Expert applies keyword to original query with context
- Rationale: Maximize semantic distance for novelty
- Risk: Some ideas may be too distant (nonsense/hallucination)
- RQ6 investigates this tradeoff
### 4.2 Experimental Setup
#### 4.2.1 Dataset
- 30 queries for ideation (see experimental_protocol.md)
- Selection criteria: diverse domains, complexity levels
- Categories: everyday objects, technology/tools, services/systems
#### 4.2.2 Conditions (2×2 Factorial Design)
| Condition | Attributes | Experts | Description |
|-----------|------------|---------|-------------|
| **C1: Direct** | ❌ | ❌ | Baseline: "Generate 20 creative ideas for [query]" |
| **C2: Expert-Only** | ❌ | ✅ | Expert personas generate for whole query |
| **C3: Attribute-Only** | ✅ | ❌ | Decompose query, direct generate per attribute |
| **C4: Full Pipeline** | ✅ | ✅ | Decompose query, experts generate per attribute |
| **C5: Random-Perspective** | ❌ | (random) | Control: 4 random words as "perspectives" |
#### 4.2.3 Controls
- Same LLM model (specify version)
- Same temperature settings
- Same total idea count per condition (20 ideas)
### 4.3 Metrics
#### 4.3.1 Semantic Diversity
- Mean pairwise cosine distance between embeddings
- Cluster distribution analysis
- Silhouette score for idea clustering
#### 4.3.2 Novelty
- Patent overlap rate
- Semantic distance from query centroid
#### 4.3.3 Quality (Human Evaluation)
- Novelty rating (1-7 Likert)
- Usefulness rating (1-7 Likert)
- Creativity rating (1-7 Likert)
- **Relevance rating (1-7 Likert) - for RQ6**
- Interrater reliability (Cronbach's alpha)
#### 4.3.4 Nonsense/Hallucination Analysis (RQ6) - Three Methods
| Method | Metric | Purpose |
|--------|--------|---------|
| Automatic | Semantic distance threshold (>0.85) | Fast screening |
| LLM-as-Judge | GPT-4 relevance score (1-3) | Scalable evaluation |
| Human | Relevance rating (1-7 Likert) | Gold standard validation |
Triangulate all three to validate findings
### 4.4 Procedure
- Idea generation process
- Evaluation process
- Statistical analysis methods
---
## 5. Results
### 5.1 Main Effect of Attribute Decomposition (RQ1)
- Compare: (Attribute-Only + Full Pipeline) vs (Direct + Expert-Only)
- Quantitative results
- Statistical significance (ANOVA main effect)
### 5.2 Main Effect of Expert Perspectives (RQ2)
- Compare: (Expert-Only + Full Pipeline) vs (Direct + Attribute-Only)
- Quantitative results
- Statistical significance (ANOVA main effect)
### 5.3 Interaction Effect (RQ3)
- 2×2 interaction analysis
- Visualization: interaction plot
- Evidence for super-additive vs additive effects
### 5.4 Patent Novelty (RQ4)
- Overlap rates by condition
- Full Pipeline vs other conditions
- Examples of high-novelty ideas
### 5.5 Expert Source Comparison (RQ5)
- LLM-generated vs curated vs external
- Unconventionality metrics
- Within Expert=With conditions only
### 5.6 Control Condition Analysis
- Expert-Only vs Random-Perspective
- Validates expert knowledge matters
### 5.7 Hallucination/Nonsense Analysis (RQ6)
- Nonsense rate by condition (LLM-as-judge)
- Semantic distance threshold analysis
- Novelty-usefulness tradeoff visualization
- Is the context-free design worth the hallucination cost?
### 5.8 Human Evaluation Results
- Rating distributions by condition
- 2×2 pattern in human judgments
- Correlation with automatic metrics
---
## 6. Discussion
### 6.1 Interpreting the Results
- Why each factor contributes independently
- The interaction: why attributes amplify expert effectiveness
- Theoretical explanation via conceptual blending
### 6.2 Theoretical Implications
- Semantic gravity as framework for LLM creativity
- Two complementary escape mechanisms
- Structured decomposition as "scaffolding" for creative exploration
### 6.3 Practical Implications
- When to use multi-expert approach
- Expert selection strategies
- Integration with existing workflows
### 6.4 Limitations
- LLM-specific results may not generalize
- Patent overlap as proxy for true novelty
- Human evaluation subjectivity
- Single-language experiments
### 6.5 Future Work
- Cross-cultural creativity
- Domain-specific expert optimization
- Real-world deployment studies
- Integration with other creativity techniques
---
## 7. Conclusion
- Summary of contributions
- Key takeaways
- Broader impact
---
## Appendices
### A. Prompt Templates
- Expert generation prompts
- Keyword generation prompts
- Description generation prompts
### B. Full Experimental Results
- Complete data tables
- Additional visualizations
### C. Expert Source Details
- Curated occupation list
- DBpedia/Wikidata query details
### D. Human Evaluation Protocol
- Instructions for raters
- Example ratings
- Training materials
---
## Target Venues
### Tier 1 (Recommended)
1. **CHI** - ACM Conference on Human Factors in Computing Systems
- Strong fit: creativity support tools, human-AI collaboration
- Deadline: typically September
2. **CSCW** - ACM Conference on Computer-Supported Cooperative Work
- Good fit: collaborative ideation, crowd wisdom
- Deadline: typically April/January
3. **Creativity & Cognition** - ACM Conference
- Perfect fit: computational creativity focus
- Smaller but specialized venue
### Tier 2 (Alternative)
4. **DIS** - ACM Designing Interactive Systems
- Good fit: design ideation tools
5. **UIST** - ACM Symposium on User Interface Software and Technology
- If system/interaction focus emphasized
6. **ICCC** - International Conference on Computational Creativity
- Specialized computational creativity venue
### Journal Options
1. **International Journal of Human-Computer Studies (IJHCS)**
2. **ACM Transactions on Computer-Human Interaction (TOCHI)**
3. **Design Studies**
4. **Creativity Research Journal**
---
## Timeline Checklist
- [ ] Finalize experimental design
- [ ] Collect/select query dataset
- [ ] Run all experimental conditions
- [ ] Compute automatic metrics
- [ ] Design human evaluation study
- [ ] Recruit evaluators
- [ ] Conduct human evaluation
- [ ] Statistical analysis
- [ ] Write first draft
- [ ] Internal review
- [ ] Revision
- [ ] Submit