chore: save local changes

This commit is contained in:
2026-01-05 22:32:08 +08:00
parent bc281b8e0a
commit ec48709755
42 changed files with 5576 additions and 254 deletions

38
research/README.md Normal file
View File

@@ -0,0 +1,38 @@
# Research: Expert-Augmented LLM Ideation
This folder contains research materials for the academic paper on the novelty-seeking system.
## Files
| File | Description |
|------|-------------|
| `literature_review.md` | Comprehensive literature review covering semantic distance theory, conceptual blending, design fixation, LLM limitations, and related work |
| `references.md` | 55+ academic references with links to papers |
| `theoretical_framework.md` | The "Semantic Gravity" theoretical model and testable hypotheses |
| `paper_outline.md` | Complete paper structure, experimental design, and target venues |
## Key Theoretical Contribution
**"Semantic Gravity"**: LLMs exhibit a tendency to generate outputs clustered around high-probability regions of their training distribution, limiting creative novelty. Expert perspectives provide "escape velocity" to break free from this gravity.
## Core Hypotheses
1. **H1**: Multi-expert generation → higher semantic diversity
2. **H2**: Multi-expert generation → lower patent overlap (higher novelty)
3. **H3**: Diversity increases with expert count (diminishing returns ~4-6)
4. **H4**: Expert source affects unconventionality of ideas
## Target Venues
- **CHI** (ACM Conference on Human Factors in Computing Systems)
- **CSCW** (ACM Conference on Computer-Supported Cooperative Work)
- **Creativity & Cognition** (ACM Conference)
- **IJHCS** (International Journal of Human-Computer Studies)
## Next Steps
1. Design concrete experiment protocol
2. Add measurement code to existing system
3. Collect experimental data
4. Conduct human evaluation
5. Write and submit paper

View File

@@ -0,0 +1,555 @@
# Experimental Protocol: Expert-Augmented LLM Ideation
## Executive Summary
This document outlines a comprehensive experimental design to test the hypothesis that multi-expert LLM-based ideation produces more diverse and novel ideas than direct LLM generation.
---
## 1. Research Questions
| ID | Research Question |
|----|-------------------|
| **RQ1** | Does multi-expert generation produce higher semantic diversity than direct LLM generation? |
| **RQ2** | Does multi-expert generation produce ideas with lower patent overlap (higher novelty)? |
| **RQ3** | What is the optimal number of experts for maximizing diversity? |
| **RQ4** | How do different expert sources (LLM vs Curated vs DBpedia) affect idea quality? |
| **RQ5** | Does structured attribute decomposition enhance the multi-expert effect? |
---
## 2. Experimental Design Overview
### 2.1 Design Type
**Mixed Design**: Between-subjects for main conditions × Within-subjects for queries
### 2.2 Variables
#### Independent Variables (Manipulated)
| Variable | Levels | Your System Parameter |
|----------|--------|----------------------|
| **Generation Method** | 5 levels (see conditions) | Condition-dependent |
| **Expert Count** | 1, 2, 4, 6, 8 | `expert_count` |
| **Expert Source** | LLM, Curated, DBpedia | `expert_source` |
| **Attribute Structure** | With/Without decomposition | Pipeline inclusion |
#### Dependent Variables (Measured)
| Variable | Measurement Method |
|----------|-------------------|
| **Semantic Diversity** | Mean pairwise cosine distance (embeddings) |
| **Cluster Spread** | Number of clusters, silhouette score |
| **Patent Novelty** | 1 - (ideas with patent match / total ideas) |
| **Semantic Distance** | Distance from query centroid |
| **Human Novelty Rating** | 7-point Likert scale |
| **Human Usefulness Rating** | 7-point Likert scale |
| **Human Creativity Rating** | 7-point Likert scale |
#### Control Variables (Held Constant)
| Variable | Fixed Value |
|----------|-------------|
| LLM Model | Qwen3:8b (or specify) |
| Temperature | 0.7 |
| Total Ideas per Query | 20 |
| Keywords per Expert | 1 |
| Deduplication | Disabled for raw comparison |
| Language | English (for patent search) |
---
## 3. Experimental Conditions
### 3.1 Main Study: Generation Method Comparison
| Condition | Description | Implementation |
|-----------|-------------|----------------|
| **C1: Direct** | Direct LLM generation | Prompt: "Generate 20 creative ideas for [query]" |
| **C2: Single-Expert** | 1 expert × 20 ideas | `expert_count=1`, `keywords_per_expert=20` |
| **C3: Multi-Expert-4** | 4 experts × 5 ideas each | `expert_count=4`, `keywords_per_expert=5` |
| **C4: Multi-Expert-8** | 8 experts × 2-3 ideas each | `expert_count=8`, `keywords_per_expert=2-3` |
| **C5: Random-Perspective** | 4 random words as "perspectives" | Custom prompt with random nouns |
### 3.2 Expert Count Study
| Condition | Expert Count | Ideas per Expert |
|-----------|--------------|------------------|
| **E1** | 1 | 20 |
| **E2** | 2 | 10 |
| **E4** | 4 | 5 |
| **E6** | 6 | 3-4 |
| **E8** | 8 | 2-3 |
### 3.3 Expert Source Study
| Condition | Source | Implementation |
|-----------|--------|----------------|
| **S-LLM** | LLM-generated | `expert_source=ExpertSource.LLM` |
| **S-Curated** | Curated 210 occupations | `expert_source=ExpertSource.CURATED` |
| **S-DBpedia** | DBpedia 2164 occupations | `expert_source=ExpertSource.DBPEDIA` |
| **S-Random** | Random word "experts" | Custom implementation |
---
## 4. Query Dataset
### 4.1 Design Principles
- **Diversity**: Cover multiple domains (consumer products, technology, services, abstract concepts)
- **Complexity Variation**: Simple objects to complex systems
- **Familiarity Variation**: Common items to specialized equipment
- **Cultural Neutrality**: Concepts understandable across cultures
### 4.2 Query Set (30 Queries)
#### Category A: Everyday Objects (10)
| ID | Query | Complexity |
|----|-------|------------|
| A1 | Chair | Low |
| A2 | Umbrella | Low |
| A3 | Backpack | Low |
| A4 | Coffee mug | Low |
| A5 | Bicycle | Medium |
| A6 | Refrigerator | Medium |
| A7 | Smartphone | Medium |
| A8 | Running shoes | Medium |
| A9 | Kitchen knife | Low |
| A10 | Desk lamp | Low |
#### Category B: Technology & Tools (10)
| ID | Query | Complexity |
|----|-------|------------|
| B1 | Solar panel | Medium |
| B2 | Electric vehicle | High |
| B3 | 3D printer | High |
| B4 | Drone | Medium |
| B5 | Smart thermostat | Medium |
| B6 | Noise-canceling headphones | Medium |
| B7 | Water purifier | Medium |
| B8 | Wind turbine | High |
| B9 | Robotic vacuum | Medium |
| B10 | Wearable fitness tracker | Medium |
#### Category C: Services & Systems (10)
| ID | Query | Complexity |
|----|-------|------------|
| C1 | Food delivery service | Medium |
| C2 | Online education platform | High |
| C3 | Healthcare appointment system | High |
| C4 | Public transportation | High |
| C5 | Hotel booking system | Medium |
| C6 | Personal finance app | Medium |
| C7 | Grocery shopping experience | Medium |
| C8 | Parking solution | Medium |
| C9 | Elderly care service | High |
| C10 | Waste management system | High |
### 4.3 Sample Size Justification
Based on [CHI meta-study on effect sizes](https://dl.acm.org/doi/10.1145/3706598.3713671):
- **Queries**: 30 (crossed with conditions)
- **Expected effect size**: d = 0.5 (medium)
- **Power target**: 80%
- **For automatic metrics**: 30 queries × 5 conditions × 20 ideas = 3,000 ideas
- **For human evaluation**: Subset of 10 queries × 3 conditions × 20 ideas = 600 ideas
---
## 5. Automatic Metrics Collection
### 5.1 Semantic Diversity Metrics
#### 5.1.1 Mean Pairwise Distance (Primary)
```python
def compute_mean_pairwise_distance(ideas: List[str], embedding_model: str) -> float:
"""
Compute mean cosine distance between all idea pairs.
Higher = more diverse.
"""
embeddings = get_embeddings(ideas, model=embedding_model)
n = len(embeddings)
distances = []
for i in range(n):
for j in range(i+1, n):
dist = 1 - cosine_similarity(embeddings[i], embeddings[j])
distances.append(dist)
return np.mean(distances), np.std(distances)
```
#### 5.1.2 Cluster Analysis
```python
def compute_cluster_metrics(ideas: List[str], embedding_model: str) -> dict:
"""
Analyze idea clustering patterns.
"""
embeddings = get_embeddings(ideas, model=embedding_model)
# Find optimal k using silhouette score
silhouette_scores = []
for k in range(2, min(len(ideas), 10)):
kmeans = KMeans(n_clusters=k)
labels = kmeans.fit_predict(embeddings)
score = silhouette_score(embeddings, labels)
silhouette_scores.append((k, score))
best_k = max(silhouette_scores, key=lambda x: x[1])[0]
return {
'optimal_clusters': best_k,
'silhouette_score': max(silhouette_scores, key=lambda x: x[1])[1],
'cluster_distribution': compute_cluster_sizes(embeddings, best_k)
}
```
#### 5.1.3 Semantic Distance from Query
```python
def compute_query_distance(query: str, ideas: List[str], embedding_model: str) -> dict:
"""
Measure how far ideas are from the original query.
Higher = more novel/distant.
"""
query_emb = get_embedding(query, model=embedding_model)
idea_embs = get_embeddings(ideas, model=embedding_model)
distances = [1 - cosine_similarity(query_emb, e) for e in idea_embs]
return {
'mean_distance': np.mean(distances),
'max_distance': np.max(distances),
'min_distance': np.min(distances),
'std_distance': np.std(distances)
}
```
### 5.2 Patent Novelty Metrics
#### 5.2.1 Patent Overlap Rate
```python
def compute_patent_novelty(ideas: List[str], query: str) -> dict:
"""
Search patents for each idea and compute overlap rate.
Uses existing patent_search_service.
"""
matches = 0
match_details = []
for idea in ideas:
result = patent_search_service.search(idea)
if result.has_match:
matches += 1
match_details.append({
'idea': idea,
'patent': result.best_match
})
return {
'novelty_rate': 1 - (matches / len(ideas)),
'match_count': matches,
'total_ideas': len(ideas),
'match_details': match_details
}
```
### 5.3 Metrics Summary Table
| Metric | Formula | Interpretation |
|--------|---------|----------------|
| **Mean Pairwise Distance** | avg(1 - cos_sim(i, j)) for all pairs | Higher = more diverse |
| **Silhouette Score** | Cluster cohesion vs separation | Higher = clearer clusters |
| **Optimal Cluster Count** | argmax(silhouette) | More clusters = more themes |
| **Query Distance** | 1 - cos_sim(query, idea) | Higher = farther from original |
| **Patent Novelty Rate** | 1 - (matches / total) | Higher = more novel |
---
## 6. Human Evaluation Protocol
### 6.1 Participants
#### 6.1.1 Recruitment
- **Platform**: Prolific, MTurk, or domain experts
- **Sample Size**: 60 evaluators (20 per condition group)
- **Criteria**:
- Native English speakers
- Bachelor's degree or higher
- Attention check pass rate > 80%
#### 6.1.2 Compensation
- $15/hour equivalent
- ~30 minutes per session
- Bonus for high-quality ratings
### 6.2 Rating Scales
#### 6.2.1 Novelty (7-point Likert)
```
How novel/surprising is this idea?
1 = Not at all novel (very common/obvious)
4 = Moderately novel
7 = Extremely novel (never seen before)
```
#### 6.2.2 Usefulness (7-point Likert)
```
How useful/practical is this idea?
1 = Not at all useful (impractical)
4 = Moderately useful
7 = Extremely useful (highly practical)
```
#### 6.2.3 Creativity (7-point Likert)
```
How creative is this idea overall?
1 = Not at all creative
4 = Moderately creative
7 = Extremely creative
```
### 6.3 Procedure
1. **Introduction** (5 min)
- Study purpose (without revealing hypotheses)
- Rating scale explanation
- Practice with 3 example ideas
2. **Training** (5 min)
- Rate 5 calibration ideas with feedback
- Discuss edge cases
3. **Main Evaluation** (20 min)
- Rate 30 ideas (randomized order)
- 3 attention check items embedded
- Break after 15 ideas
4. **Debriefing** (2 min)
- Demographics
- Open-ended feedback
### 6.4 Quality Control
| Check | Threshold | Action |
|-------|-----------|--------|
| Attention checks | < 2/3 correct | Exclude |
| Completion time | < 10 min | Flag for review |
| Variance in ratings | All same score | Exclude |
| Inter-rater reliability | Cronbach's α < 0.7 | Review ratings |
### 6.5 Analysis Plan
#### 6.5.1 Reliability
- Cronbach's alpha for each scale
- ICC (Intraclass Correlation) for inter-rater agreement
#### 6.5.2 Main Analysis
- Mixed-effects ANOVA: Condition × Query
- Post-hoc: Tukey HSD for pairwise comparisons
- Effect sizes: Cohen's d
#### 6.5.3 Correlation with Automatic Metrics
- Pearson correlation: Human ratings vs semantic diversity
- Regression: Predict human ratings from automatic metrics
---
## 7. Experimental Procedure
### 7.1 Phase 1: Idea Generation
```
For each query Q in QuerySet:
For each condition C in Conditions:
If C == "Direct":
ideas = direct_llm_generation(Q, n=20)
Elif C == "Single-Expert":
expert = generate_expert(Q, n=1)
ideas = expert_transformation(Q, expert, ideas_per_expert=20)
Elif C == "Multi-Expert-4":
experts = generate_experts(Q, n=4)
ideas = expert_transformation(Q, experts, ideas_per_expert=5)
Elif C == "Multi-Expert-8":
experts = generate_experts(Q, n=8)
ideas = expert_transformation(Q, experts, ideas_per_expert=2-3)
Elif C == "Random-Perspective":
perspectives = random.sample(RANDOM_WORDS, 4)
ideas = perspective_generation(Q, perspectives, ideas_per=5)
Store(Q, C, ideas)
```
### 7.2 Phase 2: Automatic Metrics
```
For each (Q, C, ideas) in Results:
metrics = {
'diversity': compute_mean_pairwise_distance(ideas),
'clusters': compute_cluster_metrics(ideas),
'query_distance': compute_query_distance(Q, ideas),
'patent_novelty': compute_patent_novelty(ideas, Q)
}
Store(Q, C, metrics)
```
### 7.3 Phase 3: Human Evaluation
```
# Sample selection
selected_queries = random.sample(QuerySet, 10)
selected_conditions = ["Direct", "Multi-Expert-4", "Multi-Expert-8"]
# Create evaluation set
evaluation_items = []
For each Q in selected_queries:
For each C in selected_conditions:
ideas = Get(Q, C)
For each idea in ideas:
evaluation_items.append((Q, C, idea))
# Randomize and assign to evaluators
random.shuffle(evaluation_items)
assignments = assign_to_evaluators(evaluation_items, n_evaluators=60)
# Collect ratings
ratings = collect_human_ratings(assignments)
```
### 7.4 Phase 4: Analysis
```
# Automatic metrics analysis
Run ANOVA: diversity ~ condition + query + condition:query
Run post-hoc: Tukey HSD for condition pairs
Compute effect sizes
# Human ratings analysis
Check reliability: Cronbach's alpha, ICC
Run mixed-effects model: rating ~ condition + (1|evaluator) + (1|query)
Compute correlations: human vs automatic metrics
# Visualization
Plot: Diversity by condition (box plots)
Plot: t-SNE of idea embeddings colored by condition
Plot: Expert count vs diversity curve
```
---
## 8. Implementation Checklist
### 8.1 Code to Implement
- [ ] `experiments/generate_ideas.py` - Idea generation for all conditions
- [ ] `experiments/compute_metrics.py` - Automatic metric computation
- [ ] `experiments/export_for_evaluation.py` - Prepare human evaluation set
- [ ] `experiments/analyze_results.py` - Statistical analysis
- [ ] `experiments/visualize.py` - Generate figures
### 8.2 Data Files to Create
- [ ] `data/queries.json` - 30 queries with metadata
- [ ] `data/random_words.json` - Random perspective words
- [ ] `data/generated_ideas/` - Raw idea outputs
- [ ] `data/metrics/` - Computed metric results
- [ ] `data/human_ratings/` - Collected ratings
### 8.3 Analysis Outputs
- [ ] `results/diversity_by_condition.csv`
- [ ] `results/patent_novelty_by_condition.csv`
- [ ] `results/human_ratings_summary.csv`
- [ ] `results/statistical_tests.txt`
- [ ] `figures/` - All visualizations
---
## 9. Expected Results & Hypotheses
### 9.1 Primary Hypotheses
| Hypothesis | Prediction | Metric |
|------------|------------|--------|
| **H1** | Multi-Expert-4 > Single-Expert > Direct | Semantic diversity |
| **H2** | Multi-Expert-8 ≈ Multi-Expert-4 (diminishing returns) | Semantic diversity |
| **H3** | Multi-Expert > Direct | Patent novelty rate |
| **H4** | LLM experts > Curated > DBpedia | Unconventionality |
| **H5** | With attributes > Without attributes | Overall diversity |
### 9.2 Expected Effect Sizes
Based on related work:
- Diversity increase: d = 0.5-0.8 (medium to large)
- Patent novelty increase: 20-40% improvement
- Human creativity rating: d = 0.3-0.5 (small to medium)
### 9.3 Potential Confounds
| Confound | Mitigation |
|----------|-----------|
| Query difficulty | Crossed design (all queries × all conditions) |
| LLM variability | Multiple runs, fixed seed where possible |
| Evaluator bias | Randomized presentation, blinding |
| Order effects | Counterbalancing in human evaluation |
---
## 10. Timeline
| Week | Activity |
|------|----------|
| 1-2 | Implement idea generation scripts |
| 3 | Generate all ideas (5 conditions × 30 queries) |
| 4 | Compute automatic metrics |
| 5 | Design and pilot human evaluation |
| 6-7 | Run human evaluation (60 participants) |
| 8 | Analyze results |
| 9-10 | Write paper |
| 11 | Internal review |
| 12 | Submit |
---
## 11. Appendix: Direct Generation Prompt
For baseline condition C1 (Direct LLM generation):
```
You are a creative innovation consultant. Generate 20 unique and creative ideas
for improving or reimagining a [QUERY].
Requirements:
- Each idea should be distinct and novel
- Ideas should range from incremental improvements to radical innovations
- Consider different aspects: materials, functions, user experiences, contexts
- Provide a brief (15-30 word) description for each idea
Output format:
1. [Idea keyword]: [Description]
2. [Idea keyword]: [Description]
...
20. [Idea keyword]: [Description]
```
---
## 12. Appendix: Random Perspective Words
For condition C5 (Random-Perspective), sample from:
```json
[
"ocean", "mountain", "forest", "desert", "cave",
"microscope", "telescope", "kaleidoscope", "prism", "lens",
"butterfly", "elephant", "octopus", "eagle", "ant",
"sunrise", "thunderstorm", "rainbow", "fog", "aurora",
"clockwork", "origami", "mosaic", "symphony", "ballet",
"ancient", "futuristic", "organic", "crystalline", "liquid",
"whisper", "explosion", "rhythm", "silence", "echo"
]
```
This tests whether ANY perspective shift helps, or if EXPERT perspectives specifically matter.

View File

@@ -0,0 +1,209 @@
# Literature Review: Expert-Augmented LLM Ideation
## 1. Core Directly-Related Work
### 1.1 Wisdom of Crowds via Role Assumption
**Bringing the Wisdom of the Crowd to an Individual by Having the Individual Assume Different Roles** (ACM C&C 2017)
Groups of people tend to generate more diverse ideas than individuals because each group member brings a different perspective. This study showed it's possible to help individuals think more like a group by asking them to approach a problem from different perspectives. In an experiment with 54 crowd workers, participants who assumed different expert roles came up with more creative ideas than those who did not.
**Gap for our work**: This was human-based role-playing. Our system automates this with LLM-powered expert perspectives.
### 1.2 PersonaFlow: LLM-Simulated Expert Perspectives
**PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation** (2024)
PersonaFlow provides multiple perspectives by using LLMs to simulate domain-specific experts. User studies showed it increased the perceived relevance and creativity of ideated research directions and promoted users' critical thinking activities without increasing perceived cognitive load.
**Gap for our work**: PersonaFlow focuses on research ideation. Our system applies to product/innovation ideation with structured attribute decomposition.
### 1.3 PopBlends: Conceptual Blending with LLMs
**PopBlends: Strategies for Conceptual Blending with Large Language Models** (CHI 2023)
PopBlends automatically suggests conceptual blends using both traditional knowledge extraction and LLMs. Studies showed people found twice as many blend suggestions with the system, with half the mental demand.
**Gap for our work**: We structure blending through expert domain knowledge rather than direct concept pairing.
### 1.4 BILLY: Persona Vector Merging
**BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation** (2025)
Proposes fusing persona vectors in activation space to steer LLM output towards multiple perspectives simultaneously, requiring only a single additive operation during inference.
**Gap for our work**: We use sequential multi-expert generation rather than vector fusion, allowing more explicit control and interpretability.
---
## 2. Theoretical Foundations
### 2.1 Semantic Distance Theory
**Core Insight** (Mednick, 1962): Creative thinking involves connecting weakly related, remote concepts in semantic memory. The farther one "moves away" from a conventional idea, the more creative the new idea will likely be.
**Key Research**:
- Semantic distance plays an important role in the creative process
- A more "flexible" semantic memory structure (higher connectivity, shorter distances) facilitates creative idea generation
- Quantitative measures using LSA and semantic networks can objectively examine creative output
- Divergent Semantic Integration (DSI) correlates strongly with human creativity ratings (72% variance explained)
**Application to Our Work**: Expert perspectives force semantic "jumps" to distant domains that LLMs wouldn't naturally traverse.
```
Without Expert: "Chair" → furniture, sitting, comfort (short semantic distance)
With Expert: "Chair" + Marine Biologist → pressure, buoyancy, coral (long semantic distance)
```
### 2.2 Conceptual Blending Theory
**Core Insight** (Fauconnier & Turner, 2002): Creative products emerge from blending elements of two input spaces into a novel integrated space.
**Key Research**:
- Blending process: (1) find connecting concept between inputs, (2) map elements that can be blended
- Generative AI demonstrates ability to blend and integrate concepts (bisociation)
- Trisociation (three-concept blending) is being used for AI-augmented idea generation
- Conceptual blending provides terminology for describing creative products
**Limitation**: Blending theory doesn't explain where inputs originate - the "inspiration problem."
**Application to Our Work**: Each expert provides a distinct "input space" enabling systematic multi-space blending. Our attribute decomposition provides structured inputs for blending.
### 2.3 Design Fixation
**Core Insight** (Jansson & Smith, 1991): Design fixation is "blind adherence to a set of ideas or concepts limiting the output of conceptual design."
**Key Research**:
- Fixation results from categorical knowledge organization around prototypes
- Accessing prototypes requires less cognitive effort than processing exemplars
- Diverse teams, model-making, and facilitation help prevent fixation
- Reflecting on prior fixation episodes is most effective prevention
**Neural Evidence**: fMRI studies show distinct patterns during fixated vs. creative ideation.
**Application to Our Work**: LLMs exhibit "semantic fixation" on high-probability outputs. Expert perspectives break this by forcing activation of non-prototype knowledge.
### 2.4 Constraint-Based Creativity
**Core Insight**: Paradoxically, constraints can enhance creativity by pushing beyond the path of least resistance.
**Key Research**:
- Constraints push people to search for more distant ideas in semantic memory
- Extreme constraints may require different types of creative problem-solving
- Not all constraints promote creativity for all individuals/tasks
- A "constraint-leveraging mindset" can be developed through experience
**Application to Our Work**: Expert role = productive constraint that expands rather than limits creative space. The expert perspective forces exploration of non-obvious solution spaces.
---
## 3. LLM Limitations in Creative Generation
### 3.1 Design Fixation from AI
**The Effects of Generative AI on Design Fixation and Divergent Thinking** (CHI 2024)
Key finding: AI exposure during ideation leads to HIGHER fixation. Participants who used AI produced:
- Fewer ideas
- Less variety
- Lower originality
compared to baseline (no AI assistance).
### 3.2 Dual Mechanisms: Inspiration vs. Fixation
**Inspiration Booster or Creative Fixation?** (Nature Humanities & Social Sciences, 2025)
- LLMs help in **simple** creative tasks (inspiration stimulation)
- LLMs **hurt** in **complex** creative tasks (creative fixation)
**Application to Our Work**: Our structured decomposition manages complexity, while multi-expert approach maintains inspiration benefits.
### 3.3 Statistical Pattern Perpetuation
**Bias and Fairness in Large Language Models: A Survey** (MIT Press, 2024)
LLMs learn, perpetuate, and amplify patterns from training data. This applies to creative outputs - LLMs generate what is statistically common/expected.
### 3.4 Generalization Bias
**Generalization Bias in LLM Summarization** (Royal Society, 2025)
LLMs' overgeneralization tendency produces outputs that lack sufficient empirical support. This suggests a bias toward "safe" middle-ground outputs rather than novel extremes.
---
## 4. Role-Playing and Perspective-Taking
### 4.1 Creativity Enhancement
Research on tabletop role-playing games (TTRPGs) demonstrates:
- Significant positive impact on creativity potential through divergent thinking
- TTRPG players exhibit significantly higher creativity than non-players
- Perspective-taking is closely linked to empathy and cognitive flexibility
### 4.2 Therapeutic and Educational Applications
- Role-playing develops perspective-taking, storytelling, creativity, and self-expression
- Physiological, emotional, and mental well-being from play enables creative ideation
- Play signals psychological safety, which is essential for creativity
### 4.3 Design Research Applications
- Role-playing stimulates creativity by exploring alternative solutions
- Offers safe environment to explore failure modes and challenge assumptions
- Well-suited for early-stage ideation and empathy-critical moments
---
## 5. Creativity Support Tools (CSTs)
### 5.1 Current State
- CSTs primarily support **divergent** thinking
- **Convergent** thinking often neglected
- Ideal CST should offer tailored support for both
### 5.2 AI as Creative Partner
- Collaborative ideation systems expose users to different ideas
- Competing theories on when/whether such exposure helps
- Tool-mediated expert activity view: computers as "mediating artifacts people act through"
### 5.3 Evaluation Methods
**Consensual Assessment Technique (CAT)**:
- Pool of experts independently evaluate artifacts
- Creative if high evaluations + high interrater reliability (Cronbach's alpha > 0.7)
**Semantic Distance Measures**:
- SemDis platform for automated creativity assessment
- Overcomes labor cost and subjectivity of human rating
- Uses NLP to quantify semantic relatedness
---
## 6. Our Theoretical Contribution
### The "Semantic Gravity" Problem
```
Direct LLM Generation:
P(idea | query)
→ Samples from high-probability region
→ Ideas cluster around training distribution modes
→ "Semantic gravity" pulls toward conventional associations
```
### Expert Transformation Solution
```
Conditioned Generation:
P(idea | query, expert)
→ Expert perspective activates distant semantic regions
→ Forces conceptual blending across domains
→ Breaks design fixation through productive constraints
```
### Multi-Expert Aggregation
```
Diverse Experts → Semantic Coverage
→ "Inner crowd" wisdom without actual crowd
→ Systematic exploration of idea space
→ Deduplication ensures non-redundant novelty
```
### Theoretical Model
1. **Attribute Decomposition**: Structures the problem space (categories, attributes)
2. **Expert Perspectives**: Forces semantic jumps to distant domains
3. **Multi-Expert Aggregation**: Achieves crowd-like diversity individually
4. **Deduplication**: Ensures generated ideas are truly distinct
5. **Patent Validation**: Grounds novelty in real-world uniqueness

288
research/paper_outline.md Normal file
View File

@@ -0,0 +1,288 @@
# Paper Outline: Expert-Augmented LLM Ideation
## Suggested Titles
1. **"Breaking Semantic Gravity: Expert-Augmented LLM Ideation for Enhanced Creativity"**
2. "Beyond Interpolation: Multi-Expert Perspectives for Combinatorial Innovation"
3. "Escaping the Relevance Trap: Structured Expert Frameworks for Creative AI"
4. "From Crowd to Expert: Simulating Diverse Perspectives for LLM-Based Ideation"
---
## Abstract (Draft)
Large Language Models (LLMs) are increasingly used for creative ideation, yet they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs clustered around high-probability regions of their training distribution. This limits the novelty and diversity of generated ideas. We propose a multi-expert transformation framework that systematically activates diverse semantic regions by conditioning LLM generation on simulated expert perspectives. Our system decomposes concepts into structured attributes, generates ideas through multiple domain-expert viewpoints, and employs semantic deduplication to ensure genuine diversity. Through experiments comparing multi-expert generation against direct LLM generation and single-expert baselines, we demonstrate that our approach produces ideas with [X]% higher semantic diversity and [Y]% lower patent overlap. We contribute a theoretical framework explaining LLM creativity limitations and an open-source system for innovation ideation.
---
## 1. Introduction
### 1.1 The Promise and Problem of LLM Creativity
- LLMs widely adopted for creative tasks
- Initial enthusiasm: infinite idea generation
- Emerging concern: quality and diversity issues
### 1.2 The Semantic Gravity Problem
- Define the phenomenon
- Why it occurs (statistical learning, mode collapse)
- Why it matters (innovation requires novelty)
### 1.3 Our Solution: Expert-Augmented Ideation
- Brief overview of the approach
- Key insight: expert perspectives as semantic "escape velocity"
- Contributions preview
### 1.4 Paper Organization
- Roadmap for the rest of the paper
---
## 2. Related Work
### 2.1 Theoretical Foundations
- Semantic distance and creativity (Mednick, 1962)
- Conceptual blending theory (Fauconnier & Turner)
- Design fixation (Jansson & Smith)
- Constraint-based creativity
### 2.2 LLM Limitations in Creative Generation
- Design fixation from AI (CHI 2024)
- Dual mechanisms: inspiration vs. fixation
- Bias and pattern perpetuation
### 2.3 Persona-Based Prompting
- PersonaFlow (2024)
- BILLY persona vectors (2025)
- Quantifying persona effects (ACL 2024)
### 2.4 Creativity Support Tools
- Wisdom of crowds approaches
- Human-AI collaboration in ideation
- Evaluation methods (CAT, semantic distance)
### 2.5 Positioning Our Work
- Gap: No end-to-end system combining structured decomposition + multi-expert transformation + deduplication
- Distinction from PersonaFlow: product innovation focus, attribute structure
---
## 3. System Design
### 3.1 Overview
- Pipeline diagram
- Design rationale
### 3.2 Attribute Decomposition
- Category analysis (dynamic vs. fixed)
- Attribute generation per category
- DAG relationship mapping
### 3.3 Expert Team Generation
- Expert sources: LLM-generated, curated, external databases
- Diversity optimization strategies
- Domain coverage considerations
### 3.4 Expert Transformation
- Conditioning mechanism
- Keyword generation
- Description generation
- Parallel processing for efficiency
### 3.5 Semantic Deduplication
- Embedding-based approach
- LLM-based approach
- Threshold selection
### 3.6 Novelty Validation
- Patent search integration
- Overlap scoring
---
## 4. Experiments
### 4.1 Research Questions
- RQ1: Does multi-expert generation increase semantic diversity?
- RQ2: Does multi-expert generation reduce patent overlap?
- RQ3: What is the optimal number of experts?
- RQ4: How do expert sources affect output quality?
### 4.2 Experimental Setup
#### 4.2.1 Dataset
- N concepts/queries for ideation
- Selection criteria (diverse domains, complexity levels)
#### 4.2.2 Conditions
| Condition | Description |
|-----------|-------------|
| Baseline | Direct LLM: "Generate 20 creative ideas for X" |
| Single-Expert | 1 expert × 20 ideas |
| Multi-Expert-4 | 4 experts × 5 ideas each |
| Multi-Expert-8 | 8 experts × 2-3 ideas each |
| Random-Perspective | 4 random words as "perspectives" |
#### 4.2.3 Controls
- Same LLM model (specify version)
- Same temperature settings
- Same total idea count per condition
### 4.3 Metrics
#### 4.3.1 Semantic Diversity
- Mean pairwise cosine distance between embeddings
- Cluster distribution analysis
- Silhouette score for idea clustering
#### 4.3.2 Novelty
- Patent overlap rate
- Semantic distance from query centroid
#### 4.3.3 Quality (Human Evaluation)
- Novelty rating (1-7 Likert)
- Usefulness rating (1-7 Likert)
- Creativity rating (1-7 Likert)
- Interrater reliability (Cronbach's alpha)
### 4.4 Procedure
- Idea generation process
- Evaluation process
- Statistical analysis methods
---
## 5. Results
### 5.1 Semantic Diversity (RQ1)
- Quantitative results
- Visualization (t-SNE/UMAP of idea embeddings)
- Statistical significance tests
### 5.2 Patent Novelty (RQ2)
- Overlap rates by condition
- Examples of high-novelty ideas
### 5.3 Expert Count Analysis (RQ3)
- Diversity vs. expert count curve
- Diminishing returns analysis
- Optimal expert count recommendation
### 5.4 Expert Source Comparison (RQ4)
- LLM-generated vs. curated vs. random
- Unconventionality metrics
### 5.5 Human Evaluation Results
- Rating distributions
- Condition comparisons
- Correlation with automatic metrics
---
## 6. Discussion
### 6.1 Interpreting the Results
- Why multi-expert works
- The role of structured decomposition
- Deduplication importance
### 6.2 Theoretical Implications
- Semantic gravity as framework for LLM creativity
- Expert perspectives as productive constraints
- Inner crowd wisdom
### 6.3 Practical Implications
- When to use multi-expert approach
- Expert selection strategies
- Integration with existing workflows
### 6.4 Limitations
- LLM-specific results may not generalize
- Patent overlap as proxy for true novelty
- Human evaluation subjectivity
- Single-language experiments
### 6.5 Future Work
- Cross-cultural creativity
- Domain-specific expert optimization
- Real-world deployment studies
- Integration with other creativity techniques
---
## 7. Conclusion
- Summary of contributions
- Key takeaways
- Broader impact
---
## Appendices
### A. Prompt Templates
- Expert generation prompts
- Keyword generation prompts
- Description generation prompts
### B. Full Experimental Results
- Complete data tables
- Additional visualizations
### C. Expert Source Details
- Curated occupation list
- DBpedia/Wikidata query details
### D. Human Evaluation Protocol
- Instructions for raters
- Example ratings
- Training materials
---
## Target Venues
### Tier 1 (Recommended)
1. **CHI** - ACM Conference on Human Factors in Computing Systems
- Strong fit: creativity support tools, human-AI collaboration
- Deadline: typically September
2. **CSCW** - ACM Conference on Computer-Supported Cooperative Work
- Good fit: collaborative ideation, crowd wisdom
- Deadline: typically April/January
3. **Creativity & Cognition** - ACM Conference
- Perfect fit: computational creativity focus
- Smaller but specialized venue
### Tier 2 (Alternative)
4. **DIS** - ACM Designing Interactive Systems
- Good fit: design ideation tools
5. **UIST** - ACM Symposium on User Interface Software and Technology
- If system/interaction focus emphasized
6. **ICCC** - International Conference on Computational Creativity
- Specialized computational creativity venue
### Journal Options
1. **International Journal of Human-Computer Studies (IJHCS)**
2. **ACM Transactions on Computer-Human Interaction (TOCHI)**
3. **Design Studies**
4. **Creativity Research Journal**
---
## Timeline Checklist
- [ ] Finalize experimental design
- [ ] Collect/select query dataset
- [ ] Run all experimental conditions
- [ ] Compute automatic metrics
- [ ] Design human evaluation study
- [ ] Recruit evaluators
- [ ] Conduct human evaluation
- [ ] Statistical analysis
- [ ] Write first draft
- [ ] Internal review
- [ ] Revision
- [ ] Submit

208
research/references.md Normal file
View File

@@ -0,0 +1,208 @@
# References
## Core Related Work
1. **Siangliulue, P., Arnold, K. C., Gajos, K. Z., & Dow, S. P.** (2017). Bringing the Wisdom of the Crowd to an Individual by Having the Individual Assume Different Roles. *Proceedings of the 2017 ACM SIGCHI Conference on Creativity and Cognition (C&C '17)*, 131-141.
- https://dl.acm.org/doi/10.1145/3059454.3059467
2. **Liu, Y., Sharma, A., et al.** (2024). PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation. *arXiv preprint*.
- https://arxiv.org/html/2409.12538v1
- https://www.semanticscholar.org/paper/PersonaFlow:-Designing-LLM-Simulated-Expert-for-Liu-Sharma/eb0c224be9191e39452f20b2cbb886b5ecc4f57b
3. **Choi, J., et al.** (2023). PopBlends: Strategies for Conceptual Blending with Large Language Models. *Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems*.
- https://dl.acm.org/doi/10.1145/3544548.3580948
4. **BILLY Authors** (2025). BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation. *arXiv preprint*.
- https://arxiv.org/html/2510.10157v1
---
## Semantic Distance & Creative Cognition
5. **Mednick, S. A.** (1962). The associative basis of the creative process. *Psychological Review, 69*(3), 220-232.
- (Classic foundational paper)
6. **Kenett, Y. N., & Faust, M.** (2019). Going the Extra Creative Mile: The Role of Semantic Distance in Creativity Theory, Research, and Measurement. *The Cambridge Handbook of the Neuroscience of Creativity*.
- https://www.cambridge.org/core/books/abs/cambridge-handbook-of-the-neuroscience-of-creativity/going-the-extra-creative-mile-the-role-of-semantic-distance-in-creativity-theory-research-and-measurement/3AD9143E69A463F85F2D8CC8940425CA
7. **Beaty, R. E., & Johnson, D. R.** (2021). Automating creativity assessment with SemDis: An open platform for computing semantic distance. *Behavior Research Methods, 53*, 757-780.
- https://link.springer.com/article/10.3758/s13428-020-01453-w
8. **What can quantitative measures of semantic distance tell us about creativity?** (2018). *Current Opinion in Behavioral Sciences*.
- https://www.sciencedirect.com/science/article/abs/pii/S2352154618301098
9. **Semantic Memory and Creativity: The Costs and Benefits of Semantic Memory Structure in Generating Original Ideas** (2023). *PMC*.
- https://pmc.ncbi.nlm.nih.gov/articles/PMC10128864/
10. **The Role of Semantic Associations as a Metacognitive Cue in Creative Idea Generation** (2023). *PMC*.
- https://pmc.ncbi.nlm.nih.gov/articles/PMC10141130/
---
## Conceptual Blending Theory
11. **Fauconnier, G., & Turner, M.** (2002). *The Way We Think: Conceptual Blending and the Mind's Hidden Complexities*. Basic Books.
12. **Conceptual Blending** - Wikipedia Overview
- https://en.wikipedia.org/wiki/Conceptual_blending
13. **Pereira, F. C.** (2007). *Creativity and Artificial Intelligence: A Conceptual Blending Approach*. Mouton de Gruyter.
- https://dl.acm.org/doi/10.5555/1557446
- https://www.researchgate.net/publication/332711522_Creativity_and_Artificial_Intelligence_A_Conceptual_Blending_Approach
14. **Confalonieri, R., et al.** (2018). A computational framework for conceptual blending. *Artificial Intelligence, 256*, 105-129.
- https://www.sciencedirect.com/science/article/pii/S000437021730142X
15. **Trisociation with AI for Creative Idea Generation** (2025). *California Management Review*.
- https://cmr.berkeley.edu/2025/01/trisociation-with-ai-for-creative-idea-generation/
---
## Design Fixation & Constraint-Based Creativity
16. **Jansson, D. G., & Smith, S. M.** (1991). Design fixation. *Design Studies, 12*(1), 3-11.
- (Classic foundational paper)
17. **Design Fixation: A Cognitive Model**. *Design Society*.
- https://www.designsociety.org/download-publication/25504/design_fixation_a_cognitive_model
18. **Crilly, N.** (2019). Research Design Fixation. *Cambridge Repository*.
- https://www.repository.cam.ac.uk/bitstreams/2c002015-8771-4694-ad48-0e4b52008bdf/download
19. **Using fMRI to deepen our understanding of design fixation** (2020). *Design Science, Cambridge Core*.
- https://www.cambridge.org/core/journals/design-science/article/using-fmri-to-deepen-our-understanding-of-design-fixation/2DD81FEE8ED682F6DFF415BF2948EFA6
20. **Acar, O. A., Tarakci, M., & van Knippenberg, D.** (2019). Creativity and Innovation Under Constraints: A Cross-Disciplinary Integrative Review. *Journal of Management, 45*(1), 96-121.
- https://journals.sagepub.com/doi/full/10.1177/0149206318805832
21. **Cromwell, J. R.** (2024). How combinations of constraint affect creativity: A new typology of creative problem solving in organizations. *Organizational Psychology Review*.
- https://journals.sagepub.com/doi/10.1177/20413866231202031
22. **Creativity from constraints: Theory and applications to education** (2022). *Thinking Skills and Creativity*.
- https://www.sciencedirect.com/science/article/abs/pii/S1871187122001870
---
## LLM Limitations in Creative Generation
23. **Wadinambiarachchi, S., et al.** (2024). The Effects of Generative AI on Design Fixation and Divergent Thinking. *Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems*.
- https://dl.acm.org/doi/full/10.1145/3613904.3642919
- https://arxiv.org/html/2403.11164v1
24. **Inspiration booster or creative fixation? The dual mechanisms of LLMs in shaping individual creativity in tasks of different complexity** (2025). *Humanities and Social Sciences Communications (Nature)*.
- https://www.nature.com/articles/s41599-025-05867-9
25. **Gallegos, I. O., et al.** (2024). Bias and Fairness in Large Language Models: A Survey. *Computational Linguistics, 50*(3), 1097-1179. MIT Press.
- https://direct.mit.edu/coli/article/50/3/1097/121961/Bias-and-Fairness-in-Large-Language-Models-A
26. **Generalization bias in large language model summarization of scientific research** (2025). *Royal Society Open Science, 12*(4).
- https://royalsocietypublishing.org/rsos/article/12/4/241776/235656/Generalization-bias-in-large-language-model
27. **LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models** (2025). *arXiv*.
- https://arxiv.org/html/2505.19240v1
---
## Persona Prompting & Multi-Agent Systems
28. **Quantifying the Persona Effect in LLM Simulations** (2024). *ACL 2024*.
- https://aclanthology.org/2024.acl-long.554.pdf
- https://www.emergentmind.com/topics/persona-effect-in-llm-simulations
29. **Two Tales of Persona in LLMs: A Survey of Role-Playing** (2024). *EMNLP Findings*.
- https://aclanthology.org/2024.findings-emnlp.969.pdf
30. **LLM Generated Persona is a Promise with a Catch** (2024). *Semantic Scholar*.
- https://www.semanticscholar.org/paper/LLM-Generated-Persona-is-a-Promise-with-a-Catch-Li-Chen/3ea29481ec11d1568fde727d236f71e44e4e2ad0
31. **Using AI for User Representation: An Analysis of 83 Persona Prompts** (2025). *arXiv*.
- https://arxiv.org/html/2508.13047v1
32. **Scaffolding Creativity: How Divergent and Convergent Personas Shape AI-Assisted Ideation** (2025). *arXiv*.
- https://arxiv.org/pdf/2510.26490
---
## Role-Playing & Perspective-Taking
33. **Chung, T. S.** (2013). Table-top role playing game and creativity. *Thinking Skills and Creativity, 8*, 56-71.
- https://www.researchgate.net/publication/257701334_Table-top_role_playing_game_and_creativity
34. **The effect of tabletop role-playing games on the creative potential and emotional creativity of Taiwanese college students** (2015). *Thinking Skills and Creativity*.
- https://www.researchgate.net/publication/284013184_The_effect_of_tabletop_role-playing_games_on_the_creative_potential_and_emotional_creativity_of_Taiwanese_college_students
35. **Psychology and Role-Playing Games** (2019). *ResearchGate*.
- https://www.researchgate.net/publication/331758159_Psychology_and_Role-Playing_Games
36. **Role Playing and Perspective Taking: An Educational Point of View** (2020). *ResearchGate*.
- https://www.researchgate.net/publication/346610467_Role_Playing_and_Perspective_Taking_An_Educational_Point_of_View
---
## Creativity Support Tools & Evaluation
37. **Jordanous, A.** (2018). Evaluating Computational Creativity: An Interdisciplinary Tutorial. *ACM Computing Surveys, 51*(2), Article 28.
- https://dl.acm.org/doi/10.1145/3167476
38. **Evaluating Creativity in Computational Co-Creative Systems** (2018). *ResearchGate*.
- https://www.researchgate.net/publication/326646917_Evaluating_Creativity_in_Computational_Co-Creative_Systems
39. **The Intersection of Users, Roles, Interactions, and Technologies in Creativity Support Tools** (2021). *DIS '21*.
- https://dl.acm.org/doi/10.1145/3461778.3462050
40. **What Counts as 'Creative' Work? Articulating Four Epistemic Positions in Creativity-Oriented HCI Research** (2024). *CHI '24*.
- https://dl.acm.org/doi/10.1145/3613904.3642854
41. **Colton, S., & Wiggins, G. A.** (2012). Computational Creativity: The Final Frontier? *ECAI 2012*.
- https://link.springer.com/article/10.1007/s00354-020-00116-w
---
## AI-Augmented Design & Ideation
42. **The effect of AI-based inspiration on human design ideation** (2023). *CoDesign*.
- https://www.tandfonline.com/doi/full/10.1080/21650349.2023.2167124
43. **A Hybrid Prototype Method Combining Physical Models and Generative AI to Support Creativity in Conceptual Design** (2024). *ACM TOCHI*.
- https://dl.acm.org/doi/10.1145/3689433
44. **Artificial intelligence for design education: A conceptual approach to enhance students' divergent and convergent thinking** (2025). *IJTDE*.
- https://link.springer.com/article/10.1007/s10798-025-09964-3
45. **The Ideation Compass: Supporting interdisciplinary creative dialogues with real time visualization** (2022). *CoDesign*.
- https://www.tandfonline.com/doi/full/10.1080/21650349.2022.2142674
46. **Guiding data-driven design ideation by knowledge distance** (2021). *Knowledge-Based Systems*.
- https://www.sciencedirect.com/science/article/abs/pii/S0950705121001362
---
## CHI/CSCW Related Papers
47. **Chan, J., Dang, S., & Dow, S. P.** (2016). Improving Crowd Innovation with Expert Facilitation. *CSCW '16*.
48. **Koch, J., et al.** (2020). ImageSense: An Intelligent Collaborative Ideation Tool to Support Diverse Human-Computer Partnerships. *CSCW '20*.
49. **Yu, L., Kittur, A., & Kraut, R. E.** (2014). Distributed Analogical Idea Generation: Inventing with Crowds. *CHI '14*.
50. **Crowdboard** (2017). *C&C '17*.
- https://dl.acm.org/doi/10.1145/3059454.3059477
51. **Collaborative Creativity** (2011). *CHI '11*.
- https://dl.acm.org/doi/10.1145/1978942.1979214
52. **Beyond Automation: How UI/UX Designers Perceive AI as a Creative Partner in the Divergent Thinking Stages** (2025). *arXiv*.
- https://arxiv.org/html/2501.18778
---
## Additional Resources
53. **Automatic Scoring of Metaphor Creativity with Large Language Models** (2024). *Creativity Research Journal*.
- https://www.tandfonline.com/doi/full/10.1080/10400419.2024.2326343
54. **Wisdom of Crowds** - Surowiecki, J. (2004). *The Wisdom of Crowds*. Doubleday.
- https://en.wikipedia.org/wiki/The_Wisdom_of_Crowds
55. **Research: When Used Correctly, LLMs Can Unlock More Creative Ideas** (2025). *Harvard Business Review*.
- https://hbr.org/2025/12/research-when-used-correctly-llms-can-unlock-more-creative-ideas

View File

@@ -0,0 +1,280 @@
# Theoretical Framework: Expert-Augmented LLM Ideation
## The Core Problem: LLM "Semantic Gravity"
### What is Semantic Gravity?
When LLMs generate creative ideas directly, they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs that cluster around high-probability regions of their training distribution.
```
Direct LLM Generation:
Input: "Generate creative ideas for a chair"
LLM Process:
P(idea | "chair") → samples from training distribution
Result:
- "Ergonomic office chair" (high probability)
- "Foldable portable chair" (high probability)
- "Eco-friendly bamboo chair" (moderate probability)
Problem:
→ Ideas cluster in predictable semantic neighborhoods
→ Limited exploration of distant conceptual spaces
→ "Creative" outputs are interpolations, not extrapolations
```
### Why Does This Happen?
1. **Statistical Pattern Learning**: LLMs learn co-occurrence patterns from training data
2. **Mode Collapse**: When asked to be "creative," LLMs sample from the distribution of "creative ideas" they've seen
3. **Relevance Trap**: Strong associations dominate weak ones (chair→furniture >> chair→marine biology)
4. **Prototype Bias**: Outputs gravitate toward category prototypes, not edge cases
---
## The Solution: Expert Perspective Transformation
### Theoretical Basis
Our approach draws from three key theoretical foundations:
#### 1. Semantic Distance Theory (Mednick, 1962)
> "Creative thinking involves connecting weakly related, remote concepts in semantic memory."
**Key insight**: Creativity correlates with semantic distance. The farther the conceptual "jump," the more creative the result.
**Our application**: Expert perspectives force semantic jumps that LLMs wouldn't naturally make.
```
Without Expert:
"Chair" → furniture, sitting, comfort, design
Semantic distance: SHORT
With Marine Biologist Expert:
"Chair" → underwater pressure, coral structure, buoyancy, bioluminescence
Semantic distance: LONG
Result: Novel ideas like "pressure-adaptive seating" or "coral-inspired structural support"
```
#### 2. Conceptual Blending Theory (Fauconnier & Turner, 2002)
> "Creative products emerge from blending elements of two input spaces into a novel integrated space."
**The blending process**:
1. Input Space 1: The target concept (e.g., "chair")
2. Input Space 2: The expert's domain knowledge (e.g., marine biology)
3. Generic Space: Abstract structure shared by both
4. Blended Space: Novel integration of elements from both inputs
**Our application**: Each expert provides a distinct input space for systematic blending.
```
┌─────────────────┐ ┌─────────────────┐
│ Input 1 │ │ Input 2 │
│ "Chair" │ │ Marine Biology │
│ - support │ │ - pressure │
│ - sitting │ │ - buoyancy │
│ - comfort │ │ - adaptation │
└────────┬────────┘ └────────┬────────┘
│ │
└───────────┬───────────┘
┌─────────────────────┐
│ Blended Space │
│ Novel Chair Ideas │
│ - pressure-adapt │
│ - buoyant support │
│ - bio-adaptive │
└─────────────────────┘
```
#### 3. Design Fixation Breaking (Jansson & Smith, 1991)
> "Design fixation is blind adherence to initial ideas, limiting creative output."
**Fixation occurs because**:
- Knowledge is organized around category prototypes
- Prototypes require less cognitive effort to access
- Initial examples anchor subsequent ideation
**Our application**: Expert perspectives act as "defixation triggers" by activating non-prototype knowledge.
```
Without Intervention:
Prototype: "standard four-legged chair"
Fixation: Variations on four-legged design
With Expert Intervention:
Archaeologist: "Ancient people sat differently..."
Dance Therapist: "Seating affects movement expression..."
Fixation Broken: Entirely new seating paradigms explored
```
---
## The Multi-Expert Aggregation Model
### From "Wisdom of Crowds" to "Inner Crowd"
Research shows that groups generate more diverse ideas because each member brings different perspectives. Our system simulates this "crowd wisdom" through multiple expert personas:
```
Traditional Crowd:
Person 1 → Ideas from perspective 1
Person 2 → Ideas from perspective 2
Person 3 → Ideas from perspective 3
Aggregation → Diverse idea pool
Our "Inner Crowd":
LLM + Expert 1 Persona → Ideas from perspective 1
LLM + Expert 2 Persona → Ideas from perspective 2
LLM + Expert 3 Persona → Ideas from perspective 3
Aggregation → Diverse idea pool (simulated crowd)
```
### Why Multiple Experts Work
1. **Coverage**: Different experts activate different semantic regions
2. **Redundancy Reduction**: Deduplication removes overlapping ideas
3. **Diversity by Design**: Expert selection can be optimized for maximum diversity
4. **Diminishing Returns**: Beyond ~4-6 experts, marginal diversity gains decrease
---
## The Complete Pipeline
### Stage 1: Attribute Decomposition
**Purpose**: Structure the problem space before creative exploration
```
Input: "Innovative chair design"
Output:
Categories: [Material, Function, Usage, User Group]
Material: [wood, metal, fabric, composite]
Function: [support, comfort, mobility, storage]
Usage: [office, home, outdoor, medical]
User Group: [children, elderly, professionals, athletes]
```
**Theoretical basis**: Structured decomposition prevents premature fixation on holistic solutions.
### Stage 2: Expert Team Generation
**Purpose**: Assemble diverse perspectives for maximum semantic coverage
```
Strategies:
1. LLM-Generated: Query-specific, prioritizes unconventional experts
2. Curated: Pre-selected high-quality occupations
3. External Sources: DBpedia, Wikidata for broad coverage
Diversity Optimization:
- Domain spread (arts, science, trades, services)
- Expertise level variation
- Cultural/geographic diversity
```
### Stage 3: Expert Transformation
**Purpose**: Apply each expert's perspective to each attribute
```
For each (attribute, expert) pair:
Input: "Chair comfort" + "Marine Biologist"
LLM Prompt:
"As a marine biologist, how might you reimagine
chair comfort using principles from your field?"
Output: Keywords + Descriptions
- "Pressure-distributed seating inspired by deep-sea fish"
- "Buoyancy-assisted support reducing pressure points"
```
### Stage 4: Deduplication
**Purpose**: Ensure idea set is truly diverse, not just numerous
```
Methods:
1. Embedding-based: Fast cosine similarity clustering
2. LLM-based: Semantic pairwise comparison (more accurate)
Output:
- Unique ideas grouped by similarity
- Representative idea selected from each cluster
- Diversity metrics computed
```
### Stage 5: Novelty Validation
**Purpose**: Ground novelty in real-world uniqueness
```
Process:
- Search patent databases for similar concepts
- Compute overlap scores
- Flag ideas with high existing coverage
Output:
- Novelty score per idea
- Patent overlap rate for idea set
```
---
## Testable Hypotheses
### H1: Semantic Diversity
> Multi-expert generation produces higher semantic diversity than single-expert or direct generation.
**Measurement**: Mean pairwise cosine distance between idea embeddings
### H2: Novelty
> Ideas from multi-expert generation have lower patent overlap than direct generation.
**Measurement**: Percentage of ideas with existing patent matches
### H3: Expert Count Effect
> Semantic diversity increases with expert count, with diminishing returns beyond 4-6 experts.
**Measurement**: Diversity vs. expert count curve
### H4: Expert Source Effect
> LLM-generated experts produce more unconventional ideas than curated/database experts.
**Measurement**: Semantic distance from query centroid
### H5: Fixation Breaking
> Multi-expert approach produces more ideas outside the top-3 semantic clusters than direct generation.
**Measurement**: Cluster distribution analysis
---
## Expected Contributions
1. **Theoretical**: Formalization of "semantic gravity" as LLM creativity limitation
2. **Methodological**: Expert-augmented ideation pipeline with evaluation framework
3. **Empirical**: Quantitative evidence for multi-expert creativity enhancement
4. **Practical**: Open-source system for innovation ideation
---
## Positioning Against Related Work
| Approach | Limitation | Our Advantage |
|----------|------------|---------------|
| Direct LLM generation | Semantic gravity, fixation | Expert-forced semantic jumps |
| Human brainstorming | Cognitive fatigue, social dynamics | Tireless LLM generation |
| PersonaFlow (2024) | Research-focused, no attribute structure | Product innovation, structured decomposition |
| PopBlends (2023) | Two-concept blending only | Multi-expert, multi-attribute blending |
| BILLY (2025) | Vector fusion less interpretable | Sequential generation, explicit control |