chore: save local changes

2026-01-05 22:32:08 +08:00
parent bc281b8e0a
commit ec48709755
42 changed files with 5576 additions and 254 deletions
--- a/research/README.md
+++ b/research/README.md
@@ -0,0 +1,38 @@
+# Research: Expert-Augmented LLM Ideation
+
+This folder contains research materials for the academic paper on the novelty-seeking system.
+
+## Files
+
+| File | Description |
+|------|-------------|
+| `literature_review.md` | Comprehensive literature review covering semantic distance theory, conceptual blending, design fixation, LLM limitations, and related work |
+| `references.md` | 55+ academic references with links to papers |
+| `theoretical_framework.md` | The "Semantic Gravity" theoretical model and testable hypotheses |
+| `paper_outline.md` | Complete paper structure, experimental design, and target venues |
+
+## Key Theoretical Contribution
+
+**"Semantic Gravity"**: LLMs exhibit a tendency to generate outputs clustered around high-probability regions of their training distribution, limiting creative novelty. Expert perspectives provide "escape velocity" to break free from this gravity.
+
+## Core Hypotheses
+
+1. **H1**: Multi-expert generation → higher semantic diversity
+2. **H2**: Multi-expert generation → lower patent overlap (higher novelty)
+3. **H3**: Diversity increases with expert count (diminishing returns ~4-6)
+4. **H4**: Expert source affects unconventionality of ideas
+
+## Target Venues
+
+- **CHI** (ACM Conference on Human Factors in Computing Systems)
+- **CSCW** (ACM Conference on Computer-Supported Cooperative Work)
+- **Creativity & Cognition** (ACM Conference)
+- **IJHCS** (International Journal of Human-Computer Studies)
+
+## Next Steps
+
+1. Design concrete experiment protocol
+2. Add measurement code to existing system
+3. Collect experimental data
+4. Conduct human evaluation
+5. Write and submit paper
--- a/research/experimental_protocol.md
+++ b/research/experimental_protocol.md
@@ -0,0 +1,555 @@
+# Experimental Protocol: Expert-Augmented LLM Ideation
+
+## Executive Summary
+
+This document outlines a comprehensive experimental design to test the hypothesis that multi-expert LLM-based ideation produces more diverse and novel ideas than direct LLM generation.
+
+---
+
+## 1. Research Questions
+
+| ID | Research Question |
+|----|-------------------|
+| **RQ1** | Does multi-expert generation produce higher semantic diversity than direct LLM generation? |
+| **RQ2** | Does multi-expert generation produce ideas with lower patent overlap (higher novelty)? |
+| **RQ3** | What is the optimal number of experts for maximizing diversity? |
+| **RQ4** | How do different expert sources (LLM vs Curated vs DBpedia) affect idea quality? |
+| **RQ5** | Does structured attribute decomposition enhance the multi-expert effect? |
+
+---
+
+## 2. Experimental Design Overview
+
+### 2.1 Design Type
+**Mixed Design**: Between-subjects for main conditions × Within-subjects for queries
+
+### 2.2 Variables
+
+#### Independent Variables (Manipulated)
+
+| Variable | Levels | Your System Parameter |
+|----------|--------|----------------------|
+| **Generation Method** | 5 levels (see conditions) | Condition-dependent |
+| **Expert Count** | 1, 2, 4, 6, 8 | `expert_count` |
+| **Expert Source** | LLM, Curated, DBpedia | `expert_source` |
+| **Attribute Structure** | With/Without decomposition | Pipeline inclusion |
+
+#### Dependent Variables (Measured)
+
+| Variable | Measurement Method |
+|----------|-------------------|
+| **Semantic Diversity** | Mean pairwise cosine distance (embeddings) |
+| **Cluster Spread** | Number of clusters, silhouette score |
+| **Patent Novelty** | 1 - (ideas with patent match / total ideas) |
+| **Semantic Distance** | Distance from query centroid |
+| **Human Novelty Rating** | 7-point Likert scale |
+| **Human Usefulness Rating** | 7-point Likert scale |
+| **Human Creativity Rating** | 7-point Likert scale |
+
+#### Control Variables (Held Constant)
+
+| Variable | Fixed Value |
+|----------|-------------|
+| LLM Model | Qwen3:8b (or specify) |
+| Temperature | 0.7 |
+| Total Ideas per Query | 20 |
+| Keywords per Expert | 1 |
+| Deduplication | Disabled for raw comparison |
+| Language | English (for patent search) |
+
+---
+
+## 3. Experimental Conditions
+
+### 3.1 Main Study: Generation Method Comparison
+
+| Condition | Description | Implementation |
+|-----------|-------------|----------------|
+| **C1: Direct** | Direct LLM generation | Prompt: "Generate 20 creative ideas for [query]" |
+| **C2: Single-Expert** | 1 expert × 20 ideas | `expert_count=1`, `keywords_per_expert=20` |
+| **C3: Multi-Expert-4** | 4 experts × 5 ideas each | `expert_count=4`, `keywords_per_expert=5` |
+| **C4: Multi-Expert-8** | 8 experts × 2-3 ideas each | `expert_count=8`, `keywords_per_expert=2-3` |
+| **C5: Random-Perspective** | 4 random words as "perspectives" | Custom prompt with random nouns |
+
+### 3.2 Expert Count Study
+
+| Condition | Expert Count | Ideas per Expert |
+|-----------|--------------|------------------|
+| **E1** | 1 | 20 |
+| **E2** | 2 | 10 |
+| **E4** | 4 | 5 |
+| **E6** | 6 | 3-4 |
+| **E8** | 8 | 2-3 |
+
+### 3.3 Expert Source Study
+
+| Condition | Source | Implementation |
+|-----------|--------|----------------|
+| **S-LLM** | LLM-generated | `expert_source=ExpertSource.LLM` |
+| **S-Curated** | Curated 210 occupations | `expert_source=ExpertSource.CURATED` |
+| **S-DBpedia** | DBpedia 2164 occupations | `expert_source=ExpertSource.DBPEDIA` |
+| **S-Random** | Random word "experts" | Custom implementation |
+
+---
+
+## 4. Query Dataset
+
+### 4.1 Design Principles
+- **Diversity**: Cover multiple domains (consumer products, technology, services, abstract concepts)
+- **Complexity Variation**: Simple objects to complex systems
+- **Familiarity Variation**: Common items to specialized equipment
+- **Cultural Neutrality**: Concepts understandable across cultures
+
+### 4.2 Query Set (30 Queries)
+
+#### Category A: Everyday Objects (10)
+| ID | Query | Complexity |
+|----|-------|------------|
+| A1 | Chair | Low |
+| A2 | Umbrella | Low |
+| A3 | Backpack | Low |
+| A4 | Coffee mug | Low |
+| A5 | Bicycle | Medium |
+| A6 | Refrigerator | Medium |
+| A7 | Smartphone | Medium |
+| A8 | Running shoes | Medium |
+| A9 | Kitchen knife | Low |
+| A10 | Desk lamp | Low |
+
+#### Category B: Technology & Tools (10)
+| ID | Query | Complexity |
+|----|-------|------------|
+| B1 | Solar panel | Medium |
+| B2 | Electric vehicle | High |
+| B3 | 3D printer | High |
+| B4 | Drone | Medium |
+| B5 | Smart thermostat | Medium |
+| B6 | Noise-canceling headphones | Medium |
+| B7 | Water purifier | Medium |
+| B8 | Wind turbine | High |
+| B9 | Robotic vacuum | Medium |
+| B10 | Wearable fitness tracker | Medium |
+
+#### Category C: Services & Systems (10)
+| ID | Query | Complexity |
+|----|-------|------------|
+| C1 | Food delivery service | Medium |
+| C2 | Online education platform | High |
+| C3 | Healthcare appointment system | High |
+| C4 | Public transportation | High |
+| C5 | Hotel booking system | Medium |
+| C6 | Personal finance app | Medium |
+| C7 | Grocery shopping experience | Medium |
+| C8 | Parking solution | Medium |
+| C9 | Elderly care service | High |
+| C10 | Waste management system | High |
+
+### 4.3 Sample Size Justification
+
+Based on [CHI meta-study on effect sizes](https://dl.acm.org/doi/10.1145/3706598.3713671):
+
+- **Queries**: 30 (crossed with conditions)
+- **Expected effect size**: d = 0.5 (medium)
+- **Power target**: 80%
+- **For automatic metrics**: 30 queries × 5 conditions × 20 ideas = 3,000 ideas
+- **For human evaluation**: Subset of 10 queries × 3 conditions × 20 ideas = 600 ideas
+
+---
+
+## 5. Automatic Metrics Collection
+
+### 5.1 Semantic Diversity Metrics
+
+#### 5.1.1 Mean Pairwise Distance (Primary)
+```python
+def compute_mean_pairwise_distance(ideas: List[str], embedding_model: str) -> float:
+    """
+    Compute mean cosine distance between all idea pairs.
+    Higher = more diverse.
+    """
+    embeddings = get_embeddings(ideas, model=embedding_model)
+    n = len(embeddings)
+    distances = []
+    for i in range(n):
+        for j in range(i+1, n):
+            dist = 1 - cosine_similarity(embeddings[i], embeddings[j])
+            distances.append(dist)
+    return np.mean(distances), np.std(distances)
+```
+
+#### 5.1.2 Cluster Analysis
+```python
+def compute_cluster_metrics(ideas: List[str], embedding_model: str) -> dict:
+    """
+    Analyze idea clustering patterns.
+    """
+    embeddings = get_embeddings(ideas, model=embedding_model)
+
+    # Find optimal k using silhouette score
+    silhouette_scores = []
+    for k in range(2, min(len(ideas), 10)):
+        kmeans = KMeans(n_clusters=k)
+        labels = kmeans.fit_predict(embeddings)
+        score = silhouette_score(embeddings, labels)
+        silhouette_scores.append((k, score))
+
+    best_k = max(silhouette_scores, key=lambda x: x[1])[0]
+
+    return {
+        'optimal_clusters': best_k,
+        'silhouette_score': max(silhouette_scores, key=lambda x: x[1])[1],
+        'cluster_distribution': compute_cluster_sizes(embeddings, best_k)
+    }
+```
+
+#### 5.1.3 Semantic Distance from Query
+```python
+def compute_query_distance(query: str, ideas: List[str], embedding_model: str) -> dict:
+    """
+    Measure how far ideas are from the original query.
+    Higher = more novel/distant.
+    """
+    query_emb = get_embedding(query, model=embedding_model)
+    idea_embs = get_embeddings(ideas, model=embedding_model)
+
+    distances = [1 - cosine_similarity(query_emb, e) for e in idea_embs]
+
+    return {
+        'mean_distance': np.mean(distances),
+        'max_distance': np.max(distances),
+        'min_distance': np.min(distances),
+        'std_distance': np.std(distances)
+    }
+```
+
+### 5.2 Patent Novelty Metrics
+
+#### 5.2.1 Patent Overlap Rate
+```python
+def compute_patent_novelty(ideas: List[str], query: str) -> dict:
+    """
+    Search patents for each idea and compute overlap rate.
+    Uses existing patent_search_service.
+    """
+    matches = 0
+    match_details = []
+
+    for idea in ideas:
+        result = patent_search_service.search(idea)
+        if result.has_match:
+            matches += 1
+            match_details.append({
+                'idea': idea,
+                'patent': result.best_match
+            })
+
+    return {
+        'novelty_rate': 1 - (matches / len(ideas)),
+        'match_count': matches,
+        'total_ideas': len(ideas),
+        'match_details': match_details
+    }
+```
+
+### 5.3 Metrics Summary Table
+
+| Metric | Formula | Interpretation |
+|--------|---------|----------------|
+| **Mean Pairwise Distance** | avg(1 - cos_sim(i, j)) for all pairs | Higher = more diverse |
+| **Silhouette Score** | Cluster cohesion vs separation | Higher = clearer clusters |
+| **Optimal Cluster Count** | argmax(silhouette) | More clusters = more themes |
+| **Query Distance** | 1 - cos_sim(query, idea) | Higher = farther from original |
+| **Patent Novelty Rate** | 1 - (matches / total) | Higher = more novel |
+
+---
+
+## 6. Human Evaluation Protocol
+
+### 6.1 Participants
+
+#### 6.1.1 Recruitment
+- **Platform**: Prolific, MTurk, or domain experts
+- **Sample Size**: 60 evaluators (20 per condition group)
+- **Criteria**:
+  - Native English speakers
+  - Bachelor's degree or higher
+  - Attention check pass rate > 80%
+
+#### 6.1.2 Compensation
+- $15/hour equivalent
+- ~30 minutes per session
+- Bonus for high-quality ratings
+
+### 6.2 Rating Scales
+
+#### 6.2.1 Novelty (7-point Likert)
+```
+How novel/surprising is this idea?
+1 = Not at all novel (very common/obvious)
+4 = Moderately novel
+7 = Extremely novel (never seen before)
+```
+
+#### 6.2.2 Usefulness (7-point Likert)
+```
+How useful/practical is this idea?
+1 = Not at all useful (impractical)
+4 = Moderately useful
+7 = Extremely useful (highly practical)
+```
+
+#### 6.2.3 Creativity (7-point Likert)
+```
+How creative is this idea overall?
+1 = Not at all creative
+4 = Moderately creative
+7 = Extremely creative
+```
+
+### 6.3 Procedure
+
+1. **Introduction** (5 min)
+   - Study purpose (without revealing hypotheses)
+   - Rating scale explanation
+   - Practice with 3 example ideas
+
+2. **Training** (5 min)
+   - Rate 5 calibration ideas with feedback
+   - Discuss edge cases
+
+3. **Main Evaluation** (20 min)
+   - Rate 30 ideas (randomized order)
+   - 3 attention check items embedded
+   - Break after 15 ideas
+
+4. **Debriefing** (2 min)
+   - Demographics
+   - Open-ended feedback
+
+### 6.4 Quality Control
+
+| Check | Threshold | Action |
+|-------|-----------|--------|
+| Attention checks | < 2/3 correct | Exclude |
+| Completion time | < 10 min | Flag for review |
+| Variance in ratings | All same score | Exclude |
+| Inter-rater reliability | Cronbach's α < 0.7 | Review ratings |
+
+### 6.5 Analysis Plan
+
+#### 6.5.1 Reliability
+- Cronbach's alpha for each scale
+- ICC (Intraclass Correlation) for inter-rater agreement
+
+#### 6.5.2 Main Analysis
+- Mixed-effects ANOVA: Condition × Query
+- Post-hoc: Tukey HSD for pairwise comparisons
+- Effect sizes: Cohen's d
+
+#### 6.5.3 Correlation with Automatic Metrics
+- Pearson correlation: Human ratings vs semantic diversity
+- Regression: Predict human ratings from automatic metrics
+
+---
+
+## 7. Experimental Procedure
+
+### 7.1 Phase 1: Idea Generation
+
+```
+For each query Q in QuerySet:
+    For each condition C in Conditions:
+
+        If C == "Direct":
+            ideas = direct_llm_generation(Q, n=20)
+
+        Elif C == "Single-Expert":
+            expert = generate_expert(Q, n=1)
+            ideas = expert_transformation(Q, expert, ideas_per_expert=20)
+
+        Elif C == "Multi-Expert-4":
+            experts = generate_experts(Q, n=4)
+            ideas = expert_transformation(Q, experts, ideas_per_expert=5)
+
+        Elif C == "Multi-Expert-8":
+            experts = generate_experts(Q, n=8)
+            ideas = expert_transformation(Q, experts, ideas_per_expert=2-3)
+
+        Elif C == "Random-Perspective":
+            perspectives = random.sample(RANDOM_WORDS, 4)
+            ideas = perspective_generation(Q, perspectives, ideas_per=5)
+
+        Store(Q, C, ideas)
+```
+
+### 7.2 Phase 2: Automatic Metrics
+
+```
+For each (Q, C, ideas) in Results:
+    metrics = {
+        'diversity': compute_mean_pairwise_distance(ideas),
+        'clusters': compute_cluster_metrics(ideas),
+        'query_distance': compute_query_distance(Q, ideas),
+        'patent_novelty': compute_patent_novelty(ideas, Q)
+    }
+    Store(Q, C, metrics)
+```
+
+### 7.3 Phase 3: Human Evaluation
+
+```
+# Sample selection
+selected_queries = random.sample(QuerySet, 10)
+selected_conditions = ["Direct", "Multi-Expert-4", "Multi-Expert-8"]
+
+# Create evaluation set
+evaluation_items = []
+For each Q in selected_queries:
+    For each C in selected_conditions:
+        ideas = Get(Q, C)
+        For each idea in ideas:
+            evaluation_items.append((Q, C, idea))
+
+# Randomize and assign to evaluators
+random.shuffle(evaluation_items)
+assignments = assign_to_evaluators(evaluation_items, n_evaluators=60)
+
+# Collect ratings
+ratings = collect_human_ratings(assignments)
+```
+
+### 7.4 Phase 4: Analysis
+
+```
+# Automatic metrics analysis
+Run ANOVA: diversity ~ condition + query + condition:query
+Run post-hoc: Tukey HSD for condition pairs
+Compute effect sizes
+
+# Human ratings analysis
+Check reliability: Cronbach's alpha, ICC
+Run mixed-effects model: rating ~ condition + (1|evaluator) + (1|query)
+Compute correlations: human vs automatic metrics
+
+# Visualization
+Plot: Diversity by condition (box plots)
+Plot: t-SNE of idea embeddings colored by condition
+Plot: Expert count vs diversity curve
+```
+
+---
+
+## 8. Implementation Checklist
+
+### 8.1 Code to Implement
+
+- [ ] `experiments/generate_ideas.py` - Idea generation for all conditions
+- [ ] `experiments/compute_metrics.py` - Automatic metric computation
+- [ ] `experiments/export_for_evaluation.py` - Prepare human evaluation set
+- [ ] `experiments/analyze_results.py` - Statistical analysis
+- [ ] `experiments/visualize.py` - Generate figures
+
+### 8.2 Data Files to Create
+
+- [ ] `data/queries.json` - 30 queries with metadata
+- [ ] `data/random_words.json` - Random perspective words
+- [ ] `data/generated_ideas/` - Raw idea outputs
+- [ ] `data/metrics/` - Computed metric results
+- [ ] `data/human_ratings/` - Collected ratings
+
+### 8.3 Analysis Outputs
+
+- [ ] `results/diversity_by_condition.csv`
+- [ ] `results/patent_novelty_by_condition.csv`
+- [ ] `results/human_ratings_summary.csv`
+- [ ] `results/statistical_tests.txt`
+- [ ] `figures/` - All visualizations
+
+---
+
+## 9. Expected Results & Hypotheses
+
+### 9.1 Primary Hypotheses
+
+| Hypothesis | Prediction | Metric |
+|------------|------------|--------|
+| **H1** | Multi-Expert-4 > Single-Expert > Direct | Semantic diversity |
+| **H2** | Multi-Expert-8 ≈ Multi-Expert-4 (diminishing returns) | Semantic diversity |
+| **H3** | Multi-Expert > Direct | Patent novelty rate |
+| **H4** | LLM experts > Curated > DBpedia | Unconventionality |
+| **H5** | With attributes > Without attributes | Overall diversity |
+
+### 9.2 Expected Effect Sizes
+
+Based on related work:
+- Diversity increase: d = 0.5-0.8 (medium to large)
+- Patent novelty increase: 20-40% improvement
+- Human creativity rating: d = 0.3-0.5 (small to medium)
+
+### 9.3 Potential Confounds
+
+| Confound | Mitigation |
+|----------|-----------|
+| Query difficulty | Crossed design (all queries × all conditions) |
+| LLM variability | Multiple runs, fixed seed where possible |
+| Evaluator bias | Randomized presentation, blinding |
+| Order effects | Counterbalancing in human evaluation |
+
+---
+
+## 10. Timeline
+
+| Week | Activity |
+|------|----------|
+| 1-2 | Implement idea generation scripts |
+| 3 | Generate all ideas (5 conditions × 30 queries) |
+| 4 | Compute automatic metrics |
+| 5 | Design and pilot human evaluation |
+| 6-7 | Run human evaluation (60 participants) |
+| 8 | Analyze results |
+| 9-10 | Write paper |
+| 11 | Internal review |
+| 12 | Submit |
+
+---
+
+## 11. Appendix: Direct Generation Prompt
+
+For baseline condition C1 (Direct LLM generation):
+
+```
+You are a creative innovation consultant. Generate 20 unique and creative ideas
+for improving or reimagining a [QUERY].
+
+Requirements:
+- Each idea should be distinct and novel
+- Ideas should range from incremental improvements to radical innovations
+- Consider different aspects: materials, functions, user experiences, contexts
+- Provide a brief (15-30 word) description for each idea
+
+Output format:
+1. [Idea keyword]: [Description]
+2. [Idea keyword]: [Description]
+...
+20. [Idea keyword]: [Description]
+```
+
+---
+
+## 12. Appendix: Random Perspective Words
+
+For condition C5 (Random-Perspective), sample from:
+
+```json
+[
+  "ocean", "mountain", "forest", "desert", "cave",
+  "microscope", "telescope", "kaleidoscope", "prism", "lens",
+  "butterfly", "elephant", "octopus", "eagle", "ant",
+  "sunrise", "thunderstorm", "rainbow", "fog", "aurora",
+  "clockwork", "origami", "mosaic", "symphony", "ballet",
+  "ancient", "futuristic", "organic", "crystalline", "liquid",
+  "whisper", "explosion", "rhythm", "silence", "echo"
+]
+```
+
+This tests whether ANY perspective shift helps, or if EXPERT perspectives specifically matter.
--- a/research/literature_review.md
+++ b/research/literature_review.md
@@ -0,0 +1,209 @@
+# Literature Review: Expert-Augmented LLM Ideation
+
+## 1. Core Directly-Related Work
+
+### 1.1 Wisdom of Crowds via Role Assumption
+**Bringing the Wisdom of the Crowd to an Individual by Having the Individual Assume Different Roles** (ACM C&C 2017)
+
+Groups of people tend to generate more diverse ideas than individuals because each group member brings a different perspective. This study showed it's possible to help individuals think more like a group by asking them to approach a problem from different perspectives. In an experiment with 54 crowd workers, participants who assumed different expert roles came up with more creative ideas than those who did not.
+
+**Gap for our work**: This was human-based role-playing. Our system automates this with LLM-powered expert perspectives.
+
+### 1.2 PersonaFlow: LLM-Simulated Expert Perspectives
+**PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation** (2024)
+
+PersonaFlow provides multiple perspectives by using LLMs to simulate domain-specific experts. User studies showed it increased the perceived relevance and creativity of ideated research directions and promoted users' critical thinking activities without increasing perceived cognitive load.
+
+**Gap for our work**: PersonaFlow focuses on research ideation. Our system applies to product/innovation ideation with structured attribute decomposition.
+
+### 1.3 PopBlends: Conceptual Blending with LLMs
+**PopBlends: Strategies for Conceptual Blending with Large Language Models** (CHI 2023)
+
+PopBlends automatically suggests conceptual blends using both traditional knowledge extraction and LLMs. Studies showed people found twice as many blend suggestions with the system, with half the mental demand.
+
+**Gap for our work**: We structure blending through expert domain knowledge rather than direct concept pairing.
+
+### 1.4 BILLY: Persona Vector Merging
+**BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation** (2025)
+
+Proposes fusing persona vectors in activation space to steer LLM output towards multiple perspectives simultaneously, requiring only a single additive operation during inference.
+
+**Gap for our work**: We use sequential multi-expert generation rather than vector fusion, allowing more explicit control and interpretability.
+
+---
+
+## 2. Theoretical Foundations
+
+### 2.1 Semantic Distance Theory
+
+**Core Insight** (Mednick, 1962): Creative thinking involves connecting weakly related, remote concepts in semantic memory. The farther one "moves away" from a conventional idea, the more creative the new idea will likely be.
+
+**Key Research**:
+- Semantic distance plays an important role in the creative process
+- A more "flexible" semantic memory structure (higher connectivity, shorter distances) facilitates creative idea generation
+- Quantitative measures using LSA and semantic networks can objectively examine creative output
+- Divergent Semantic Integration (DSI) correlates strongly with human creativity ratings (72% variance explained)
+
+**Application to Our Work**: Expert perspectives force semantic "jumps" to distant domains that LLMs wouldn't naturally traverse.
+
+```
+Without Expert: "Chair" → furniture, sitting, comfort (short semantic distance)
+With Expert:    "Chair" + Marine Biologist → pressure, buoyancy, coral (long semantic distance)
+```
+
+### 2.2 Conceptual Blending Theory
+
+**Core Insight** (Fauconnier & Turner, 2002): Creative products emerge from blending elements of two input spaces into a novel integrated space.
+
+**Key Research**:
+- Blending process: (1) find connecting concept between inputs, (2) map elements that can be blended
+- Generative AI demonstrates ability to blend and integrate concepts (bisociation)
+- Trisociation (three-concept blending) is being used for AI-augmented idea generation
+- Conceptual blending provides terminology for describing creative products
+
+**Limitation**: Blending theory doesn't explain where inputs originate - the "inspiration problem."
+
+**Application to Our Work**: Each expert provides a distinct "input space" enabling systematic multi-space blending. Our attribute decomposition provides structured inputs for blending.
+
+### 2.3 Design Fixation
+
+**Core Insight** (Jansson & Smith, 1991): Design fixation is "blind adherence to a set of ideas or concepts limiting the output of conceptual design."
+
+**Key Research**:
+- Fixation results from categorical knowledge organization around prototypes
+- Accessing prototypes requires less cognitive effort than processing exemplars
+- Diverse teams, model-making, and facilitation help prevent fixation
+- Reflecting on prior fixation episodes is most effective prevention
+
+**Neural Evidence**: fMRI studies show distinct patterns during fixated vs. creative ideation.
+
+**Application to Our Work**: LLMs exhibit "semantic fixation" on high-probability outputs. Expert perspectives break this by forcing activation of non-prototype knowledge.
+
+### 2.4 Constraint-Based Creativity
+
+**Core Insight**: Paradoxically, constraints can enhance creativity by pushing beyond the path of least resistance.
+
+**Key Research**:
+- Constraints push people to search for more distant ideas in semantic memory
+- Extreme constraints may require different types of creative problem-solving
+- Not all constraints promote creativity for all individuals/tasks
+- A "constraint-leveraging mindset" can be developed through experience
+
+**Application to Our Work**: Expert role = productive constraint that expands rather than limits creative space. The expert perspective forces exploration of non-obvious solution spaces.
+
+---
+
+## 3. LLM Limitations in Creative Generation
+
+### 3.1 Design Fixation from AI
+**The Effects of Generative AI on Design Fixation and Divergent Thinking** (CHI 2024)
+
+Key finding: AI exposure during ideation leads to HIGHER fixation. Participants who used AI produced:
+- Fewer ideas
+- Less variety
+- Lower originality
+
+compared to baseline (no AI assistance).
+
+### 3.2 Dual Mechanisms: Inspiration vs. Fixation
+**Inspiration Booster or Creative Fixation?** (Nature Humanities & Social Sciences, 2025)
+
+- LLMs help in **simple** creative tasks (inspiration stimulation)
+- LLMs **hurt** in **complex** creative tasks (creative fixation)
+
+**Application to Our Work**: Our structured decomposition manages complexity, while multi-expert approach maintains inspiration benefits.
+
+### 3.3 Statistical Pattern Perpetuation
+**Bias and Fairness in Large Language Models: A Survey** (MIT Press, 2024)
+
+LLMs learn, perpetuate, and amplify patterns from training data. This applies to creative outputs - LLMs generate what is statistically common/expected.
+
+### 3.4 Generalization Bias
+**Generalization Bias in LLM Summarization** (Royal Society, 2025)
+
+LLMs' overgeneralization tendency produces outputs that lack sufficient empirical support. This suggests a bias toward "safe" middle-ground outputs rather than novel extremes.
+
+---
+
+## 4. Role-Playing and Perspective-Taking
+
+### 4.1 Creativity Enhancement
+Research on tabletop role-playing games (TTRPGs) demonstrates:
+- Significant positive impact on creativity potential through divergent thinking
+- TTRPG players exhibit significantly higher creativity than non-players
+- Perspective-taking is closely linked to empathy and cognitive flexibility
+
+### 4.2 Therapeutic and Educational Applications
+- Role-playing develops perspective-taking, storytelling, creativity, and self-expression
+- Physiological, emotional, and mental well-being from play enables creative ideation
+- Play signals psychological safety, which is essential for creativity
+
+### 4.3 Design Research Applications
+- Role-playing stimulates creativity by exploring alternative solutions
+- Offers safe environment to explore failure modes and challenge assumptions
+- Well-suited for early-stage ideation and empathy-critical moments
+
+---
+
+## 5. Creativity Support Tools (CSTs)
+
+### 5.1 Current State
+- CSTs primarily support **divergent** thinking
+- **Convergent** thinking often neglected
+- Ideal CST should offer tailored support for both
+
+### 5.2 AI as Creative Partner
+- Collaborative ideation systems expose users to different ideas
+- Competing theories on when/whether such exposure helps
+- Tool-mediated expert activity view: computers as "mediating artifacts people act through"
+
+### 5.3 Evaluation Methods
+**Consensual Assessment Technique (CAT)**:
+- Pool of experts independently evaluate artifacts
+- Creative if high evaluations + high interrater reliability (Cronbach's alpha > 0.7)
+
+**Semantic Distance Measures**:
+- SemDis platform for automated creativity assessment
+- Overcomes labor cost and subjectivity of human rating
+- Uses NLP to quantify semantic relatedness
+
+---
+
+## 6. Our Theoretical Contribution
+
+### The "Semantic Gravity" Problem
+
+```
+Direct LLM Generation:
+  P(idea | query)
+  → Samples from high-probability region
+  → Ideas cluster around training distribution modes
+  → "Semantic gravity" pulls toward conventional associations
+```
+
+### Expert Transformation Solution
+
+```
+Conditioned Generation:
+  P(idea | query, expert)
+  → Expert perspective activates distant semantic regions
+  → Forces conceptual blending across domains
+  → Breaks design fixation through productive constraints
+```
+
+### Multi-Expert Aggregation
+
+```
+Diverse Experts → Semantic Coverage
+  → "Inner crowd" wisdom without actual crowd
+  → Systematic exploration of idea space
+  → Deduplication ensures non-redundant novelty
+```
+
+### Theoretical Model
+
+1. **Attribute Decomposition**: Structures the problem space (categories, attributes)
+2. **Expert Perspectives**: Forces semantic jumps to distant domains
+3. **Multi-Expert Aggregation**: Achieves crowd-like diversity individually
+4. **Deduplication**: Ensures generated ideas are truly distinct
+5. **Patent Validation**: Grounds novelty in real-world uniqueness
--- a/research/paper_outline.md
+++ b/research/paper_outline.md
@@ -0,0 +1,288 @@
+# Paper Outline: Expert-Augmented LLM Ideation
+
+## Suggested Titles
+
+1. **"Breaking Semantic Gravity: Expert-Augmented LLM Ideation for Enhanced Creativity"**
+2. "Beyond Interpolation: Multi-Expert Perspectives for Combinatorial Innovation"
+3. "Escaping the Relevance Trap: Structured Expert Frameworks for Creative AI"
+4. "From Crowd to Expert: Simulating Diverse Perspectives for LLM-Based Ideation"
+
+---
+
+## Abstract (Draft)
+
+Large Language Models (LLMs) are increasingly used for creative ideation, yet they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs clustered around high-probability regions of their training distribution. This limits the novelty and diversity of generated ideas. We propose a multi-expert transformation framework that systematically activates diverse semantic regions by conditioning LLM generation on simulated expert perspectives. Our system decomposes concepts into structured attributes, generates ideas through multiple domain-expert viewpoints, and employs semantic deduplication to ensure genuine diversity. Through experiments comparing multi-expert generation against direct LLM generation and single-expert baselines, we demonstrate that our approach produces ideas with [X]% higher semantic diversity and [Y]% lower patent overlap. We contribute a theoretical framework explaining LLM creativity limitations and an open-source system for innovation ideation.
+
+---
+
+## 1. Introduction
+
+### 1.1 The Promise and Problem of LLM Creativity
+- LLMs widely adopted for creative tasks
+- Initial enthusiasm: infinite idea generation
+- Emerging concern: quality and diversity issues
+
+### 1.2 The Semantic Gravity Problem
+- Define the phenomenon
+- Why it occurs (statistical learning, mode collapse)
+- Why it matters (innovation requires novelty)
+
+### 1.3 Our Solution: Expert-Augmented Ideation
+- Brief overview of the approach
+- Key insight: expert perspectives as semantic "escape velocity"
+- Contributions preview
+
+### 1.4 Paper Organization
+- Roadmap for the rest of the paper
+
+---
+
+## 2. Related Work
+
+### 2.1 Theoretical Foundations
+- Semantic distance and creativity (Mednick, 1962)
+- Conceptual blending theory (Fauconnier & Turner)
+- Design fixation (Jansson & Smith)
+- Constraint-based creativity
+
+### 2.2 LLM Limitations in Creative Generation
+- Design fixation from AI (CHI 2024)
+- Dual mechanisms: inspiration vs. fixation
+- Bias and pattern perpetuation
+
+### 2.3 Persona-Based Prompting
+- PersonaFlow (2024)
+- BILLY persona vectors (2025)
+- Quantifying persona effects (ACL 2024)
+
+### 2.4 Creativity Support Tools
+- Wisdom of crowds approaches
+- Human-AI collaboration in ideation
+- Evaluation methods (CAT, semantic distance)
+
+### 2.5 Positioning Our Work
+- Gap: No end-to-end system combining structured decomposition + multi-expert transformation + deduplication
+- Distinction from PersonaFlow: product innovation focus, attribute structure
+
+---
+
+## 3. System Design
+
+### 3.1 Overview
+- Pipeline diagram
+- Design rationale
+
+### 3.2 Attribute Decomposition
+- Category analysis (dynamic vs. fixed)
+- Attribute generation per category
+- DAG relationship mapping
+
+### 3.3 Expert Team Generation
+- Expert sources: LLM-generated, curated, external databases
+- Diversity optimization strategies
+- Domain coverage considerations
+
+### 3.4 Expert Transformation
+- Conditioning mechanism
+- Keyword generation
+- Description generation
+- Parallel processing for efficiency
+
+### 3.5 Semantic Deduplication
+- Embedding-based approach
+- LLM-based approach
+- Threshold selection
+
+### 3.6 Novelty Validation
+- Patent search integration
+- Overlap scoring
+
+---
+
+## 4. Experiments
+
+### 4.1 Research Questions
+- RQ1: Does multi-expert generation increase semantic diversity?
+- RQ2: Does multi-expert generation reduce patent overlap?
+- RQ3: What is the optimal number of experts?
+- RQ4: How do expert sources affect output quality?
+
+### 4.2 Experimental Setup
+
+#### 4.2.1 Dataset
+- N concepts/queries for ideation
+- Selection criteria (diverse domains, complexity levels)
+
+#### 4.2.2 Conditions
+| Condition | Description |
+|-----------|-------------|
+| Baseline | Direct LLM: "Generate 20 creative ideas for X" |
+| Single-Expert | 1 expert × 20 ideas |
+| Multi-Expert-4 | 4 experts × 5 ideas each |
+| Multi-Expert-8 | 8 experts × 2-3 ideas each |
+| Random-Perspective | 4 random words as "perspectives" |
+
+#### 4.2.3 Controls
+- Same LLM model (specify version)
+- Same temperature settings
+- Same total idea count per condition
+
+### 4.3 Metrics
+
+#### 4.3.1 Semantic Diversity
+- Mean pairwise cosine distance between embeddings
+- Cluster distribution analysis
+- Silhouette score for idea clustering
+
+#### 4.3.2 Novelty
+- Patent overlap rate
+- Semantic distance from query centroid
+
+#### 4.3.3 Quality (Human Evaluation)
+- Novelty rating (1-7 Likert)
+- Usefulness rating (1-7 Likert)
+- Creativity rating (1-7 Likert)
+- Interrater reliability (Cronbach's alpha)
+
+### 4.4 Procedure
+- Idea generation process
+- Evaluation process
+- Statistical analysis methods
+
+---
+
+## 5. Results
+
+### 5.1 Semantic Diversity (RQ1)
+- Quantitative results
+- Visualization (t-SNE/UMAP of idea embeddings)
+- Statistical significance tests
+
+### 5.2 Patent Novelty (RQ2)
+- Overlap rates by condition
+- Examples of high-novelty ideas
+
+### 5.3 Expert Count Analysis (RQ3)
+- Diversity vs. expert count curve
+- Diminishing returns analysis
+- Optimal expert count recommendation
+
+### 5.4 Expert Source Comparison (RQ4)
+- LLM-generated vs. curated vs. random
+- Unconventionality metrics
+
+### 5.5 Human Evaluation Results
+- Rating distributions
+- Condition comparisons
+- Correlation with automatic metrics
+
+---
+
+## 6. Discussion
+
+### 6.1 Interpreting the Results
+- Why multi-expert works
+- The role of structured decomposition
+- Deduplication importance
+
+### 6.2 Theoretical Implications
+- Semantic gravity as framework for LLM creativity
+- Expert perspectives as productive constraints
+- Inner crowd wisdom
+
+### 6.3 Practical Implications
+- When to use multi-expert approach
+- Expert selection strategies
+- Integration with existing workflows
+
+### 6.4 Limitations
+- LLM-specific results may not generalize
+- Patent overlap as proxy for true novelty
+- Human evaluation subjectivity
+- Single-language experiments
+
+### 6.5 Future Work
+- Cross-cultural creativity
+- Domain-specific expert optimization
+- Real-world deployment studies
+- Integration with other creativity techniques
+
+---
+
+## 7. Conclusion
+
+- Summary of contributions
+- Key takeaways
+- Broader impact
+
+---
+
+## Appendices
+
+### A. Prompt Templates
+- Expert generation prompts
+- Keyword generation prompts
+- Description generation prompts
+
+### B. Full Experimental Results
+- Complete data tables
+- Additional visualizations
+
+### C. Expert Source Details
+- Curated occupation list
+- DBpedia/Wikidata query details
+
+### D. Human Evaluation Protocol
+- Instructions for raters
+- Example ratings
+- Training materials
+
+---
+
+## Target Venues
+
+### Tier 1 (Recommended)
+1. **CHI** - ACM Conference on Human Factors in Computing Systems
+   - Strong fit: creativity support tools, human-AI collaboration
+   - Deadline: typically September
+
+2. **CSCW** - ACM Conference on Computer-Supported Cooperative Work
+   - Good fit: collaborative ideation, crowd wisdom
+   - Deadline: typically April/January
+
+3. **Creativity & Cognition** - ACM Conference
+   - Perfect fit: computational creativity focus
+   - Smaller but specialized venue
+
+### Tier 2 (Alternative)
+4. **DIS** - ACM Designing Interactive Systems
+   - Good fit: design ideation tools
+
+5. **UIST** - ACM Symposium on User Interface Software and Technology
+   - If system/interaction focus emphasized
+
+6. **ICCC** - International Conference on Computational Creativity
+   - Specialized computational creativity venue
+
+### Journal Options
+1. **International Journal of Human-Computer Studies (IJHCS)**
+2. **ACM Transactions on Computer-Human Interaction (TOCHI)**
+3. **Design Studies**
+4. **Creativity Research Journal**
+
+---
+
+## Timeline Checklist
+
+- [ ] Finalize experimental design
+- [ ] Collect/select query dataset
+- [ ] Run all experimental conditions
+- [ ] Compute automatic metrics
+- [ ] Design human evaluation study
+- [ ] Recruit evaluators
+- [ ] Conduct human evaluation
+- [ ] Statistical analysis
+- [ ] Write first draft
+- [ ] Internal review
+- [ ] Revision
+- [ ] Submit
--- a/research/references.md
+++ b/research/references.md
@@ -0,0 +1,208 @@
+# References
+
+## Core Related Work
+
+1. **Siangliulue, P., Arnold, K. C., Gajos, K. Z., & Dow, S. P.** (2017). Bringing the Wisdom of the Crowd to an Individual by Having the Individual Assume Different Roles. *Proceedings of the 2017 ACM SIGCHI Conference on Creativity and Cognition (C&C '17)*, 131-141.
+   - https://dl.acm.org/doi/10.1145/3059454.3059467
+
+2. **Liu, Y., Sharma, A., et al.** (2024). PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation. *arXiv preprint*.
+   - https://arxiv.org/html/2409.12538v1
+   - https://www.semanticscholar.org/paper/PersonaFlow:-Designing-LLM-Simulated-Expert-for-Liu-Sharma/eb0c224be9191e39452f20b2cbb886b5ecc4f57b
+
+3. **Choi, J., et al.** (2023). PopBlends: Strategies for Conceptual Blending with Large Language Models. *Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems*.
+   - https://dl.acm.org/doi/10.1145/3544548.3580948
+
+4. **BILLY Authors** (2025). BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation. *arXiv preprint*.
+   - https://arxiv.org/html/2510.10157v1
+
+---
+
+## Semantic Distance & Creative Cognition
+
+5. **Mednick, S. A.** (1962). The associative basis of the creative process. *Psychological Review, 69*(3), 220-232.
+   - (Classic foundational paper)
+
+6. **Kenett, Y. N., & Faust, M.** (2019). Going the Extra Creative Mile: The Role of Semantic Distance in Creativity – Theory, Research, and Measurement. *The Cambridge Handbook of the Neuroscience of Creativity*.
+   - https://www.cambridge.org/core/books/abs/cambridge-handbook-of-the-neuroscience-of-creativity/going-the-extra-creative-mile-the-role-of-semantic-distance-in-creativity-theory-research-and-measurement/3AD9143E69A463F85F2D8CC8940425CA
+
+7. **Beaty, R. E., & Johnson, D. R.** (2021). Automating creativity assessment with SemDis: An open platform for computing semantic distance. *Behavior Research Methods, 53*, 757-780.
+   - https://link.springer.com/article/10.3758/s13428-020-01453-w
+
+8. **What can quantitative measures of semantic distance tell us about creativity?** (2018). *Current Opinion in Behavioral Sciences*.
+   - https://www.sciencedirect.com/science/article/abs/pii/S2352154618301098
+
+9. **Semantic Memory and Creativity: The Costs and Benefits of Semantic Memory Structure in Generating Original Ideas** (2023). *PMC*.
+   - https://pmc.ncbi.nlm.nih.gov/articles/PMC10128864/
+
+10. **The Role of Semantic Associations as a Metacognitive Cue in Creative Idea Generation** (2023). *PMC*.
+    - https://pmc.ncbi.nlm.nih.gov/articles/PMC10141130/
+
+---
+
+## Conceptual Blending Theory
+
+11. **Fauconnier, G., & Turner, M.** (2002). *The Way We Think: Conceptual Blending and the Mind's Hidden Complexities*. Basic Books.
+
+12. **Conceptual Blending** - Wikipedia Overview
+    - https://en.wikipedia.org/wiki/Conceptual_blending
+
+13. **Pereira, F. C.** (2007). *Creativity and Artificial Intelligence: A Conceptual Blending Approach*. Mouton de Gruyter.
+    - https://dl.acm.org/doi/10.5555/1557446
+    - https://www.researchgate.net/publication/332711522_Creativity_and_Artificial_Intelligence_A_Conceptual_Blending_Approach
+
+14. **Confalonieri, R., et al.** (2018). A computational framework for conceptual blending. *Artificial Intelligence, 256*, 105-129.
+    - https://www.sciencedirect.com/science/article/pii/S000437021730142X
+
+15. **Trisociation with AI for Creative Idea Generation** (2025). *California Management Review*.
+    - https://cmr.berkeley.edu/2025/01/trisociation-with-ai-for-creative-idea-generation/
+
+---
+
+## Design Fixation & Constraint-Based Creativity
+
+16. **Jansson, D. G., & Smith, S. M.** (1991). Design fixation. *Design Studies, 12*(1), 3-11.
+    - (Classic foundational paper)
+
+17. **Design Fixation: A Cognitive Model**. *Design Society*.
+    - https://www.designsociety.org/download-publication/25504/design_fixation_a_cognitive_model
+
+18. **Crilly, N.** (2019). Research Design Fixation. *Cambridge Repository*.
+    - https://www.repository.cam.ac.uk/bitstreams/2c002015-8771-4694-ad48-0e4b52008bdf/download
+
+19. **Using fMRI to deepen our understanding of design fixation** (2020). *Design Science, Cambridge Core*.
+    - https://www.cambridge.org/core/journals/design-science/article/using-fmri-to-deepen-our-understanding-of-design-fixation/2DD81FEE8ED682F6DFF415BF2948EFA6
+
+20. **Acar, O. A., Tarakci, M., & van Knippenberg, D.** (2019). Creativity and Innovation Under Constraints: A Cross-Disciplinary Integrative Review. *Journal of Management, 45*(1), 96-121.
+    - https://journals.sagepub.com/doi/full/10.1177/0149206318805832
+
+21. **Cromwell, J. R.** (2024). How combinations of constraint affect creativity: A new typology of creative problem solving in organizations. *Organizational Psychology Review*.
+    - https://journals.sagepub.com/doi/10.1177/20413866231202031
+
+22. **Creativity from constraints: Theory and applications to education** (2022). *Thinking Skills and Creativity*.
+    - https://www.sciencedirect.com/science/article/abs/pii/S1871187122001870
+
+---
+
+## LLM Limitations in Creative Generation
+
+23. **Wadinambiarachchi, S., et al.** (2024). The Effects of Generative AI on Design Fixation and Divergent Thinking. *Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems*.
+    - https://dl.acm.org/doi/full/10.1145/3613904.3642919
+    - https://arxiv.org/html/2403.11164v1
+
+24. **Inspiration booster or creative fixation? The dual mechanisms of LLMs in shaping individual creativity in tasks of different complexity** (2025). *Humanities and Social Sciences Communications (Nature)*.
+    - https://www.nature.com/articles/s41599-025-05867-9
+
+25. **Gallegos, I. O., et al.** (2024). Bias and Fairness in Large Language Models: A Survey. *Computational Linguistics, 50*(3), 1097-1179. MIT Press.
+    - https://direct.mit.edu/coli/article/50/3/1097/121961/Bias-and-Fairness-in-Large-Language-Models-A
+
+26. **Generalization bias in large language model summarization of scientific research** (2025). *Royal Society Open Science, 12*(4).
+    - https://royalsocietypublishing.org/rsos/article/12/4/241776/235656/Generalization-bias-in-large-language-model
+
+27. **LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models** (2025). *arXiv*.
+    - https://arxiv.org/html/2505.19240v1
+
+---
+
+## Persona Prompting & Multi-Agent Systems
+
+28. **Quantifying the Persona Effect in LLM Simulations** (2024). *ACL 2024*.
+    - https://aclanthology.org/2024.acl-long.554.pdf
+    - https://www.emergentmind.com/topics/persona-effect-in-llm-simulations
+
+29. **Two Tales of Persona in LLMs: A Survey of Role-Playing** (2024). *EMNLP Findings*.
+    - https://aclanthology.org/2024.findings-emnlp.969.pdf
+
+30. **LLM Generated Persona is a Promise with a Catch** (2024). *Semantic Scholar*.
+    - https://www.semanticscholar.org/paper/LLM-Generated-Persona-is-a-Promise-with-a-Catch-Li-Chen/3ea29481ec11d1568fde727d236f71e44e4e2ad0
+
+31. **Using AI for User Representation: An Analysis of 83 Persona Prompts** (2025). *arXiv*.
+    - https://arxiv.org/html/2508.13047v1
+
+32. **Scaffolding Creativity: How Divergent and Convergent Personas Shape AI-Assisted Ideation** (2025). *arXiv*.
+    - https://arxiv.org/pdf/2510.26490
+
+---
+
+## Role-Playing & Perspective-Taking
+
+33. **Chung, T. S.** (2013). Table-top role playing game and creativity. *Thinking Skills and Creativity, 8*, 56-71.
+    - https://www.researchgate.net/publication/257701334_Table-top_role_playing_game_and_creativity
+
+34. **The effect of tabletop role-playing games on the creative potential and emotional creativity of Taiwanese college students** (2015). *Thinking Skills and Creativity*.
+    - https://www.researchgate.net/publication/284013184_The_effect_of_tabletop_role-playing_games_on_the_creative_potential_and_emotional_creativity_of_Taiwanese_college_students
+
+35. **Psychology and Role-Playing Games** (2019). *ResearchGate*.
+    - https://www.researchgate.net/publication/331758159_Psychology_and_Role-Playing_Games
+
+36. **Role Playing and Perspective Taking: An Educational Point of View** (2020). *ResearchGate*.
+    - https://www.researchgate.net/publication/346610467_Role_Playing_and_Perspective_Taking_An_Educational_Point_of_View
+
+---
+
+## Creativity Support Tools & Evaluation
+
+37. **Jordanous, A.** (2018). Evaluating Computational Creativity: An Interdisciplinary Tutorial. *ACM Computing Surveys, 51*(2), Article 28.
+    - https://dl.acm.org/doi/10.1145/3167476
+
+38. **Evaluating Creativity in Computational Co-Creative Systems** (2018). *ResearchGate*.
+    - https://www.researchgate.net/publication/326646917_Evaluating_Creativity_in_Computational_Co-Creative_Systems
+
+39. **The Intersection of Users, Roles, Interactions, and Technologies in Creativity Support Tools** (2021). *DIS '21*.
+    - https://dl.acm.org/doi/10.1145/3461778.3462050
+
+40. **What Counts as 'Creative' Work? Articulating Four Epistemic Positions in Creativity-Oriented HCI Research** (2024). *CHI '24*.
+    - https://dl.acm.org/doi/10.1145/3613904.3642854
+
+41. **Colton, S., & Wiggins, G. A.** (2012). Computational Creativity: The Final Frontier? *ECAI 2012*.
+    - https://link.springer.com/article/10.1007/s00354-020-00116-w
+
+---
+
+## AI-Augmented Design & Ideation
+
+42. **The effect of AI-based inspiration on human design ideation** (2023). *CoDesign*.
+    - https://www.tandfonline.com/doi/full/10.1080/21650349.2023.2167124
+
+43. **A Hybrid Prototype Method Combining Physical Models and Generative AI to Support Creativity in Conceptual Design** (2024). *ACM TOCHI*.
+    - https://dl.acm.org/doi/10.1145/3689433
+
+44. **Artificial intelligence for design education: A conceptual approach to enhance students' divergent and convergent thinking** (2025). *IJTDE*.
+    - https://link.springer.com/article/10.1007/s10798-025-09964-3
+
+45. **The Ideation Compass: Supporting interdisciplinary creative dialogues with real time visualization** (2022). *CoDesign*.
+    - https://www.tandfonline.com/doi/full/10.1080/21650349.2022.2142674
+
+46. **Guiding data-driven design ideation by knowledge distance** (2021). *Knowledge-Based Systems*.
+    - https://www.sciencedirect.com/science/article/abs/pii/S0950705121001362
+
+---
+
+## CHI/CSCW Related Papers
+
+47. **Chan, J., Dang, S., & Dow, S. P.** (2016). Improving Crowd Innovation with Expert Facilitation. *CSCW '16*.
+
+48. **Koch, J., et al.** (2020). ImageSense: An Intelligent Collaborative Ideation Tool to Support Diverse Human-Computer Partnerships. *CSCW '20*.
+
+49. **Yu, L., Kittur, A., & Kraut, R. E.** (2014). Distributed Analogical Idea Generation: Inventing with Crowds. *CHI '14*.
+
+50. **Crowdboard** (2017). *C&C '17*.
+    - https://dl.acm.org/doi/10.1145/3059454.3059477
+
+51. **Collaborative Creativity** (2011). *CHI '11*.
+    - https://dl.acm.org/doi/10.1145/1978942.1979214
+
+52. **Beyond Automation: How UI/UX Designers Perceive AI as a Creative Partner in the Divergent Thinking Stages** (2025). *arXiv*.
+    - https://arxiv.org/html/2501.18778
+
+---
+
+## Additional Resources
+
+53. **Automatic Scoring of Metaphor Creativity with Large Language Models** (2024). *Creativity Research Journal*.
+    - https://www.tandfonline.com/doi/full/10.1080/10400419.2024.2326343
+
+54. **Wisdom of Crowds** - Surowiecki, J. (2004). *The Wisdom of Crowds*. Doubleday.
+    - https://en.wikipedia.org/wiki/The_Wisdom_of_Crowds
+
+55. **Research: When Used Correctly, LLMs Can Unlock More Creative Ideas** (2025). *Harvard Business Review*.
+    - https://hbr.org/2025/12/research-when-used-correctly-llms-can-unlock-more-creative-ideas
--- a/research/theoretical_framework.md
+++ b/research/theoretical_framework.md
@@ -0,0 +1,280 @@
+# Theoretical Framework: Expert-Augmented LLM Ideation
+
+## The Core Problem: LLM "Semantic Gravity"
+
+### What is Semantic Gravity?
+
+When LLMs generate creative ideas directly, they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs that cluster around high-probability regions of their training distribution.
+
+```
+Direct LLM Generation:
+  Input: "Generate creative ideas for a chair"
+
+  LLM Process:
+    P(idea | "chair") → samples from training distribution
+
+  Result:
+    - "Ergonomic office chair" (high probability)
+    - "Foldable portable chair" (high probability)
+    - "Eco-friendly bamboo chair" (moderate probability)
+
+  Problem:
+    → Ideas cluster in predictable semantic neighborhoods
+    → Limited exploration of distant conceptual spaces
+    → "Creative" outputs are interpolations, not extrapolations
+```
+
+### Why Does This Happen?
+
+1. **Statistical Pattern Learning**: LLMs learn co-occurrence patterns from training data
+2. **Mode Collapse**: When asked to be "creative," LLMs sample from the distribution of "creative ideas" they've seen
+3. **Relevance Trap**: Strong associations dominate weak ones (chair→furniture >> chair→marine biology)
+4. **Prototype Bias**: Outputs gravitate toward category prototypes, not edge cases
+
+---
+
+## The Solution: Expert Perspective Transformation
+
+### Theoretical Basis
+
+Our approach draws from three key theoretical foundations:
+
+#### 1. Semantic Distance Theory (Mednick, 1962)
+
+> "Creative thinking involves connecting weakly related, remote concepts in semantic memory."
+
+**Key insight**: Creativity correlates with semantic distance. The farther the conceptual "jump," the more creative the result.
+
+**Our application**: Expert perspectives force semantic jumps that LLMs wouldn't naturally make.
+
+```
+Without Expert:
+  "Chair" → furniture, sitting, comfort, design
+  Semantic distance: SHORT
+
+With Marine Biologist Expert:
+  "Chair" → underwater pressure, coral structure, buoyancy, bioluminescence
+  Semantic distance: LONG
+
+Result: Novel ideas like "pressure-adaptive seating" or "coral-inspired structural support"
+```
+
+#### 2. Conceptual Blending Theory (Fauconnier & Turner, 2002)
+
+> "Creative products emerge from blending elements of two input spaces into a novel integrated space."
+
+**The blending process**:
+1. Input Space 1: The target concept (e.g., "chair")
+2. Input Space 2: The expert's domain knowledge (e.g., marine biology)
+3. Generic Space: Abstract structure shared by both
+4. Blended Space: Novel integration of elements from both inputs
+
+**Our application**: Each expert provides a distinct input space for systematic blending.
+
+```
+┌─────────────────┐     ┌─────────────────┐
+│   Input 1       │     │   Input 2       │
+│   "Chair"       │     │ Marine Biology  │
+│   - support     │     │ - pressure      │
+│   - sitting     │     │ - buoyancy      │
+│   - comfort     │     │ - adaptation    │
+└────────┬────────┘     └────────┬────────┘
+         │                       │
+         └───────────┬───────────┘
+                     ▼
+         ┌─────────────────────┐
+         │   Blended Space     │
+         │   Novel Chair Ideas │
+         │   - pressure-adapt  │
+         │   - buoyant support │
+         │   - bio-adaptive    │
+         └─────────────────────┘
+```
+
+#### 3. Design Fixation Breaking (Jansson & Smith, 1991)
+
+> "Design fixation is blind adherence to initial ideas, limiting creative output."
+
+**Fixation occurs because**:
+- Knowledge is organized around category prototypes
+- Prototypes require less cognitive effort to access
+- Initial examples anchor subsequent ideation
+
+**Our application**: Expert perspectives act as "defixation triggers" by activating non-prototype knowledge.
+
+```
+Without Intervention:
+  Prototype: "standard four-legged chair"
+  Fixation: Variations on four-legged design
+
+With Expert Intervention:
+  Archaeologist: "Ancient people sat differently..."
+  Dance Therapist: "Seating affects movement expression..."
+
+  Fixation Broken: Entirely new seating paradigms explored
+```
+
+---
+
+## The Multi-Expert Aggregation Model
+
+### From "Wisdom of Crowds" to "Inner Crowd"
+
+Research shows that groups generate more diverse ideas because each member brings different perspectives. Our system simulates this "crowd wisdom" through multiple expert personas:
+
+```
+Traditional Crowd:
+  Person 1 → Ideas from perspective 1
+  Person 2 → Ideas from perspective 2
+  Person 3 → Ideas from perspective 3
+  Aggregation → Diverse idea pool
+
+Our "Inner Crowd":
+  LLM + Expert 1 Persona → Ideas from perspective 1
+  LLM + Expert 2 Persona → Ideas from perspective 2
+  LLM + Expert 3 Persona → Ideas from perspective 3
+  Aggregation → Diverse idea pool (simulated crowd)
+```
+
+### Why Multiple Experts Work
+
+1. **Coverage**: Different experts activate different semantic regions
+2. **Redundancy Reduction**: Deduplication removes overlapping ideas
+3. **Diversity by Design**: Expert selection can be optimized for maximum diversity
+4. **Diminishing Returns**: Beyond ~4-6 experts, marginal diversity gains decrease
+
+---
+
+## The Complete Pipeline
+
+### Stage 1: Attribute Decomposition
+
+**Purpose**: Structure the problem space before creative exploration
+
+```
+Input: "Innovative chair design"
+
+Output:
+  Categories: [Material, Function, Usage, User Group]
+
+  Material: [wood, metal, fabric, composite]
+  Function: [support, comfort, mobility, storage]
+  Usage: [office, home, outdoor, medical]
+  User Group: [children, elderly, professionals, athletes]
+```
+
+**Theoretical basis**: Structured decomposition prevents premature fixation on holistic solutions.
+
+### Stage 2: Expert Team Generation
+
+**Purpose**: Assemble diverse perspectives for maximum semantic coverage
+
+```
+Strategies:
+  1. LLM-Generated: Query-specific, prioritizes unconventional experts
+  2. Curated: Pre-selected high-quality occupations
+  3. External Sources: DBpedia, Wikidata for broad coverage
+
+Diversity Optimization:
+  - Domain spread (arts, science, trades, services)
+  - Expertise level variation
+  - Cultural/geographic diversity
+```
+
+### Stage 3: Expert Transformation
+
+**Purpose**: Apply each expert's perspective to each attribute
+
+```
+For each (attribute, expert) pair:
+
+  Input: "Chair comfort" + "Marine Biologist"
+
+  LLM Prompt:
+    "As a marine biologist, how might you reimagine
+     chair comfort using principles from your field?"
+
+  Output: Keywords + Descriptions
+    - "Pressure-distributed seating inspired by deep-sea fish"
+    - "Buoyancy-assisted support reducing pressure points"
+```
+
+### Stage 4: Deduplication
+
+**Purpose**: Ensure idea set is truly diverse, not just numerous
+
+```
+Methods:
+  1. Embedding-based: Fast cosine similarity clustering
+  2. LLM-based: Semantic pairwise comparison (more accurate)
+
+Output:
+  - Unique ideas grouped by similarity
+  - Representative idea selected from each cluster
+  - Diversity metrics computed
+```
+
+### Stage 5: Novelty Validation
+
+**Purpose**: Ground novelty in real-world uniqueness
+
+```
+Process:
+  - Search patent databases for similar concepts
+  - Compute overlap scores
+  - Flag ideas with high existing coverage
+
+Output:
+  - Novelty score per idea
+  - Patent overlap rate for idea set
+```
+
+---
+
+## Testable Hypotheses
+
+### H1: Semantic Diversity
+> Multi-expert generation produces higher semantic diversity than single-expert or direct generation.
+
+**Measurement**: Mean pairwise cosine distance between idea embeddings
+
+### H2: Novelty
+> Ideas from multi-expert generation have lower patent overlap than direct generation.
+
+**Measurement**: Percentage of ideas with existing patent matches
+
+### H3: Expert Count Effect
+> Semantic diversity increases with expert count, with diminishing returns beyond 4-6 experts.
+
+**Measurement**: Diversity vs. expert count curve
+
+### H4: Expert Source Effect
+> LLM-generated experts produce more unconventional ideas than curated/database experts.
+
+**Measurement**: Semantic distance from query centroid
+
+### H5: Fixation Breaking
+> Multi-expert approach produces more ideas outside the top-3 semantic clusters than direct generation.
+
+**Measurement**: Cluster distribution analysis
+
+---
+
+## Expected Contributions
+
+1. **Theoretical**: Formalization of "semantic gravity" as LLM creativity limitation
+2. **Methodological**: Expert-augmented ideation pipeline with evaluation framework
+3. **Empirical**: Quantitative evidence for multi-expert creativity enhancement
+4. **Practical**: Open-source system for innovation ideation
+
+---
+
+## Positioning Against Related Work
+
+| Approach | Limitation | Our Advantage |
+|----------|------------|---------------|
+| Direct LLM generation | Semantic gravity, fixation | Expert-forced semantic jumps |
+| Human brainstorming | Cognitive fatigue, social dynamics | Tireless LLM generation |
+| PersonaFlow (2024) | Research-focused, no attribute structure | Product innovation, structured decomposition |
+| PopBlends (2023) | Two-concept blending only | Multi-expert, multi-attribute blending |
+| BILLY (2025) | Vector fusion less interpretable | Sequential generation, explicit control |