chore: save local changes

2026-01-05 22:32:08 +08:00
parent bc281b8e0a
commit ec48709755
42 changed files with 5576 additions and 254 deletions
--- a/research/theoretical_framework.md
+++ b/research/theoretical_framework.md
@@ -0,0 +1,280 @@
+# Theoretical Framework: Expert-Augmented LLM Ideation
+
+## The Core Problem: LLM "Semantic Gravity"
+
+### What is Semantic Gravity?
+
+When LLMs generate creative ideas directly, they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs that cluster around high-probability regions of their training distribution.
+
+```
+Direct LLM Generation:
+  Input: "Generate creative ideas for a chair"
+
+  LLM Process:
+    P(idea | "chair") → samples from training distribution
+
+  Result:
+    - "Ergonomic office chair" (high probability)
+    - "Foldable portable chair" (high probability)
+    - "Eco-friendly bamboo chair" (moderate probability)
+
+  Problem:
+    → Ideas cluster in predictable semantic neighborhoods
+    → Limited exploration of distant conceptual spaces
+    → "Creative" outputs are interpolations, not extrapolations
+```
+
+### Why Does This Happen?
+
+1. **Statistical Pattern Learning**: LLMs learn co-occurrence patterns from training data
+2. **Mode Collapse**: When asked to be "creative," LLMs sample from the distribution of "creative ideas" they've seen
+3. **Relevance Trap**: Strong associations dominate weak ones (chair→furniture >> chair→marine biology)
+4. **Prototype Bias**: Outputs gravitate toward category prototypes, not edge cases
+
+---
+
+## The Solution: Expert Perspective Transformation
+
+### Theoretical Basis
+
+Our approach draws from three key theoretical foundations:
+
+#### 1. Semantic Distance Theory (Mednick, 1962)
+
+> "Creative thinking involves connecting weakly related, remote concepts in semantic memory."
+
+**Key insight**: Creativity correlates with semantic distance. The farther the conceptual "jump," the more creative the result.
+
+**Our application**: Expert perspectives force semantic jumps that LLMs wouldn't naturally make.
+
+```
+Without Expert:
+  "Chair" → furniture, sitting, comfort, design
+  Semantic distance: SHORT
+
+With Marine Biologist Expert:
+  "Chair" → underwater pressure, coral structure, buoyancy, bioluminescence
+  Semantic distance: LONG
+
+Result: Novel ideas like "pressure-adaptive seating" or "coral-inspired structural support"
+```
+
+#### 2. Conceptual Blending Theory (Fauconnier & Turner, 2002)
+
+> "Creative products emerge from blending elements of two input spaces into a novel integrated space."
+
+**The blending process**:
+1. Input Space 1: The target concept (e.g., "chair")
+2. Input Space 2: The expert's domain knowledge (e.g., marine biology)
+3. Generic Space: Abstract structure shared by both
+4. Blended Space: Novel integration of elements from both inputs
+
+**Our application**: Each expert provides a distinct input space for systematic blending.
+
+```
+┌─────────────────┐     ┌─────────────────┐
+│   Input 1       │     │   Input 2       │
+│   "Chair"       │     │ Marine Biology  │
+│   - support     │     │ - pressure      │
+│   - sitting     │     │ - buoyancy      │
+│   - comfort     │     │ - adaptation    │
+└────────┬────────┘     └────────┬────────┘
+         │                       │
+         └───────────┬───────────┘
+                     ▼
+         ┌─────────────────────┐
+         │   Blended Space     │
+         │   Novel Chair Ideas │
+         │   - pressure-adapt  │
+         │   - buoyant support │
+         │   - bio-adaptive    │
+         └─────────────────────┘
+```
+
+#### 3. Design Fixation Breaking (Jansson & Smith, 1991)
+
+> "Design fixation is blind adherence to initial ideas, limiting creative output."
+
+**Fixation occurs because**:
+- Knowledge is organized around category prototypes
+- Prototypes require less cognitive effort to access
+- Initial examples anchor subsequent ideation
+
+**Our application**: Expert perspectives act as "defixation triggers" by activating non-prototype knowledge.
+
+```
+Without Intervention:
+  Prototype: "standard four-legged chair"
+  Fixation: Variations on four-legged design
+
+With Expert Intervention:
+  Archaeologist: "Ancient people sat differently..."
+  Dance Therapist: "Seating affects movement expression..."
+
+  Fixation Broken: Entirely new seating paradigms explored
+```
+
+---
+
+## The Multi-Expert Aggregation Model
+
+### From "Wisdom of Crowds" to "Inner Crowd"
+
+Research shows that groups generate more diverse ideas because each member brings different perspectives. Our system simulates this "crowd wisdom" through multiple expert personas:
+
+```
+Traditional Crowd:
+  Person 1 → Ideas from perspective 1
+  Person 2 → Ideas from perspective 2
+  Person 3 → Ideas from perspective 3
+  Aggregation → Diverse idea pool
+
+Our "Inner Crowd":
+  LLM + Expert 1 Persona → Ideas from perspective 1
+  LLM + Expert 2 Persona → Ideas from perspective 2
+  LLM + Expert 3 Persona → Ideas from perspective 3
+  Aggregation → Diverse idea pool (simulated crowd)
+```
+
+### Why Multiple Experts Work
+
+1. **Coverage**: Different experts activate different semantic regions
+2. **Redundancy Reduction**: Deduplication removes overlapping ideas
+3. **Diversity by Design**: Expert selection can be optimized for maximum diversity
+4. **Diminishing Returns**: Beyond ~4-6 experts, marginal diversity gains decrease
+
+---
+
+## The Complete Pipeline
+
+### Stage 1: Attribute Decomposition
+
+**Purpose**: Structure the problem space before creative exploration
+
+```
+Input: "Innovative chair design"
+
+Output:
+  Categories: [Material, Function, Usage, User Group]
+
+  Material: [wood, metal, fabric, composite]
+  Function: [support, comfort, mobility, storage]
+  Usage: [office, home, outdoor, medical]
+  User Group: [children, elderly, professionals, athletes]
+```
+
+**Theoretical basis**: Structured decomposition prevents premature fixation on holistic solutions.
+
+### Stage 2: Expert Team Generation
+
+**Purpose**: Assemble diverse perspectives for maximum semantic coverage
+
+```
+Strategies:
+  1. LLM-Generated: Query-specific, prioritizes unconventional experts
+  2. Curated: Pre-selected high-quality occupations
+  3. External Sources: DBpedia, Wikidata for broad coverage
+
+Diversity Optimization:
+  - Domain spread (arts, science, trades, services)
+  - Expertise level variation
+  - Cultural/geographic diversity
+```
+
+### Stage 3: Expert Transformation
+
+**Purpose**: Apply each expert's perspective to each attribute
+
+```
+For each (attribute, expert) pair:
+
+  Input: "Chair comfort" + "Marine Biologist"
+
+  LLM Prompt:
+    "As a marine biologist, how might you reimagine
+     chair comfort using principles from your field?"
+
+  Output: Keywords + Descriptions
+    - "Pressure-distributed seating inspired by deep-sea fish"
+    - "Buoyancy-assisted support reducing pressure points"
+```
+
+### Stage 4: Deduplication
+
+**Purpose**: Ensure idea set is truly diverse, not just numerous
+
+```
+Methods:
+  1. Embedding-based: Fast cosine similarity clustering
+  2. LLM-based: Semantic pairwise comparison (more accurate)
+
+Output:
+  - Unique ideas grouped by similarity
+  - Representative idea selected from each cluster
+  - Diversity metrics computed
+```
+
+### Stage 5: Novelty Validation
+
+**Purpose**: Ground novelty in real-world uniqueness
+
+```
+Process:
+  - Search patent databases for similar concepts
+  - Compute overlap scores
+  - Flag ideas with high existing coverage
+
+Output:
+  - Novelty score per idea
+  - Patent overlap rate for idea set
+```
+
+---
+
+## Testable Hypotheses
+
+### H1: Semantic Diversity
+> Multi-expert generation produces higher semantic diversity than single-expert or direct generation.
+
+**Measurement**: Mean pairwise cosine distance between idea embeddings
+
+### H2: Novelty
+> Ideas from multi-expert generation have lower patent overlap than direct generation.
+
+**Measurement**: Percentage of ideas with existing patent matches
+
+### H3: Expert Count Effect
+> Semantic diversity increases with expert count, with diminishing returns beyond 4-6 experts.
+
+**Measurement**: Diversity vs. expert count curve
+
+### H4: Expert Source Effect
+> LLM-generated experts produce more unconventional ideas than curated/database experts.
+
+**Measurement**: Semantic distance from query centroid
+
+### H5: Fixation Breaking
+> Multi-expert approach produces more ideas outside the top-3 semantic clusters than direct generation.
+
+**Measurement**: Cluster distribution analysis
+
+---
+
+## Expected Contributions
+
+1. **Theoretical**: Formalization of "semantic gravity" as LLM creativity limitation
+2. **Methodological**: Expert-augmented ideation pipeline with evaluation framework
+3. **Empirical**: Quantitative evidence for multi-expert creativity enhancement
+4. **Practical**: Open-source system for innovation ideation
+
+---
+
+## Positioning Against Related Work
+
+| Approach | Limitation | Our Advantage |
+|----------|------------|---------------|
+| Direct LLM generation | Semantic gravity, fixation | Expert-forced semantic jumps |
+| Human brainstorming | Cognitive fatigue, social dynamics | Tireless LLM generation |
+| PersonaFlow (2024) | Research-focused, no attribute structure | Product innovation, structured decomposition |
+| PopBlends (2023) | Two-concept blending only | Multi-expert, multi-attribute blending |
+| BILLY (2025) | Vector fusion less interpretable | Sequential generation, explicit control |