novelty-seeking/research/theoretical_framework.md

# Theoretical Framework: Expert-Augmented LLM Ideation

## The Core Problem: LLM "Semantic Gravity"

### What is Semantic Gravity?

When LLMs generate creative ideas directly, they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs that cluster around high-probability regions of their training distribution.

```
Direct LLM Generation:
  Input: "Generate creative ideas for a chair"

  LLM Process:
    P(idea | "chair") → samples from training distribution

  Result:
    - "Ergonomic office chair" (high probability)
    - "Foldable portable chair" (high probability)
    - "Eco-friendly bamboo chair" (moderate probability)

  Problem:
    → Ideas cluster in predictable semantic neighborhoods
    → Limited exploration of distant conceptual spaces
    → "Creative" outputs are interpolations, not extrapolations
```

### Why Does This Happen?

1. **Statistical Pattern Learning**: LLMs learn co-occurrence patterns from training data
2. **Mode Collapse**: When asked to be "creative," LLMs sample from the distribution of "creative ideas" they've seen
3. **Relevance Trap**: Strong associations dominate weak ones (chair→furniture >> chair→marine biology)
4. **Prototype Bias**: Outputs gravitate toward category prototypes, not edge cases

---

## The Solution: Expert Perspective Transformation

### Theoretical Basis

Our approach draws from three key theoretical foundations:

#### 1. Semantic Distance Theory (Mednick, 1962)

> "Creative thinking involves connecting weakly related, remote concepts in semantic memory."

**Key insight**: Creativity correlates with semantic distance. The farther the conceptual "jump," the more creative the result.

**Our application**: Expert perspectives force semantic jumps that LLMs wouldn't naturally make.

```
Without Expert:
  "Chair" → furniture, sitting, comfort, design
  Semantic distance: SHORT

With Marine Biologist Expert:
  "Chair" → underwater pressure, coral structure, buoyancy, bioluminescence
  Semantic distance: LONG

Result: Novel ideas like "pressure-adaptive seating" or "coral-inspired structural support"
```

#### The Semantic Distance Tradeoff

However, semantic distance is not always beneficial. There exists a tradeoff:

```
Semantic Distance Spectrum:

Too Close                    Optimal Zone                    Too Far
(Semantic Gravity)           (Creative)                      (Hallucination)
├────────────────────────────┼────────────────────────────────┼────────────────────────────┤
"Ergonomic office chair"     "Pressure-adaptive seating"     "Quantum-entangled
                             "Coral-inspired support"         chair consciousness"

High usefulness              High novelty + useful            High novelty, nonsense
Low novelty                                                   Low usefulness
```

**Our Design Choice**: Context-free keyword generation (Stage 1 excludes original query) intentionally pushes toward the "far" end to maximize novelty. Stage 2 re-introduces query context to ground the ideas.

**Research Question**: What is the hallucination/nonsense rate of this approach, and is the tradeoff worthwhile?

#### 2. Conceptual Blending Theory (Fauconnier & Turner, 2002)

> "Creative products emerge from blending elements of two input spaces into a novel integrated space."

**The blending process**:
1. Input Space 1: The target concept (e.g., "chair")
2. Input Space 2: The expert's domain knowledge (e.g., marine biology)
3. Generic Space: Abstract structure shared by both
4. Blended Space: Novel integration of elements from both inputs

**Our application**: Each expert provides a distinct input space for systematic blending.

```
┌─────────────────┐     ┌─────────────────┐
│   Input 1       │     │   Input 2       │
│   "Chair"       │     │ Marine Biology  │
│   - support     │     │ - pressure      │
│   - sitting     │     │ - buoyancy      │
│   - comfort     │     │ - adaptation    │
└────────┬────────┘     └────────┬────────┘
         │                       │
         └───────────┬───────────┘
                     ▼
         ┌─────────────────────┐
         │   Blended Space     │
         │   Novel Chair Ideas │
         │   - pressure-adapt  │
         │   - buoyant support │
         │   - bio-adaptive    │
         └─────────────────────┘
```

#### 3. Design Fixation Breaking (Jansson & Smith, 1991)

> "Design fixation is blind adherence to initial ideas, limiting creative output."

**Fixation occurs because**:
- Knowledge is organized around category prototypes
- Prototypes require less cognitive effort to access
- Initial examples anchor subsequent ideation

**Our application**: Expert perspectives act as "defixation triggers" by activating non-prototype knowledge.

```
Without Intervention:
  Prototype: "standard four-legged chair"
  Fixation: Variations on four-legged design

With Expert Intervention:
  Archaeologist: "Ancient people sat differently..."
  Dance Therapist: "Seating affects movement expression..."

  Fixation Broken: Entirely new seating paradigms explored
```

---

## The Multi-Expert Aggregation Model

### From "Wisdom of Crowds" to "Inner Crowd"

Research shows that groups generate more diverse ideas because each member brings different perspectives. Our system simulates this "crowd wisdom" through multiple expert personas:

```
Traditional Crowd:
  Person 1 → Ideas from perspective 1
  Person 2 → Ideas from perspective 2
  Person 3 → Ideas from perspective 3
  Aggregation → Diverse idea pool

Our "Inner Crowd":
  LLM + Expert 1 Persona → Ideas from perspective 1
  LLM + Expert 2 Persona → Ideas from perspective 2
  LLM + Expert 3 Persona → Ideas from perspective 3
  Aggregation → Diverse idea pool (simulated crowd)
```

### Why This Approach Works: Two Complementary Mechanisms

**Factor 1: Attribute Decomposition**
- Structures the problem space before creative exploration
- Prevents premature fixation on holistic solutions
- Ensures coverage across different aspects of the target concept

**Factor 2: Expert Perspectives**
- Different experts activate different semantic regions
- Forces semantic jumps that LLMs wouldn't naturally make
- Each expert provides a distinct input space for conceptual blending

**Combined Effect (Interaction)**
- Experts are more effective when given structured attributes to transform
- Attributes without expert perspectives still generate predictable ideas
- The combination creates systematic exploration of remote conceptual spaces

---

## The Complete Pipeline

### Stage 1: Attribute Decomposition

**Purpose**: Structure the problem space before creative exploration

```
Input: "Innovative chair design"

Output:
  Categories: [Material, Function, Usage, User Group]

  Material: [wood, metal, fabric, composite]
  Function: [support, comfort, mobility, storage]
  Usage: [office, home, outdoor, medical]
  User Group: [children, elderly, professionals, athletes]
```

**Theoretical basis**: Structured decomposition prevents premature fixation on holistic solutions.

### Stage 2: Expert Team Generation

**Purpose**: Assemble diverse perspectives for maximum semantic coverage

```
Strategies:
  1. LLM-Generated: Query-specific, prioritizes unconventional experts
  2. Curated: Pre-selected high-quality occupations
  3. External Sources: DBpedia, Wikidata for broad coverage

Diversity Optimization:
  - Domain spread (arts, science, trades, services)
  - Expertise level variation
  - Cultural/geographic diversity
```

### Stage 3: Expert Transformation

**Purpose**: Apply each expert's perspective to each attribute

```
For each (attribute, expert) pair:

  Input: "Chair comfort" + "Marine Biologist"

  LLM Prompt:
    "As a marine biologist, how might you reimagine
     chair comfort using principles from your field?"

  Output: Keywords + Descriptions
    - "Pressure-distributed seating inspired by deep-sea fish"
    - "Buoyancy-assisted support reducing pressure points"
```

### Stage 4: Deduplication

**Purpose**: Ensure idea set is truly diverse, not just numerous

```
Methods:
  1. Embedding-based: Fast cosine similarity clustering
  2. LLM-based: Semantic pairwise comparison (more accurate)

Output:
  - Unique ideas grouped by similarity
  - Representative idea selected from each cluster
  - Diversity metrics computed
```

### Stage 5: Novelty Validation

**Purpose**: Ground novelty in real-world uniqueness

```
Process:
  - Search patent databases for similar concepts
  - Compute overlap scores
  - Flag ideas with high existing coverage

Output:
  - Novelty score per idea
  - Patent overlap rate for idea set
```

---

## Testable Hypotheses (2×2 Factorial Design)

Our experimental design manipulates two independent factors:
1. **Attribute Decomposition**: With / Without
2. **Expert Perspectives**: With / Without

### H1: Main Effect of Attribute Decomposition
> Conditions with attribute decomposition produce higher semantic diversity than those without.

**Prediction**: (Attribute-Only + Full Pipeline) > (Direct + Expert-Only)
**Measurement**: Mean pairwise cosine distance between idea embeddings

### H2: Main Effect of Expert Perspectives
> Conditions with expert perspectives produce higher semantic diversity than those without.

**Prediction**: (Expert-Only + Full Pipeline) > (Direct + Attribute-Only)
**Measurement**: Mean pairwise cosine distance between idea embeddings

### H3: Interaction Effect
> The combination of attributes and experts produces super-additive benefits.

**Prediction**: Full Pipeline > (Attribute-Only + Expert-Only - Direct)
**Rationale**: Experts are more effective when given structured problem decomposition to work with.
**Measurement**: Interaction term in 2×2 ANOVA

### H4: Novelty
> The Full Pipeline produces ideas with lowest patent overlap.

**Prediction**: Full Pipeline has highest novelty rate across all conditions
**Measurement**: Percentage of ideas without existing patent matches

### H5: Expert vs Random Control
> Expert perspectives outperform random word perspectives.

**Prediction**: Expert-Only > Random-Perspective
**Rationale**: Validates that domain knowledge (not just any perspective shift) drives improvement
**Measurement**: Semantic diversity and human creativity ratings

---

## Expected Contributions

1. **Theoretical**: Formalization of "semantic gravity" as LLM creativity limitation
2. **Methodological**: Expert-augmented ideation pipeline with evaluation framework
3. **Empirical**: Quantitative evidence for multi-expert creativity enhancement
4. **Practical**: Open-source system for innovation ideation

---

## Positioning Against Related Work

### Key Differentiator: Attribute Decomposition

```
PersonaFlow (2024):        Query → Experts → Ideas
Our Approach:              Query → Attributes → (Attributes × Experts) → Ideas
```

**Why this matters**: Attribute decomposition provides **scaffolding** that makes expert perspectives more effective. An expert seeing "chair materials" generates more focused ideas than an expert seeing just "chair."

### Comparison Table

| Approach | Limitation | Our Advantage |
|----------|------------|---------------|
| Direct LLM generation | Semantic gravity, fixation | Two-factor enhancement (attributes + experts) |
| **PersonaFlow (2024)** | **No problem structure, experts see whole query** | **Attribute decomposition amplifies expert effect** |
| PopBlends (2023) | Two-concept blending only | Systematic attribute × expert exploration |
| BILLY (2025) | Cannot isolate what helps | 2×2 factorial design isolates contributions |
| Persona prompting alone | Random coverage | Systematic coverage via attribute × expert matrix |

### What We Can Answer That PersonaFlow Cannot

1. **Does problem structure alone help?** (Attribute-Only vs Direct)
2. **Do experts help beyond structure?** (Full Pipeline vs Attribute-Only)
3. **Is there an interaction effect?** (Full Pipeline > Attribute-Only + Expert-Only - Direct)

PersonaFlow showed experts help, but never tested whether **structuring the problem first** makes experts more effective.