Files
gbanyan 43c025e060 feat: Add experiments framework and novelty-driven agent loop
- Add complete experiments directory with pilot study infrastructure
  - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective)
  - Human assessment tool with React frontend and FastAPI backend
  - AUT flexibility analysis with jump signal detection
  - Result visualization and metrics computation

- Add novelty-driven agent loop module (experiments/novelty_loop/)
  - NoveltyDrivenTaskAgent with expert perspective perturbation
  - Three termination strategies: breakthrough, exhaust, coverage
  - Interactive CLI demo with colored output
  - Embedding-based novelty scoring

- Add DDC knowledge domain classification data (en/zh)
- Add CLAUDE.md project documentation
- Update research report with experiment findings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 10:16:21 +08:00

254 lines
10 KiB
Markdown

# Novelty-Driven LLM Agent Loop
An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
## Concept
Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
```
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
```
## Research Foundation
This work builds on established research:
- **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
- **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
- **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
- **Open-ended Learning**: Endless innovation through novelty pressure
The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
## Architecture
```
┌──────────────────────────────────────────────────────────────────┐
│ Novelty-Driven Task Generation Loop │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │ Seed │ "Design a better bicycle" │
│ │ Problem │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ WHILE novelty < threshold AND iterations < max: │ │
│ │ │ │
│ │ 1. Sample random expert (curated occupations) │ │
│ │ e.g., "marine biologist", "choreographer" │ │
│ │ │ │
│ │ 2. Generate task from expert perspective │ │
│ │ "What task would a {expert} assign to improve │ │
│ │ {seed_problem}?" │ │
│ │ │ │
│ │ 3. Embed task, compute novelty vs. centroid │ │
│ │ │ │
│ │ 4. If novelty > threshold → STOP (breakthrough!) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Output: │ Novel task that "jumped out" of typical space │
│ │ Task │ + trajectory of exploration │
│ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
```
## Installation
The module uses the existing project infrastructure. Ensure you have:
1. **Ollama** running with the required models:
```bash
ollama pull qwen3:8b
ollama pull qwen3-embedding:4b
```
2. **Python dependencies** (from project root):
```bash
cd backend
source venv/bin/activate
pip install httpx numpy
```
## Quick Start
### Basic Usage
```bash
cd experiments/novelty_loop
python demo.py "Improve urban transportation"
```
### Example Output
```
Iteration 1
Expert: Architect (Architecture & Design)
Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
Novelty: [████████░░░░░░░░░░░░] 0.1234
Iteration 2
Expert: Chef (Culinary)
Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
Novelty: [███████████░░░░░░░░░] 0.1823
Iteration 3
Expert: Marine Biologist (Science)
Task: Study fish schooling behavior to develop organic traffic flow algorithms
Novelty: [██████████████░░░░░░] 0.3521
Iteration 4
Expert: Choreographer (Performing Arts)
Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
Novelty: [████████████████████] 0.5234
★ BREAKTHROUGH! ★
```
## Termination Strategies
### 1. Seek Breakthrough (Default)
Stop when novelty exceeds threshold. Finds the first truly novel task.
```bash
python demo.py "Your problem" --strategy breakthrough --threshold 0.4
```
### 2. Exhaust Frontier
Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
```bash
python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
```
### 3. Coverage Target
Continue until N distinct conceptual clusters are covered. Ensures diversity.
```bash
python demo.py "Your problem" --strategy coverage --clusters 5
```
## API Usage
```python
import asyncio
from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
async def main():
agent = NoveltyDrivenTaskAgent(
novelty_threshold=0.4,
max_iterations=20,
language="en"
)
result = await agent.run("Design a better bicycle")
print(f"Found breakthrough: {result.breakthrough_task.task}")
print(f"Novelty score: {result.breakthrough_task.novelty_score}")
print(f"From expert: {result.breakthrough_task.expert}")
await agent.close()
asyncio.run(main())
```
## Novelty Metrics
The `novelty_metrics.py` module provides:
- **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
- **Min Distance**: Distance to nearest neighbor (detect duplicates)
- **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
- **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
```python
from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
metrics = NoveltyMetrics(similarity_threshold=0.7)
# Add embeddings one by one
for embedding in embeddings:
novelty = metrics.compute_novelty(embedding)
metrics.add_embedding(embedding, novelty)
print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
# Get trajectory stats
print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
print(f"Max novelty: {metrics.trajectory.max_novelty}")
print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
```
## CLI Options
```
positional arguments:
seed_problem The seed problem or challenge to explore
options:
--strategy {breakthrough,exhaust,coverage}
Termination strategy (default: breakthrough)
--threshold, -t Novelty threshold for breakthrough (default: 0.4)
--max-iter, -m Maximum iterations (default: 20)
--language, -l {en,zh}
Language for prompts and experts (default: en)
--model LLM model for task generation (default: qwen3:8b)
--embedding-model Embedding model (default: qwen3-embedding:4b)
--temperature LLM temperature (default: 0.7)
--output, -o Save results to JSON file
--quiet, -q Suppress iteration output
--verbose, -v Enable verbose logging
```
## File Structure
```
experiments/novelty_loop/
├── README.md # This file
├── agent.py # Core NoveltyDrivenTaskAgent and variants
├── novelty_metrics.py # Novelty computation utilities
└── demo.py # Interactive CLI demo
```
## Design Decisions
| Question | Decision | Rationale |
|----------|----------|-----------|
| Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
| Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
| Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
| Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
## Connection to Main Project
This module integrates with the main novelty-seeking project:
- Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
- Uses the same **embedding model** (qwen3-embedding:4b)
- Builds on the **AUT flexibility analysis** metrics for novelty computation
- Can use **DDC domain data** for alternative perturbation strategies
## Future Work
1. **Hybrid Perturbation**: Combine expert + domain perspectives
2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
3. **Semantic Steering**: Guide generation away from centroid direction
4. **Multi-Agent Exploration**: Parallel agents with different strategies
5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
## References
- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs