- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
254 lines
10 KiB
Markdown
254 lines
10 KiB
Markdown
# Novelty-Driven LLM Agent Loop
|
|
|
|
An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
|
|
|
|
## Concept
|
|
|
|
Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
|
|
|
|
This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
|
|
|
|
```
|
|
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
|
|
```
|
|
|
|
## Research Foundation
|
|
|
|
This work builds on established research:
|
|
|
|
- **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
|
|
- **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
|
|
- **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
|
|
- **Open-ended Learning**: Endless innovation through novelty pressure
|
|
|
|
The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Novelty-Driven Task Generation Loop │
|
|
├──────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌──────────┐ │
|
|
│ │ Seed │ "Design a better bicycle" │
|
|
│ │ Problem │ │
|
|
│ └────┬─────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ WHILE novelty < threshold AND iterations < max: │ │
|
|
│ │ │ │
|
|
│ │ 1. Sample random expert (curated occupations) │ │
|
|
│ │ e.g., "marine biologist", "choreographer" │ │
|
|
│ │ │ │
|
|
│ │ 2. Generate task from expert perspective │ │
|
|
│ │ "What task would a {expert} assign to improve │ │
|
|
│ │ {seed_problem}?" │ │
|
|
│ │ │ │
|
|
│ │ 3. Embed task, compute novelty vs. centroid │ │
|
|
│ │ │ │
|
|
│ │ 4. If novelty > threshold → STOP (breakthrough!) │ │
|
|
│ │ │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌──────────┐ │
|
|
│ │ Output: │ Novel task that "jumped out" of typical space │
|
|
│ │ Task │ + trajectory of exploration │
|
|
│ └──────────┘ │
|
|
│ │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Installation
|
|
|
|
The module uses the existing project infrastructure. Ensure you have:
|
|
|
|
1. **Ollama** running with the required models:
|
|
```bash
|
|
ollama pull qwen3:8b
|
|
ollama pull qwen3-embedding:4b
|
|
```
|
|
|
|
2. **Python dependencies** (from project root):
|
|
```bash
|
|
cd backend
|
|
source venv/bin/activate
|
|
pip install httpx numpy
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
cd experiments/novelty_loop
|
|
python demo.py "Improve urban transportation"
|
|
```
|
|
|
|
### Example Output
|
|
|
|
```
|
|
Iteration 1
|
|
Expert: Architect (Architecture & Design)
|
|
Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
|
|
Novelty: [████████░░░░░░░░░░░░] 0.1234
|
|
|
|
Iteration 2
|
|
Expert: Chef (Culinary)
|
|
Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
|
|
Novelty: [███████████░░░░░░░░░] 0.1823
|
|
|
|
Iteration 3
|
|
Expert: Marine Biologist (Science)
|
|
Task: Study fish schooling behavior to develop organic traffic flow algorithms
|
|
Novelty: [██████████████░░░░░░] 0.3521
|
|
|
|
Iteration 4
|
|
Expert: Choreographer (Performing Arts)
|
|
Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
|
|
Novelty: [████████████████████] 0.5234
|
|
★ BREAKTHROUGH! ★
|
|
```
|
|
|
|
## Termination Strategies
|
|
|
|
### 1. Seek Breakthrough (Default)
|
|
|
|
Stop when novelty exceeds threshold. Finds the first truly novel task.
|
|
|
|
```bash
|
|
python demo.py "Your problem" --strategy breakthrough --threshold 0.4
|
|
```
|
|
|
|
### 2. Exhaust Frontier
|
|
|
|
Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
|
|
|
|
```bash
|
|
python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
|
|
```
|
|
|
|
### 3. Coverage Target
|
|
|
|
Continue until N distinct conceptual clusters are covered. Ensures diversity.
|
|
|
|
```bash
|
|
python demo.py "Your problem" --strategy coverage --clusters 5
|
|
```
|
|
|
|
## API Usage
|
|
|
|
```python
|
|
import asyncio
|
|
from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
|
|
|
|
async def main():
|
|
agent = NoveltyDrivenTaskAgent(
|
|
novelty_threshold=0.4,
|
|
max_iterations=20,
|
|
language="en"
|
|
)
|
|
|
|
result = await agent.run("Design a better bicycle")
|
|
|
|
print(f"Found breakthrough: {result.breakthrough_task.task}")
|
|
print(f"Novelty score: {result.breakthrough_task.novelty_score}")
|
|
print(f"From expert: {result.breakthrough_task.expert}")
|
|
|
|
await agent.close()
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
## Novelty Metrics
|
|
|
|
The `novelty_metrics.py` module provides:
|
|
|
|
- **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
|
|
- **Min Distance**: Distance to nearest neighbor (detect duplicates)
|
|
- **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
|
|
- **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
|
|
|
|
```python
|
|
from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
|
|
|
|
metrics = NoveltyMetrics(similarity_threshold=0.7)
|
|
|
|
# Add embeddings one by one
|
|
for embedding in embeddings:
|
|
novelty = metrics.compute_novelty(embedding)
|
|
metrics.add_embedding(embedding, novelty)
|
|
print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
|
|
|
|
# Get trajectory stats
|
|
print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
|
|
print(f"Max novelty: {metrics.trajectory.max_novelty}")
|
|
print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
|
|
```
|
|
|
|
## CLI Options
|
|
|
|
```
|
|
positional arguments:
|
|
seed_problem The seed problem or challenge to explore
|
|
|
|
options:
|
|
--strategy {breakthrough,exhaust,coverage}
|
|
Termination strategy (default: breakthrough)
|
|
--threshold, -t Novelty threshold for breakthrough (default: 0.4)
|
|
--max-iter, -m Maximum iterations (default: 20)
|
|
--language, -l {en,zh}
|
|
Language for prompts and experts (default: en)
|
|
--model LLM model for task generation (default: qwen3:8b)
|
|
--embedding-model Embedding model (default: qwen3-embedding:4b)
|
|
--temperature LLM temperature (default: 0.7)
|
|
--output, -o Save results to JSON file
|
|
--quiet, -q Suppress iteration output
|
|
--verbose, -v Enable verbose logging
|
|
```
|
|
|
|
## File Structure
|
|
|
|
```
|
|
experiments/novelty_loop/
|
|
├── README.md # This file
|
|
├── agent.py # Core NoveltyDrivenTaskAgent and variants
|
|
├── novelty_metrics.py # Novelty computation utilities
|
|
└── demo.py # Interactive CLI demo
|
|
```
|
|
|
|
## Design Decisions
|
|
|
|
| Question | Decision | Rationale |
|
|
|----------|----------|-----------|
|
|
| Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
|
|
| Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
|
|
| Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
|
|
| Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
|
|
|
|
## Connection to Main Project
|
|
|
|
This module integrates with the main novelty-seeking project:
|
|
|
|
- Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
|
|
- Uses the same **embedding model** (qwen3-embedding:4b)
|
|
- Builds on the **AUT flexibility analysis** metrics for novelty computation
|
|
- Can use **DDC domain data** for alternative perturbation strategies
|
|
|
|
## Future Work
|
|
|
|
1. **Hybrid Perturbation**: Combine expert + domain perspectives
|
|
2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
|
|
3. **Semantic Steering**: Guide generation away from centroid direction
|
|
4. **Multi-Agent Exploration**: Parallel agents with different strategies
|
|
5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
|
|
|
|
## References
|
|
|
|
- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
|
|
- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
|
|
- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
|
|
- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs
|