feat: Add experiments framework and novelty-driven agent loop
- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
253
experiments/novelty_loop/README.md
Normal file
253
experiments/novelty_loop/README.md
Normal file
@@ -0,0 +1,253 @@
|
||||
# Novelty-Driven LLM Agent Loop
|
||||
|
||||
An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
|
||||
|
||||
## Concept
|
||||
|
||||
Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
|
||||
|
||||
This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
|
||||
|
||||
```
|
||||
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
|
||||
```
|
||||
|
||||
## Research Foundation
|
||||
|
||||
This work builds on established research:
|
||||
|
||||
- **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
|
||||
- **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
|
||||
- **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
|
||||
- **Open-ended Learning**: Endless innovation through novelty pressure
|
||||
|
||||
The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ Novelty-Driven Task Generation Loop │
|
||||
├──────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ Seed │ "Design a better bicycle" │
|
||||
│ │ Problem │ │
|
||||
│ └────┬─────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ WHILE novelty < threshold AND iterations < max: │ │
|
||||
│ │ │ │
|
||||
│ │ 1. Sample random expert (curated occupations) │ │
|
||||
│ │ e.g., "marine biologist", "choreographer" │ │
|
||||
│ │ │ │
|
||||
│ │ 2. Generate task from expert perspective │ │
|
||||
│ │ "What task would a {expert} assign to improve │ │
|
||||
│ │ {seed_problem}?" │ │
|
||||
│ │ │ │
|
||||
│ │ 3. Embed task, compute novelty vs. centroid │ │
|
||||
│ │ │ │
|
||||
│ │ 4. If novelty > threshold → STOP (breakthrough!) │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ Output: │ Novel task that "jumped out" of typical space │
|
||||
│ │ Task │ + trajectory of exploration │
|
||||
│ └──────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
The module uses the existing project infrastructure. Ensure you have:
|
||||
|
||||
1. **Ollama** running with the required models:
|
||||
```bash
|
||||
ollama pull qwen3:8b
|
||||
ollama pull qwen3-embedding:4b
|
||||
```
|
||||
|
||||
2. **Python dependencies** (from project root):
|
||||
```bash
|
||||
cd backend
|
||||
source venv/bin/activate
|
||||
pip install httpx numpy
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
cd experiments/novelty_loop
|
||||
python demo.py "Improve urban transportation"
|
||||
```
|
||||
|
||||
### Example Output
|
||||
|
||||
```
|
||||
Iteration 1
|
||||
Expert: Architect (Architecture & Design)
|
||||
Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
|
||||
Novelty: [████████░░░░░░░░░░░░] 0.1234
|
||||
|
||||
Iteration 2
|
||||
Expert: Chef (Culinary)
|
||||
Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
|
||||
Novelty: [███████████░░░░░░░░░] 0.1823
|
||||
|
||||
Iteration 3
|
||||
Expert: Marine Biologist (Science)
|
||||
Task: Study fish schooling behavior to develop organic traffic flow algorithms
|
||||
Novelty: [██████████████░░░░░░] 0.3521
|
||||
|
||||
Iteration 4
|
||||
Expert: Choreographer (Performing Arts)
|
||||
Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
|
||||
Novelty: [████████████████████] 0.5234
|
||||
★ BREAKTHROUGH! ★
|
||||
```
|
||||
|
||||
## Termination Strategies
|
||||
|
||||
### 1. Seek Breakthrough (Default)
|
||||
|
||||
Stop when novelty exceeds threshold. Finds the first truly novel task.
|
||||
|
||||
```bash
|
||||
python demo.py "Your problem" --strategy breakthrough --threshold 0.4
|
||||
```
|
||||
|
||||
### 2. Exhaust Frontier
|
||||
|
||||
Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
|
||||
|
||||
```bash
|
||||
python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
|
||||
```
|
||||
|
||||
### 3. Coverage Target
|
||||
|
||||
Continue until N distinct conceptual clusters are covered. Ensures diversity.
|
||||
|
||||
```bash
|
||||
python demo.py "Your problem" --strategy coverage --clusters 5
|
||||
```
|
||||
|
||||
## API Usage
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
|
||||
|
||||
async def main():
|
||||
agent = NoveltyDrivenTaskAgent(
|
||||
novelty_threshold=0.4,
|
||||
max_iterations=20,
|
||||
language="en"
|
||||
)
|
||||
|
||||
result = await agent.run("Design a better bicycle")
|
||||
|
||||
print(f"Found breakthrough: {result.breakthrough_task.task}")
|
||||
print(f"Novelty score: {result.breakthrough_task.novelty_score}")
|
||||
print(f"From expert: {result.breakthrough_task.expert}")
|
||||
|
||||
await agent.close()
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## Novelty Metrics
|
||||
|
||||
The `novelty_metrics.py` module provides:
|
||||
|
||||
- **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
|
||||
- **Min Distance**: Distance to nearest neighbor (detect duplicates)
|
||||
- **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
|
||||
- **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
|
||||
|
||||
```python
|
||||
from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
|
||||
|
||||
metrics = NoveltyMetrics(similarity_threshold=0.7)
|
||||
|
||||
# Add embeddings one by one
|
||||
for embedding in embeddings:
|
||||
novelty = metrics.compute_novelty(embedding)
|
||||
metrics.add_embedding(embedding, novelty)
|
||||
print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
|
||||
|
||||
# Get trajectory stats
|
||||
print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
|
||||
print(f"Max novelty: {metrics.trajectory.max_novelty}")
|
||||
print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
|
||||
```
|
||||
|
||||
## CLI Options
|
||||
|
||||
```
|
||||
positional arguments:
|
||||
seed_problem The seed problem or challenge to explore
|
||||
|
||||
options:
|
||||
--strategy {breakthrough,exhaust,coverage}
|
||||
Termination strategy (default: breakthrough)
|
||||
--threshold, -t Novelty threshold for breakthrough (default: 0.4)
|
||||
--max-iter, -m Maximum iterations (default: 20)
|
||||
--language, -l {en,zh}
|
||||
Language for prompts and experts (default: en)
|
||||
--model LLM model for task generation (default: qwen3:8b)
|
||||
--embedding-model Embedding model (default: qwen3-embedding:4b)
|
||||
--temperature LLM temperature (default: 0.7)
|
||||
--output, -o Save results to JSON file
|
||||
--quiet, -q Suppress iteration output
|
||||
--verbose, -v Enable verbose logging
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
experiments/novelty_loop/
|
||||
├── README.md # This file
|
||||
├── agent.py # Core NoveltyDrivenTaskAgent and variants
|
||||
├── novelty_metrics.py # Novelty computation utilities
|
||||
└── demo.py # Interactive CLI demo
|
||||
```
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| Question | Decision | Rationale |
|
||||
|----------|----------|-----------|
|
||||
| Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
|
||||
| Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
|
||||
| Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
|
||||
| Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
|
||||
|
||||
## Connection to Main Project
|
||||
|
||||
This module integrates with the main novelty-seeking project:
|
||||
|
||||
- Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
|
||||
- Uses the same **embedding model** (qwen3-embedding:4b)
|
||||
- Builds on the **AUT flexibility analysis** metrics for novelty computation
|
||||
- Can use **DDC domain data** for alternative perturbation strategies
|
||||
|
||||
## Future Work
|
||||
|
||||
1. **Hybrid Perturbation**: Combine expert + domain perspectives
|
||||
2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
|
||||
3. **Semantic Steering**: Guide generation away from centroid direction
|
||||
4. **Multi-Agent Exploration**: Parallel agents with different strategies
|
||||
5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
|
||||
|
||||
## References
|
||||
|
||||
- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
|
||||
- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
|
||||
- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
|
||||
- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs
|
||||
Reference in New Issue
Block a user