- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
10 KiB
Novelty-Driven LLM Agent Loop
An autonomous LLM agent that generates tasks in a while loop, using novelty assessment as the termination condition to help the agent "jump out" of its trained data distribution (semantic gravity).
Concept
Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
This module implements a novel approach: use novelty scores to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
Research Foundation
This work builds on established research:
- Novelty Search (Lehman & Stanley): Reward novelty, not objectives
- Curiosity-driven Exploration (Pathak et al.): Intrinsic motivation via prediction error
- Quality-Diversity (MAP-Elites): Maintain diverse high-quality solutions
- Open-ended Learning: Endless innovation through novelty pressure
The unique contribution is using novelty as a termination condition rather than just a reward signal.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ Novelty-Driven Task Generation Loop │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │ Seed │ "Design a better bicycle" │
│ │ Problem │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ WHILE novelty < threshold AND iterations < max: │ │
│ │ │ │
│ │ 1. Sample random expert (curated occupations) │ │
│ │ e.g., "marine biologist", "choreographer" │ │
│ │ │ │
│ │ 2. Generate task from expert perspective │ │
│ │ "What task would a {expert} assign to improve │ │
│ │ {seed_problem}?" │ │
│ │ │ │
│ │ 3. Embed task, compute novelty vs. centroid │ │
│ │ │ │
│ │ 4. If novelty > threshold → STOP (breakthrough!) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Output: │ Novel task that "jumped out" of typical space │
│ │ Task │ + trajectory of exploration │
│ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
Installation
The module uses the existing project infrastructure. Ensure you have:
-
Ollama running with the required models:
ollama pull qwen3:8b ollama pull qwen3-embedding:4b -
Python dependencies (from project root):
cd backend source venv/bin/activate pip install httpx numpy
Quick Start
Basic Usage
cd experiments/novelty_loop
python demo.py "Improve urban transportation"
Example Output
Iteration 1
Expert: Architect (Architecture & Design)
Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
Novelty: [████████░░░░░░░░░░░░] 0.1234
Iteration 2
Expert: Chef (Culinary)
Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
Novelty: [███████████░░░░░░░░░] 0.1823
Iteration 3
Expert: Marine Biologist (Science)
Task: Study fish schooling behavior to develop organic traffic flow algorithms
Novelty: [██████████████░░░░░░] 0.3521
Iteration 4
Expert: Choreographer (Performing Arts)
Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
Novelty: [████████████████████] 0.5234
★ BREAKTHROUGH! ★
Termination Strategies
1. Seek Breakthrough (Default)
Stop when novelty exceeds threshold. Finds the first truly novel task.
python demo.py "Your problem" --strategy breakthrough --threshold 0.4
2. Exhaust Frontier
Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
3. Coverage Target
Continue until N distinct conceptual clusters are covered. Ensures diversity.
python demo.py "Your problem" --strategy coverage --clusters 5
API Usage
import asyncio
from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
async def main():
agent = NoveltyDrivenTaskAgent(
novelty_threshold=0.4,
max_iterations=20,
language="en"
)
result = await agent.run("Design a better bicycle")
print(f"Found breakthrough: {result.breakthrough_task.task}")
print(f"Novelty score: {result.breakthrough_task.novelty_score}")
print(f"From expert: {result.breakthrough_task.expert}")
await agent.close()
asyncio.run(main())
Novelty Metrics
The novelty_metrics.py module provides:
- Centroid Distance: Primary novelty metric - how far from the average of all previous outputs
- Min Distance: Distance to nearest neighbor (detect duplicates)
- Jump Detection: Identifies significant semantic shifts between consecutive outputs
- Trajectory Tracking: Cumulative novelty, jump ratio, etc.
from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
metrics = NoveltyMetrics(similarity_threshold=0.7)
# Add embeddings one by one
for embedding in embeddings:
novelty = metrics.compute_novelty(embedding)
metrics.add_embedding(embedding, novelty)
print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
# Get trajectory stats
print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
print(f"Max novelty: {metrics.trajectory.max_novelty}")
print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
CLI Options
positional arguments:
seed_problem The seed problem or challenge to explore
options:
--strategy {breakthrough,exhaust,coverage}
Termination strategy (default: breakthrough)
--threshold, -t Novelty threshold for breakthrough (default: 0.4)
--max-iter, -m Maximum iterations (default: 20)
--language, -l {en,zh}
Language for prompts and experts (default: en)
--model LLM model for task generation (default: qwen3:8b)
--embedding-model Embedding model (default: qwen3-embedding:4b)
--temperature LLM temperature (default: 0.7)
--output, -o Save results to JSON file
--quiet, -q Suppress iteration output
--verbose, -v Enable verbose logging
File Structure
experiments/novelty_loop/
├── README.md # This file
├── agent.py # Core NoveltyDrivenTaskAgent and variants
├── novelty_metrics.py # Novelty computation utilities
└── demo.py # Interactive CLI demo
Design Decisions
| Question | Decision | Rationale |
|---|---|---|
| Output Type | Tasks | Self-generated sub-goals for autonomous problem decomposition |
| Termination | Seek Breakthrough | Stop when novelty exceeds threshold - find truly novel task |
| Perturbation | Expert Perspectives | Experts have task-oriented knowledge; more natural than abstract domains |
| Novelty Reference | Centroid | Dynamic, adapts as exploration progresses |
Connection to Main Project
This module integrates with the main novelty-seeking project:
- Uses the same curated occupation data (
backend/app/data/curated_occupations_*.json) - Uses the same embedding model (qwen3-embedding:4b)
- Builds on the AUT flexibility analysis metrics for novelty computation
- Can use DDC domain data for alternative perturbation strategies
Future Work
- Hybrid Perturbation: Combine expert + domain perspectives
- Contrastive Prompting: Explicitly ask for outputs unlike recent ones
- Semantic Steering: Guide generation away from centroid direction
- Multi-Agent Exploration: Parallel agents with different strategies
- Quality-Diversity Archive: Maintain diverse high-quality solutions
References
- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs