feat: Add experiments framework and novelty-driven agent loop

- Add complete experiments directory with pilot study infrastructure
  - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective)
  - Human assessment tool with React frontend and FastAPI backend
  - AUT flexibility analysis with jump signal detection
  - Result visualization and metrics computation

- Add novelty-driven agent loop module (experiments/novelty_loop/)
  - NoveltyDrivenTaskAgent with expert perspective perturbation
  - Three termination strategies: breakthrough, exhaust, coverage
  - Interactive CLI demo with colored output
  - Embedding-based novelty scoring

- Add DDC knowledge domain classification data (en/zh)
- Add CLAUDE.md project documentation
- Update research report with experiment findings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-20 10:16:21 +08:00
parent 26a56a2a07
commit 43c025e060
81 changed files with 18766 additions and 2 deletions

View File

@@ -0,0 +1,253 @@
# Novelty-Driven LLM Agent Loop
An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
## Concept
Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
```
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
```
## Research Foundation
This work builds on established research:
- **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
- **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
- **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
- **Open-ended Learning**: Endless innovation through novelty pressure
The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
## Architecture
```
┌──────────────────────────────────────────────────────────────────┐
│ Novelty-Driven Task Generation Loop │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │ Seed │ "Design a better bicycle" │
│ │ Problem │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ WHILE novelty < threshold AND iterations < max: │ │
│ │ │ │
│ │ 1. Sample random expert (curated occupations) │ │
│ │ e.g., "marine biologist", "choreographer" │ │
│ │ │ │
│ │ 2. Generate task from expert perspective │ │
│ │ "What task would a {expert} assign to improve │ │
│ │ {seed_problem}?" │ │
│ │ │ │
│ │ 3. Embed task, compute novelty vs. centroid │ │
│ │ │ │
│ │ 4. If novelty > threshold → STOP (breakthrough!) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Output: │ Novel task that "jumped out" of typical space │
│ │ Task │ + trajectory of exploration │
│ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
```
## Installation
The module uses the existing project infrastructure. Ensure you have:
1. **Ollama** running with the required models:
```bash
ollama pull qwen3:8b
ollama pull qwen3-embedding:4b
```
2. **Python dependencies** (from project root):
```bash
cd backend
source venv/bin/activate
pip install httpx numpy
```
## Quick Start
### Basic Usage
```bash
cd experiments/novelty_loop
python demo.py "Improve urban transportation"
```
### Example Output
```
Iteration 1
Expert: Architect (Architecture & Design)
Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
Novelty: [████████░░░░░░░░░░░░] 0.1234
Iteration 2
Expert: Chef (Culinary)
Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
Novelty: [███████████░░░░░░░░░] 0.1823
Iteration 3
Expert: Marine Biologist (Science)
Task: Study fish schooling behavior to develop organic traffic flow algorithms
Novelty: [██████████████░░░░░░] 0.3521
Iteration 4
Expert: Choreographer (Performing Arts)
Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
Novelty: [████████████████████] 0.5234
★ BREAKTHROUGH! ★
```
## Termination Strategies
### 1. Seek Breakthrough (Default)
Stop when novelty exceeds threshold. Finds the first truly novel task.
```bash
python demo.py "Your problem" --strategy breakthrough --threshold 0.4
```
### 2. Exhaust Frontier
Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
```bash
python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
```
### 3. Coverage Target
Continue until N distinct conceptual clusters are covered. Ensures diversity.
```bash
python demo.py "Your problem" --strategy coverage --clusters 5
```
## API Usage
```python
import asyncio
from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
async def main():
agent = NoveltyDrivenTaskAgent(
novelty_threshold=0.4,
max_iterations=20,
language="en"
)
result = await agent.run("Design a better bicycle")
print(f"Found breakthrough: {result.breakthrough_task.task}")
print(f"Novelty score: {result.breakthrough_task.novelty_score}")
print(f"From expert: {result.breakthrough_task.expert}")
await agent.close()
asyncio.run(main())
```
## Novelty Metrics
The `novelty_metrics.py` module provides:
- **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
- **Min Distance**: Distance to nearest neighbor (detect duplicates)
- **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
- **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
```python
from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
metrics = NoveltyMetrics(similarity_threshold=0.7)
# Add embeddings one by one
for embedding in embeddings:
novelty = metrics.compute_novelty(embedding)
metrics.add_embedding(embedding, novelty)
print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
# Get trajectory stats
print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
print(f"Max novelty: {metrics.trajectory.max_novelty}")
print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
```
## CLI Options
```
positional arguments:
seed_problem The seed problem or challenge to explore
options:
--strategy {breakthrough,exhaust,coverage}
Termination strategy (default: breakthrough)
--threshold, -t Novelty threshold for breakthrough (default: 0.4)
--max-iter, -m Maximum iterations (default: 20)
--language, -l {en,zh}
Language for prompts and experts (default: en)
--model LLM model for task generation (default: qwen3:8b)
--embedding-model Embedding model (default: qwen3-embedding:4b)
--temperature LLM temperature (default: 0.7)
--output, -o Save results to JSON file
--quiet, -q Suppress iteration output
--verbose, -v Enable verbose logging
```
## File Structure
```
experiments/novelty_loop/
├── README.md # This file
├── agent.py # Core NoveltyDrivenTaskAgent and variants
├── novelty_metrics.py # Novelty computation utilities
└── demo.py # Interactive CLI demo
```
## Design Decisions
| Question | Decision | Rationale |
|----------|----------|-----------|
| Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
| Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
| Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
| Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
## Connection to Main Project
This module integrates with the main novelty-seeking project:
- Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
- Uses the same **embedding model** (qwen3-embedding:4b)
- Builds on the **AUT flexibility analysis** metrics for novelty computation
- Can use **DDC domain data** for alternative perturbation strategies
## Future Work
1. **Hybrid Perturbation**: Combine expert + domain perspectives
2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
3. **Semantic Steering**: Guide generation away from centroid direction
4. **Multi-Agent Exploration**: Parallel agents with different strategies
5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
## References
- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs

View File

@@ -0,0 +1,42 @@
"""
Novelty-Driven LLM Agent Loop
An autonomous agent that generates tasks using novelty as the termination condition.
"""
from .agent import (
NoveltyDrivenTaskAgent,
ExhaustFrontierAgent,
CoverageTargetAgent,
GeneratedTask,
TaskGenerationResult,
ExpertProvider,
DomainProvider,
)
from .novelty_metrics import (
NoveltyMetrics,
NoveltyScore,
NoveltyTrajectory,
compute_batch_novelty,
find_most_novel,
)
__all__ = [
# Agents
"NoveltyDrivenTaskAgent",
"ExhaustFrontierAgent",
"CoverageTargetAgent",
# Data classes
"GeneratedTask",
"TaskGenerationResult",
"NoveltyScore",
"NoveltyTrajectory",
# Providers
"ExpertProvider",
"DomainProvider",
# Metrics
"NoveltyMetrics",
"compute_batch_novelty",
"find_most_novel",
]

View File

@@ -0,0 +1,725 @@
"""
Novelty-Driven Task Agent - An autonomous agent that generates tasks using novelty as termination condition.
This agent operates in a while loop, generating tasks from diverse expert perspectives,
and terminates when it finds a task that exceeds the novelty threshold (a "breakthrough").
The core innovation is using novelty assessment to help the agent "jump out" of its
trained data distribution (semantic gravity), finding truly novel ideas.
Architecture:
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
Termination Strategy: "Seek Breakthrough"
- Continue until novelty > threshold
- Find the first truly novel task and stop
Research Foundation:
- Novelty Search (Lehman & Stanley): Reward novelty, not objectives
- Curiosity-driven Exploration (Pathak et al.): Intrinsic motivation via prediction error
- Quality-Diversity (MAP-Elites): Maintain diverse high-quality solutions
"""
import asyncio
import json
import logging
import random
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Callable, List, Optional
import httpx
import numpy as np
from .novelty_metrics import NoveltyMetrics, NoveltyScore, NoveltyTrajectory
logger = logging.getLogger(__name__)
# ============================================================================
# Data Classes
# ============================================================================
@dataclass
class GeneratedTask:
"""A single generated task with metadata."""
task: str
expert: str
expert_domain: str
novelty_score: float
iteration: int
is_breakthrough: bool = False
embedding: Optional[np.ndarray] = None
@dataclass
class TaskGenerationResult:
"""Result of a complete novelty-driven task generation session."""
seed_problem: str
breakthrough_task: Optional[GeneratedTask] = None
trajectory: List[GeneratedTask] = field(default_factory=list)
total_iterations: int = 0
terminated_by: str = "unknown" # "breakthrough", "max_iterations", "error"
novelty_trajectory: Optional[NoveltyTrajectory] = None
start_time: Optional[str] = None
end_time: Optional[str] = None
config: dict = field(default_factory=dict)
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"seed_problem": self.seed_problem,
"breakthrough_task": {
"task": self.breakthrough_task.task,
"expert": self.breakthrough_task.expert,
"expert_domain": self.breakthrough_task.expert_domain,
"novelty_score": self.breakthrough_task.novelty_score,
"iteration": self.breakthrough_task.iteration
} if self.breakthrough_task else None,
"trajectory": [
{
"task": t.task,
"expert": t.expert,
"expert_domain": t.expert_domain,
"novelty_score": t.novelty_score,
"iteration": t.iteration,
"is_breakthrough": t.is_breakthrough
}
for t in self.trajectory
],
"total_iterations": self.total_iterations,
"terminated_by": self.terminated_by,
"novelty_stats": {
"mean_novelty": self.novelty_trajectory.mean_novelty if self.novelty_trajectory else 0,
"max_novelty": self.novelty_trajectory.max_novelty if self.novelty_trajectory else 0,
"jump_ratio": self.novelty_trajectory.jump_ratio if self.novelty_trajectory else 0,
"cumulative_novelty": self.novelty_trajectory.final_cumulative_novelty if self.novelty_trajectory else 0
},
"start_time": self.start_time,
"end_time": self.end_time,
"config": self.config
}
# ============================================================================
# Expert/Domain Providers
# ============================================================================
class ExpertProvider:
"""Provides random experts from curated occupation lists."""
def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
"""
Args:
data_dir: Path to data directory containing occupation JSON files
language: Language code ("en" or "zh")
"""
if data_dir is None:
# Default to backend data directory
data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
self.data_dir = data_dir
self.language = language
self._occupations: List[dict] = []
self._load_occupations()
def _load_occupations(self):
"""Load occupations from JSON file."""
file_path = self.data_dir / f"curated_occupations_{self.language}.json"
if not file_path.exists():
logger.warning(f"Occupation file not found: {file_path}")
# Fallback to some default experts
self._occupations = [
{"name": "Marine Biologist", "domain": "Science"},
{"name": "Choreographer", "domain": "Arts"},
{"name": "Urban Planner", "domain": "Architecture"},
{"name": "Chef", "domain": "Culinary"},
{"name": "Astronomer", "domain": "Science"},
]
return
try:
with open(file_path, "r", encoding="utf-8") as f:
data = json.load(f)
self._occupations = data.get("occupations", [])
logger.info(f"Loaded {len(self._occupations)} occupations from {file_path.name}")
except Exception as e:
logger.error(f"Error loading occupations: {e}")
self._occupations = []
def get_random_expert(self) -> dict:
"""Get a random expert with name and domain."""
if not self._occupations:
return {"name": "Expert", "domain": "General"}
return random.choice(self._occupations)
def get_random_experts(self, count: int) -> List[dict]:
"""Get multiple random experts without replacement."""
if len(self._occupations) <= count:
return self._occupations.copy()
return random.sample(self._occupations, count)
class DomainProvider:
"""Provides random knowledge domains from DDC classification."""
def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
if data_dir is None:
data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
self.data_dir = data_dir
self.language = language
self._domains: List[dict] = []
self._load_domains()
def _load_domains(self):
"""Load domains from JSON file."""
file_path = self.data_dir / f"ddc_domains_{self.language}.json"
if not file_path.exists():
logger.warning(f"Domain file not found: {file_path}")
self._domains = []
return
try:
with open(file_path, "r", encoding="utf-8") as f:
data = json.load(f)
self._domains = data.get("domains", [])
logger.info(f"Loaded {len(self._domains)} domains from {file_path.name}")
except Exception as e:
logger.error(f"Error loading domains: {e}")
def get_random_domain(self, level: Optional[str] = None) -> dict:
"""Get a random domain, optionally filtered by level."""
domains = self._domains
if level:
domains = [d for d in domains if d.get("level") == level]
if not domains:
return {"name": "General Knowledge", "code": "000"}
return random.choice(domains)
# ============================================================================
# Novelty-Driven Task Agent
# ============================================================================
class NoveltyDrivenTaskAgent:
"""
An autonomous agent that generates tasks using novelty as the termination condition.
The agent operates in a loop:
1. Sample a random expert perspective
2. Generate a task from that expert's viewpoint
3. Compute the task's novelty (distance from centroid of previous tasks)
4. If novelty > threshold → STOP (found breakthrough!)
5. Otherwise → Continue with next expert
Example:
agent = NoveltyDrivenTaskAgent(novelty_threshold=0.4)
result = await agent.run("Improve urban transportation")
# result.breakthrough_task contains the novel task found
# result.trajectory shows the exploration path
"""
def __init__(
self,
novelty_threshold: float = 0.4,
max_iterations: int = 20,
ollama_base_url: str = "http://localhost:11435",
llm_model: str = "qwen3:8b",
embedding_model: str = "qwen3-embedding:4b",
language: str = "en",
data_dir: Optional[Path] = None,
on_iteration: Optional[Callable[[GeneratedTask], None]] = None,
temperature: float = 0.7
):
"""
Args:
novelty_threshold: Novelty score threshold for breakthrough (0.0-1.0)
max_iterations: Maximum iterations before stopping
ollama_base_url: Ollama API endpoint
llm_model: Model for task generation
embedding_model: Model for embeddings
language: Language for prompts and experts ("en" or "zh")
data_dir: Path to data directory for expert/domain files
on_iteration: Callback function called after each iteration
temperature: LLM temperature for generation
"""
self.novelty_threshold = novelty_threshold
self.max_iterations = max_iterations
self.ollama_base_url = ollama_base_url
self.llm_model = llm_model
self.embedding_model = embedding_model
self.language = language
self.temperature = temperature
self.on_iteration = on_iteration
# Initialize providers
self.expert_provider = ExpertProvider(data_dir, language)
self.domain_provider = DomainProvider(data_dir, language)
# Initialize novelty metrics
self.novelty_metrics = NoveltyMetrics(
similarity_threshold=0.7,
jump_detection_enabled=True
)
# HTTP client
self._client: Optional[httpx.AsyncClient] = None
async def _get_client(self) -> httpx.AsyncClient:
"""Get or create HTTP client."""
if self._client is None:
self._client = httpx.AsyncClient(timeout=120.0)
return self._client
async def close(self):
"""Close HTTP client."""
if self._client is not None:
await self._client.aclose()
self._client = None
async def _generate_text(self, prompt: str) -> str:
"""Generate text using Ollama LLM."""
client = await self._get_client()
url = f"{self.ollama_base_url}/api/generate"
# Add /no_think prefix for qwen models to disable thinking
if self.llm_model.lower().startswith("qwen"):
prompt = f"/no_think\n{prompt}"
try:
response = await client.post(url, json={
"model": self.llm_model,
"prompt": prompt,
"stream": False,
"options": {
"temperature": self.temperature
}
})
response.raise_for_status()
result = response.json()
return result.get("response", "").strip()
except Exception as e:
logger.error(f"LLM generation error: {e}")
raise
async def _get_embedding(self, text: str) -> np.ndarray:
"""Get embedding vector for text."""
client = await self._get_client()
url = f"{self.ollama_base_url}/api/embed"
try:
response = await client.post(url, json={
"model": self.embedding_model,
"input": text
})
response.raise_for_status()
result = response.json()
return np.array(result["embeddings"][0])
except Exception as e:
logger.error(f"Embedding error: {e}")
raise
def _build_task_prompt(
self,
seed_problem: str,
expert: dict,
previous_tasks: List[str]
) -> str:
"""Build the prompt for task generation."""
expert_name = expert.get("name", "Expert")
expert_domain = expert.get("domain", "General")
# Build context from previous tasks (if any)
context = ""
if previous_tasks:
recent = previous_tasks[-3:] # Last 3 tasks
context = "\n\nPrevious suggestions (generate something DIFFERENT):\n"
for t in recent:
context += f"- {t}\n"
if self.language == "zh":
prompt = f"""你是一位 {expert_name}{expert_domain})。
给定问题:{seed_problem}
请从你的专业角度出发,提出一个独特的改进任务或探索方向。
这个任务应该结合你的专业知识,提供一个非传统但有价值的视角。
{context}
请直接给出任务描述,不要添加解释。任务应该具体、可行、且与众不同。
任务:"""
else:
prompt = f"""You are a {expert_name} ({expert_domain}).
Given problem: {seed_problem}
From your professional perspective, propose a unique task or exploration direction to improve or innovate on this problem.
The task should leverage your domain expertise to provide an unconventional but valuable angle.
{context}
Provide just the task description without explanation. The task should be specific, actionable, and distinctive.
Task:"""
return prompt
async def _generate_task(
self,
seed_problem: str,
expert: dict,
previous_tasks: List[str]
) -> str:
"""Generate a task from an expert's perspective."""
prompt = self._build_task_prompt(seed_problem, expert, previous_tasks)
task = await self._generate_text(prompt)
# Clean up the response
task = task.strip()
# Remove common prefixes
for prefix in ["Task:", "任务:", "Here's", "I suggest", "Based on"]:
if task.lower().startswith(prefix.lower()):
task = task[len(prefix):].strip()
return task
async def run(
self,
seed_problem: str,
used_experts: Optional[List[dict]] = None
) -> TaskGenerationResult:
"""
Run the novelty-driven task generation loop.
Args:
seed_problem: The initial problem/challenge to explore
used_experts: Optional list of experts to avoid (for multi-run scenarios)
Returns:
TaskGenerationResult with breakthrough task (if found) and full trajectory
"""
# Reset state
self.novelty_metrics.reset()
result = TaskGenerationResult(
seed_problem=seed_problem,
start_time=datetime.now(timezone.utc).isoformat(),
config={
"novelty_threshold": self.novelty_threshold,
"max_iterations": self.max_iterations,
"llm_model": self.llm_model,
"embedding_model": self.embedding_model,
"language": self.language
}
)
used_expert_names = set()
if used_experts:
used_expert_names = {e["name"] for e in used_experts}
previous_tasks: List[str] = []
logger.info(f"Starting novelty loop: '{seed_problem}' (threshold={self.novelty_threshold})")
try:
for iteration in range(self.max_iterations):
# 1. Sample a random expert (avoid duplicates)
attempts = 0
expert = self.expert_provider.get_random_expert()
while expert["name"] in used_expert_names and attempts < 10:
expert = self.expert_provider.get_random_expert()
attempts += 1
used_expert_names.add(expert["name"])
logger.info(f"Iteration {iteration + 1}: Expert = {expert['name']} ({expert['domain']})")
# 2. Generate task
task = await self._generate_task(seed_problem, expert, previous_tasks)
previous_tasks.append(task)
# 3. Get embedding
embedding = await self._get_embedding(task)
# 4. Compute novelty
novelty = self.novelty_metrics.compute_novelty(embedding)
self.novelty_metrics.add_embedding(embedding, novelty)
# 5. Create task record
generated_task = GeneratedTask(
task=task,
expert=expert["name"],
expert_domain=expert["domain"],
novelty_score=novelty.score,
iteration=iteration + 1,
is_breakthrough=novelty.score > self.novelty_threshold,
embedding=embedding
)
result.trajectory.append(generated_task)
logger.info(f" Task: {task[:80]}...")
logger.info(f" Novelty: {novelty.score:.4f} (threshold: {self.novelty_threshold})")
# Callback
if self.on_iteration:
self.on_iteration(generated_task)
# 6. Check for breakthrough
if novelty.score > self.novelty_threshold:
result.breakthrough_task = generated_task
result.terminated_by = "breakthrough"
result.total_iterations = iteration + 1
logger.info(f" BREAKTHROUGH! Stopping after {iteration + 1} iterations")
break
else:
# Max iterations reached without breakthrough
result.terminated_by = "max_iterations"
result.total_iterations = self.max_iterations
logger.info(f"Max iterations ({self.max_iterations}) reached without breakthrough")
# Find the most novel task as a fallback
if result.trajectory:
best_task = max(result.trajectory, key=lambda t: t.novelty_score)
best_task.is_breakthrough = True # Mark as best found
result.breakthrough_task = best_task
except Exception as e:
logger.error(f"Error during generation: {e}")
result.terminated_by = f"error: {str(e)}"
result.total_iterations = len(result.trajectory)
# Finalize
result.end_time = datetime.now(timezone.utc).isoformat()
result.novelty_trajectory = self.novelty_metrics.trajectory
return result
# ============================================================================
# Alternative Termination Strategies
# ============================================================================
class ExhaustFrontierAgent(NoveltyDrivenTaskAgent):
"""
Alternative strategy: Continue while novelty is high, stop when it drops.
This explores the "novelty frontier" more thoroughly, finding multiple novel
ideas before stopping when exploration becomes repetitive.
"""
def __init__(
self,
exhaustion_threshold: float = 0.15,
window_size: int = 3,
min_iterations: int = 5,
**kwargs
):
"""
Args:
exhaustion_threshold: Stop when recent average novelty drops below this
window_size: Number of recent iterations to average
min_iterations: Minimum iterations before checking exhaustion
**kwargs: Passed to parent class
"""
super().__init__(**kwargs)
self.exhaustion_threshold = exhaustion_threshold
self.window_size = window_size
self.min_iterations = min_iterations
async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
"""Override to use exhaustion-based termination."""
# Reset state
self.novelty_metrics.reset()
result = TaskGenerationResult(
seed_problem=seed_problem,
start_time=datetime.now(timezone.utc).isoformat(),
config={
"strategy": "exhaust_frontier",
"exhaustion_threshold": self.exhaustion_threshold,
"window_size": self.window_size,
"min_iterations": self.min_iterations,
"max_iterations": self.max_iterations,
"llm_model": self.llm_model
}
)
used_expert_names = set()
previous_tasks: List[str] = []
novelty_history: List[float] = []
try:
for iteration in range(self.max_iterations):
# Sample expert
expert = self.expert_provider.get_random_expert()
while expert["name"] in used_expert_names and len(used_expert_names) < 200:
expert = self.expert_provider.get_random_expert()
used_expert_names.add(expert["name"])
# Generate and evaluate
task = await self._generate_task(seed_problem, expert, previous_tasks)
previous_tasks.append(task)
embedding = await self._get_embedding(task)
novelty = self.novelty_metrics.compute_novelty(embedding)
self.novelty_metrics.add_embedding(embedding, novelty)
novelty_history.append(novelty.score)
generated_task = GeneratedTask(
task=task,
expert=expert["name"],
expert_domain=expert["domain"],
novelty_score=novelty.score,
iteration=iteration + 1
)
result.trajectory.append(generated_task)
if self.on_iteration:
self.on_iteration(generated_task)
# Check exhaustion condition
if iteration >= self.min_iterations:
recent_avg = np.mean(novelty_history[-self.window_size:])
if recent_avg < self.exhaustion_threshold:
result.terminated_by = f"exhaustion (avg={recent_avg:.3f})"
result.total_iterations = iteration + 1
break
else:
result.terminated_by = "max_iterations"
result.total_iterations = self.max_iterations
# Find all "novel" tasks
novel_tasks = [t for t in result.trajectory if t.novelty_score > self.exhaustion_threshold]
if novel_tasks:
result.breakthrough_task = max(novel_tasks, key=lambda t: t.novelty_score)
result.breakthrough_task.is_breakthrough = True
except Exception as e:
result.terminated_by = f"error: {str(e)}"
result.total_iterations = len(result.trajectory)
result.end_time = datetime.now(timezone.utc).isoformat()
result.novelty_trajectory = self.novelty_metrics.trajectory
return result
class CoverageTargetAgent(NoveltyDrivenTaskAgent):
"""
Alternative strategy: Continue until N distinct clusters are covered.
This ensures a diverse portfolio of ideas across different conceptual areas.
"""
def __init__(
self,
target_clusters: int = 5,
cluster_threshold: float = 0.7,
**kwargs
):
"""
Args:
target_clusters: Target number of distinct clusters to find
cluster_threshold: Similarity threshold for cluster membership
**kwargs: Passed to parent class
"""
super().__init__(**kwargs)
self.target_clusters = target_clusters
self.cluster_threshold = cluster_threshold
def _count_clusters(self, embeddings: List[np.ndarray]) -> int:
"""Count distinct clusters using greedy clustering."""
if not embeddings:
return 0
clusters = []
for emb in embeddings:
found_cluster = False
for cluster_centroid in clusters:
similarity = NoveltyMetrics.cosine_similarity(emb, cluster_centroid)
if similarity >= self.cluster_threshold:
found_cluster = True
break
if not found_cluster:
clusters.append(emb)
return len(clusters)
async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
"""Override to use coverage-based termination."""
self.novelty_metrics.reset()
result = TaskGenerationResult(
seed_problem=seed_problem,
start_time=datetime.now(timezone.utc).isoformat(),
config={
"strategy": "coverage_target",
"target_clusters": self.target_clusters,
"cluster_threshold": self.cluster_threshold,
"max_iterations": self.max_iterations
}
)
used_expert_names = set()
previous_tasks: List[str] = []
all_embeddings: List[np.ndarray] = []
try:
for iteration in range(self.max_iterations):
expert = self.expert_provider.get_random_expert()
while expert["name"] in used_expert_names and len(used_expert_names) < 200:
expert = self.expert_provider.get_random_expert()
used_expert_names.add(expert["name"])
task = await self._generate_task(seed_problem, expert, previous_tasks)
previous_tasks.append(task)
embedding = await self._get_embedding(task)
all_embeddings.append(embedding)
novelty = self.novelty_metrics.compute_novelty(embedding)
self.novelty_metrics.add_embedding(embedding, novelty)
generated_task = GeneratedTask(
task=task,
expert=expert["name"],
expert_domain=expert["domain"],
novelty_score=novelty.score,
iteration=iteration + 1
)
result.trajectory.append(generated_task)
if self.on_iteration:
self.on_iteration(generated_task)
# Check coverage
cluster_count = self._count_clusters(all_embeddings)
if cluster_count >= self.target_clusters:
result.terminated_by = f"coverage ({cluster_count} clusters)"
result.total_iterations = iteration + 1
break
else:
final_clusters = self._count_clusters(all_embeddings)
result.terminated_by = f"max_iterations ({final_clusters} clusters)"
result.total_iterations = self.max_iterations
# Find most novel task
if result.trajectory:
best_task = max(result.trajectory, key=lambda t: t.novelty_score)
best_task.is_breakthrough = True
result.breakthrough_task = best_task
except Exception as e:
result.terminated_by = f"error: {str(e)}"
result.total_iterations = len(result.trajectory)
result.end_time = datetime.now(timezone.utc).isoformat()
result.novelty_trajectory = self.novelty_metrics.trajectory
return result

313
experiments/novelty_loop/demo.py Executable file
View File

@@ -0,0 +1,313 @@
#!/usr/bin/env python3
"""
Novelty-Driven Task Generation Demo
Interactive CLI for exploring the novelty-driven task generation agent.
Examples:
# Basic usage with default settings
python demo.py "Improve urban transportation"
# Custom threshold and iterations
python demo.py "Design a better bicycle" --threshold 0.35 --max-iter 15
# Use Chinese language
python demo.py "改进城市交通" --language zh
# Use exhaustion strategy (explore until stuck)
python demo.py "Sustainable energy solutions" --strategy exhaust
# Use coverage strategy (find N distinct clusters)
python demo.py "Future of education" --strategy coverage --clusters 5
# Save results to file
python demo.py "Smart home innovations" --output results.json
# Verbose mode with detailed logging
python demo.py "Healthcare improvements" --verbose
"""
import argparse
import asyncio
import json
import logging
import sys
from datetime import datetime
from pathlib import Path
# Add parent directory to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from experiments.novelty_loop.agent import (
NoveltyDrivenTaskAgent,
ExhaustFrontierAgent,
CoverageTargetAgent,
GeneratedTask,
TaskGenerationResult
)
# ANSI color codes for terminal output
class Colors:
HEADER = '\033[95m'
BLUE = '\033[94m'
CYAN = '\033[96m'
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BOLD = '\033[1m'
UNDERLINE = '\033[4m'
END = '\033[0m'
def print_header(text: str):
"""Print a styled header."""
print(f"\n{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}")
print(f"{Colors.BOLD}{Colors.HEADER}{text.center(60)}{Colors.END}")
print(f"{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}\n")
def print_iteration(task: GeneratedTask):
"""Print iteration result with colors."""
status_color = Colors.GREEN if task.is_breakthrough else Colors.CYAN
print(f"\n{Colors.BOLD}Iteration {task.iteration}{Colors.END}")
print(f" {Colors.YELLOW}Expert:{Colors.END} {task.expert} ({task.expert_domain})")
print(f" {Colors.YELLOW}Task:{Colors.END} {task.task}")
novelty_bar = "" * int(task.novelty_score * 20) + "" * (20 - int(task.novelty_score * 20))
print(f" {Colors.YELLOW}Novelty:{Colors.END} [{novelty_bar}] {task.novelty_score:.4f}")
if task.is_breakthrough:
print(f" {Colors.GREEN}{Colors.BOLD}★ BREAKTHROUGH! ★{Colors.END}")
def print_result(result: TaskGenerationResult):
"""Print final result summary."""
print_header("RESULTS")
print(f"{Colors.BOLD}Seed Problem:{Colors.END} {result.seed_problem}")
print(f"{Colors.BOLD}Total Iterations:{Colors.END} {result.total_iterations}")
print(f"{Colors.BOLD}Terminated By:{Colors.END} {result.terminated_by}")
if result.novelty_trajectory:
print(f"\n{Colors.BOLD}Novelty Statistics:{Colors.END}")
print(f" Mean Novelty: {result.novelty_trajectory.mean_novelty:.4f}")
print(f" Max Novelty: {result.novelty_trajectory.max_novelty:.4f}")
print(f" Jump Ratio: {result.novelty_trajectory.jump_ratio:.2%}")
if result.breakthrough_task:
print(f"\n{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
print(f"{Colors.GREEN}{Colors.BOLD}BREAKTHROUGH TASK{Colors.END}")
print(f"{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
print(f"\n{Colors.BOLD}Expert:{Colors.END} {result.breakthrough_task.expert}")
print(f"{Colors.BOLD}Domain:{Colors.END} {result.breakthrough_task.expert_domain}")
print(f"{Colors.BOLD}Task:{Colors.END}")
print(f" {Colors.CYAN}{result.breakthrough_task.task}{Colors.END}")
print(f"\n{Colors.BOLD}Novelty Score:{Colors.END} {result.breakthrough_task.novelty_score:.4f}")
print(f"{Colors.BOLD}Found at Iteration:{Colors.END} {result.breakthrough_task.iteration}")
# Show trajectory summary
print(f"\n{Colors.BOLD}Exploration Trajectory:{Colors.END}")
for task in result.trajectory:
marker = "" if task.is_breakthrough else ""
novelty_indicator = "" * int(task.novelty_score * 10)
print(f" {marker} [{task.iteration:2d}] {task.expert:20s} | {novelty_indicator:10s} {task.novelty_score:.3f}")
def save_result(result: TaskGenerationResult, output_path: str):
"""Save result to JSON file."""
with open(output_path, "w", encoding="utf-8") as f:
json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
print(f"\n{Colors.GREEN}Results saved to: {output_path}{Colors.END}")
async def run_demo(args):
"""Run the novelty-driven task generation demo."""
print_header("NOVELTY-DRIVEN TASK GENERATION")
print(f"{Colors.BOLD}Configuration:{Colors.END}")
print(f" Seed Problem: {args.seed_problem}")
print(f" Strategy: {args.strategy}")
print(f" Novelty Threshold: {args.threshold}")
print(f" Max Iterations: {args.max_iter}")
print(f" Language: {args.language}")
print(f" LLM Model: {args.model}")
# Create appropriate agent based on strategy
common_kwargs = {
"max_iterations": args.max_iter,
"llm_model": args.model,
"embedding_model": args.embedding_model,
"language": args.language,
"temperature": args.temperature,
"on_iteration": print_iteration if not args.quiet else None
}
if args.strategy == "breakthrough":
agent = NoveltyDrivenTaskAgent(
novelty_threshold=args.threshold,
**common_kwargs
)
elif args.strategy == "exhaust":
agent = ExhaustFrontierAgent(
exhaustion_threshold=args.exhaust_threshold,
window_size=args.window_size,
min_iterations=args.min_iter,
**common_kwargs
)
elif args.strategy == "coverage":
agent = CoverageTargetAgent(
target_clusters=args.clusters,
cluster_threshold=args.cluster_threshold,
**common_kwargs
)
else:
print(f"{Colors.RED}Unknown strategy: {args.strategy}{Colors.END}")
return
print(f"\n{Colors.BOLD}Starting generation loop...{Colors.END}")
print("-" * 60)
try:
result = await agent.run(args.seed_problem)
print_result(result)
if args.output:
save_result(result, args.output)
except Exception as e:
print(f"\n{Colors.RED}Error: {e}{Colors.END}")
if args.verbose:
import traceback
traceback.print_exc()
finally:
await agent.close()
def main():
parser = argparse.ArgumentParser(
description="Novelty-Driven Task Generation Demo",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__
)
# Required argument
parser.add_argument(
"seed_problem",
help="The seed problem or challenge to explore"
)
# Strategy selection
parser.add_argument(
"--strategy", "-s",
choices=["breakthrough", "exhaust", "coverage"],
default="breakthrough",
help="Termination strategy (default: breakthrough)"
)
# Common options
parser.add_argument(
"--threshold", "-t",
type=float,
default=0.4,
help="Novelty threshold for breakthrough (default: 0.4)"
)
parser.add_argument(
"--max-iter", "-m",
type=int,
default=20,
help="Maximum iterations (default: 20)"
)
parser.add_argument(
"--language", "-l",
choices=["en", "zh"],
default="en",
help="Language for prompts and experts (default: en)"
)
# Model options
parser.add_argument(
"--model",
default="qwen3:8b",
help="LLM model for task generation (default: qwen3:8b)"
)
parser.add_argument(
"--embedding-model",
default="qwen3-embedding:4b",
help="Embedding model (default: qwen3-embedding:4b)"
)
parser.add_argument(
"--temperature",
type=float,
default=0.7,
help="LLM temperature (default: 0.7)"
)
# Exhaust strategy options
parser.add_argument(
"--exhaust-threshold",
type=float,
default=0.15,
help="Exhaustion threshold for 'exhaust' strategy (default: 0.15)"
)
parser.add_argument(
"--window-size",
type=int,
default=3,
help="Window size for exhaustion check (default: 3)"
)
parser.add_argument(
"--min-iter",
type=int,
default=5,
help="Minimum iterations before exhaustion check (default: 5)"
)
# Coverage strategy options
parser.add_argument(
"--clusters",
type=int,
default=5,
help="Target clusters for 'coverage' strategy (default: 5)"
)
parser.add_argument(
"--cluster-threshold",
type=float,
default=0.7,
help="Cluster similarity threshold (default: 0.7)"
)
# Output options
parser.add_argument(
"--output", "-o",
help="Save results to JSON file"
)
parser.add_argument(
"--quiet", "-q",
action="store_true",
help="Suppress iteration output"
)
parser.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable verbose logging"
)
args = parser.parse_args()
# Configure logging
if args.verbose:
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
else:
logging.basicConfig(level=logging.WARNING)
# Run the demo
asyncio.run(run_demo(args))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,269 @@
"""
Novelty Metrics Module - Compute novelty scores for generated outputs.
This module provides embedding-based novelty metrics adapted from the AUT flexibility
analysis framework for use in novelty-driven agent loops.
Key Metrics:
- Centroid Distance: Measures how far a new output is from the centroid of previous outputs
- Cumulative Novelty: Tracks novelty over the generation sequence
- Jump Detection: Identifies significant semantic shifts between consecutive outputs
"""
from dataclasses import dataclass, field
from typing import List, Optional
import numpy as np
@dataclass
class NoveltyScore:
"""Result of novelty computation for a single output."""
score: float # Main novelty score (0.0 = identical to centroid, 1.0 = maximally distant)
distance_from_centroid: float
min_distance_to_existing: float # Nearest neighbor distance
is_jump: bool # Whether this represents a significant semantic jump
jump_magnitude: Optional[float] = None # Similarity to previous output (if applicable)
@dataclass
class NoveltyTrajectory:
"""Tracks novelty scores over a generation sequence."""
scores: List[float] = field(default_factory=list)
cumulative_novelty: List[float] = field(default_factory=list)
jump_positions: List[int] = field(default_factory=list)
centroid_history: List[np.ndarray] = field(default_factory=list)
@property
def mean_novelty(self) -> float:
"""Average novelty across all outputs."""
return float(np.mean(self.scores)) if self.scores else 0.0
@property
def max_novelty(self) -> float:
"""Maximum novelty achieved."""
return float(max(self.scores)) if self.scores else 0.0
@property
def jump_ratio(self) -> float:
"""Proportion of transitions that were jumps."""
if len(self.scores) < 2:
return 0.0
return len(self.jump_positions) / (len(self.scores) - 1)
@property
def final_cumulative_novelty(self) -> float:
"""Total accumulated novelty."""
return self.cumulative_novelty[-1] if self.cumulative_novelty else 0.0
class NoveltyMetrics:
"""
Computes novelty metrics for embeddings in a streaming fashion.
Designed for use in an agent loop where outputs are generated one at a time
and we need to assess novelty incrementally.
"""
def __init__(
self,
similarity_threshold: float = 0.7,
jump_detection_enabled: bool = True
):
"""
Args:
similarity_threshold: Threshold for semantic similarity (below = jump)
jump_detection_enabled: Whether to track semantic jumps
"""
self.similarity_threshold = similarity_threshold
self.jump_detection_enabled = jump_detection_enabled
# State
self.embeddings: List[np.ndarray] = []
self.trajectory = NoveltyTrajectory()
self._centroid: Optional[np.ndarray] = None
def reset(self):
"""Reset all state for a new generation session."""
self.embeddings = []
self.trajectory = NoveltyTrajectory()
self._centroid = None
@staticmethod
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
"""Compute cosine similarity between two vectors."""
norm_a = np.linalg.norm(a)
norm_b = np.linalg.norm(b)
if norm_a == 0 or norm_b == 0:
return 0.0
return float(np.dot(a, b) / (norm_a * norm_b))
@staticmethod
def cosine_distance(a: np.ndarray, b: np.ndarray) -> float:
"""Compute cosine distance (1 - similarity) between two vectors."""
return 1.0 - NoveltyMetrics.cosine_similarity(a, b)
def compute_centroid(self) -> Optional[np.ndarray]:
"""Compute centroid of all current embeddings."""
if not self.embeddings:
return None
return np.mean(self.embeddings, axis=0)
def compute_novelty(self, embedding: np.ndarray) -> NoveltyScore:
"""
Compute novelty score for a new embedding.
This does NOT add the embedding to the history - call add_embedding() for that.
Args:
embedding: The embedding vector to evaluate
Returns:
NoveltyScore with computed metrics
"""
embedding = np.array(embedding)
# First output is maximally novel (nothing to compare to)
if not self.embeddings:
return NoveltyScore(
score=1.0,
distance_from_centroid=1.0,
min_distance_to_existing=1.0,
is_jump=False,
jump_magnitude=None
)
# Distance from centroid (primary novelty metric)
centroid = self.compute_centroid()
distance_from_centroid = self.cosine_distance(embedding, centroid)
# Minimum distance to any existing embedding (nearest neighbor)
min_distance = min(
self.cosine_distance(embedding, existing)
for existing in self.embeddings
)
# Jump detection (similarity to previous output)
is_jump = False
jump_magnitude = None
if self.jump_detection_enabled and self.embeddings:
similarity_to_prev = self.cosine_similarity(embedding, self.embeddings[-1])
jump_magnitude = similarity_to_prev
is_jump = similarity_to_prev < self.similarity_threshold
# Primary novelty score is distance from centroid
# Normalized to [0, 1] range where higher = more novel
novelty_score = distance_from_centroid
return NoveltyScore(
score=novelty_score,
distance_from_centroid=distance_from_centroid,
min_distance_to_existing=min_distance,
is_jump=is_jump,
jump_magnitude=jump_magnitude
)
def add_embedding(self, embedding: np.ndarray, novelty: Optional[NoveltyScore] = None):
"""
Add an embedding to the history and update trajectory.
Args:
embedding: The embedding to add
novelty: Pre-computed novelty score (computed if not provided)
"""
embedding = np.array(embedding)
if novelty is None:
novelty = self.compute_novelty(embedding)
# Update state
self.embeddings.append(embedding)
self._centroid = self.compute_centroid()
# Update trajectory
self.trajectory.scores.append(novelty.score)
# Cumulative novelty
prev_cumulative = self.trajectory.cumulative_novelty[-1] if self.trajectory.cumulative_novelty else 0.0
self.trajectory.cumulative_novelty.append(prev_cumulative + novelty.score)
# Track jumps
if novelty.is_jump:
self.trajectory.jump_positions.append(len(self.embeddings) - 1)
# Store centroid history
if self._centroid is not None:
self.trajectory.centroid_history.append(self._centroid.copy())
def get_current_state(self) -> dict:
"""Get current state as a dictionary for logging/debugging."""
return {
"num_embeddings": len(self.embeddings),
"mean_novelty": self.trajectory.mean_novelty,
"max_novelty": self.trajectory.max_novelty,
"jump_ratio": self.trajectory.jump_ratio,
"cumulative_novelty": self.trajectory.final_cumulative_novelty,
"recent_scores": self.trajectory.scores[-5:] if self.trajectory.scores else []
}
def compute_batch_novelty(
embeddings: List[np.ndarray],
reference_embeddings: Optional[List[np.ndarray]] = None
) -> List[float]:
"""
Compute novelty scores for a batch of embeddings.
Useful for post-hoc analysis of generated outputs.
Args:
embeddings: List of embeddings to evaluate
reference_embeddings: Optional reference set (uses self if not provided)
Returns:
List of novelty scores (distance from centroid)
"""
if not embeddings:
return []
embeddings_arr = np.array(embeddings)
if reference_embeddings is not None:
centroid = np.mean(reference_embeddings, axis=0)
else:
centroid = np.mean(embeddings_arr, axis=0)
scores = []
for emb in embeddings_arr:
distance = NoveltyMetrics.cosine_distance(emb, centroid)
scores.append(distance)
return scores
def find_most_novel(
embeddings: List[np.ndarray],
texts: List[str],
top_k: int = 5
) -> List[tuple]:
"""
Find the most novel outputs from a batch.
Args:
embeddings: List of embeddings
texts: Corresponding text outputs
top_k: Number of top results to return
Returns:
List of (text, novelty_score, index) tuples, sorted by novelty descending
"""
scores = compute_batch_novelty(embeddings)
indexed_results = [
(texts[i], scores[i], i)
for i in range(len(texts))
]
# Sort by novelty score descending
indexed_results.sort(key=lambda x: x[1], reverse=True)
return indexed_results[:top_k]