feat: Add experiments framework and novelty-driven agent loop
- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
253
experiments/novelty_loop/README.md
Normal file
253
experiments/novelty_loop/README.md
Normal file
@@ -0,0 +1,253 @@
|
||||
# Novelty-Driven LLM Agent Loop
|
||||
|
||||
An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
|
||||
|
||||
## Concept
|
||||
|
||||
Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
|
||||
|
||||
This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
|
||||
|
||||
```
|
||||
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
|
||||
```
|
||||
|
||||
## Research Foundation
|
||||
|
||||
This work builds on established research:
|
||||
|
||||
- **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
|
||||
- **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
|
||||
- **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
|
||||
- **Open-ended Learning**: Endless innovation through novelty pressure
|
||||
|
||||
The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ Novelty-Driven Task Generation Loop │
|
||||
├──────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ Seed │ "Design a better bicycle" │
|
||||
│ │ Problem │ │
|
||||
│ └────┬─────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ WHILE novelty < threshold AND iterations < max: │ │
|
||||
│ │ │ │
|
||||
│ │ 1. Sample random expert (curated occupations) │ │
|
||||
│ │ e.g., "marine biologist", "choreographer" │ │
|
||||
│ │ │ │
|
||||
│ │ 2. Generate task from expert perspective │ │
|
||||
│ │ "What task would a {expert} assign to improve │ │
|
||||
│ │ {seed_problem}?" │ │
|
||||
│ │ │ │
|
||||
│ │ 3. Embed task, compute novelty vs. centroid │ │
|
||||
│ │ │ │
|
||||
│ │ 4. If novelty > threshold → STOP (breakthrough!) │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ Output: │ Novel task that "jumped out" of typical space │
|
||||
│ │ Task │ + trajectory of exploration │
|
||||
│ └──────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
The module uses the existing project infrastructure. Ensure you have:
|
||||
|
||||
1. **Ollama** running with the required models:
|
||||
```bash
|
||||
ollama pull qwen3:8b
|
||||
ollama pull qwen3-embedding:4b
|
||||
```
|
||||
|
||||
2. **Python dependencies** (from project root):
|
||||
```bash
|
||||
cd backend
|
||||
source venv/bin/activate
|
||||
pip install httpx numpy
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
cd experiments/novelty_loop
|
||||
python demo.py "Improve urban transportation"
|
||||
```
|
||||
|
||||
### Example Output
|
||||
|
||||
```
|
||||
Iteration 1
|
||||
Expert: Architect (Architecture & Design)
|
||||
Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
|
||||
Novelty: [████████░░░░░░░░░░░░] 0.1234
|
||||
|
||||
Iteration 2
|
||||
Expert: Chef (Culinary)
|
||||
Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
|
||||
Novelty: [███████████░░░░░░░░░] 0.1823
|
||||
|
||||
Iteration 3
|
||||
Expert: Marine Biologist (Science)
|
||||
Task: Study fish schooling behavior to develop organic traffic flow algorithms
|
||||
Novelty: [██████████████░░░░░░] 0.3521
|
||||
|
||||
Iteration 4
|
||||
Expert: Choreographer (Performing Arts)
|
||||
Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
|
||||
Novelty: [████████████████████] 0.5234
|
||||
★ BREAKTHROUGH! ★
|
||||
```
|
||||
|
||||
## Termination Strategies
|
||||
|
||||
### 1. Seek Breakthrough (Default)
|
||||
|
||||
Stop when novelty exceeds threshold. Finds the first truly novel task.
|
||||
|
||||
```bash
|
||||
python demo.py "Your problem" --strategy breakthrough --threshold 0.4
|
||||
```
|
||||
|
||||
### 2. Exhaust Frontier
|
||||
|
||||
Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
|
||||
|
||||
```bash
|
||||
python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
|
||||
```
|
||||
|
||||
### 3. Coverage Target
|
||||
|
||||
Continue until N distinct conceptual clusters are covered. Ensures diversity.
|
||||
|
||||
```bash
|
||||
python demo.py "Your problem" --strategy coverage --clusters 5
|
||||
```
|
||||
|
||||
## API Usage
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
|
||||
|
||||
async def main():
|
||||
agent = NoveltyDrivenTaskAgent(
|
||||
novelty_threshold=0.4,
|
||||
max_iterations=20,
|
||||
language="en"
|
||||
)
|
||||
|
||||
result = await agent.run("Design a better bicycle")
|
||||
|
||||
print(f"Found breakthrough: {result.breakthrough_task.task}")
|
||||
print(f"Novelty score: {result.breakthrough_task.novelty_score}")
|
||||
print(f"From expert: {result.breakthrough_task.expert}")
|
||||
|
||||
await agent.close()
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## Novelty Metrics
|
||||
|
||||
The `novelty_metrics.py` module provides:
|
||||
|
||||
- **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
|
||||
- **Min Distance**: Distance to nearest neighbor (detect duplicates)
|
||||
- **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
|
||||
- **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
|
||||
|
||||
```python
|
||||
from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
|
||||
|
||||
metrics = NoveltyMetrics(similarity_threshold=0.7)
|
||||
|
||||
# Add embeddings one by one
|
||||
for embedding in embeddings:
|
||||
novelty = metrics.compute_novelty(embedding)
|
||||
metrics.add_embedding(embedding, novelty)
|
||||
print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
|
||||
|
||||
# Get trajectory stats
|
||||
print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
|
||||
print(f"Max novelty: {metrics.trajectory.max_novelty}")
|
||||
print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
|
||||
```
|
||||
|
||||
## CLI Options
|
||||
|
||||
```
|
||||
positional arguments:
|
||||
seed_problem The seed problem or challenge to explore
|
||||
|
||||
options:
|
||||
--strategy {breakthrough,exhaust,coverage}
|
||||
Termination strategy (default: breakthrough)
|
||||
--threshold, -t Novelty threshold for breakthrough (default: 0.4)
|
||||
--max-iter, -m Maximum iterations (default: 20)
|
||||
--language, -l {en,zh}
|
||||
Language for prompts and experts (default: en)
|
||||
--model LLM model for task generation (default: qwen3:8b)
|
||||
--embedding-model Embedding model (default: qwen3-embedding:4b)
|
||||
--temperature LLM temperature (default: 0.7)
|
||||
--output, -o Save results to JSON file
|
||||
--quiet, -q Suppress iteration output
|
||||
--verbose, -v Enable verbose logging
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
experiments/novelty_loop/
|
||||
├── README.md # This file
|
||||
├── agent.py # Core NoveltyDrivenTaskAgent and variants
|
||||
├── novelty_metrics.py # Novelty computation utilities
|
||||
└── demo.py # Interactive CLI demo
|
||||
```
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| Question | Decision | Rationale |
|
||||
|----------|----------|-----------|
|
||||
| Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
|
||||
| Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
|
||||
| Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
|
||||
| Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
|
||||
|
||||
## Connection to Main Project
|
||||
|
||||
This module integrates with the main novelty-seeking project:
|
||||
|
||||
- Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
|
||||
- Uses the same **embedding model** (qwen3-embedding:4b)
|
||||
- Builds on the **AUT flexibility analysis** metrics for novelty computation
|
||||
- Can use **DDC domain data** for alternative perturbation strategies
|
||||
|
||||
## Future Work
|
||||
|
||||
1. **Hybrid Perturbation**: Combine expert + domain perspectives
|
||||
2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
|
||||
3. **Semantic Steering**: Guide generation away from centroid direction
|
||||
4. **Multi-Agent Exploration**: Parallel agents with different strategies
|
||||
5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
|
||||
|
||||
## References
|
||||
|
||||
- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
|
||||
- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
|
||||
- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
|
||||
- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs
|
||||
42
experiments/novelty_loop/__init__.py
Normal file
42
experiments/novelty_loop/__init__.py
Normal file
@@ -0,0 +1,42 @@
|
||||
"""
|
||||
Novelty-Driven LLM Agent Loop
|
||||
|
||||
An autonomous agent that generates tasks using novelty as the termination condition.
|
||||
"""
|
||||
|
||||
from .agent import (
|
||||
NoveltyDrivenTaskAgent,
|
||||
ExhaustFrontierAgent,
|
||||
CoverageTargetAgent,
|
||||
GeneratedTask,
|
||||
TaskGenerationResult,
|
||||
ExpertProvider,
|
||||
DomainProvider,
|
||||
)
|
||||
|
||||
from .novelty_metrics import (
|
||||
NoveltyMetrics,
|
||||
NoveltyScore,
|
||||
NoveltyTrajectory,
|
||||
compute_batch_novelty,
|
||||
find_most_novel,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Agents
|
||||
"NoveltyDrivenTaskAgent",
|
||||
"ExhaustFrontierAgent",
|
||||
"CoverageTargetAgent",
|
||||
# Data classes
|
||||
"GeneratedTask",
|
||||
"TaskGenerationResult",
|
||||
"NoveltyScore",
|
||||
"NoveltyTrajectory",
|
||||
# Providers
|
||||
"ExpertProvider",
|
||||
"DomainProvider",
|
||||
# Metrics
|
||||
"NoveltyMetrics",
|
||||
"compute_batch_novelty",
|
||||
"find_most_novel",
|
||||
]
|
||||
725
experiments/novelty_loop/agent.py
Normal file
725
experiments/novelty_loop/agent.py
Normal file
@@ -0,0 +1,725 @@
|
||||
"""
|
||||
Novelty-Driven Task Agent - An autonomous agent that generates tasks using novelty as termination condition.
|
||||
|
||||
This agent operates in a while loop, generating tasks from diverse expert perspectives,
|
||||
and terminates when it finds a task that exceeds the novelty threshold (a "breakthrough").
|
||||
|
||||
The core innovation is using novelty assessment to help the agent "jump out" of its
|
||||
trained data distribution (semantic gravity), finding truly novel ideas.
|
||||
|
||||
Architecture:
|
||||
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
|
||||
|
||||
Termination Strategy: "Seek Breakthrough"
|
||||
- Continue until novelty > threshold
|
||||
- Find the first truly novel task and stop
|
||||
|
||||
Research Foundation:
|
||||
- Novelty Search (Lehman & Stanley): Reward novelty, not objectives
|
||||
- Curiosity-driven Exploration (Pathak et al.): Intrinsic motivation via prediction error
|
||||
- Quality-Diversity (MAP-Elites): Maintain diverse high-quality solutions
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import random
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Callable, List, Optional
|
||||
|
||||
import httpx
|
||||
import numpy as np
|
||||
|
||||
from .novelty_metrics import NoveltyMetrics, NoveltyScore, NoveltyTrajectory
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Data Classes
|
||||
# ============================================================================
|
||||
|
||||
@dataclass
|
||||
class GeneratedTask:
|
||||
"""A single generated task with metadata."""
|
||||
task: str
|
||||
expert: str
|
||||
expert_domain: str
|
||||
novelty_score: float
|
||||
iteration: int
|
||||
is_breakthrough: bool = False
|
||||
embedding: Optional[np.ndarray] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class TaskGenerationResult:
|
||||
"""Result of a complete novelty-driven task generation session."""
|
||||
seed_problem: str
|
||||
breakthrough_task: Optional[GeneratedTask] = None
|
||||
trajectory: List[GeneratedTask] = field(default_factory=list)
|
||||
total_iterations: int = 0
|
||||
terminated_by: str = "unknown" # "breakthrough", "max_iterations", "error"
|
||||
novelty_trajectory: Optional[NoveltyTrajectory] = None
|
||||
start_time: Optional[str] = None
|
||||
end_time: Optional[str] = None
|
||||
config: dict = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dictionary for JSON serialization."""
|
||||
return {
|
||||
"seed_problem": self.seed_problem,
|
||||
"breakthrough_task": {
|
||||
"task": self.breakthrough_task.task,
|
||||
"expert": self.breakthrough_task.expert,
|
||||
"expert_domain": self.breakthrough_task.expert_domain,
|
||||
"novelty_score": self.breakthrough_task.novelty_score,
|
||||
"iteration": self.breakthrough_task.iteration
|
||||
} if self.breakthrough_task else None,
|
||||
"trajectory": [
|
||||
{
|
||||
"task": t.task,
|
||||
"expert": t.expert,
|
||||
"expert_domain": t.expert_domain,
|
||||
"novelty_score": t.novelty_score,
|
||||
"iteration": t.iteration,
|
||||
"is_breakthrough": t.is_breakthrough
|
||||
}
|
||||
for t in self.trajectory
|
||||
],
|
||||
"total_iterations": self.total_iterations,
|
||||
"terminated_by": self.terminated_by,
|
||||
"novelty_stats": {
|
||||
"mean_novelty": self.novelty_trajectory.mean_novelty if self.novelty_trajectory else 0,
|
||||
"max_novelty": self.novelty_trajectory.max_novelty if self.novelty_trajectory else 0,
|
||||
"jump_ratio": self.novelty_trajectory.jump_ratio if self.novelty_trajectory else 0,
|
||||
"cumulative_novelty": self.novelty_trajectory.final_cumulative_novelty if self.novelty_trajectory else 0
|
||||
},
|
||||
"start_time": self.start_time,
|
||||
"end_time": self.end_time,
|
||||
"config": self.config
|
||||
}
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Expert/Domain Providers
|
||||
# ============================================================================
|
||||
|
||||
class ExpertProvider:
|
||||
"""Provides random experts from curated occupation lists."""
|
||||
|
||||
def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
|
||||
"""
|
||||
Args:
|
||||
data_dir: Path to data directory containing occupation JSON files
|
||||
language: Language code ("en" or "zh")
|
||||
"""
|
||||
if data_dir is None:
|
||||
# Default to backend data directory
|
||||
data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
|
||||
|
||||
self.data_dir = data_dir
|
||||
self.language = language
|
||||
self._occupations: List[dict] = []
|
||||
self._load_occupations()
|
||||
|
||||
def _load_occupations(self):
|
||||
"""Load occupations from JSON file."""
|
||||
file_path = self.data_dir / f"curated_occupations_{self.language}.json"
|
||||
|
||||
if not file_path.exists():
|
||||
logger.warning(f"Occupation file not found: {file_path}")
|
||||
# Fallback to some default experts
|
||||
self._occupations = [
|
||||
{"name": "Marine Biologist", "domain": "Science"},
|
||||
{"name": "Choreographer", "domain": "Arts"},
|
||||
{"name": "Urban Planner", "domain": "Architecture"},
|
||||
{"name": "Chef", "domain": "Culinary"},
|
||||
{"name": "Astronomer", "domain": "Science"},
|
||||
]
|
||||
return
|
||||
|
||||
try:
|
||||
with open(file_path, "r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
self._occupations = data.get("occupations", [])
|
||||
logger.info(f"Loaded {len(self._occupations)} occupations from {file_path.name}")
|
||||
except Exception as e:
|
||||
logger.error(f"Error loading occupations: {e}")
|
||||
self._occupations = []
|
||||
|
||||
def get_random_expert(self) -> dict:
|
||||
"""Get a random expert with name and domain."""
|
||||
if not self._occupations:
|
||||
return {"name": "Expert", "domain": "General"}
|
||||
return random.choice(self._occupations)
|
||||
|
||||
def get_random_experts(self, count: int) -> List[dict]:
|
||||
"""Get multiple random experts without replacement."""
|
||||
if len(self._occupations) <= count:
|
||||
return self._occupations.copy()
|
||||
return random.sample(self._occupations, count)
|
||||
|
||||
|
||||
class DomainProvider:
|
||||
"""Provides random knowledge domains from DDC classification."""
|
||||
|
||||
def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
|
||||
if data_dir is None:
|
||||
data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
|
||||
|
||||
self.data_dir = data_dir
|
||||
self.language = language
|
||||
self._domains: List[dict] = []
|
||||
self._load_domains()
|
||||
|
||||
def _load_domains(self):
|
||||
"""Load domains from JSON file."""
|
||||
file_path = self.data_dir / f"ddc_domains_{self.language}.json"
|
||||
|
||||
if not file_path.exists():
|
||||
logger.warning(f"Domain file not found: {file_path}")
|
||||
self._domains = []
|
||||
return
|
||||
|
||||
try:
|
||||
with open(file_path, "r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
self._domains = data.get("domains", [])
|
||||
logger.info(f"Loaded {len(self._domains)} domains from {file_path.name}")
|
||||
except Exception as e:
|
||||
logger.error(f"Error loading domains: {e}")
|
||||
|
||||
def get_random_domain(self, level: Optional[str] = None) -> dict:
|
||||
"""Get a random domain, optionally filtered by level."""
|
||||
domains = self._domains
|
||||
if level:
|
||||
domains = [d for d in domains if d.get("level") == level]
|
||||
|
||||
if not domains:
|
||||
return {"name": "General Knowledge", "code": "000"}
|
||||
return random.choice(domains)
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Novelty-Driven Task Agent
|
||||
# ============================================================================
|
||||
|
||||
class NoveltyDrivenTaskAgent:
|
||||
"""
|
||||
An autonomous agent that generates tasks using novelty as the termination condition.
|
||||
|
||||
The agent operates in a loop:
|
||||
1. Sample a random expert perspective
|
||||
2. Generate a task from that expert's viewpoint
|
||||
3. Compute the task's novelty (distance from centroid of previous tasks)
|
||||
4. If novelty > threshold → STOP (found breakthrough!)
|
||||
5. Otherwise → Continue with next expert
|
||||
|
||||
Example:
|
||||
agent = NoveltyDrivenTaskAgent(novelty_threshold=0.4)
|
||||
result = await agent.run("Improve urban transportation")
|
||||
|
||||
# result.breakthrough_task contains the novel task found
|
||||
# result.trajectory shows the exploration path
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
novelty_threshold: float = 0.4,
|
||||
max_iterations: int = 20,
|
||||
ollama_base_url: str = "http://localhost:11435",
|
||||
llm_model: str = "qwen3:8b",
|
||||
embedding_model: str = "qwen3-embedding:4b",
|
||||
language: str = "en",
|
||||
data_dir: Optional[Path] = None,
|
||||
on_iteration: Optional[Callable[[GeneratedTask], None]] = None,
|
||||
temperature: float = 0.7
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
novelty_threshold: Novelty score threshold for breakthrough (0.0-1.0)
|
||||
max_iterations: Maximum iterations before stopping
|
||||
ollama_base_url: Ollama API endpoint
|
||||
llm_model: Model for task generation
|
||||
embedding_model: Model for embeddings
|
||||
language: Language for prompts and experts ("en" or "zh")
|
||||
data_dir: Path to data directory for expert/domain files
|
||||
on_iteration: Callback function called after each iteration
|
||||
temperature: LLM temperature for generation
|
||||
"""
|
||||
self.novelty_threshold = novelty_threshold
|
||||
self.max_iterations = max_iterations
|
||||
self.ollama_base_url = ollama_base_url
|
||||
self.llm_model = llm_model
|
||||
self.embedding_model = embedding_model
|
||||
self.language = language
|
||||
self.temperature = temperature
|
||||
self.on_iteration = on_iteration
|
||||
|
||||
# Initialize providers
|
||||
self.expert_provider = ExpertProvider(data_dir, language)
|
||||
self.domain_provider = DomainProvider(data_dir, language)
|
||||
|
||||
# Initialize novelty metrics
|
||||
self.novelty_metrics = NoveltyMetrics(
|
||||
similarity_threshold=0.7,
|
||||
jump_detection_enabled=True
|
||||
)
|
||||
|
||||
# HTTP client
|
||||
self._client: Optional[httpx.AsyncClient] = None
|
||||
|
||||
async def _get_client(self) -> httpx.AsyncClient:
|
||||
"""Get or create HTTP client."""
|
||||
if self._client is None:
|
||||
self._client = httpx.AsyncClient(timeout=120.0)
|
||||
return self._client
|
||||
|
||||
async def close(self):
|
||||
"""Close HTTP client."""
|
||||
if self._client is not None:
|
||||
await self._client.aclose()
|
||||
self._client = None
|
||||
|
||||
async def _generate_text(self, prompt: str) -> str:
|
||||
"""Generate text using Ollama LLM."""
|
||||
client = await self._get_client()
|
||||
url = f"{self.ollama_base_url}/api/generate"
|
||||
|
||||
# Add /no_think prefix for qwen models to disable thinking
|
||||
if self.llm_model.lower().startswith("qwen"):
|
||||
prompt = f"/no_think\n{prompt}"
|
||||
|
||||
try:
|
||||
response = await client.post(url, json={
|
||||
"model": self.llm_model,
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {
|
||||
"temperature": self.temperature
|
||||
}
|
||||
})
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
return result.get("response", "").strip()
|
||||
except Exception as e:
|
||||
logger.error(f"LLM generation error: {e}")
|
||||
raise
|
||||
|
||||
async def _get_embedding(self, text: str) -> np.ndarray:
|
||||
"""Get embedding vector for text."""
|
||||
client = await self._get_client()
|
||||
url = f"{self.ollama_base_url}/api/embed"
|
||||
|
||||
try:
|
||||
response = await client.post(url, json={
|
||||
"model": self.embedding_model,
|
||||
"input": text
|
||||
})
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
return np.array(result["embeddings"][0])
|
||||
except Exception as e:
|
||||
logger.error(f"Embedding error: {e}")
|
||||
raise
|
||||
|
||||
def _build_task_prompt(
|
||||
self,
|
||||
seed_problem: str,
|
||||
expert: dict,
|
||||
previous_tasks: List[str]
|
||||
) -> str:
|
||||
"""Build the prompt for task generation."""
|
||||
expert_name = expert.get("name", "Expert")
|
||||
expert_domain = expert.get("domain", "General")
|
||||
|
||||
# Build context from previous tasks (if any)
|
||||
context = ""
|
||||
if previous_tasks:
|
||||
recent = previous_tasks[-3:] # Last 3 tasks
|
||||
context = "\n\nPrevious suggestions (generate something DIFFERENT):\n"
|
||||
for t in recent:
|
||||
context += f"- {t}\n"
|
||||
|
||||
if self.language == "zh":
|
||||
prompt = f"""你是一位 {expert_name}({expert_domain})。
|
||||
|
||||
给定问题:{seed_problem}
|
||||
|
||||
请从你的专业角度出发,提出一个独特的改进任务或探索方向。
|
||||
这个任务应该结合你的专业知识,提供一个非传统但有价值的视角。
|
||||
{context}
|
||||
请直接给出任务描述,不要添加解释。任务应该具体、可行、且与众不同。
|
||||
|
||||
任务:"""
|
||||
else:
|
||||
prompt = f"""You are a {expert_name} ({expert_domain}).
|
||||
|
||||
Given problem: {seed_problem}
|
||||
|
||||
From your professional perspective, propose a unique task or exploration direction to improve or innovate on this problem.
|
||||
The task should leverage your domain expertise to provide an unconventional but valuable angle.
|
||||
{context}
|
||||
Provide just the task description without explanation. The task should be specific, actionable, and distinctive.
|
||||
|
||||
Task:"""
|
||||
|
||||
return prompt
|
||||
|
||||
async def _generate_task(
|
||||
self,
|
||||
seed_problem: str,
|
||||
expert: dict,
|
||||
previous_tasks: List[str]
|
||||
) -> str:
|
||||
"""Generate a task from an expert's perspective."""
|
||||
prompt = self._build_task_prompt(seed_problem, expert, previous_tasks)
|
||||
task = await self._generate_text(prompt)
|
||||
|
||||
# Clean up the response
|
||||
task = task.strip()
|
||||
# Remove common prefixes
|
||||
for prefix in ["Task:", "任务:", "Here's", "I suggest", "Based on"]:
|
||||
if task.lower().startswith(prefix.lower()):
|
||||
task = task[len(prefix):].strip()
|
||||
|
||||
return task
|
||||
|
||||
async def run(
|
||||
self,
|
||||
seed_problem: str,
|
||||
used_experts: Optional[List[dict]] = None
|
||||
) -> TaskGenerationResult:
|
||||
"""
|
||||
Run the novelty-driven task generation loop.
|
||||
|
||||
Args:
|
||||
seed_problem: The initial problem/challenge to explore
|
||||
used_experts: Optional list of experts to avoid (for multi-run scenarios)
|
||||
|
||||
Returns:
|
||||
TaskGenerationResult with breakthrough task (if found) and full trajectory
|
||||
"""
|
||||
# Reset state
|
||||
self.novelty_metrics.reset()
|
||||
|
||||
result = TaskGenerationResult(
|
||||
seed_problem=seed_problem,
|
||||
start_time=datetime.now(timezone.utc).isoformat(),
|
||||
config={
|
||||
"novelty_threshold": self.novelty_threshold,
|
||||
"max_iterations": self.max_iterations,
|
||||
"llm_model": self.llm_model,
|
||||
"embedding_model": self.embedding_model,
|
||||
"language": self.language
|
||||
}
|
||||
)
|
||||
|
||||
used_expert_names = set()
|
||||
if used_experts:
|
||||
used_expert_names = {e["name"] for e in used_experts}
|
||||
|
||||
previous_tasks: List[str] = []
|
||||
|
||||
logger.info(f"Starting novelty loop: '{seed_problem}' (threshold={self.novelty_threshold})")
|
||||
|
||||
try:
|
||||
for iteration in range(self.max_iterations):
|
||||
# 1. Sample a random expert (avoid duplicates)
|
||||
attempts = 0
|
||||
expert = self.expert_provider.get_random_expert()
|
||||
while expert["name"] in used_expert_names and attempts < 10:
|
||||
expert = self.expert_provider.get_random_expert()
|
||||
attempts += 1
|
||||
used_expert_names.add(expert["name"])
|
||||
|
||||
logger.info(f"Iteration {iteration + 1}: Expert = {expert['name']} ({expert['domain']})")
|
||||
|
||||
# 2. Generate task
|
||||
task = await self._generate_task(seed_problem, expert, previous_tasks)
|
||||
previous_tasks.append(task)
|
||||
|
||||
# 3. Get embedding
|
||||
embedding = await self._get_embedding(task)
|
||||
|
||||
# 4. Compute novelty
|
||||
novelty = self.novelty_metrics.compute_novelty(embedding)
|
||||
self.novelty_metrics.add_embedding(embedding, novelty)
|
||||
|
||||
# 5. Create task record
|
||||
generated_task = GeneratedTask(
|
||||
task=task,
|
||||
expert=expert["name"],
|
||||
expert_domain=expert["domain"],
|
||||
novelty_score=novelty.score,
|
||||
iteration=iteration + 1,
|
||||
is_breakthrough=novelty.score > self.novelty_threshold,
|
||||
embedding=embedding
|
||||
)
|
||||
result.trajectory.append(generated_task)
|
||||
|
||||
logger.info(f" Task: {task[:80]}...")
|
||||
logger.info(f" Novelty: {novelty.score:.4f} (threshold: {self.novelty_threshold})")
|
||||
|
||||
# Callback
|
||||
if self.on_iteration:
|
||||
self.on_iteration(generated_task)
|
||||
|
||||
# 6. Check for breakthrough
|
||||
if novelty.score > self.novelty_threshold:
|
||||
result.breakthrough_task = generated_task
|
||||
result.terminated_by = "breakthrough"
|
||||
result.total_iterations = iteration + 1
|
||||
logger.info(f" BREAKTHROUGH! Stopping after {iteration + 1} iterations")
|
||||
break
|
||||
|
||||
else:
|
||||
# Max iterations reached without breakthrough
|
||||
result.terminated_by = "max_iterations"
|
||||
result.total_iterations = self.max_iterations
|
||||
logger.info(f"Max iterations ({self.max_iterations}) reached without breakthrough")
|
||||
|
||||
# Find the most novel task as a fallback
|
||||
if result.trajectory:
|
||||
best_task = max(result.trajectory, key=lambda t: t.novelty_score)
|
||||
best_task.is_breakthrough = True # Mark as best found
|
||||
result.breakthrough_task = best_task
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error during generation: {e}")
|
||||
result.terminated_by = f"error: {str(e)}"
|
||||
result.total_iterations = len(result.trajectory)
|
||||
|
||||
# Finalize
|
||||
result.end_time = datetime.now(timezone.utc).isoformat()
|
||||
result.novelty_trajectory = self.novelty_metrics.trajectory
|
||||
|
||||
return result
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Alternative Termination Strategies
|
||||
# ============================================================================
|
||||
|
||||
class ExhaustFrontierAgent(NoveltyDrivenTaskAgent):
|
||||
"""
|
||||
Alternative strategy: Continue while novelty is high, stop when it drops.
|
||||
|
||||
This explores the "novelty frontier" more thoroughly, finding multiple novel
|
||||
ideas before stopping when exploration becomes repetitive.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
exhaustion_threshold: float = 0.15,
|
||||
window_size: int = 3,
|
||||
min_iterations: int = 5,
|
||||
**kwargs
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
exhaustion_threshold: Stop when recent average novelty drops below this
|
||||
window_size: Number of recent iterations to average
|
||||
min_iterations: Minimum iterations before checking exhaustion
|
||||
**kwargs: Passed to parent class
|
||||
"""
|
||||
super().__init__(**kwargs)
|
||||
self.exhaustion_threshold = exhaustion_threshold
|
||||
self.window_size = window_size
|
||||
self.min_iterations = min_iterations
|
||||
|
||||
async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
|
||||
"""Override to use exhaustion-based termination."""
|
||||
# Reset state
|
||||
self.novelty_metrics.reset()
|
||||
|
||||
result = TaskGenerationResult(
|
||||
seed_problem=seed_problem,
|
||||
start_time=datetime.now(timezone.utc).isoformat(),
|
||||
config={
|
||||
"strategy": "exhaust_frontier",
|
||||
"exhaustion_threshold": self.exhaustion_threshold,
|
||||
"window_size": self.window_size,
|
||||
"min_iterations": self.min_iterations,
|
||||
"max_iterations": self.max_iterations,
|
||||
"llm_model": self.llm_model
|
||||
}
|
||||
)
|
||||
|
||||
used_expert_names = set()
|
||||
previous_tasks: List[str] = []
|
||||
novelty_history: List[float] = []
|
||||
|
||||
try:
|
||||
for iteration in range(self.max_iterations):
|
||||
# Sample expert
|
||||
expert = self.expert_provider.get_random_expert()
|
||||
while expert["name"] in used_expert_names and len(used_expert_names) < 200:
|
||||
expert = self.expert_provider.get_random_expert()
|
||||
used_expert_names.add(expert["name"])
|
||||
|
||||
# Generate and evaluate
|
||||
task = await self._generate_task(seed_problem, expert, previous_tasks)
|
||||
previous_tasks.append(task)
|
||||
embedding = await self._get_embedding(task)
|
||||
novelty = self.novelty_metrics.compute_novelty(embedding)
|
||||
self.novelty_metrics.add_embedding(embedding, novelty)
|
||||
|
||||
novelty_history.append(novelty.score)
|
||||
|
||||
generated_task = GeneratedTask(
|
||||
task=task,
|
||||
expert=expert["name"],
|
||||
expert_domain=expert["domain"],
|
||||
novelty_score=novelty.score,
|
||||
iteration=iteration + 1
|
||||
)
|
||||
result.trajectory.append(generated_task)
|
||||
|
||||
if self.on_iteration:
|
||||
self.on_iteration(generated_task)
|
||||
|
||||
# Check exhaustion condition
|
||||
if iteration >= self.min_iterations:
|
||||
recent_avg = np.mean(novelty_history[-self.window_size:])
|
||||
if recent_avg < self.exhaustion_threshold:
|
||||
result.terminated_by = f"exhaustion (avg={recent_avg:.3f})"
|
||||
result.total_iterations = iteration + 1
|
||||
break
|
||||
|
||||
else:
|
||||
result.terminated_by = "max_iterations"
|
||||
result.total_iterations = self.max_iterations
|
||||
|
||||
# Find all "novel" tasks
|
||||
novel_tasks = [t for t in result.trajectory if t.novelty_score > self.exhaustion_threshold]
|
||||
if novel_tasks:
|
||||
result.breakthrough_task = max(novel_tasks, key=lambda t: t.novelty_score)
|
||||
result.breakthrough_task.is_breakthrough = True
|
||||
|
||||
except Exception as e:
|
||||
result.terminated_by = f"error: {str(e)}"
|
||||
result.total_iterations = len(result.trajectory)
|
||||
|
||||
result.end_time = datetime.now(timezone.utc).isoformat()
|
||||
result.novelty_trajectory = self.novelty_metrics.trajectory
|
||||
|
||||
return result
|
||||
|
||||
|
||||
class CoverageTargetAgent(NoveltyDrivenTaskAgent):
|
||||
"""
|
||||
Alternative strategy: Continue until N distinct clusters are covered.
|
||||
|
||||
This ensures a diverse portfolio of ideas across different conceptual areas.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
target_clusters: int = 5,
|
||||
cluster_threshold: float = 0.7,
|
||||
**kwargs
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
target_clusters: Target number of distinct clusters to find
|
||||
cluster_threshold: Similarity threshold for cluster membership
|
||||
**kwargs: Passed to parent class
|
||||
"""
|
||||
super().__init__(**kwargs)
|
||||
self.target_clusters = target_clusters
|
||||
self.cluster_threshold = cluster_threshold
|
||||
|
||||
def _count_clusters(self, embeddings: List[np.ndarray]) -> int:
|
||||
"""Count distinct clusters using greedy clustering."""
|
||||
if not embeddings:
|
||||
return 0
|
||||
|
||||
clusters = []
|
||||
for emb in embeddings:
|
||||
found_cluster = False
|
||||
for cluster_centroid in clusters:
|
||||
similarity = NoveltyMetrics.cosine_similarity(emb, cluster_centroid)
|
||||
if similarity >= self.cluster_threshold:
|
||||
found_cluster = True
|
||||
break
|
||||
|
||||
if not found_cluster:
|
||||
clusters.append(emb)
|
||||
|
||||
return len(clusters)
|
||||
|
||||
async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
|
||||
"""Override to use coverage-based termination."""
|
||||
self.novelty_metrics.reset()
|
||||
|
||||
result = TaskGenerationResult(
|
||||
seed_problem=seed_problem,
|
||||
start_time=datetime.now(timezone.utc).isoformat(),
|
||||
config={
|
||||
"strategy": "coverage_target",
|
||||
"target_clusters": self.target_clusters,
|
||||
"cluster_threshold": self.cluster_threshold,
|
||||
"max_iterations": self.max_iterations
|
||||
}
|
||||
)
|
||||
|
||||
used_expert_names = set()
|
||||
previous_tasks: List[str] = []
|
||||
all_embeddings: List[np.ndarray] = []
|
||||
|
||||
try:
|
||||
for iteration in range(self.max_iterations):
|
||||
expert = self.expert_provider.get_random_expert()
|
||||
while expert["name"] in used_expert_names and len(used_expert_names) < 200:
|
||||
expert = self.expert_provider.get_random_expert()
|
||||
used_expert_names.add(expert["name"])
|
||||
|
||||
task = await self._generate_task(seed_problem, expert, previous_tasks)
|
||||
previous_tasks.append(task)
|
||||
embedding = await self._get_embedding(task)
|
||||
all_embeddings.append(embedding)
|
||||
|
||||
novelty = self.novelty_metrics.compute_novelty(embedding)
|
||||
self.novelty_metrics.add_embedding(embedding, novelty)
|
||||
|
||||
generated_task = GeneratedTask(
|
||||
task=task,
|
||||
expert=expert["name"],
|
||||
expert_domain=expert["domain"],
|
||||
novelty_score=novelty.score,
|
||||
iteration=iteration + 1
|
||||
)
|
||||
result.trajectory.append(generated_task)
|
||||
|
||||
if self.on_iteration:
|
||||
self.on_iteration(generated_task)
|
||||
|
||||
# Check coverage
|
||||
cluster_count = self._count_clusters(all_embeddings)
|
||||
if cluster_count >= self.target_clusters:
|
||||
result.terminated_by = f"coverage ({cluster_count} clusters)"
|
||||
result.total_iterations = iteration + 1
|
||||
break
|
||||
|
||||
else:
|
||||
final_clusters = self._count_clusters(all_embeddings)
|
||||
result.terminated_by = f"max_iterations ({final_clusters} clusters)"
|
||||
result.total_iterations = self.max_iterations
|
||||
|
||||
# Find most novel task
|
||||
if result.trajectory:
|
||||
best_task = max(result.trajectory, key=lambda t: t.novelty_score)
|
||||
best_task.is_breakthrough = True
|
||||
result.breakthrough_task = best_task
|
||||
|
||||
except Exception as e:
|
||||
result.terminated_by = f"error: {str(e)}"
|
||||
result.total_iterations = len(result.trajectory)
|
||||
|
||||
result.end_time = datetime.now(timezone.utc).isoformat()
|
||||
result.novelty_trajectory = self.novelty_metrics.trajectory
|
||||
|
||||
return result
|
||||
313
experiments/novelty_loop/demo.py
Executable file
313
experiments/novelty_loop/demo.py
Executable file
@@ -0,0 +1,313 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Novelty-Driven Task Generation Demo
|
||||
|
||||
Interactive CLI for exploring the novelty-driven task generation agent.
|
||||
|
||||
Examples:
|
||||
# Basic usage with default settings
|
||||
python demo.py "Improve urban transportation"
|
||||
|
||||
# Custom threshold and iterations
|
||||
python demo.py "Design a better bicycle" --threshold 0.35 --max-iter 15
|
||||
|
||||
# Use Chinese language
|
||||
python demo.py "改进城市交通" --language zh
|
||||
|
||||
# Use exhaustion strategy (explore until stuck)
|
||||
python demo.py "Sustainable energy solutions" --strategy exhaust
|
||||
|
||||
# Use coverage strategy (find N distinct clusters)
|
||||
python demo.py "Future of education" --strategy coverage --clusters 5
|
||||
|
||||
# Save results to file
|
||||
python demo.py "Smart home innovations" --output results.json
|
||||
|
||||
# Verbose mode with detailed logging
|
||||
python demo.py "Healthcare improvements" --verbose
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
# Add parent directory to path for imports
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||
|
||||
from experiments.novelty_loop.agent import (
|
||||
NoveltyDrivenTaskAgent,
|
||||
ExhaustFrontierAgent,
|
||||
CoverageTargetAgent,
|
||||
GeneratedTask,
|
||||
TaskGenerationResult
|
||||
)
|
||||
|
||||
# ANSI color codes for terminal output
|
||||
class Colors:
|
||||
HEADER = '\033[95m'
|
||||
BLUE = '\033[94m'
|
||||
CYAN = '\033[96m'
|
||||
GREEN = '\033[92m'
|
||||
YELLOW = '\033[93m'
|
||||
RED = '\033[91m'
|
||||
BOLD = '\033[1m'
|
||||
UNDERLINE = '\033[4m'
|
||||
END = '\033[0m'
|
||||
|
||||
|
||||
def print_header(text: str):
|
||||
"""Print a styled header."""
|
||||
print(f"\n{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}")
|
||||
print(f"{Colors.BOLD}{Colors.HEADER}{text.center(60)}{Colors.END}")
|
||||
print(f"{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}\n")
|
||||
|
||||
|
||||
def print_iteration(task: GeneratedTask):
|
||||
"""Print iteration result with colors."""
|
||||
status_color = Colors.GREEN if task.is_breakthrough else Colors.CYAN
|
||||
|
||||
print(f"\n{Colors.BOLD}Iteration {task.iteration}{Colors.END}")
|
||||
print(f" {Colors.YELLOW}Expert:{Colors.END} {task.expert} ({task.expert_domain})")
|
||||
print(f" {Colors.YELLOW}Task:{Colors.END} {task.task}")
|
||||
|
||||
novelty_bar = "█" * int(task.novelty_score * 20) + "░" * (20 - int(task.novelty_score * 20))
|
||||
print(f" {Colors.YELLOW}Novelty:{Colors.END} [{novelty_bar}] {task.novelty_score:.4f}")
|
||||
|
||||
if task.is_breakthrough:
|
||||
print(f" {Colors.GREEN}{Colors.BOLD}★ BREAKTHROUGH! ★{Colors.END}")
|
||||
|
||||
|
||||
def print_result(result: TaskGenerationResult):
|
||||
"""Print final result summary."""
|
||||
print_header("RESULTS")
|
||||
|
||||
print(f"{Colors.BOLD}Seed Problem:{Colors.END} {result.seed_problem}")
|
||||
print(f"{Colors.BOLD}Total Iterations:{Colors.END} {result.total_iterations}")
|
||||
print(f"{Colors.BOLD}Terminated By:{Colors.END} {result.terminated_by}")
|
||||
|
||||
if result.novelty_trajectory:
|
||||
print(f"\n{Colors.BOLD}Novelty Statistics:{Colors.END}")
|
||||
print(f" Mean Novelty: {result.novelty_trajectory.mean_novelty:.4f}")
|
||||
print(f" Max Novelty: {result.novelty_trajectory.max_novelty:.4f}")
|
||||
print(f" Jump Ratio: {result.novelty_trajectory.jump_ratio:.2%}")
|
||||
|
||||
if result.breakthrough_task:
|
||||
print(f"\n{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
|
||||
print(f"{Colors.GREEN}{Colors.BOLD}BREAKTHROUGH TASK{Colors.END}")
|
||||
print(f"{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
|
||||
print(f"\n{Colors.BOLD}Expert:{Colors.END} {result.breakthrough_task.expert}")
|
||||
print(f"{Colors.BOLD}Domain:{Colors.END} {result.breakthrough_task.expert_domain}")
|
||||
print(f"{Colors.BOLD}Task:{Colors.END}")
|
||||
print(f" {Colors.CYAN}{result.breakthrough_task.task}{Colors.END}")
|
||||
print(f"\n{Colors.BOLD}Novelty Score:{Colors.END} {result.breakthrough_task.novelty_score:.4f}")
|
||||
print(f"{Colors.BOLD}Found at Iteration:{Colors.END} {result.breakthrough_task.iteration}")
|
||||
|
||||
# Show trajectory summary
|
||||
print(f"\n{Colors.BOLD}Exploration Trajectory:{Colors.END}")
|
||||
for task in result.trajectory:
|
||||
marker = "★" if task.is_breakthrough else "○"
|
||||
novelty_indicator = "█" * int(task.novelty_score * 10)
|
||||
print(f" {marker} [{task.iteration:2d}] {task.expert:20s} | {novelty_indicator:10s} {task.novelty_score:.3f}")
|
||||
|
||||
|
||||
def save_result(result: TaskGenerationResult, output_path: str):
|
||||
"""Save result to JSON file."""
|
||||
with open(output_path, "w", encoding="utf-8") as f:
|
||||
json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
|
||||
print(f"\n{Colors.GREEN}Results saved to: {output_path}{Colors.END}")
|
||||
|
||||
|
||||
async def run_demo(args):
|
||||
"""Run the novelty-driven task generation demo."""
|
||||
|
||||
print_header("NOVELTY-DRIVEN TASK GENERATION")
|
||||
|
||||
print(f"{Colors.BOLD}Configuration:{Colors.END}")
|
||||
print(f" Seed Problem: {args.seed_problem}")
|
||||
print(f" Strategy: {args.strategy}")
|
||||
print(f" Novelty Threshold: {args.threshold}")
|
||||
print(f" Max Iterations: {args.max_iter}")
|
||||
print(f" Language: {args.language}")
|
||||
print(f" LLM Model: {args.model}")
|
||||
|
||||
# Create appropriate agent based on strategy
|
||||
common_kwargs = {
|
||||
"max_iterations": args.max_iter,
|
||||
"llm_model": args.model,
|
||||
"embedding_model": args.embedding_model,
|
||||
"language": args.language,
|
||||
"temperature": args.temperature,
|
||||
"on_iteration": print_iteration if not args.quiet else None
|
||||
}
|
||||
|
||||
if args.strategy == "breakthrough":
|
||||
agent = NoveltyDrivenTaskAgent(
|
||||
novelty_threshold=args.threshold,
|
||||
**common_kwargs
|
||||
)
|
||||
elif args.strategy == "exhaust":
|
||||
agent = ExhaustFrontierAgent(
|
||||
exhaustion_threshold=args.exhaust_threshold,
|
||||
window_size=args.window_size,
|
||||
min_iterations=args.min_iter,
|
||||
**common_kwargs
|
||||
)
|
||||
elif args.strategy == "coverage":
|
||||
agent = CoverageTargetAgent(
|
||||
target_clusters=args.clusters,
|
||||
cluster_threshold=args.cluster_threshold,
|
||||
**common_kwargs
|
||||
)
|
||||
else:
|
||||
print(f"{Colors.RED}Unknown strategy: {args.strategy}{Colors.END}")
|
||||
return
|
||||
|
||||
print(f"\n{Colors.BOLD}Starting generation loop...{Colors.END}")
|
||||
print("-" * 60)
|
||||
|
||||
try:
|
||||
result = await agent.run(args.seed_problem)
|
||||
print_result(result)
|
||||
|
||||
if args.output:
|
||||
save_result(result, args.output)
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n{Colors.RED}Error: {e}{Colors.END}")
|
||||
if args.verbose:
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
finally:
|
||||
await agent.close()
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Novelty-Driven Task Generation Demo",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=__doc__
|
||||
)
|
||||
|
||||
# Required argument
|
||||
parser.add_argument(
|
||||
"seed_problem",
|
||||
help="The seed problem or challenge to explore"
|
||||
)
|
||||
|
||||
# Strategy selection
|
||||
parser.add_argument(
|
||||
"--strategy", "-s",
|
||||
choices=["breakthrough", "exhaust", "coverage"],
|
||||
default="breakthrough",
|
||||
help="Termination strategy (default: breakthrough)"
|
||||
)
|
||||
|
||||
# Common options
|
||||
parser.add_argument(
|
||||
"--threshold", "-t",
|
||||
type=float,
|
||||
default=0.4,
|
||||
help="Novelty threshold for breakthrough (default: 0.4)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-iter", "-m",
|
||||
type=int,
|
||||
default=20,
|
||||
help="Maximum iterations (default: 20)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--language", "-l",
|
||||
choices=["en", "zh"],
|
||||
default="en",
|
||||
help="Language for prompts and experts (default: en)"
|
||||
)
|
||||
|
||||
# Model options
|
||||
parser.add_argument(
|
||||
"--model",
|
||||
default="qwen3:8b",
|
||||
help="LLM model for task generation (default: qwen3:8b)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--embedding-model",
|
||||
default="qwen3-embedding:4b",
|
||||
help="Embedding model (default: qwen3-embedding:4b)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--temperature",
|
||||
type=float,
|
||||
default=0.7,
|
||||
help="LLM temperature (default: 0.7)"
|
||||
)
|
||||
|
||||
# Exhaust strategy options
|
||||
parser.add_argument(
|
||||
"--exhaust-threshold",
|
||||
type=float,
|
||||
default=0.15,
|
||||
help="Exhaustion threshold for 'exhaust' strategy (default: 0.15)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--window-size",
|
||||
type=int,
|
||||
default=3,
|
||||
help="Window size for exhaustion check (default: 3)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--min-iter",
|
||||
type=int,
|
||||
default=5,
|
||||
help="Minimum iterations before exhaustion check (default: 5)"
|
||||
)
|
||||
|
||||
# Coverage strategy options
|
||||
parser.add_argument(
|
||||
"--clusters",
|
||||
type=int,
|
||||
default=5,
|
||||
help="Target clusters for 'coverage' strategy (default: 5)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--cluster-threshold",
|
||||
type=float,
|
||||
default=0.7,
|
||||
help="Cluster similarity threshold (default: 0.7)"
|
||||
)
|
||||
|
||||
# Output options
|
||||
parser.add_argument(
|
||||
"--output", "-o",
|
||||
help="Save results to JSON file"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--quiet", "-q",
|
||||
action="store_true",
|
||||
help="Suppress iteration output"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--verbose", "-v",
|
||||
action="store_true",
|
||||
help="Enable verbose logging"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Configure logging
|
||||
if args.verbose:
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG,
|
||||
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
else:
|
||||
logging.basicConfig(level=logging.WARNING)
|
||||
|
||||
# Run the demo
|
||||
asyncio.run(run_demo(args))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
269
experiments/novelty_loop/novelty_metrics.py
Normal file
269
experiments/novelty_loop/novelty_metrics.py
Normal file
@@ -0,0 +1,269 @@
|
||||
"""
|
||||
Novelty Metrics Module - Compute novelty scores for generated outputs.
|
||||
|
||||
This module provides embedding-based novelty metrics adapted from the AUT flexibility
|
||||
analysis framework for use in novelty-driven agent loops.
|
||||
|
||||
Key Metrics:
|
||||
- Centroid Distance: Measures how far a new output is from the centroid of previous outputs
|
||||
- Cumulative Novelty: Tracks novelty over the generation sequence
|
||||
- Jump Detection: Identifies significant semantic shifts between consecutive outputs
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import List, Optional
|
||||
import numpy as np
|
||||
|
||||
|
||||
@dataclass
|
||||
class NoveltyScore:
|
||||
"""Result of novelty computation for a single output."""
|
||||
score: float # Main novelty score (0.0 = identical to centroid, 1.0 = maximally distant)
|
||||
distance_from_centroid: float
|
||||
min_distance_to_existing: float # Nearest neighbor distance
|
||||
is_jump: bool # Whether this represents a significant semantic jump
|
||||
jump_magnitude: Optional[float] = None # Similarity to previous output (if applicable)
|
||||
|
||||
|
||||
@dataclass
|
||||
class NoveltyTrajectory:
|
||||
"""Tracks novelty scores over a generation sequence."""
|
||||
scores: List[float] = field(default_factory=list)
|
||||
cumulative_novelty: List[float] = field(default_factory=list)
|
||||
jump_positions: List[int] = field(default_factory=list)
|
||||
centroid_history: List[np.ndarray] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def mean_novelty(self) -> float:
|
||||
"""Average novelty across all outputs."""
|
||||
return float(np.mean(self.scores)) if self.scores else 0.0
|
||||
|
||||
@property
|
||||
def max_novelty(self) -> float:
|
||||
"""Maximum novelty achieved."""
|
||||
return float(max(self.scores)) if self.scores else 0.0
|
||||
|
||||
@property
|
||||
def jump_ratio(self) -> float:
|
||||
"""Proportion of transitions that were jumps."""
|
||||
if len(self.scores) < 2:
|
||||
return 0.0
|
||||
return len(self.jump_positions) / (len(self.scores) - 1)
|
||||
|
||||
@property
|
||||
def final_cumulative_novelty(self) -> float:
|
||||
"""Total accumulated novelty."""
|
||||
return self.cumulative_novelty[-1] if self.cumulative_novelty else 0.0
|
||||
|
||||
|
||||
class NoveltyMetrics:
|
||||
"""
|
||||
Computes novelty metrics for embeddings in a streaming fashion.
|
||||
|
||||
Designed for use in an agent loop where outputs are generated one at a time
|
||||
and we need to assess novelty incrementally.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
similarity_threshold: float = 0.7,
|
||||
jump_detection_enabled: bool = True
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
similarity_threshold: Threshold for semantic similarity (below = jump)
|
||||
jump_detection_enabled: Whether to track semantic jumps
|
||||
"""
|
||||
self.similarity_threshold = similarity_threshold
|
||||
self.jump_detection_enabled = jump_detection_enabled
|
||||
|
||||
# State
|
||||
self.embeddings: List[np.ndarray] = []
|
||||
self.trajectory = NoveltyTrajectory()
|
||||
self._centroid: Optional[np.ndarray] = None
|
||||
|
||||
def reset(self):
|
||||
"""Reset all state for a new generation session."""
|
||||
self.embeddings = []
|
||||
self.trajectory = NoveltyTrajectory()
|
||||
self._centroid = None
|
||||
|
||||
@staticmethod
|
||||
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
|
||||
"""Compute cosine similarity between two vectors."""
|
||||
norm_a = np.linalg.norm(a)
|
||||
norm_b = np.linalg.norm(b)
|
||||
if norm_a == 0 or norm_b == 0:
|
||||
return 0.0
|
||||
return float(np.dot(a, b) / (norm_a * norm_b))
|
||||
|
||||
@staticmethod
|
||||
def cosine_distance(a: np.ndarray, b: np.ndarray) -> float:
|
||||
"""Compute cosine distance (1 - similarity) between two vectors."""
|
||||
return 1.0 - NoveltyMetrics.cosine_similarity(a, b)
|
||||
|
||||
def compute_centroid(self) -> Optional[np.ndarray]:
|
||||
"""Compute centroid of all current embeddings."""
|
||||
if not self.embeddings:
|
||||
return None
|
||||
return np.mean(self.embeddings, axis=0)
|
||||
|
||||
def compute_novelty(self, embedding: np.ndarray) -> NoveltyScore:
|
||||
"""
|
||||
Compute novelty score for a new embedding.
|
||||
|
||||
This does NOT add the embedding to the history - call add_embedding() for that.
|
||||
|
||||
Args:
|
||||
embedding: The embedding vector to evaluate
|
||||
|
||||
Returns:
|
||||
NoveltyScore with computed metrics
|
||||
"""
|
||||
embedding = np.array(embedding)
|
||||
|
||||
# First output is maximally novel (nothing to compare to)
|
||||
if not self.embeddings:
|
||||
return NoveltyScore(
|
||||
score=1.0,
|
||||
distance_from_centroid=1.0,
|
||||
min_distance_to_existing=1.0,
|
||||
is_jump=False,
|
||||
jump_magnitude=None
|
||||
)
|
||||
|
||||
# Distance from centroid (primary novelty metric)
|
||||
centroid = self.compute_centroid()
|
||||
distance_from_centroid = self.cosine_distance(embedding, centroid)
|
||||
|
||||
# Minimum distance to any existing embedding (nearest neighbor)
|
||||
min_distance = min(
|
||||
self.cosine_distance(embedding, existing)
|
||||
for existing in self.embeddings
|
||||
)
|
||||
|
||||
# Jump detection (similarity to previous output)
|
||||
is_jump = False
|
||||
jump_magnitude = None
|
||||
if self.jump_detection_enabled and self.embeddings:
|
||||
similarity_to_prev = self.cosine_similarity(embedding, self.embeddings[-1])
|
||||
jump_magnitude = similarity_to_prev
|
||||
is_jump = similarity_to_prev < self.similarity_threshold
|
||||
|
||||
# Primary novelty score is distance from centroid
|
||||
# Normalized to [0, 1] range where higher = more novel
|
||||
novelty_score = distance_from_centroid
|
||||
|
||||
return NoveltyScore(
|
||||
score=novelty_score,
|
||||
distance_from_centroid=distance_from_centroid,
|
||||
min_distance_to_existing=min_distance,
|
||||
is_jump=is_jump,
|
||||
jump_magnitude=jump_magnitude
|
||||
)
|
||||
|
||||
def add_embedding(self, embedding: np.ndarray, novelty: Optional[NoveltyScore] = None):
|
||||
"""
|
||||
Add an embedding to the history and update trajectory.
|
||||
|
||||
Args:
|
||||
embedding: The embedding to add
|
||||
novelty: Pre-computed novelty score (computed if not provided)
|
||||
"""
|
||||
embedding = np.array(embedding)
|
||||
|
||||
if novelty is None:
|
||||
novelty = self.compute_novelty(embedding)
|
||||
|
||||
# Update state
|
||||
self.embeddings.append(embedding)
|
||||
self._centroid = self.compute_centroid()
|
||||
|
||||
# Update trajectory
|
||||
self.trajectory.scores.append(novelty.score)
|
||||
|
||||
# Cumulative novelty
|
||||
prev_cumulative = self.trajectory.cumulative_novelty[-1] if self.trajectory.cumulative_novelty else 0.0
|
||||
self.trajectory.cumulative_novelty.append(prev_cumulative + novelty.score)
|
||||
|
||||
# Track jumps
|
||||
if novelty.is_jump:
|
||||
self.trajectory.jump_positions.append(len(self.embeddings) - 1)
|
||||
|
||||
# Store centroid history
|
||||
if self._centroid is not None:
|
||||
self.trajectory.centroid_history.append(self._centroid.copy())
|
||||
|
||||
def get_current_state(self) -> dict:
|
||||
"""Get current state as a dictionary for logging/debugging."""
|
||||
return {
|
||||
"num_embeddings": len(self.embeddings),
|
||||
"mean_novelty": self.trajectory.mean_novelty,
|
||||
"max_novelty": self.trajectory.max_novelty,
|
||||
"jump_ratio": self.trajectory.jump_ratio,
|
||||
"cumulative_novelty": self.trajectory.final_cumulative_novelty,
|
||||
"recent_scores": self.trajectory.scores[-5:] if self.trajectory.scores else []
|
||||
}
|
||||
|
||||
|
||||
def compute_batch_novelty(
|
||||
embeddings: List[np.ndarray],
|
||||
reference_embeddings: Optional[List[np.ndarray]] = None
|
||||
) -> List[float]:
|
||||
"""
|
||||
Compute novelty scores for a batch of embeddings.
|
||||
|
||||
Useful for post-hoc analysis of generated outputs.
|
||||
|
||||
Args:
|
||||
embeddings: List of embeddings to evaluate
|
||||
reference_embeddings: Optional reference set (uses self if not provided)
|
||||
|
||||
Returns:
|
||||
List of novelty scores (distance from centroid)
|
||||
"""
|
||||
if not embeddings:
|
||||
return []
|
||||
|
||||
embeddings_arr = np.array(embeddings)
|
||||
|
||||
if reference_embeddings is not None:
|
||||
centroid = np.mean(reference_embeddings, axis=0)
|
||||
else:
|
||||
centroid = np.mean(embeddings_arr, axis=0)
|
||||
|
||||
scores = []
|
||||
for emb in embeddings_arr:
|
||||
distance = NoveltyMetrics.cosine_distance(emb, centroid)
|
||||
scores.append(distance)
|
||||
|
||||
return scores
|
||||
|
||||
|
||||
def find_most_novel(
|
||||
embeddings: List[np.ndarray],
|
||||
texts: List[str],
|
||||
top_k: int = 5
|
||||
) -> List[tuple]:
|
||||
"""
|
||||
Find the most novel outputs from a batch.
|
||||
|
||||
Args:
|
||||
embeddings: List of embeddings
|
||||
texts: Corresponding text outputs
|
||||
top_k: Number of top results to return
|
||||
|
||||
Returns:
|
||||
List of (text, novelty_score, index) tuples, sorted by novelty descending
|
||||
"""
|
||||
scores = compute_batch_novelty(embeddings)
|
||||
|
||||
indexed_results = [
|
||||
(texts[i], scores[i], i)
|
||||
for i in range(len(texts))
|
||||
]
|
||||
|
||||
# Sort by novelty score descending
|
||||
indexed_results.sort(key=lambda x: x[1], reverse=True)
|
||||
|
||||
return indexed_results[:top_k]
|
||||
Reference in New Issue
Block a user