feat: Add experiments framework and novelty-driven agent loop

- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 10:16:21 +08:00
parent 26a56a2a07
commit 43c025e060
81 changed files with 18766 additions and 2 deletions
--- a/experiments/novelty_loop/README.md
+++ b/experiments/novelty_loop/README.md
@@ -0,0 +1,253 @@
+# Novelty-Driven LLM Agent Loop
+
+An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
+
+## Concept
+
+Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
+
+This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
+
+```
+Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
+```
+
+## Research Foundation
+
+This work builds on established research:
+
+- **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
+- **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
+- **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
+- **Open-ended Learning**: Endless innovation through novelty pressure
+
+The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
+
+## Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│              Novelty-Driven Task Generation Loop                 │
+├──────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│   ┌──────────┐                                                   │
+│   │ Seed     │  "Design a better bicycle"                        │
+│   │ Problem  │                                                   │
+│   └────┬─────┘                                                   │
+│        │                                                         │
+│        ▼                                                         │
+│   ┌─────────────────────────────────────────────────────────┐    │
+│   │  WHILE novelty < threshold AND iterations < max:        │    │
+│   │                                                         │    │
+│   │    1. Sample random expert (curated occupations)        │    │
+│   │       e.g., "marine biologist", "choreographer"         │    │
+│   │                                                         │    │
+│   │    2. Generate task from expert perspective             │    │
+│   │       "What task would a {expert} assign to improve     │    │
+│   │        {seed_problem}?"                                 │    │
+│   │                                                         │    │
+│   │    3. Embed task, compute novelty vs. centroid          │    │
+│   │                                                         │    │
+│   │    4. If novelty > threshold → STOP (breakthrough!)     │    │
+│   │                                                         │    │
+│   └─────────────────────────────────────────────────────────┘    │
+│        │                                                         │
+│        ▼                                                         │
+│   ┌──────────┐                                                   │
+│   │ Output:  │  Novel task that "jumped out" of typical space    │
+│   │ Task     │  + trajectory of exploration                      │
+│   └──────────┘                                                   │
+│                                                                  │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+## Installation
+
+The module uses the existing project infrastructure. Ensure you have:
+
+1. **Ollama** running with the required models:
+   ```bash
+   ollama pull qwen3:8b
+   ollama pull qwen3-embedding:4b
+   ```
+
+2. **Python dependencies** (from project root):
+   ```bash
+   cd backend
+   source venv/bin/activate
+   pip install httpx numpy
+   ```
+
+## Quick Start
+
+### Basic Usage
+
+```bash
+cd experiments/novelty_loop
+python demo.py "Improve urban transportation"
+```
+
+### Example Output
+
+```
+Iteration 1
+  Expert: Architect (Architecture & Design)
+  Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
+  Novelty: [████████░░░░░░░░░░░░] 0.1234
+
+Iteration 2
+  Expert: Chef (Culinary)
+  Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
+  Novelty: [███████████░░░░░░░░░] 0.1823
+
+Iteration 3
+  Expert: Marine Biologist (Science)
+  Task: Study fish schooling behavior to develop organic traffic flow algorithms
+  Novelty: [██████████████░░░░░░] 0.3521
+
+Iteration 4
+  Expert: Choreographer (Performing Arts)
+  Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
+  Novelty: [████████████████████] 0.5234
+  ★ BREAKTHROUGH! ★
+```
+
+## Termination Strategies
+
+### 1. Seek Breakthrough (Default)
+
+Stop when novelty exceeds threshold. Finds the first truly novel task.
+
+```bash
+python demo.py "Your problem" --strategy breakthrough --threshold 0.4
+```
+
+### 2. Exhaust Frontier
+
+Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
+
+```bash
+python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
+```
+
+### 3. Coverage Target
+
+Continue until N distinct conceptual clusters are covered. Ensures diversity.
+
+```bash
+python demo.py "Your problem" --strategy coverage --clusters 5
+```
+
+## API Usage
+
+```python
+import asyncio
+from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
+
+async def main():
+    agent = NoveltyDrivenTaskAgent(
+        novelty_threshold=0.4,
+        max_iterations=20,
+        language="en"
+    )
+
+    result = await agent.run("Design a better bicycle")
+
+    print(f"Found breakthrough: {result.breakthrough_task.task}")
+    print(f"Novelty score: {result.breakthrough_task.novelty_score}")
+    print(f"From expert: {result.breakthrough_task.expert}")
+
+    await agent.close()
+
+asyncio.run(main())
+```
+
+## Novelty Metrics
+
+The `novelty_metrics.py` module provides:
+
+- **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
+- **Min Distance**: Distance to nearest neighbor (detect duplicates)
+- **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
+- **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
+
+```python
+from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
+
+metrics = NoveltyMetrics(similarity_threshold=0.7)
+
+# Add embeddings one by one
+for embedding in embeddings:
+    novelty = metrics.compute_novelty(embedding)
+    metrics.add_embedding(embedding, novelty)
+    print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
+
+# Get trajectory stats
+print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
+print(f"Max novelty: {metrics.trajectory.max_novelty}")
+print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
+```
+
+## CLI Options
+
+```
+positional arguments:
+  seed_problem          The seed problem or challenge to explore
+
+options:
+  --strategy {breakthrough,exhaust,coverage}
+                        Termination strategy (default: breakthrough)
+  --threshold, -t       Novelty threshold for breakthrough (default: 0.4)
+  --max-iter, -m        Maximum iterations (default: 20)
+  --language, -l {en,zh}
+                        Language for prompts and experts (default: en)
+  --model               LLM model for task generation (default: qwen3:8b)
+  --embedding-model     Embedding model (default: qwen3-embedding:4b)
+  --temperature         LLM temperature (default: 0.7)
+  --output, -o          Save results to JSON file
+  --quiet, -q           Suppress iteration output
+  --verbose, -v         Enable verbose logging
+```
+
+## File Structure
+
+```
+experiments/novelty_loop/
+├── README.md           # This file
+├── agent.py            # Core NoveltyDrivenTaskAgent and variants
+├── novelty_metrics.py  # Novelty computation utilities
+└── demo.py             # Interactive CLI demo
+```
+
+## Design Decisions
+
+| Question | Decision | Rationale |
+|----------|----------|-----------|
+| Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
+| Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
+| Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
+| Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
+
+## Connection to Main Project
+
+This module integrates with the main novelty-seeking project:
+
+- Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
+- Uses the same **embedding model** (qwen3-embedding:4b)
+- Builds on the **AUT flexibility analysis** metrics for novelty computation
+- Can use **DDC domain data** for alternative perturbation strategies
+
+## Future Work
+
+1. **Hybrid Perturbation**: Combine expert + domain perspectives
+2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
+3. **Semantic Steering**: Guide generation away from centroid direction
+4. **Multi-Agent Exploration**: Parallel agents with different strategies
+5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
+
+## References
+
+- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
+- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
+- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
+- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs
--- a/experiments/novelty_loop/init.py
+++ b/experiments/novelty_loop/init.py
@@ -0,0 +1,42 @@
+"""
+Novelty-Driven LLM Agent Loop
+
+An autonomous agent that generates tasks using novelty as the termination condition.
+"""
+
+from .agent import (
+    NoveltyDrivenTaskAgent,
+    ExhaustFrontierAgent,
+    CoverageTargetAgent,
+    GeneratedTask,
+    TaskGenerationResult,
+    ExpertProvider,
+    DomainProvider,
+)
+
+from .novelty_metrics import (
+    NoveltyMetrics,
+    NoveltyScore,
+    NoveltyTrajectory,
+    compute_batch_novelty,
+    find_most_novel,
+)
+
+__all__ = [
+    # Agents
+    "NoveltyDrivenTaskAgent",
+    "ExhaustFrontierAgent",
+    "CoverageTargetAgent",
+    # Data classes
+    "GeneratedTask",
+    "TaskGenerationResult",
+    "NoveltyScore",
+    "NoveltyTrajectory",
+    # Providers
+    "ExpertProvider",
+    "DomainProvider",
+    # Metrics
+    "NoveltyMetrics",
+    "compute_batch_novelty",
+    "find_most_novel",
+]
--- a/experiments/novelty_loop/agent.py
+++ b/experiments/novelty_loop/agent.py
@@ -0,0 +1,725 @@
+"""
+Novelty-Driven Task Agent - An autonomous agent that generates tasks using novelty as termination condition.
+
+This agent operates in a while loop, generating tasks from diverse expert perspectives,
+and terminates when it finds a task that exceeds the novelty threshold (a "breakthrough").
+
+The core innovation is using novelty assessment to help the agent "jump out" of its
+trained data distribution (semantic gravity), finding truly novel ideas.
+
+Architecture:
+    Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
+
+Termination Strategy: "Seek Breakthrough"
+    - Continue until novelty > threshold
+    - Find the first truly novel task and stop
+
+Research Foundation:
+    - Novelty Search (Lehman & Stanley): Reward novelty, not objectives
+    - Curiosity-driven Exploration (Pathak et al.): Intrinsic motivation via prediction error
+    - Quality-Diversity (MAP-Elites): Maintain diverse high-quality solutions
+"""
+
+import asyncio
+import json
+import logging
+import random
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Callable, List, Optional
+
+import httpx
+import numpy as np
+
+from .novelty_metrics import NoveltyMetrics, NoveltyScore, NoveltyTrajectory
+
+logger = logging.getLogger(__name__)
+
+
+# ============================================================================
+# Data Classes
+# ============================================================================
+
+@dataclass
+class GeneratedTask:
+    """A single generated task with metadata."""
+    task: str
+    expert: str
+    expert_domain: str
+    novelty_score: float
+    iteration: int
+    is_breakthrough: bool = False
+    embedding: Optional[np.ndarray] = None
+
+
+@dataclass
+class TaskGenerationResult:
+    """Result of a complete novelty-driven task generation session."""
+    seed_problem: str
+    breakthrough_task: Optional[GeneratedTask] = None
+    trajectory: List[GeneratedTask] = field(default_factory=list)
+    total_iterations: int = 0
+    terminated_by: str = "unknown"  # "breakthrough", "max_iterations", "error"
+    novelty_trajectory: Optional[NoveltyTrajectory] = None
+    start_time: Optional[str] = None
+    end_time: Optional[str] = None
+    config: dict = field(default_factory=dict)
+
+    def to_dict(self) -> dict:
+        """Convert to dictionary for JSON serialization."""
+        return {
+            "seed_problem": self.seed_problem,
+            "breakthrough_task": {
+                "task": self.breakthrough_task.task,
+                "expert": self.breakthrough_task.expert,
+                "expert_domain": self.breakthrough_task.expert_domain,
+                "novelty_score": self.breakthrough_task.novelty_score,
+                "iteration": self.breakthrough_task.iteration
+            } if self.breakthrough_task else None,
+            "trajectory": [
+                {
+                    "task": t.task,
+                    "expert": t.expert,
+                    "expert_domain": t.expert_domain,
+                    "novelty_score": t.novelty_score,
+                    "iteration": t.iteration,
+                    "is_breakthrough": t.is_breakthrough
+                }
+                for t in self.trajectory
+            ],
+            "total_iterations": self.total_iterations,
+            "terminated_by": self.terminated_by,
+            "novelty_stats": {
+                "mean_novelty": self.novelty_trajectory.mean_novelty if self.novelty_trajectory else 0,
+                "max_novelty": self.novelty_trajectory.max_novelty if self.novelty_trajectory else 0,
+                "jump_ratio": self.novelty_trajectory.jump_ratio if self.novelty_trajectory else 0,
+                "cumulative_novelty": self.novelty_trajectory.final_cumulative_novelty if self.novelty_trajectory else 0
+            },
+            "start_time": self.start_time,
+            "end_time": self.end_time,
+            "config": self.config
+        }
+
+
+# ============================================================================
+# Expert/Domain Providers
+# ============================================================================
+
+class ExpertProvider:
+    """Provides random experts from curated occupation lists."""
+
+    def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
+        """
+        Args:
+            data_dir: Path to data directory containing occupation JSON files
+            language: Language code ("en" or "zh")
+        """
+        if data_dir is None:
+            # Default to backend data directory
+            data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
+
+        self.data_dir = data_dir
+        self.language = language
+        self._occupations: List[dict] = []
+        self._load_occupations()
+
+    def _load_occupations(self):
+        """Load occupations from JSON file."""
+        file_path = self.data_dir / f"curated_occupations_{self.language}.json"
+
+        if not file_path.exists():
+            logger.warning(f"Occupation file not found: {file_path}")
+            # Fallback to some default experts
+            self._occupations = [
+                {"name": "Marine Biologist", "domain": "Science"},
+                {"name": "Choreographer", "domain": "Arts"},
+                {"name": "Urban Planner", "domain": "Architecture"},
+                {"name": "Chef", "domain": "Culinary"},
+                {"name": "Astronomer", "domain": "Science"},
+            ]
+            return
+
+        try:
+            with open(file_path, "r", encoding="utf-8") as f:
+                data = json.load(f)
+            self._occupations = data.get("occupations", [])
+            logger.info(f"Loaded {len(self._occupations)} occupations from {file_path.name}")
+        except Exception as e:
+            logger.error(f"Error loading occupations: {e}")
+            self._occupations = []
+
+    def get_random_expert(self) -> dict:
+        """Get a random expert with name and domain."""
+        if not self._occupations:
+            return {"name": "Expert", "domain": "General"}
+        return random.choice(self._occupations)
+
+    def get_random_experts(self, count: int) -> List[dict]:
+        """Get multiple random experts without replacement."""
+        if len(self._occupations) <= count:
+            return self._occupations.copy()
+        return random.sample(self._occupations, count)
+
+
+class DomainProvider:
+    """Provides random knowledge domains from DDC classification."""
+
+    def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
+        if data_dir is None:
+            data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
+
+        self.data_dir = data_dir
+        self.language = language
+        self._domains: List[dict] = []
+        self._load_domains()
+
+    def _load_domains(self):
+        """Load domains from JSON file."""
+        file_path = self.data_dir / f"ddc_domains_{self.language}.json"
+
+        if not file_path.exists():
+            logger.warning(f"Domain file not found: {file_path}")
+            self._domains = []
+            return
+
+        try:
+            with open(file_path, "r", encoding="utf-8") as f:
+                data = json.load(f)
+            self._domains = data.get("domains", [])
+            logger.info(f"Loaded {len(self._domains)} domains from {file_path.name}")
+        except Exception as e:
+            logger.error(f"Error loading domains: {e}")
+
+    def get_random_domain(self, level: Optional[str] = None) -> dict:
+        """Get a random domain, optionally filtered by level."""
+        domains = self._domains
+        if level:
+            domains = [d for d in domains if d.get("level") == level]
+
+        if not domains:
+            return {"name": "General Knowledge", "code": "000"}
+        return random.choice(domains)
+
+
+# ============================================================================
+# Novelty-Driven Task Agent
+# ============================================================================
+
+class NoveltyDrivenTaskAgent:
+    """
+    An autonomous agent that generates tasks using novelty as the termination condition.
+
+    The agent operates in a loop:
+    1. Sample a random expert perspective
+    2. Generate a task from that expert's viewpoint
+    3. Compute the task's novelty (distance from centroid of previous tasks)
+    4. If novelty > threshold → STOP (found breakthrough!)
+    5. Otherwise → Continue with next expert
+
+    Example:
+        agent = NoveltyDrivenTaskAgent(novelty_threshold=0.4)
+        result = await agent.run("Improve urban transportation")
+
+        # result.breakthrough_task contains the novel task found
+        # result.trajectory shows the exploration path
+    """
+
+    def __init__(
+        self,
+        novelty_threshold: float = 0.4,
+        max_iterations: int = 20,
+        ollama_base_url: str = "http://localhost:11435",
+        llm_model: str = "qwen3:8b",
+        embedding_model: str = "qwen3-embedding:4b",
+        language: str = "en",
+        data_dir: Optional[Path] = None,
+        on_iteration: Optional[Callable[[GeneratedTask], None]] = None,
+        temperature: float = 0.7
+    ):
+        """
+        Args:
+            novelty_threshold: Novelty score threshold for breakthrough (0.0-1.0)
+            max_iterations: Maximum iterations before stopping
+            ollama_base_url: Ollama API endpoint
+            llm_model: Model for task generation
+            embedding_model: Model for embeddings
+            language: Language for prompts and experts ("en" or "zh")
+            data_dir: Path to data directory for expert/domain files
+            on_iteration: Callback function called after each iteration
+            temperature: LLM temperature for generation
+        """
+        self.novelty_threshold = novelty_threshold
+        self.max_iterations = max_iterations
+        self.ollama_base_url = ollama_base_url
+        self.llm_model = llm_model
+        self.embedding_model = embedding_model
+        self.language = language
+        self.temperature = temperature
+        self.on_iteration = on_iteration
+
+        # Initialize providers
+        self.expert_provider = ExpertProvider(data_dir, language)
+        self.domain_provider = DomainProvider(data_dir, language)
+
+        # Initialize novelty metrics
+        self.novelty_metrics = NoveltyMetrics(
+            similarity_threshold=0.7,
+            jump_detection_enabled=True
+        )
+
+        # HTTP client
+        self._client: Optional[httpx.AsyncClient] = None
+
+    async def _get_client(self) -> httpx.AsyncClient:
+        """Get or create HTTP client."""
+        if self._client is None:
+            self._client = httpx.AsyncClient(timeout=120.0)
+        return self._client
+
+    async def close(self):
+        """Close HTTP client."""
+        if self._client is not None:
+            await self._client.aclose()
+            self._client = None
+
+    async def _generate_text(self, prompt: str) -> str:
+        """Generate text using Ollama LLM."""
+        client = await self._get_client()
+        url = f"{self.ollama_base_url}/api/generate"
+
+        # Add /no_think prefix for qwen models to disable thinking
+        if self.llm_model.lower().startswith("qwen"):
+            prompt = f"/no_think\n{prompt}"
+
+        try:
+            response = await client.post(url, json={
+                "model": self.llm_model,
+                "prompt": prompt,
+                "stream": False,
+                "options": {
+                    "temperature": self.temperature
+                }
+            })
+            response.raise_for_status()
+            result = response.json()
+            return result.get("response", "").strip()
+        except Exception as e:
+            logger.error(f"LLM generation error: {e}")
+            raise
+
+    async def _get_embedding(self, text: str) -> np.ndarray:
+        """Get embedding vector for text."""
+        client = await self._get_client()
+        url = f"{self.ollama_base_url}/api/embed"
+
+        try:
+            response = await client.post(url, json={
+                "model": self.embedding_model,
+                "input": text
+            })
+            response.raise_for_status()
+            result = response.json()
+            return np.array(result["embeddings"][0])
+        except Exception as e:
+            logger.error(f"Embedding error: {e}")
+            raise
+
+    def _build_task_prompt(
+        self,
+        seed_problem: str,
+        expert: dict,
+        previous_tasks: List[str]
+    ) -> str:
+        """Build the prompt for task generation."""
+        expert_name = expert.get("name", "Expert")
+        expert_domain = expert.get("domain", "General")
+
+        # Build context from previous tasks (if any)
+        context = ""
+        if previous_tasks:
+            recent = previous_tasks[-3:]  # Last 3 tasks
+            context = "\n\nPrevious suggestions (generate something DIFFERENT):\n"
+            for t in recent:
+                context += f"- {t}\n"
+
+        if self.language == "zh":
+            prompt = f"""你是一位 {expert_name}（{expert_domain}）。
+
+给定问题：{seed_problem}
+
+请从你的专业角度出发，提出一个独特的改进任务或探索方向。
+这个任务应该结合你的专业知识，提供一个非传统但有价值的视角。
+{context}
+请直接给出任务描述，不要添加解释。任务应该具体、可行、且与众不同。
+
+任务："""
+        else:
+            prompt = f"""You are a {expert_name} ({expert_domain}).
+
+Given problem: {seed_problem}
+
+From your professional perspective, propose a unique task or exploration direction to improve or innovate on this problem.
+The task should leverage your domain expertise to provide an unconventional but valuable angle.
+{context}
+Provide just the task description without explanation. The task should be specific, actionable, and distinctive.
+
+Task:"""
+
+        return prompt
+
+    async def _generate_task(
+        self,
+        seed_problem: str,
+        expert: dict,
+        previous_tasks: List[str]
+    ) -> str:
+        """Generate a task from an expert's perspective."""
+        prompt = self._build_task_prompt(seed_problem, expert, previous_tasks)
+        task = await self._generate_text(prompt)
+
+        # Clean up the response
+        task = task.strip()
+        # Remove common prefixes
+        for prefix in ["Task:", "任务:", "Here's", "I suggest", "Based on"]:
+            if task.lower().startswith(prefix.lower()):
+                task = task[len(prefix):].strip()
+
+        return task
+
+    async def run(
+        self,
+        seed_problem: str,
+        used_experts: Optional[List[dict]] = None
+    ) -> TaskGenerationResult:
+        """
+        Run the novelty-driven task generation loop.
+
+        Args:
+            seed_problem: The initial problem/challenge to explore
+            used_experts: Optional list of experts to avoid (for multi-run scenarios)
+
+        Returns:
+            TaskGenerationResult with breakthrough task (if found) and full trajectory
+        """
+        # Reset state
+        self.novelty_metrics.reset()
+
+        result = TaskGenerationResult(
+            seed_problem=seed_problem,
+            start_time=datetime.now(timezone.utc).isoformat(),
+            config={
+                "novelty_threshold": self.novelty_threshold,
+                "max_iterations": self.max_iterations,
+                "llm_model": self.llm_model,
+                "embedding_model": self.embedding_model,
+                "language": self.language
+            }
+        )
+
+        used_expert_names = set()
+        if used_experts:
+            used_expert_names = {e["name"] for e in used_experts}
+
+        previous_tasks: List[str] = []
+
+        logger.info(f"Starting novelty loop: '{seed_problem}' (threshold={self.novelty_threshold})")
+
+        try:
+            for iteration in range(self.max_iterations):
+                # 1. Sample a random expert (avoid duplicates)
+                attempts = 0
+                expert = self.expert_provider.get_random_expert()
+                while expert["name"] in used_expert_names and attempts < 10:
+                    expert = self.expert_provider.get_random_expert()
+                    attempts += 1
+                used_expert_names.add(expert["name"])
+
+                logger.info(f"Iteration {iteration + 1}: Expert = {expert['name']} ({expert['domain']})")
+
+                # 2. Generate task
+                task = await self._generate_task(seed_problem, expert, previous_tasks)
+                previous_tasks.append(task)
+
+                # 3. Get embedding
+                embedding = await self._get_embedding(task)
+
+                # 4. Compute novelty
+                novelty = self.novelty_metrics.compute_novelty(embedding)
+                self.novelty_metrics.add_embedding(embedding, novelty)
+
+                # 5. Create task record
+                generated_task = GeneratedTask(
+                    task=task,
+                    expert=expert["name"],
+                    expert_domain=expert["domain"],
+                    novelty_score=novelty.score,
+                    iteration=iteration + 1,
+                    is_breakthrough=novelty.score > self.novelty_threshold,
+                    embedding=embedding
+                )
+                result.trajectory.append(generated_task)
+
+                logger.info(f"  Task: {task[:80]}...")
+                logger.info(f"  Novelty: {novelty.score:.4f} (threshold: {self.novelty_threshold})")
+
+                # Callback
+                if self.on_iteration:
+                    self.on_iteration(generated_task)
+
+                # 6. Check for breakthrough
+                if novelty.score > self.novelty_threshold:
+                    result.breakthrough_task = generated_task
+                    result.terminated_by = "breakthrough"
+                    result.total_iterations = iteration + 1
+                    logger.info(f"  BREAKTHROUGH! Stopping after {iteration + 1} iterations")
+                    break
+
+            else:
+                # Max iterations reached without breakthrough
+                result.terminated_by = "max_iterations"
+                result.total_iterations = self.max_iterations
+                logger.info(f"Max iterations ({self.max_iterations}) reached without breakthrough")
+
+                # Find the most novel task as a fallback
+                if result.trajectory:
+                    best_task = max(result.trajectory, key=lambda t: t.novelty_score)
+                    best_task.is_breakthrough = True  # Mark as best found
+                    result.breakthrough_task = best_task
+
+        except Exception as e:
+            logger.error(f"Error during generation: {e}")
+            result.terminated_by = f"error: {str(e)}"
+            result.total_iterations = len(result.trajectory)
+
+        # Finalize
+        result.end_time = datetime.now(timezone.utc).isoformat()
+        result.novelty_trajectory = self.novelty_metrics.trajectory
+
+        return result
+
+
+# ============================================================================
+# Alternative Termination Strategies
+# ============================================================================
+
+class ExhaustFrontierAgent(NoveltyDrivenTaskAgent):
+    """
+    Alternative strategy: Continue while novelty is high, stop when it drops.
+
+    This explores the "novelty frontier" more thoroughly, finding multiple novel
+    ideas before stopping when exploration becomes repetitive.
+    """
+
+    def __init__(
+        self,
+        exhaustion_threshold: float = 0.15,
+        window_size: int = 3,
+        min_iterations: int = 5,
+        **kwargs
+    ):
+        """
+        Args:
+            exhaustion_threshold: Stop when recent average novelty drops below this
+            window_size: Number of recent iterations to average
+            min_iterations: Minimum iterations before checking exhaustion
+            **kwargs: Passed to parent class
+        """
+        super().__init__(**kwargs)
+        self.exhaustion_threshold = exhaustion_threshold
+        self.window_size = window_size
+        self.min_iterations = min_iterations
+
+    async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
+        """Override to use exhaustion-based termination."""
+        # Reset state
+        self.novelty_metrics.reset()
+
+        result = TaskGenerationResult(
+            seed_problem=seed_problem,
+            start_time=datetime.now(timezone.utc).isoformat(),
+            config={
+                "strategy": "exhaust_frontier",
+                "exhaustion_threshold": self.exhaustion_threshold,
+                "window_size": self.window_size,
+                "min_iterations": self.min_iterations,
+                "max_iterations": self.max_iterations,
+                "llm_model": self.llm_model
+            }
+        )
+
+        used_expert_names = set()
+        previous_tasks: List[str] = []
+        novelty_history: List[float] = []
+
+        try:
+            for iteration in range(self.max_iterations):
+                # Sample expert
+                expert = self.expert_provider.get_random_expert()
+                while expert["name"] in used_expert_names and len(used_expert_names) < 200:
+                    expert = self.expert_provider.get_random_expert()
+                used_expert_names.add(expert["name"])
+
+                # Generate and evaluate
+                task = await self._generate_task(seed_problem, expert, previous_tasks)
+                previous_tasks.append(task)
+                embedding = await self._get_embedding(task)
+                novelty = self.novelty_metrics.compute_novelty(embedding)
+                self.novelty_metrics.add_embedding(embedding, novelty)
+
+                novelty_history.append(novelty.score)
+
+                generated_task = GeneratedTask(
+                    task=task,
+                    expert=expert["name"],
+                    expert_domain=expert["domain"],
+                    novelty_score=novelty.score,
+                    iteration=iteration + 1
+                )
+                result.trajectory.append(generated_task)
+
+                if self.on_iteration:
+                    self.on_iteration(generated_task)
+
+                # Check exhaustion condition
+                if iteration >= self.min_iterations:
+                    recent_avg = np.mean(novelty_history[-self.window_size:])
+                    if recent_avg < self.exhaustion_threshold:
+                        result.terminated_by = f"exhaustion (avg={recent_avg:.3f})"
+                        result.total_iterations = iteration + 1
+                        break
+
+            else:
+                result.terminated_by = "max_iterations"
+                result.total_iterations = self.max_iterations
+
+            # Find all "novel" tasks
+            novel_tasks = [t for t in result.trajectory if t.novelty_score > self.exhaustion_threshold]
+            if novel_tasks:
+                result.breakthrough_task = max(novel_tasks, key=lambda t: t.novelty_score)
+                result.breakthrough_task.is_breakthrough = True
+
+        except Exception as e:
+            result.terminated_by = f"error: {str(e)}"
+            result.total_iterations = len(result.trajectory)
+
+        result.end_time = datetime.now(timezone.utc).isoformat()
+        result.novelty_trajectory = self.novelty_metrics.trajectory
+
+        return result
+
+
+class CoverageTargetAgent(NoveltyDrivenTaskAgent):
+    """
+    Alternative strategy: Continue until N distinct clusters are covered.
+
+    This ensures a diverse portfolio of ideas across different conceptual areas.
+    """
+
+    def __init__(
+        self,
+        target_clusters: int = 5,
+        cluster_threshold: float = 0.7,
+        **kwargs
+    ):
+        """
+        Args:
+            target_clusters: Target number of distinct clusters to find
+            cluster_threshold: Similarity threshold for cluster membership
+            **kwargs: Passed to parent class
+        """
+        super().__init__(**kwargs)
+        self.target_clusters = target_clusters
+        self.cluster_threshold = cluster_threshold
+
+    def _count_clusters(self, embeddings: List[np.ndarray]) -> int:
+        """Count distinct clusters using greedy clustering."""
+        if not embeddings:
+            return 0
+
+        clusters = []
+        for emb in embeddings:
+            found_cluster = False
+            for cluster_centroid in clusters:
+                similarity = NoveltyMetrics.cosine_similarity(emb, cluster_centroid)
+                if similarity >= self.cluster_threshold:
+                    found_cluster = True
+                    break
+
+            if not found_cluster:
+                clusters.append(emb)
+
+        return len(clusters)
+
+    async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
+        """Override to use coverage-based termination."""
+        self.novelty_metrics.reset()
+
+        result = TaskGenerationResult(
+            seed_problem=seed_problem,
+            start_time=datetime.now(timezone.utc).isoformat(),
+            config={
+                "strategy": "coverage_target",
+                "target_clusters": self.target_clusters,
+                "cluster_threshold": self.cluster_threshold,
+                "max_iterations": self.max_iterations
+            }
+        )
+
+        used_expert_names = set()
+        previous_tasks: List[str] = []
+        all_embeddings: List[np.ndarray] = []
+
+        try:
+            for iteration in range(self.max_iterations):
+                expert = self.expert_provider.get_random_expert()
+                while expert["name"] in used_expert_names and len(used_expert_names) < 200:
+                    expert = self.expert_provider.get_random_expert()
+                used_expert_names.add(expert["name"])
+
+                task = await self._generate_task(seed_problem, expert, previous_tasks)
+                previous_tasks.append(task)
+                embedding = await self._get_embedding(task)
+                all_embeddings.append(embedding)
+
+                novelty = self.novelty_metrics.compute_novelty(embedding)
+                self.novelty_metrics.add_embedding(embedding, novelty)
+
+                generated_task = GeneratedTask(
+                    task=task,
+                    expert=expert["name"],
+                    expert_domain=expert["domain"],
+                    novelty_score=novelty.score,
+                    iteration=iteration + 1
+                )
+                result.trajectory.append(generated_task)
+
+                if self.on_iteration:
+                    self.on_iteration(generated_task)
+
+                # Check coverage
+                cluster_count = self._count_clusters(all_embeddings)
+                if cluster_count >= self.target_clusters:
+                    result.terminated_by = f"coverage ({cluster_count} clusters)"
+                    result.total_iterations = iteration + 1
+                    break
+
+            else:
+                final_clusters = self._count_clusters(all_embeddings)
+                result.terminated_by = f"max_iterations ({final_clusters} clusters)"
+                result.total_iterations = self.max_iterations
+
+            # Find most novel task
+            if result.trajectory:
+                best_task = max(result.trajectory, key=lambda t: t.novelty_score)
+                best_task.is_breakthrough = True
+                result.breakthrough_task = best_task
+
+        except Exception as e:
+            result.terminated_by = f"error: {str(e)}"
+            result.total_iterations = len(result.trajectory)
+
+        result.end_time = datetime.now(timezone.utc).isoformat()
+        result.novelty_trajectory = self.novelty_metrics.trajectory
+
+        return result
--- a/experiments/novelty_loop/demo.py
+++ b/experiments/novelty_loop/demo.py
@@ -0,0 +1,313 @@
+#!/usr/bin/env python3
+"""
+Novelty-Driven Task Generation Demo
+
+Interactive CLI for exploring the novelty-driven task generation agent.
+
+Examples:
+    # Basic usage with default settings
+    python demo.py "Improve urban transportation"
+
+    # Custom threshold and iterations
+    python demo.py "Design a better bicycle" --threshold 0.35 --max-iter 15
+
+    # Use Chinese language
+    python demo.py "改进城市交通" --language zh
+
+    # Use exhaustion strategy (explore until stuck)
+    python demo.py "Sustainable energy solutions" --strategy exhaust
+
+    # Use coverage strategy (find N distinct clusters)
+    python demo.py "Future of education" --strategy coverage --clusters 5
+
+    # Save results to file
+    python demo.py "Smart home innovations" --output results.json
+
+    # Verbose mode with detailed logging
+    python demo.py "Healthcare improvements" --verbose
+"""
+
+import argparse
+import asyncio
+import json
+import logging
+import sys
+from datetime import datetime
+from pathlib import Path
+
+# Add parent directory to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+
+from experiments.novelty_loop.agent import (
+    NoveltyDrivenTaskAgent,
+    ExhaustFrontierAgent,
+    CoverageTargetAgent,
+    GeneratedTask,
+    TaskGenerationResult
+)
+
+# ANSI color codes for terminal output
+class Colors:
+    HEADER = '\033[95m'
+    BLUE = '\033[94m'
+    CYAN = '\033[96m'
+    GREEN = '\033[92m'
+    YELLOW = '\033[93m'
+    RED = '\033[91m'
+    BOLD = '\033[1m'
+    UNDERLINE = '\033[4m'
+    END = '\033[0m'
+
+
+def print_header(text: str):
+    """Print a styled header."""
+    print(f"\n{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}")
+    print(f"{Colors.BOLD}{Colors.HEADER}{text.center(60)}{Colors.END}")
+    print(f"{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}\n")
+
+
+def print_iteration(task: GeneratedTask):
+    """Print iteration result with colors."""
+    status_color = Colors.GREEN if task.is_breakthrough else Colors.CYAN
+
+    print(f"\n{Colors.BOLD}Iteration {task.iteration}{Colors.END}")
+    print(f"  {Colors.YELLOW}Expert:{Colors.END} {task.expert} ({task.expert_domain})")
+    print(f"  {Colors.YELLOW}Task:{Colors.END} {task.task}")
+
+    novelty_bar = "█" * int(task.novelty_score * 20) + "░" * (20 - int(task.novelty_score * 20))
+    print(f"  {Colors.YELLOW}Novelty:{Colors.END} [{novelty_bar}] {task.novelty_score:.4f}")
+
+    if task.is_breakthrough:
+        print(f"  {Colors.GREEN}{Colors.BOLD}★ BREAKTHROUGH! ★{Colors.END}")
+
+
+def print_result(result: TaskGenerationResult):
+    """Print final result summary."""
+    print_header("RESULTS")
+
+    print(f"{Colors.BOLD}Seed Problem:{Colors.END} {result.seed_problem}")
+    print(f"{Colors.BOLD}Total Iterations:{Colors.END} {result.total_iterations}")
+    print(f"{Colors.BOLD}Terminated By:{Colors.END} {result.terminated_by}")
+
+    if result.novelty_trajectory:
+        print(f"\n{Colors.BOLD}Novelty Statistics:{Colors.END}")
+        print(f"  Mean Novelty: {result.novelty_trajectory.mean_novelty:.4f}")
+        print(f"  Max Novelty:  {result.novelty_trajectory.max_novelty:.4f}")
+        print(f"  Jump Ratio:   {result.novelty_trajectory.jump_ratio:.2%}")
+
+    if result.breakthrough_task:
+        print(f"\n{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
+        print(f"{Colors.GREEN}{Colors.BOLD}BREAKTHROUGH TASK{Colors.END}")
+        print(f"{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
+        print(f"\n{Colors.BOLD}Expert:{Colors.END} {result.breakthrough_task.expert}")
+        print(f"{Colors.BOLD}Domain:{Colors.END} {result.breakthrough_task.expert_domain}")
+        print(f"{Colors.BOLD}Task:{Colors.END}")
+        print(f"  {Colors.CYAN}{result.breakthrough_task.task}{Colors.END}")
+        print(f"\n{Colors.BOLD}Novelty Score:{Colors.END} {result.breakthrough_task.novelty_score:.4f}")
+        print(f"{Colors.BOLD}Found at Iteration:{Colors.END} {result.breakthrough_task.iteration}")
+
+    # Show trajectory summary
+    print(f"\n{Colors.BOLD}Exploration Trajectory:{Colors.END}")
+    for task in result.trajectory:
+        marker = "★" if task.is_breakthrough else "○"
+        novelty_indicator = "█" * int(task.novelty_score * 10)
+        print(f"  {marker} [{task.iteration:2d}] {task.expert:20s} | {novelty_indicator:10s} {task.novelty_score:.3f}")
+
+
+def save_result(result: TaskGenerationResult, output_path: str):
+    """Save result to JSON file."""
+    with open(output_path, "w", encoding="utf-8") as f:
+        json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
+    print(f"\n{Colors.GREEN}Results saved to: {output_path}{Colors.END}")
+
+
+async def run_demo(args):
+    """Run the novelty-driven task generation demo."""
+
+    print_header("NOVELTY-DRIVEN TASK GENERATION")
+
+    print(f"{Colors.BOLD}Configuration:{Colors.END}")
+    print(f"  Seed Problem: {args.seed_problem}")
+    print(f"  Strategy: {args.strategy}")
+    print(f"  Novelty Threshold: {args.threshold}")
+    print(f"  Max Iterations: {args.max_iter}")
+    print(f"  Language: {args.language}")
+    print(f"  LLM Model: {args.model}")
+
+    # Create appropriate agent based on strategy
+    common_kwargs = {
+        "max_iterations": args.max_iter,
+        "llm_model": args.model,
+        "embedding_model": args.embedding_model,
+        "language": args.language,
+        "temperature": args.temperature,
+        "on_iteration": print_iteration if not args.quiet else None
+    }
+
+    if args.strategy == "breakthrough":
+        agent = NoveltyDrivenTaskAgent(
+            novelty_threshold=args.threshold,
+            **common_kwargs
+        )
+    elif args.strategy == "exhaust":
+        agent = ExhaustFrontierAgent(
+            exhaustion_threshold=args.exhaust_threshold,
+            window_size=args.window_size,
+            min_iterations=args.min_iter,
+            **common_kwargs
+        )
+    elif args.strategy == "coverage":
+        agent = CoverageTargetAgent(
+            target_clusters=args.clusters,
+            cluster_threshold=args.cluster_threshold,
+            **common_kwargs
+        )
+    else:
+        print(f"{Colors.RED}Unknown strategy: {args.strategy}{Colors.END}")
+        return
+
+    print(f"\n{Colors.BOLD}Starting generation loop...{Colors.END}")
+    print("-" * 60)
+
+    try:
+        result = await agent.run(args.seed_problem)
+        print_result(result)
+
+        if args.output:
+            save_result(result, args.output)
+
+    except Exception as e:
+        print(f"\n{Colors.RED}Error: {e}{Colors.END}")
+        if args.verbose:
+            import traceback
+            traceback.print_exc()
+    finally:
+        await agent.close()
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Novelty-Driven Task Generation Demo",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=__doc__
+    )
+
+    # Required argument
+    parser.add_argument(
+        "seed_problem",
+        help="The seed problem or challenge to explore"
+    )
+
+    # Strategy selection
+    parser.add_argument(
+        "--strategy", "-s",
+        choices=["breakthrough", "exhaust", "coverage"],
+        default="breakthrough",
+        help="Termination strategy (default: breakthrough)"
+    )
+
+    # Common options
+    parser.add_argument(
+        "--threshold", "-t",
+        type=float,
+        default=0.4,
+        help="Novelty threshold for breakthrough (default: 0.4)"
+    )
+    parser.add_argument(
+        "--max-iter", "-m",
+        type=int,
+        default=20,
+        help="Maximum iterations (default: 20)"
+    )
+    parser.add_argument(
+        "--language", "-l",
+        choices=["en", "zh"],
+        default="en",
+        help="Language for prompts and experts (default: en)"
+    )
+
+    # Model options
+    parser.add_argument(
+        "--model",
+        default="qwen3:8b",
+        help="LLM model for task generation (default: qwen3:8b)"
+    )
+    parser.add_argument(
+        "--embedding-model",
+        default="qwen3-embedding:4b",
+        help="Embedding model (default: qwen3-embedding:4b)"
+    )
+    parser.add_argument(
+        "--temperature",
+        type=float,
+        default=0.7,
+        help="LLM temperature (default: 0.7)"
+    )
+
+    # Exhaust strategy options
+    parser.add_argument(
+        "--exhaust-threshold",
+        type=float,
+        default=0.15,
+        help="Exhaustion threshold for 'exhaust' strategy (default: 0.15)"
+    )
+    parser.add_argument(
+        "--window-size",
+        type=int,
+        default=3,
+        help="Window size for exhaustion check (default: 3)"
+    )
+    parser.add_argument(
+        "--min-iter",
+        type=int,
+        default=5,
+        help="Minimum iterations before exhaustion check (default: 5)"
+    )
+
+    # Coverage strategy options
+    parser.add_argument(
+        "--clusters",
+        type=int,
+        default=5,
+        help="Target clusters for 'coverage' strategy (default: 5)"
+    )
+    parser.add_argument(
+        "--cluster-threshold",
+        type=float,
+        default=0.7,
+        help="Cluster similarity threshold (default: 0.7)"
+    )
+
+    # Output options
+    parser.add_argument(
+        "--output", "-o",
+        help="Save results to JSON file"
+    )
+    parser.add_argument(
+        "--quiet", "-q",
+        action="store_true",
+        help="Suppress iteration output"
+    )
+    parser.add_argument(
+        "--verbose", "-v",
+        action="store_true",
+        help="Enable verbose logging"
+    )
+
+    args = parser.parse_args()
+
+    # Configure logging
+    if args.verbose:
+        logging.basicConfig(
+            level=logging.DEBUG,
+            format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+        )
+    else:
+        logging.basicConfig(level=logging.WARNING)
+
+    # Run the demo
+    asyncio.run(run_demo(args))
+
+
+if __name__ == "__main__":
+    main()
--- a/experiments/novelty_loop/novelty_metrics.py
+++ b/experiments/novelty_loop/novelty_metrics.py
@@ -0,0 +1,269 @@
+"""
+Novelty Metrics Module - Compute novelty scores for generated outputs.
+
+This module provides embedding-based novelty metrics adapted from the AUT flexibility
+analysis framework for use in novelty-driven agent loops.
+
+Key Metrics:
+- Centroid Distance: Measures how far a new output is from the centroid of previous outputs
+- Cumulative Novelty: Tracks novelty over the generation sequence
+- Jump Detection: Identifies significant semantic shifts between consecutive outputs
+"""
+
+from dataclasses import dataclass, field
+from typing import List, Optional
+import numpy as np
+
+
+@dataclass
+class NoveltyScore:
+    """Result of novelty computation for a single output."""
+    score: float  # Main novelty score (0.0 = identical to centroid, 1.0 = maximally distant)
+    distance_from_centroid: float
+    min_distance_to_existing: float  # Nearest neighbor distance
+    is_jump: bool  # Whether this represents a significant semantic jump
+    jump_magnitude: Optional[float] = None  # Similarity to previous output (if applicable)
+
+
+@dataclass
+class NoveltyTrajectory:
+    """Tracks novelty scores over a generation sequence."""
+    scores: List[float] = field(default_factory=list)
+    cumulative_novelty: List[float] = field(default_factory=list)
+    jump_positions: List[int] = field(default_factory=list)
+    centroid_history: List[np.ndarray] = field(default_factory=list)
+
+    @property
+    def mean_novelty(self) -> float:
+        """Average novelty across all outputs."""
+        return float(np.mean(self.scores)) if self.scores else 0.0
+
+    @property
+    def max_novelty(self) -> float:
+        """Maximum novelty achieved."""
+        return float(max(self.scores)) if self.scores else 0.0
+
+    @property
+    def jump_ratio(self) -> float:
+        """Proportion of transitions that were jumps."""
+        if len(self.scores) < 2:
+            return 0.0
+        return len(self.jump_positions) / (len(self.scores) - 1)
+
+    @property
+    def final_cumulative_novelty(self) -> float:
+        """Total accumulated novelty."""
+        return self.cumulative_novelty[-1] if self.cumulative_novelty else 0.0
+
+
+class NoveltyMetrics:
+    """
+    Computes novelty metrics for embeddings in a streaming fashion.
+
+    Designed for use in an agent loop where outputs are generated one at a time
+    and we need to assess novelty incrementally.
+    """
+
+    def __init__(
+        self,
+        similarity_threshold: float = 0.7,
+        jump_detection_enabled: bool = True
+    ):
+        """
+        Args:
+            similarity_threshold: Threshold for semantic similarity (below = jump)
+            jump_detection_enabled: Whether to track semantic jumps
+        """
+        self.similarity_threshold = similarity_threshold
+        self.jump_detection_enabled = jump_detection_enabled
+
+        # State
+        self.embeddings: List[np.ndarray] = []
+        self.trajectory = NoveltyTrajectory()
+        self._centroid: Optional[np.ndarray] = None
+
+    def reset(self):
+        """Reset all state for a new generation session."""
+        self.embeddings = []
+        self.trajectory = NoveltyTrajectory()
+        self._centroid = None
+
+    @staticmethod
+    def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
+        """Compute cosine similarity between two vectors."""
+        norm_a = np.linalg.norm(a)
+        norm_b = np.linalg.norm(b)
+        if norm_a == 0 or norm_b == 0:
+            return 0.0
+        return float(np.dot(a, b) / (norm_a * norm_b))
+
+    @staticmethod
+    def cosine_distance(a: np.ndarray, b: np.ndarray) -> float:
+        """Compute cosine distance (1 - similarity) between two vectors."""
+        return 1.0 - NoveltyMetrics.cosine_similarity(a, b)
+
+    def compute_centroid(self) -> Optional[np.ndarray]:
+        """Compute centroid of all current embeddings."""
+        if not self.embeddings:
+            return None
+        return np.mean(self.embeddings, axis=0)
+
+    def compute_novelty(self, embedding: np.ndarray) -> NoveltyScore:
+        """
+        Compute novelty score for a new embedding.
+
+        This does NOT add the embedding to the history - call add_embedding() for that.
+
+        Args:
+            embedding: The embedding vector to evaluate
+
+        Returns:
+            NoveltyScore with computed metrics
+        """
+        embedding = np.array(embedding)
+
+        # First output is maximally novel (nothing to compare to)
+        if not self.embeddings:
+            return NoveltyScore(
+                score=1.0,
+                distance_from_centroid=1.0,
+                min_distance_to_existing=1.0,
+                is_jump=False,
+                jump_magnitude=None
+            )
+
+        # Distance from centroid (primary novelty metric)
+        centroid = self.compute_centroid()
+        distance_from_centroid = self.cosine_distance(embedding, centroid)
+
+        # Minimum distance to any existing embedding (nearest neighbor)
+        min_distance = min(
+            self.cosine_distance(embedding, existing)
+            for existing in self.embeddings
+        )
+
+        # Jump detection (similarity to previous output)
+        is_jump = False
+        jump_magnitude = None
+        if self.jump_detection_enabled and self.embeddings:
+            similarity_to_prev = self.cosine_similarity(embedding, self.embeddings[-1])
+            jump_magnitude = similarity_to_prev
+            is_jump = similarity_to_prev < self.similarity_threshold
+
+        # Primary novelty score is distance from centroid
+        # Normalized to [0, 1] range where higher = more novel
+        novelty_score = distance_from_centroid
+
+        return NoveltyScore(
+            score=novelty_score,
+            distance_from_centroid=distance_from_centroid,
+            min_distance_to_existing=min_distance,
+            is_jump=is_jump,
+            jump_magnitude=jump_magnitude
+        )
+
+    def add_embedding(self, embedding: np.ndarray, novelty: Optional[NoveltyScore] = None):
+        """
+        Add an embedding to the history and update trajectory.
+
+        Args:
+            embedding: The embedding to add
+            novelty: Pre-computed novelty score (computed if not provided)
+        """
+        embedding = np.array(embedding)
+
+        if novelty is None:
+            novelty = self.compute_novelty(embedding)
+
+        # Update state
+        self.embeddings.append(embedding)
+        self._centroid = self.compute_centroid()
+
+        # Update trajectory
+        self.trajectory.scores.append(novelty.score)
+
+        # Cumulative novelty
+        prev_cumulative = self.trajectory.cumulative_novelty[-1] if self.trajectory.cumulative_novelty else 0.0
+        self.trajectory.cumulative_novelty.append(prev_cumulative + novelty.score)
+
+        # Track jumps
+        if novelty.is_jump:
+            self.trajectory.jump_positions.append(len(self.embeddings) - 1)
+
+        # Store centroid history
+        if self._centroid is not None:
+            self.trajectory.centroid_history.append(self._centroid.copy())
+
+    def get_current_state(self) -> dict:
+        """Get current state as a dictionary for logging/debugging."""
+        return {
+            "num_embeddings": len(self.embeddings),
+            "mean_novelty": self.trajectory.mean_novelty,
+            "max_novelty": self.trajectory.max_novelty,
+            "jump_ratio": self.trajectory.jump_ratio,
+            "cumulative_novelty": self.trajectory.final_cumulative_novelty,
+            "recent_scores": self.trajectory.scores[-5:] if self.trajectory.scores else []
+        }
+
+
+def compute_batch_novelty(
+    embeddings: List[np.ndarray],
+    reference_embeddings: Optional[List[np.ndarray]] = None
+) -> List[float]:
+    """
+    Compute novelty scores for a batch of embeddings.
+
+    Useful for post-hoc analysis of generated outputs.
+
+    Args:
+        embeddings: List of embeddings to evaluate
+        reference_embeddings: Optional reference set (uses self if not provided)
+
+    Returns:
+        List of novelty scores (distance from centroid)
+    """
+    if not embeddings:
+        return []
+
+    embeddings_arr = np.array(embeddings)
+
+    if reference_embeddings is not None:
+        centroid = np.mean(reference_embeddings, axis=0)
+    else:
+        centroid = np.mean(embeddings_arr, axis=0)
+
+    scores = []
+    for emb in embeddings_arr:
+        distance = NoveltyMetrics.cosine_distance(emb, centroid)
+        scores.append(distance)
+
+    return scores
+
+
+def find_most_novel(
+    embeddings: List[np.ndarray],
+    texts: List[str],
+    top_k: int = 5
+) -> List[tuple]:
+    """
+    Find the most novel outputs from a batch.
+
+    Args:
+        embeddings: List of embeddings
+        texts: Corresponding text outputs
+        top_k: Number of top results to return
+
+    Returns:
+        List of (text, novelty_score, index) tuples, sorted by novelty descending
+    """
+    scores = compute_batch_novelty(embeddings)
+
+    indexed_results = [
+        (texts[i], scores[i], i)
+        for i in range(len(texts))
+    ]
+
+    # Sort by novelty score descending
+    indexed_results.sort(key=lambda x: x[1], reverse=True)
+
+    return indexed_results[:top_k]