Files

gbanyan 43c025e060 feat: Add experiments framework and novelty-driven agent loop

- Add complete experiments directory with pilot study infrastructure
  - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective)
  - Human assessment tool with React frontend and FastAPI backend
  - AUT flexibility analysis with jump signal detection
  - Result visualization and metrics computation

- Add novelty-driven agent loop module (experiments/novelty_loop/)
  - NoveltyDrivenTaskAgent with expert perspective perturbation
  - Three termination strategies: breakthrough, exhaust, coverage
  - Interactive CLI demo with colored output
  - Embedding-based novelty scoring

- Add DDC knowledge domain classification data (en/zh)
- Add CLAUDE.md project documentation
- Update research report with experiment findings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-20 10:16:21 +08:00

10 KiB

Raw Blame History

Novelty-Driven LLM Agent Loop

An autonomous LLM agent that generates tasks in a while loop, using novelty assessment as the termination condition to help the agent "jump out" of its trained data distribution (semantic gravity).

Concept

Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.

This module implements a novel approach: use novelty scores to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").

Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop

Research Foundation

This work builds on established research:

Novelty Search (Lehman & Stanley): Reward novelty, not objectives
Curiosity-driven Exploration (Pathak et al.): Intrinsic motivation via prediction error
Quality-Diversity (MAP-Elites): Maintain diverse high-quality solutions
Open-ended Learning: Endless innovation through novelty pressure

The unique contribution is using novelty as a termination condition rather than just a reward signal.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│              Novelty-Driven Task Generation Loop                 │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│   ┌──────────┐                                                   │
│   │ Seed     │  "Design a better bicycle"                        │
│   │ Problem  │                                                   │
│   └────┬─────┘                                                   │
│        │                                                         │
│        ▼                                                         │
│   ┌─────────────────────────────────────────────────────────┐    │
│   │  WHILE novelty < threshold AND iterations < max:        │    │
│   │                                                         │    │
│   │    1. Sample random expert (curated occupations)        │    │
│   │       e.g., "marine biologist", "choreographer"         │    │
│   │                                                         │    │
│   │    2. Generate task from expert perspective             │    │
│   │       "What task would a {expert} assign to improve     │    │
│   │        {seed_problem}?"                                 │    │
│   │                                                         │    │
│   │    3. Embed task, compute novelty vs. centroid          │    │
│   │                                                         │    │
│   │    4. If novelty > threshold → STOP (breakthrough!)     │    │
│   │                                                         │    │
│   └─────────────────────────────────────────────────────────┘    │
│        │                                                         │
│        ▼                                                         │
│   ┌──────────┐                                                   │
│   │ Output:  │  Novel task that "jumped out" of typical space    │
│   │ Task     │  + trajectory of exploration                      │
│   └──────────┘                                                   │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Installation

The module uses the existing project infrastructure. Ensure you have:

Ollama running with the required models:

ollama pull qwen3:8b
ollama pull qwen3-embedding:4b

Python dependencies (from project root):

cd backend
source venv/bin/activate
pip install httpx numpy

Quick Start

Basic Usage

cd experiments/novelty_loop
python demo.py "Improve urban transportation"

Example Output

Iteration 1
  Expert: Architect (Architecture & Design)
  Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
  Novelty: [████████░░░░░░░░░░░░] 0.1234

Iteration 2
  Expert: Chef (Culinary)
  Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
  Novelty: [███████████░░░░░░░░░] 0.1823

Iteration 3
  Expert: Marine Biologist (Science)
  Task: Study fish schooling behavior to develop organic traffic flow algorithms
  Novelty: [██████████████░░░░░░] 0.3521

Iteration 4
  Expert: Choreographer (Performing Arts)
  Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
  Novelty: [████████████████████] 0.5234
  ★ BREAKTHROUGH! ★

Termination Strategies

1. Seek Breakthrough (Default)

Stop when novelty exceeds threshold. Finds the first truly novel task.

python demo.py "Your problem" --strategy breakthrough --threshold 0.4

2. Exhaust Frontier

Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.

python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15

3. Coverage Target

Continue until N distinct conceptual clusters are covered. Ensures diversity.

python demo.py "Your problem" --strategy coverage --clusters 5

API Usage

import asyncio
from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent

async def main():
    agent = NoveltyDrivenTaskAgent(
        novelty_threshold=0.4,
        max_iterations=20,
        language="en"
    )

    result = await agent.run("Design a better bicycle")

    print(f"Found breakthrough: {result.breakthrough_task.task}")
    print(f"Novelty score: {result.breakthrough_task.novelty_score}")
    print(f"From expert: {result.breakthrough_task.expert}")

    await agent.close()

asyncio.run(main())

Novelty Metrics

The novelty_metrics.py module provides:

Centroid Distance: Primary novelty metric - how far from the average of all previous outputs
Min Distance: Distance to nearest neighbor (detect duplicates)
Jump Detection: Identifies significant semantic shifts between consecutive outputs
Trajectory Tracking: Cumulative novelty, jump ratio, etc.

from experiments.novelty_loop.novelty_metrics import NoveltyMetrics

metrics = NoveltyMetrics(similarity_threshold=0.7)

# Add embeddings one by one
for embedding in embeddings:
    novelty = metrics.compute_novelty(embedding)
    metrics.add_embedding(embedding, novelty)
    print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")

# Get trajectory stats
print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
print(f"Max novelty: {metrics.trajectory.max_novelty}")
print(f"Jump ratio: {metrics.trajectory.jump_ratio}")

CLI Options

positional arguments:
  seed_problem          The seed problem or challenge to explore

options:
  --strategy {breakthrough,exhaust,coverage}
                        Termination strategy (default: breakthrough)
  --threshold, -t       Novelty threshold for breakthrough (default: 0.4)
  --max-iter, -m        Maximum iterations (default: 20)
  --language, -l {en,zh}
                        Language for prompts and experts (default: en)
  --model               LLM model for task generation (default: qwen3:8b)
  --embedding-model     Embedding model (default: qwen3-embedding:4b)
  --temperature         LLM temperature (default: 0.7)
  --output, -o          Save results to JSON file
  --quiet, -q           Suppress iteration output
  --verbose, -v         Enable verbose logging

File Structure

experiments/novelty_loop/
├── README.md           # This file
├── agent.py            # Core NoveltyDrivenTaskAgent and variants
├── novelty_metrics.py  # Novelty computation utilities
└── demo.py             # Interactive CLI demo

Design Decisions

Question	Decision	Rationale
Output Type	Tasks	Self-generated sub-goals for autonomous problem decomposition
Termination	Seek Breakthrough	Stop when novelty exceeds threshold - find truly novel task
Perturbation	Expert Perspectives	Experts have task-oriented knowledge; more natural than abstract domains
Novelty Reference	Centroid	Dynamic, adapts as exploration progresses

Connection to Main Project

This module integrates with the main novelty-seeking project:

Uses the same curated occupation data (backend/app/data/curated_occupations_*.json)
Uses the same embedding model (qwen3-embedding:4b)
Builds on the AUT flexibility analysis metrics for novelty computation
Can use DDC domain data for alternative perturbation strategies

Future Work

Hybrid Perturbation: Combine expert + domain perspectives
Contrastive Prompting: Explicitly ask for outputs unlike recent ones
Semantic Steering: Guide generation away from centroid direction
Multi-Agent Exploration: Parallel agents with different strategies
Quality-Diversity Archive: Maintain diverse high-quality solutions

References

Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs

10 KiB Raw Blame History