feat: Add experiments framework and novelty-driven agent loop

- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 10:16:21 +08:00
parent 26a56a2a07
commit 43c025e060
81 changed files with 18766 additions and 2 deletions
--- a/experiments/novelty_loop/README.md
+++ b/experiments/novelty_loop/README.md
@@ -0,0 +1,253 @@
+# Novelty-Driven LLM Agent Loop
+
+An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
+
+## Concept
+
+Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
+
+This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
+
+```
+Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
+```
+
+## Research Foundation
+
+This work builds on established research:
+
+- **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
+- **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
+- **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
+- **Open-ended Learning**: Endless innovation through novelty pressure
+
+The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
+
+## Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│              Novelty-Driven Task Generation Loop                 │
+├──────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│   ┌──────────┐                                                   │
+│   │ Seed     │  "Design a better bicycle"                        │
+│   │ Problem  │                                                   │
+│   └────┬─────┘                                                   │
+│        │                                                         │
+│        ▼                                                         │
+│   ┌─────────────────────────────────────────────────────────┐    │
+│   │  WHILE novelty < threshold AND iterations < max:        │    │
+│   │                                                         │    │
+│   │    1. Sample random expert (curated occupations)        │    │
+│   │       e.g., "marine biologist", "choreographer"         │    │
+│   │                                                         │    │
+│   │    2. Generate task from expert perspective             │    │
+│   │       "What task would a {expert} assign to improve     │    │
+│   │        {seed_problem}?"                                 │    │
+│   │                                                         │    │
+│   │    3. Embed task, compute novelty vs. centroid          │    │
+│   │                                                         │    │
+│   │    4. If novelty > threshold → STOP (breakthrough!)     │    │
+│   │                                                         │    │
+│   └─────────────────────────────────────────────────────────┘    │
+│        │                                                         │
+│        ▼                                                         │
+│   ┌──────────┐                                                   │
+│   │ Output:  │  Novel task that "jumped out" of typical space    │
+│   │ Task     │  + trajectory of exploration                      │
+│   └──────────┘                                                   │
+│                                                                  │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+## Installation
+
+The module uses the existing project infrastructure. Ensure you have:
+
+1. **Ollama** running with the required models:
+   ```bash
+   ollama pull qwen3:8b
+   ollama pull qwen3-embedding:4b
+   ```
+
+2. **Python dependencies** (from project root):
+   ```bash
+   cd backend
+   source venv/bin/activate
+   pip install httpx numpy
+   ```
+
+## Quick Start
+
+### Basic Usage
+
+```bash
+cd experiments/novelty_loop
+python demo.py "Improve urban transportation"
+```
+
+### Example Output
+
+```
+Iteration 1
+  Expert: Architect (Architecture & Design)
+  Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
+  Novelty: [████████░░░░░░░░░░░░] 0.1234
+
+Iteration 2
+  Expert: Chef (Culinary)
+  Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
+  Novelty: [███████████░░░░░░░░░] 0.1823
+
+Iteration 3
+  Expert: Marine Biologist (Science)
+  Task: Study fish schooling behavior to develop organic traffic flow algorithms
+  Novelty: [██████████████░░░░░░] 0.3521
+
+Iteration 4
+  Expert: Choreographer (Performing Arts)
+  Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
+  Novelty: [████████████████████] 0.5234
+  ★ BREAKTHROUGH! ★
+```
+
+## Termination Strategies
+
+### 1. Seek Breakthrough (Default)
+
+Stop when novelty exceeds threshold. Finds the first truly novel task.
+
+```bash
+python demo.py "Your problem" --strategy breakthrough --threshold 0.4
+```
+
+### 2. Exhaust Frontier
+
+Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
+
+```bash
+python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
+```
+
+### 3. Coverage Target
+
+Continue until N distinct conceptual clusters are covered. Ensures diversity.
+
+```bash
+python demo.py "Your problem" --strategy coverage --clusters 5
+```
+
+## API Usage
+
+```python
+import asyncio
+from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
+
+async def main():
+    agent = NoveltyDrivenTaskAgent(
+        novelty_threshold=0.4,
+        max_iterations=20,
+        language="en"
+    )
+
+    result = await agent.run("Design a better bicycle")
+
+    print(f"Found breakthrough: {result.breakthrough_task.task}")
+    print(f"Novelty score: {result.breakthrough_task.novelty_score}")
+    print(f"From expert: {result.breakthrough_task.expert}")
+
+    await agent.close()
+
+asyncio.run(main())
+```
+
+## Novelty Metrics
+
+The `novelty_metrics.py` module provides:
+
+- **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
+- **Min Distance**: Distance to nearest neighbor (detect duplicates)
+- **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
+- **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
+
+```python
+from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
+
+metrics = NoveltyMetrics(similarity_threshold=0.7)
+
+# Add embeddings one by one
+for embedding in embeddings:
+    novelty = metrics.compute_novelty(embedding)
+    metrics.add_embedding(embedding, novelty)
+    print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
+
+# Get trajectory stats
+print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
+print(f"Max novelty: {metrics.trajectory.max_novelty}")
+print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
+```
+
+## CLI Options
+
+```
+positional arguments:
+  seed_problem          The seed problem or challenge to explore
+
+options:
+  --strategy {breakthrough,exhaust,coverage}
+                        Termination strategy (default: breakthrough)
+  --threshold, -t       Novelty threshold for breakthrough (default: 0.4)
+  --max-iter, -m        Maximum iterations (default: 20)
+  --language, -l {en,zh}
+                        Language for prompts and experts (default: en)
+  --model               LLM model for task generation (default: qwen3:8b)
+  --embedding-model     Embedding model (default: qwen3-embedding:4b)
+  --temperature         LLM temperature (default: 0.7)
+  --output, -o          Save results to JSON file
+  --quiet, -q           Suppress iteration output
+  --verbose, -v         Enable verbose logging
+```
+
+## File Structure
+
+```
+experiments/novelty_loop/
+├── README.md           # This file
+├── agent.py            # Core NoveltyDrivenTaskAgent and variants
+├── novelty_metrics.py  # Novelty computation utilities
+└── demo.py             # Interactive CLI demo
+```
+
+## Design Decisions
+
+| Question | Decision | Rationale |
+|----------|----------|-----------|
+| Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
+| Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
+| Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
+| Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
+
+## Connection to Main Project
+
+This module integrates with the main novelty-seeking project:
+
+- Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
+- Uses the same **embedding model** (qwen3-embedding:4b)
+- Builds on the **AUT flexibility analysis** metrics for novelty computation
+- Can use **DDC domain data** for alternative perturbation strategies
+
+## Future Work
+
+1. **Hybrid Perturbation**: Combine expert + domain perspectives
+2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
+3. **Semantic Steering**: Guide generation away from centroid direction
+4. **Multi-Agent Exploration**: Parallel agents with different strategies
+5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
+
+## References
+
+- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
+- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
+- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
+- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs