feat: Add experiments framework and novelty-driven agent loop

- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
feat: Enhance patent search and update research documentation
2026-01-20 10:16:21 +08:00 · 2026-01-19 15:52:33 +08:00 · 2026-01-05 22:32:08 +08:00
126 changed files with 25270 additions and 275 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,101 @@
 # CLAUDE.md
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 ## Project Overview
 This is a creative ideation system that uses LLMs to break "semantic gravity" (the tendency of LLMs to generate ideas clustered around high-probability training distributions). The system analyzes objects through multiple attribute dimensions and transforms them using expert perspectives to generate novel ideas.
 ## Development Commands
 ### Starting the Application
 ```bash
 ./start.sh    # Starts both backend (port 8001) and frontend (port 5173)
 ./stop.sh     # Stops all services
 ```
 ### Backend (FastAPI + Python)
 ```bash
 cd backend
 python3 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt
 uvicorn app.main:app --host 0.0.0.0 --port 8001 --reload
 ```
 ### Frontend (React + Vite + TypeScript)
 ```bash
 cd frontend
 npm install
 npm run dev      # Development server
 npm run build    # TypeScript check + production build
 npm run lint     # ESLint
 ```
 ## Architecture
 ### Multi-Agent Pipeline
 The system uses three interconnected agents that process queries through Server-Sent Events (SSE) for real-time streaming:
 ```
 Query → Attribute Agent → Expert Transformation Agent → Deduplication Agent
                                    ↓
                            Patent Search (optional)
 ```
 **1. Attribute Agent** (`/api/analyze`)
 - Analyzes a query (e.g., "bicycle") through configurable category dimensions
 - Step 0: Category analysis (5 modes: FIXED_ONLY, FIXED_PLUS_CUSTOM, FIXED_PLUS_DYNAMIC, CUSTOM_ONLY, DYNAMIC_AUTO)
 - Step 1: Generate attributes per category
 - Step 2: Build DAG relationships between attributes across categories
 - Output: `AttributeDAG` with nodes and edges
 **2. Expert Transformation Agent** (`/api/expert-transformation/category`)
 - Takes attributes and transforms them through diverse expert perspectives
 - Step 0: Generate expert team (sources: `llm`, `curated`, `dbpedia`, `wikidata`)
 - Step 1: Each expert generates keywords for each attribute
 - Step 2: Generate descriptions for each keyword
 - Formula: `total_keywords = attributes × expert_count × keywords_per_expert`
 **3. Deduplication Agent** (`/api/deduplication/deduplicate`)
 - Consolidates similar ideas using embedding similarity or LLM judgment
 - Groups duplicates while preserving representative descriptions
 ### Backend Structure (`backend/app/`)
 - `routers/` - FastAPI endpoints with SSE streaming
 - `services/` - LLM service (Ollama/OpenAI), embedding service, expert source service
 - `prompts/` - Bilingual prompt templates (zh/en) for each agent step
 - `data/` - Curated occupation lists for expert sourcing (210 professions)
 ### Frontend Structure (`frontend/src/`)
 - `hooks/` - React hooks matching backend agents (`useAttribute`, `useExpertTransformation`, `useDeduplication`)
 - `components/` - UI panels for each stage + DAG visualization (D3.js, @xyflow/react)
 - `services/api.ts` - SSE stream parsing and API calls
 - `types/index.ts` - TypeScript interfaces mirroring backend schemas
 ### Key Patterns
 **SSE Event Flow**: All agent operations stream progress via SSE events:
 ```typescript
 // Frontend callback pattern
 onStep0Start → onStep0Complete → onStep1Start → onStep1Complete → onDone
 ```
 **Bilingual Support**: All prompts and UI support `PromptLanguage = 'zh' | 'en'`. Language flows through the entire pipeline from request to response messages.
 **Expert Source Fallback**: If external sources (DBpedia, Wikidata) fail, system automatically falls back to LLM-based expert generation.
 ### Configuration
 Backend requires `.env` file:
 ```
 OLLAMA_BASE_URL=http://localhost:11435  # Default Ollama endpoint
 DEFAULT_MODEL=qwen3:8b                   # Default LLM model
 OPENAI_API_KEY=                          # Optional: for OpenAI-compatible APIs
 LENS_API_TOKEN=                          # Optional: for patent search
 ```
 ### Dual-Path Mode
 The system supports analyzing two queries in parallel (`PathA` and `PathB`) with attribute crossover functionality for comparing and combining ideas across different objects.
--- a/backend/app/config.py
+++ b/backend/app/config.py
@@ -3,10 +3,11 @@ from typing import Optional
 class Settings(BaseSettings):
-    ollama_base_url: str = "http://192.168.30.36:11434"
+    ollama_base_url: str = "http://localhost:11435"
    default_model: str = "qwen3:8b"
    openai_api_key: Optional[str] = None
    openai_base_url: Optional[str] = None
    lens_api_token: Optional[str] = None
    class Config:
        env_file = ".env"
--- a/backend/app/data/ddc_domains_en.json
+++ b/backend/app/data/ddc_domains_en.json
@@ -0,0 +1,120 @@
 {
  "metadata": {
    "source": "ddc",
    "language": "en",
    "created_at": "2026-01-20",
    "total_count": 100,
    "description": "Dewey Decimal Classification knowledge domains (10 main classes + 90 divisions)"
  },
  "domains": [
    {"code": "000", "name": "Computer Science, Information & General Works", "level": "class", "parent": null},
    {"code": "010", "name": "Bibliographies", "level": "division", "parent": "000"},
    {"code": "020", "name": "Library & Information Sciences", "level": "division", "parent": "000"},
    {"code": "030", "name": "Encyclopedias & Books of Facts", "level": "division", "parent": "000"},
    {"code": "040", "name": "Unassigned", "level": "division", "parent": "000"},
    {"code": "050", "name": "Magazines, Journals & Serials", "level": "division", "parent": "000"},
    {"code": "060", "name": "Associations, Organizations & Museums", "level": "division", "parent": "000"},
    {"code": "070", "name": "News Media, Journalism & Publishing", "level": "division", "parent": "000"},
    {"code": "080", "name": "Quotations", "level": "division", "parent": "000"},
    {"code": "090", "name": "Manuscripts & Rare Books", "level": "division", "parent": "000"},
    {"code": "100", "name": "Philosophy & Psychology", "level": "class", "parent": null},
    {"code": "110", "name": "Metaphysics", "level": "division", "parent": "100"},
    {"code": "120", "name": "Epistemology", "level": "division", "parent": "100"},
    {"code": "130", "name": "Parapsychology & Occultism", "level": "division", "parent": "100"},
    {"code": "140", "name": "Philosophical Schools of Thought", "level": "division", "parent": "100"},
    {"code": "150", "name": "Psychology", "level": "division", "parent": "100"},
    {"code": "160", "name": "Logic", "level": "division", "parent": "100"},
    {"code": "170", "name": "Ethics", "level": "division", "parent": "100"},
    {"code": "180", "name": "Ancient, Medieval & Eastern Philosophy", "level": "division", "parent": "100"},
    {"code": "190", "name": "Modern Western Philosophy", "level": "division", "parent": "100"},
    {"code": "200", "name": "Religion", "level": "class", "parent": null},
    {"code": "210", "name": "Philosophy & Theory of Religion", "level": "division", "parent": "200"},
    {"code": "220", "name": "Bible", "level": "division", "parent": "200"},
    {"code": "230", "name": "Christianity & Christian Theology", "level": "division", "parent": "200"},
    {"code": "240", "name": "Christian Practice & Observance", "level": "division", "parent": "200"},
    {"code": "250", "name": "Christian Orders & Local Churches", "level": "division", "parent": "200"},
    {"code": "260", "name": "Christian Social & Ecclesiastical Theology", "level": "division", "parent": "200"},
    {"code": "270", "name": "History of Christianity", "level": "division", "parent": "200"},
    {"code": "280", "name": "Christian Denominations", "level": "division", "parent": "200"},
    {"code": "290", "name": "Other Religions", "level": "division", "parent": "200"},
    {"code": "300", "name": "Social Sciences", "level": "class", "parent": null},
    {"code": "310", "name": "Statistics", "level": "division", "parent": "300"},
    {"code": "320", "name": "Political Science", "level": "division", "parent": "300"},
    {"code": "330", "name": "Economics", "level": "division", "parent": "300"},
    {"code": "340", "name": "Law", "level": "division", "parent": "300"},
    {"code": "350", "name": "Public Administration & Military Science", "level": "division", "parent": "300"},
    {"code": "360", "name": "Social Problems & Services", "level": "division", "parent": "300"},
    {"code": "370", "name": "Education", "level": "division", "parent": "300"},
    {"code": "380", "name": "Commerce, Communications & Transportation", "level": "division", "parent": "300"},
    {"code": "390", "name": "Customs, Etiquette & Folklore", "level": "division", "parent": "300"},
    {"code": "400", "name": "Language", "level": "class", "parent": null},
    {"code": "410", "name": "Linguistics", "level": "division", "parent": "400"},
    {"code": "420", "name": "English & Old English Languages", "level": "division", "parent": "400"},
    {"code": "430", "name": "German & Related Languages", "level": "division", "parent": "400"},
    {"code": "440", "name": "French & Related Languages", "level": "division", "parent": "400"},
    {"code": "450", "name": "Italian, Romanian & Related Languages", "level": "division", "parent": "400"},
    {"code": "460", "name": "Spanish, Portuguese & Galician", "level": "division", "parent": "400"},
    {"code": "470", "name": "Latin & Italic Languages", "level": "division", "parent": "400"},
    {"code": "480", "name": "Classical & Modern Greek Languages", "level": "division", "parent": "400"},
    {"code": "490", "name": "Other Languages", "level": "division", "parent": "400"},
    {"code": "500", "name": "Science", "level": "class", "parent": null},
    {"code": "510", "name": "Mathematics", "level": "division", "parent": "500"},
    {"code": "520", "name": "Astronomy", "level": "division", "parent": "500"},
    {"code": "530", "name": "Physics", "level": "division", "parent": "500"},
    {"code": "540", "name": "Chemistry", "level": "division", "parent": "500"},
    {"code": "550", "name": "Earth Sciences & Geology", "level": "division", "parent": "500"},
    {"code": "560", "name": "Paleontology", "level": "division", "parent": "500"},
    {"code": "570", "name": "Biology & Life Sciences", "level": "division", "parent": "500"},
    {"code": "580", "name": "Botany", "level": "division", "parent": "500"},
    {"code": "590", "name": "Zoology", "level": "division", "parent": "500"},
    {"code": "600", "name": "Technology", "level": "class", "parent": null},
    {"code": "610", "name": "Medicine & Health", "level": "division", "parent": "600"},
    {"code": "620", "name": "Engineering", "level": "division", "parent": "600"},
    {"code": "630", "name": "Agriculture", "level": "division", "parent": "600"},
    {"code": "640", "name": "Home & Family Management", "level": "division", "parent": "600"},
    {"code": "650", "name": "Management & Public Relations", "level": "division", "parent": "600"},
    {"code": "660", "name": "Chemical Engineering", "level": "division", "parent": "600"},
    {"code": "670", "name": "Manufacturing", "level": "division", "parent": "600"},
    {"code": "680", "name": "Manufacture for Specific Uses", "level": "division", "parent": "600"},
    {"code": "690", "name": "Construction & Building", "level": "division", "parent": "600"},
    {"code": "700", "name": "Arts & Recreation", "level": "class", "parent": null},
    {"code": "710", "name": "Landscape & Area Planning", "level": "division", "parent": "700"},
    {"code": "720", "name": "Architecture", "level": "division", "parent": "700"},
    {"code": "730", "name": "Sculpture, Ceramics & Metalwork", "level": "division", "parent": "700"},
    {"code": "740", "name": "Drawing & Decorative Arts", "level": "division", "parent": "700"},
    {"code": "750", "name": "Painting", "level": "division", "parent": "700"},
    {"code": "760", "name": "Graphic Arts & Printmaking", "level": "division", "parent": "700"},
    {"code": "770", "name": "Photography & Computer Art", "level": "division", "parent": "700"},
    {"code": "780", "name": "Music", "level": "division", "parent": "700"},
    {"code": "790", "name": "Sports, Games & Entertainment", "level": "division", "parent": "700"},
    {"code": "800", "name": "Literature", "level": "class", "parent": null},
    {"code": "810", "name": "American Literature in English", "level": "division", "parent": "800"},
    {"code": "820", "name": "English & Old English Literature", "level": "division", "parent": "800"},
    {"code": "830", "name": "German & Related Literature", "level": "division", "parent": "800"},
    {"code": "840", "name": "French & Related Literature", "level": "division", "parent": "800"},
    {"code": "850", "name": "Italian, Romanian & Related Literature", "level": "division", "parent": "800"},
    {"code": "860", "name": "Spanish, Portuguese & Galician Literature", "level": "division", "parent": "800"},
    {"code": "870", "name": "Latin & Italic Literature", "level": "division", "parent": "800"},
    {"code": "880", "name": "Classical & Modern Greek Literature", "level": "division", "parent": "800"},
    {"code": "890", "name": "Other Literatures", "level": "division", "parent": "800"},
    {"code": "900", "name": "History & Geography", "level": "class", "parent": null},
    {"code": "910", "name": "Geography & Travel", "level": "division", "parent": "900"},
    {"code": "920", "name": "Biography & Genealogy", "level": "division", "parent": "900"},
    {"code": "930", "name": "History of Ancient World", "level": "division", "parent": "900"},
    {"code": "940", "name": "History of Europe", "level": "division", "parent": "900"},
    {"code": "950", "name": "History of Asia", "level": "division", "parent": "900"},
    {"code": "960", "name": "History of Africa", "level": "division", "parent": "900"},
    {"code": "970", "name": "History of North America", "level": "division", "parent": "900"},
    {"code": "980", "name": "History of South America", "level": "division", "parent": "900"},
    {"code": "990", "name": "History of Other Areas", "level": "division", "parent": "900"}
  ]
 }
--- a/backend/app/data/ddc_domains_zh.json
+++ b/backend/app/data/ddc_domains_zh.json
@@ -0,0 +1,120 @@
 {
  "metadata": {
    "source": "ddc",
    "language": "zh",
    "created_at": "2026-01-20",
    "total_count": 100,
    "description": "杜威十進位圖書分類法知識領域（10個大類 + 90個細類）"
  },
  "domains": [
    {"code": "000", "name": "電腦科學、資訊與總類", "level": "class", "parent": null},
    {"code": "010", "name": "書目學", "level": "division", "parent": "000"},
    {"code": "020", "name": "圖書資訊學", "level": "division", "parent": "000"},
    {"code": "030", "name": "百科全書與常識書", "level": "division", "parent": "000"},
    {"code": "040", "name": "未分配", "level": "division", "parent": "000"},
    {"code": "050", "name": "雜誌、期刊與連續出版品", "level": "division", "parent": "000"},
    {"code": "060", "name": "協會、組織與博物館", "level": "division", "parent": "000"},
    {"code": "070", "name": "新聞媒體、新聞學與出版", "level": "division", "parent": "000"},
    {"code": "080", "name": "引用語錄", "level": "division", "parent": "000"},
    {"code": "090", "name": "手稿與珍本", "level": "division", "parent": "000"},
    {"code": "100", "name": "哲學與心理學", "level": "class", "parent": null},
    {"code": "110", "name": "形上學", "level": "division", "parent": "100"},
    {"code": "120", "name": "知識論", "level": "division", "parent": "100"},
    {"code": "130", "name": "超心理學與神秘學", "level": "division", "parent": "100"},
    {"code": "140", "name": "哲學流派", "level": "division", "parent": "100"},
    {"code": "150", "name": "心理學", "level": "division", "parent": "100"},
    {"code": "160", "name": "邏輯學", "level": "division", "parent": "100"},
    {"code": "170", "name": "倫理學", "level": "division", "parent": "100"},
    {"code": "180", "name": "古代、中世紀與東方哲學", "level": "division", "parent": "100"},
    {"code": "190", "name": "近代西方哲學", "level": "division", "parent": "100"},
    {"code": "200", "name": "宗教", "level": "class", "parent": null},
    {"code": "210", "name": "宗教哲學與理論", "level": "division", "parent": "200"},
    {"code": "220", "name": "聖經", "level": "division", "parent": "200"},
    {"code": "230", "name": "基督教與基督神學", "level": "division", "parent": "200"},
    {"code": "240", "name": "基督教實踐與禮儀", "level": "division", "parent": "200"},
    {"code": "250", "name": "基督教修會與地方教會", "level": "division", "parent": "200"},
    {"code": "260", "name": "基督教社會與教會神學", "level": "division", "parent": "200"},
    {"code": "270", "name": "基督教歷史", "level": "division", "parent": "200"},
    {"code": "280", "name": "基督教教派", "level": "division", "parent": "200"},
    {"code": "290", "name": "其他宗教", "level": "division", "parent": "200"},
    {"code": "300", "name": "社會科學", "level": "class", "parent": null},
    {"code": "310", "name": "統計學", "level": "division", "parent": "300"},
    {"code": "320", "name": "政治學", "level": "division", "parent": "300"},
    {"code": "330", "name": "經濟學", "level": "division", "parent": "300"},
    {"code": "340", "name": "法律", "level": "division", "parent": "300"},
    {"code": "350", "name": "公共行政與軍事學", "level": "division", "parent": "300"},
    {"code": "360", "name": "社會問題與服務", "level": "division", "parent": "300"},
    {"code": "370", "name": "教育", "level": "division", "parent": "300"},
    {"code": "380", "name": "商業、通訊與運輸", "level": "division", "parent": "300"},
    {"code": "390", "name": "風俗、禮儀與民俗", "level": "division", "parent": "300"},
    {"code": "400", "name": "語言", "level": "class", "parent": null},
    {"code": "410", "name": "語言學", "level": "division", "parent": "400"},
    {"code": "420", "name": "英語與古英語", "level": "division", "parent": "400"},
    {"code": "430", "name": "德語及相關語言", "level": "division", "parent": "400"},
    {"code": "440", "name": "法語及相關語言", "level": "division", "parent": "400"},
    {"code": "450", "name": "義大利語、羅馬尼亞語及相關語言", "level": "division", "parent": "400"},
    {"code": "460", "name": "西班牙語、葡萄牙語與加利西亞語", "level": "division", "parent": "400"},
    {"code": "470", "name": "拉丁語及義大利語族", "level": "division", "parent": "400"},
    {"code": "480", "name": "古典與現代希臘語", "level": "division", "parent": "400"},
    {"code": "490", "name": "其他語言", "level": "division", "parent": "400"},
    {"code": "500", "name": "自然科學", "level": "class", "parent": null},
    {"code": "510", "name": "數學", "level": "division", "parent": "500"},
    {"code": "520", "name": "天文學", "level": "division", "parent": "500"},
    {"code": "530", "name": "物理學", "level": "division", "parent": "500"},
    {"code": "540", "name": "化學", "level": "division", "parent": "500"},
    {"code": "550", "name": "地球科學與地質學", "level": "division", "parent": "500"},
    {"code": "560", "name": "古生物學", "level": "division", "parent": "500"},
    {"code": "570", "name": "生物學與生命科學", "level": "division", "parent": "500"},
    {"code": "580", "name": "植物學", "level": "division", "parent": "500"},
    {"code": "590", "name": "動物學", "level": "division", "parent": "500"},
    {"code": "600", "name": "應用科學與技術", "level": "class", "parent": null},
    {"code": "610", "name": "醫學與健康", "level": "division", "parent": "600"},
    {"code": "620", "name": "工程學", "level": "division", "parent": "600"},
    {"code": "630", "name": "農業", "level": "division", "parent": "600"},
    {"code": "640", "name": "家政與家庭管理", "level": "division", "parent": "600"},
    {"code": "650", "name": "管理與公共關係", "level": "division", "parent": "600"},
    {"code": "660", "name": "化學工程", "level": "division", "parent": "600"},
    {"code": "670", "name": "製造業", "level": "division", "parent": "600"},
    {"code": "680", "name": "特定用途製造", "level": "division", "parent": "600"},
    {"code": "690", "name": "建築與營造", "level": "division", "parent": "600"},
    {"code": "700", "name": "藝術與休閒", "level": "class", "parent": null},
    {"code": "710", "name": "景觀與區域規劃", "level": "division", "parent": "700"},
    {"code": "720", "name": "建築學", "level": "division", "parent": "700"},
    {"code": "730", "name": "雕塑、陶瓷與金工", "level": "division", "parent": "700"},
    {"code": "740", "name": "繪畫與裝飾藝術", "level": "division", "parent": "700"},
    {"code": "750", "name": "繪畫藝術", "level": "division", "parent": "700"},
    {"code": "760", "name": "版畫與印刷藝術", "level": "division", "parent": "700"},
    {"code": "770", "name": "攝影與電腦藝術", "level": "division", "parent": "700"},
    {"code": "780", "name": "音樂", "level": "division", "parent": "700"},
    {"code": "790", "name": "運動、遊戲與娛樂", "level": "division", "parent": "700"},
    {"code": "800", "name": "文學", "level": "class", "parent": null},
    {"code": "810", "name": "美國英語文學", "level": "division", "parent": "800"},
    {"code": "820", "name": "英語與古英語文學", "level": "division", "parent": "800"},
    {"code": "830", "name": "德語及相關文學", "level": "division", "parent": "800"},
    {"code": "840", "name": "法語及相關文學", "level": "division", "parent": "800"},
    {"code": "850", "name": "義大利語、羅馬尼亞語及相關文學", "level": "division", "parent": "800"},
    {"code": "860", "name": "西班牙語、葡萄牙語與加利西亞語文學", "level": "division", "parent": "800"},
    {"code": "870", "name": "拉丁語及義大利語族文學", "level": "division", "parent": "800"},
    {"code": "880", "name": "古典與現代希臘文學", "level": "division", "parent": "800"},
    {"code": "890", "name": "其他文學", "level": "division", "parent": "800"},
    {"code": "900", "name": "歷史與地理", "level": "class", "parent": null},
    {"code": "910", "name": "地理與旅遊", "level": "division", "parent": "900"},
    {"code": "920", "name": "傳記與家譜", "level": "division", "parent": "900"},
    {"code": "930", "name": "古代世界史", "level": "division", "parent": "900"},
    {"code": "940", "name": "歐洲史", "level": "division", "parent": "900"},
    {"code": "950", "name": "亞洲史", "level": "division", "parent": "900"},
    {"code": "960", "name": "非洲史", "level": "division", "parent": "900"},
    {"code": "970", "name": "北美洲史", "level": "division", "parent": "900"},
    {"code": "980", "name": "南美洲史", "level": "division", "parent": "900"},
    {"code": "990", "name": "其他地區史", "level": "division", "parent": "900"}
  ]
 }
--- a/backend/app/main.py
+++ b/backend/app/main.py
@@ -3,10 +3,11 @@ from contextlib import asynccontextmanager
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
-from .routers import attributes, transformation, expert_transformation, deduplication
+from .routers import attributes, transformation, expert_transformation, deduplication, patent_search
 from .services.llm_service import ollama_provider
 from .services.embedding_service import embedding_service
 from .services.llm_deduplication_service import llm_deduplication_service
 from .services.patent_search_service import patent_search_service
@asynccontextmanager
@@ -15,6 +16,7 @@ async def lifespan(app: FastAPI):
    await ollama_provider.close()
    await embedding_service.close()
    await llm_deduplication_service.close()
    await patent_search_service.close()
 app = FastAPI(
@@ -36,6 +38,7 @@ app.include_router(attributes.router)
 app.include_router(transformation.router)
 app.include_router(expert_transformation.router)
 app.include_router(deduplication.router)
 app.include_router(patent_search.router)
@app.get("/")
--- a/backend/app/models/schemas.py
+++ b/backend/app/models/schemas.py
@@ -1,7 +1,10 @@
 from pydantic import BaseModel
-from typing import Optional, List, Dict
+from typing import Optional, List, Dict, Literal
 from enum import Enum
 # Language type for prompts
 LanguageType = Literal["zh", "en"]
 class AttributeNode(BaseModel):
    name: str
@@ -47,16 +50,19 @@ class CausalChain(BaseModel):
 class StreamAnalyzeRequest(BaseModel):
-    """多步驟分析請求（更新為支持動態類別）"""
+    """Multi-step analysis request (updated to support dynamic categories)"""
    query: str
    model: Optional[str] = None
    temperature: Optional[float] = 0.7
-    chain_count: int = 5  # 用戶可設定要生成多少條因果鏈
+    chain_count: int = 5  # User can set how many causal chains to generate
-    # 新增：動態類別支持
+    # Dynamic category support
-    category_mode: Optional[str] = "dynamic_auto"  # CategoryMode enum 值
+    category_mode: Optional[str] = "dynamic_auto"  # CategoryMode enum value
    custom_categories: Optional[List[str]] = None
-    suggested_category_count: int = 3  # 建議 LLM 生成的類別數量
+    suggested_category_count: int = 3  # Suggest LLM to generate this many categories
    # Language setting
    lang: LanguageType = "zh"
 class StreamAnalyzeResponse(BaseModel):
@@ -136,13 +142,14 @@ class DAGRelationship(BaseModel):
 # ===== Transformation Agent schemas =====
 class TransformationRequest(BaseModel):
-    """Transformation Agent 請求"""
+    """Transformation Agent request"""
-    query: str                        # 原始查詢 (e.g., "腳踏車")
+    query: str                        # Original query (e.g., "bicycle")
-    category: str                     # 類別名稱 (e.g., "功能")
+    category: str                     # Category name (e.g., "Functions")
-    attributes: List[str]             # 該類別的屬性列表
+    attributes: List[str]             # Attribute list for this category
    model: Optional[str] = None
    temperature: Optional[float] = 0.7
-    keyword_count: int = 3            # 要生成的新關鍵字數量
+    keyword_count: int = 3            # Number of new keywords to generate
    lang: LanguageType = "zh"         # Language for prompts
 class TransformationDescription(BaseModel):
@@ -215,24 +222,27 @@ class ExpertSource(str, Enum):
 class ExpertTransformationRequest(BaseModel):
-    """Expert Transformation Agent 請求"""
+    """Expert Transformation Agent request"""
    query: str
    category: str
    attributes: List[str]
    # Expert parameters
-    expert_count: int = 3                        # 專家數量 (2-8)
+    expert_count: int = 3                        # Number of experts (2-8)
-    keywords_per_expert: int = 1                 # 每個專家為每個屬性生成幾個關鍵字 (1-3)
+    keywords_per_expert: int = 1                 # Keywords per expert per attribute (1-3)
-    custom_experts: Optional[List[str]] = None   # 用戶指定專家 ["藥師", "工程師"]
+    custom_experts: Optional[List[str]] = None   # User-specified experts
    # Expert source parameters
-    expert_source: ExpertSource = ExpertSource.LLM  # 專家來源
+    expert_source: ExpertSource = ExpertSource.LLM  # Expert source
-    expert_language: str = "en"                     # 外部來源的語言 (目前只有英文資料)
+    expert_language: str = "en"                     # Language for external sources
    # LLM parameters
    model: Optional[str] = None
    temperature: Optional[float] = 0.7
    # Prompt language
    lang: LanguageType = "zh"
 # ===== Deduplication Agent schemas =====
@@ -243,11 +253,12 @@ class DeduplicationMethod(str, Enum):
 class DeduplicationRequest(BaseModel):
-    """去重請求"""
+    """Deduplication request"""
    descriptions: List[ExpertTransformationDescription]
-    method: DeduplicationMethod = DeduplicationMethod.EMBEDDING  # 去重方法
+    method: DeduplicationMethod = DeduplicationMethod.EMBEDDING  # Deduplication method
-    similarity_threshold: float = 0.85  # 餘弦相似度閾值 (0.0-1.0)，僅 Embedding 使用
+    similarity_threshold: float = 0.85  # Cosine similarity threshold (0.0-1.0), only for Embedding
-    model: Optional[str] = None  # Embedding/LLM 模型
+    model: Optional[str] = None  # Embedding/LLM model
    lang: LanguageType = "zh"  # Prompt language (for LLM method)
 class DescriptionGroup(BaseModel):
--- a/backend/app/prompts/attribute_prompt.py
+++ b/backend/app/prompts/attribute_prompt.py
@@ -1,21 +1,37 @@
 from typing import List, Optional, Dict
 import json
-
+from .language_config import (
-DEFAULT_CATEGORIES = ["材料", "功能", "用途", "使用族群", "特性"]
+    LanguageType,
-
+    DEFAULT_CATEGORIES,
-CATEGORY_DESCRIPTIONS = {
+    CATEGORY_DESCRIPTIONS,
-    "材料": "物件由什麼材料組成",
+)
    "功能": "物件能做什麼",
    "用途": "物件在什麼場景使用",
    "使用族群": "誰會使用這個物件",
    "特性": "物件有什麼特徵",
 }
-def get_attribute_prompt(query: str, categories: Optional[List[str]] = None) -> str:
+def get_default_categories(lang: LanguageType = "zh") -> List[str]:
    return DEFAULT_CATEGORIES.get(lang, DEFAULT_CATEGORIES["zh"])
 def get_category_descriptions(lang: LanguageType = "zh") -> Dict[str, str]:
    return CATEGORY_DESCRIPTIONS.get(lang, CATEGORY_DESCRIPTIONS["zh"])
 def get_attribute_prompt(
    query: str,
    categories: Optional[List[str]] = None,
    lang: LanguageType = "zh"
 ) -> str:
    """Generate prompt with causal chain structure."""
    if lang == "en":
        prompt = f"""Analyze the attributes of "{query}" in a causal chain format: Materials→Functions→Usages→User Groups.
-    prompt = f"""分析「{query}」的屬性，以因果鏈方式呈現：材料→功能→用途→使用族群。
+List 3-5 types of materials, each extending into a complete causal chain.
 JSON format:
 {{"name": "{query}", "children": [{{"name": "Material Name", "category": "Materials", "children": [{{"name": "Function Name", "category": "Functions", "children": [{{"name": "Usage Name", "category": "Usages", "children": [{{"name": "User Group Name", "category": "User Groups"}}]}}]}}]}}]}}
 Return JSON only."""
    else:
        prompt = f"""分析「{query}」的屬性，以因果鏈方式呈現：材料→功能→用途→使用族群。
 請列出 3-5 種材料，每種材料延伸出完整因果鏈。
@@ -27,9 +43,18 @@ JSON 格式：
    return prompt
-def get_step1_attributes_prompt(query: str) -> str:
+def get_step1_attributes_prompt(query: str, lang: LanguageType = "zh") -> str:
-    """Step 1: 生成各類別的屬性列表（平行結構）"""
+    """Step 1: Generate attribute list for each category (parallel structure)"""
-    return f"""/no_think
+    if lang == "en":
        return f"""/no_think
 Analyze "{query}" and list attributes for the following four categories. List 3-5 common attributes for each category.
 Return JSON only, in the following format:
 {{"materials": ["material1", "material2", "material3"], "functions": ["function1", "function2", "function3"], "usages": ["usage1", "usage2", "usage3"], "users": ["user group1", "user group2", "user group3"]}}
 Object: {query}"""
    else:
        return f"""/no_think
 分析「{query}」，列出以下四個類別的屬性。每個類別列出 3-5 個常見屬性。
 只回傳 JSON，格式如下：
@@ -45,21 +70,48 @@ def get_step2_causal_chain_prompt(
    usages: List[str],
    users: List[str],
    existing_chains: List[dict],
-    chain_index: int
+    chain_index: int,
    lang: LanguageType = "zh"
 ) -> str:
-    """Step 2: 生成單條因果鏈"""
+    """Step 2: Generate a single causal chain"""
    existing_chains_text = ""
-    if existing_chains:
+
-        chains_list = [
+    if lang == "en":
-            f"- {c['material']} → {c['function']} → {c['usage']} → {c['user']}"
+        if existing_chains:
-            for c in existing_chains
+            chains_list = [
-        ]
+                f"- {c['material']} → {c['function']} → {c['usage']} → {c['user']}"
-        existing_chains_text = f"""
+                for c in existing_chains
            ]
            existing_chains_text = f"""
 [Already generated causal chains, do not repeat]
 {chr(10).join(chains_list)}
 """
        return f"""/no_think
 Generate causal chain #{chain_index} for "{query}".
 [Available Materials] {', '.join(materials)}
 [Available Functions] {', '.join(functions)}
 [Available Usages] {', '.join(usages)}
 [Available User Groups] {', '.join(users)}
 {existing_chains_text}
 [Rules]
 1. Select one attribute from each category to form a logical causal chain
 2. The causal relationship must be logical (materials determine functions, functions determine usages, usages determine user groups)
 3. Do not repeat existing causal chains
 Return JSON only:
 {{"material": "selected material", "function": "selected function", "usage": "selected usage", "user": "selected user group"}}"""
    else:
        if existing_chains:
            chains_list = [
                f"- {c['material']} → {c['function']} → {c['usage']} → {c['user']}"
                for c in existing_chains
            ]
            existing_chains_text = f"""
 【已生成的因果鏈，請勿重複】
 {chr(10).join(chains_list)}
 """
-
+        return f"""/no_think
    return f"""/no_think
 為「{query}」生成第 {chain_index} 條因果鏈。
 【可選材料】{', '.join(materials)}
@@ -76,19 +128,52 @@ def get_step2_causal_chain_prompt(
 {{"material": "選擇的材料", "function": "選擇的功能", "usage": "選擇的用途", "user": "選擇的族群"}}"""
-def get_flat_attribute_prompt(query: str, categories: Optional[List[str]] = None) -> str:
+def get_flat_attribute_prompt(
    query: str,
    categories: Optional[List[str]] = None,
    lang: LanguageType = "zh"
 ) -> str:
    """Generate prompt with flat/parallel categories (original design)."""
-    cats = categories if categories else DEFAULT_CATEGORIES
+    cats = categories if categories else get_default_categories(lang)
    cat_descs = get_category_descriptions(lang)
    # Build category list
    category_lines = []
    for cat in cats:
-        desc = CATEGORY_DESCRIPTIONS.get(cat, f"{cat}的相關屬性")
+        desc = cat_descs.get(cat, f"Related attributes of {cat}" if lang == "en" else f"{cat}的相關屬性")
-        category_lines.append(f"- {cat}：{desc}")
+        category_lines.append(f"- {cat}: {desc}")
    categories_text = "\n".join(category_lines)
-    prompt = f"""/no_think
+    if lang == "en":
        prompt = f"""/no_think
 You are an object attribute analysis expert. Please break down the user's input object into the following attribute categories.
 [Required Categories]
 {categories_text}
 [Important] The return format must be valid JSON, and each node must have a "name" field:
 ```json
 {{
  "name": "Object Name",
  "children": [
    {{
      "name": "Category Name",
      "children": [
        {{"name": "Attribute 1"}},
        {{"name": "Attribute 2"}}
      ]
    }}
  ]
 }}
 ```
 Return JSON only, no other text.
 User input: {query}"""
    else:
        prompt = f"""/no_think
 你是一個物件屬性分析專家。請將用戶輸入的物件拆解成以下屬性類別。
 【必須包含的類別】
@@ -123,14 +208,42 @@ def get_flat_attribute_prompt(query: str, categories: Optional[List[str]] = None
 def get_step0_category_analysis_prompt(
    query: str,
    suggested_count: int = 3,
-    exclude_categories: List[str] | None = None
+    exclude_categories: List[str] | None = None,
    lang: LanguageType = "zh"
 ) -> str:
-    """Step 0: LLM 分析建議類別"""
+    """Step 0: LLM analyzes and suggests categories"""
    exclude_text = ""
    if exclude_categories:
        exclude_text = f"\n【禁止使用的類別】{', '.join(exclude_categories)}（這些已經是固定類別，不要重複建議）\n"
-    return f"""/no_think
+    if lang == "en":
        exclude_text = ""
        if exclude_categories:
            exclude_text = f"\n[Forbidden Categories] {', '.join(exclude_categories)} (These are already fixed categories, do not suggest duplicates)\n"
        return f"""/no_think
 Analyze "{query}" and suggest {suggested_count} most suitable attribute categories to describe it.
 [Common Category References] Characteristics, Shape, Color, Size, Brand, Price Range, Weight, Style, Occasion, Season, Technical Specifications
 {exclude_text}
 [Important]
 1. Choose categories that best describe the essence of this object
 2. Categories should have logical relationships
 3. Do not choose overly abstract or duplicate categories
 4. Must suggest creative categories different from the reference list
 Return JSON only:
 {{
  "categories": [
    {{"name": "Category1", "description": "Description1", "order": 0}},
    {{"name": "Category2", "description": "Description2", "order": 1}}
  ]
 }}
 Object: {query}"""
    else:
        exclude_text = ""
        if exclude_categories:
            exclude_text = f"\n【禁止使用的類別】{', '.join(exclude_categories)}（這些已經是固定類別，不要重複建議）\n"
        return f"""/no_think
 分析「{query}」，建議 {suggested_count} 個最適合的屬性類別來描述它。
 【常見類別參考】特性、形狀、顏色、尺寸、品牌、價格區間、重量、風格、場合、季節、技術規格
@@ -154,21 +267,35 @@ def get_step0_category_analysis_prompt(
 def get_step1_dynamic_attributes_prompt(
    query: str,
-    categories: List  # List[CategoryDefinition]
+    categories: List,  # List[CategoryDefinition]
    lang: LanguageType = "zh"
 ) -> str:
-    """動態 Step 1 - 根據類別列表生成屬性"""
+    """Dynamic Step 1 - Generate attributes based on category list"""
-    # 按 order 排序並構建描述
+    # Sort by order and build description
    sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0))
    category_desc = "\n".join([
-        f"- {cat.name if hasattr(cat, 'name') else cat['name']}: {cat.description if hasattr(cat, 'description') else cat.get('description', '相關屬性')}"
+        f"- {cat.name if hasattr(cat, 'name') else cat['name']}: {cat.description if hasattr(cat, 'description') else cat.get('description', 'Related attributes' if lang == 'en' else '相關屬性')}"
        for cat in sorted_cats
    ])
    category_keys = [cat.name if hasattr(cat, 'name') else cat['name'] for cat in sorted_cats]
    json_template = {cat: ["屬性1", "屬性2", "屬性3"] for cat in category_keys}
-    return f"""/no_think
+    if lang == "en":
        json_template = {cat: ["attribute1", "attribute2", "attribute3"] for cat in category_keys}
        return f"""/no_think
 Analyze "{query}" and list attributes for the following categories. List 3-5 common attributes for each category.
 [Category List]
 {category_desc}
 Return JSON only:
 {json.dumps(json_template, ensure_ascii=False, indent=2)}
 Object: {query}"""
    else:
        json_template = {cat: ["屬性1", "屬性2", "屬性3"] for cat in category_keys}
        return f"""/no_think
 分析「{query}」，列出以下類別的屬性。每個類別列出 3-5 個常見屬性。
 【類別列表】
@@ -185,30 +312,59 @@ def get_step2_dynamic_causal_chain_prompt(
    categories: List,  # List[CategoryDefinition]
    attributes_by_category: Dict[str, List[str]],
    existing_chains: List[Dict[str, str]],
-    chain_index: int
+    chain_index: int,
    lang: LanguageType = "zh"
 ) -> str:
-    """動態 Step 2 - 生成動態類別的因果鏈"""
+    """Dynamic Step 2 - Generate causal chains for dynamic categories"""
    sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0))
-    # 構建可選屬性
+    # Build available attributes
    available_attrs = "\n".join([
-        f"【{cat.name if hasattr(cat, 'name') else cat['name']}】{', '.join(attributes_by_category.get(cat.name if hasattr(cat, 'name') else cat['name'], []))}"
+        f"[{cat.name if hasattr(cat, 'name') else cat['name']}] {', '.join(attributes_by_category.get(cat.name if hasattr(cat, 'name') else cat['name'], []))}"
        for cat in sorted_cats
    ])
-    # 已生成的因果鏈
+    if lang == "en":
-    existing_text = ""
+        # Already generated causal chains
-    if existing_chains:
+        existing_text = ""
-        chains_list = [
+        if existing_chains:
-            " → ".join([chain.get(cat.name if hasattr(cat, 'name') else cat['name'], '?') for cat in sorted_cats])
+            chains_list = [
-            for chain in existing_chains
+                " → ".join([chain.get(cat.name if hasattr(cat, 'name') else cat['name'], '?') for cat in sorted_cats])
-        ]
+                for chain in existing_chains
-        existing_text = f"\n【已生成，請勿重複】\n" + "\n".join([f"- {c}" for c in chains_list])
+            ]
            existing_text = "\n[Already generated, do not repeat]\n" + "\n".join([f"- {c}" for c in chains_list])
-    # JSON 模板
+        # JSON template
-    json_template = {cat.name if hasattr(cat, 'name') else cat['name']: f"選擇的{cat.name if hasattr(cat, 'name') else cat['name']}" for cat in sorted_cats}
+        json_template = {cat.name if hasattr(cat, 'name') else cat['name']: f"selected {cat.name if hasattr(cat, 'name') else cat['name']}" for cat in sorted_cats}
-    return f"""/no_think
+        return f"""/no_think
 Generate causal chain #{chain_index} for "{query}".
 [Available Attributes]
 {available_attrs}
 {existing_text}
 [Rules]
 1. Select one attribute from each category
 2. Causal relationships must be logical
 3. Do not repeat
 Return JSON only:
 {json.dumps(json_template, ensure_ascii=False, indent=2)}"""
    else:
        # 已生成的因果鏈
        existing_text = ""
        if existing_chains:
            chains_list = [
                " → ".join([chain.get(cat.name if hasattr(cat, 'name') else cat['name'], '?') for cat in sorted_cats])
                for chain in existing_chains
            ]
            existing_text = "\n【已生成，請勿重複】\n" + "\n".join([f"- {c}" for c in chains_list])
        # JSON 模板
        json_template = {cat.name if hasattr(cat, 'name') else cat['name']: f"選擇的{cat.name if hasattr(cat, 'name') else cat['name']}" for cat in sorted_cats}
        return f"""/no_think
 為「{query}」生成第 {chain_index} 條因果鏈。
 【可選屬性】
@@ -230,20 +386,46 @@ def get_step2_dag_relationships_prompt(
    query: str,
    categories: List,  # List[CategoryDefinition]
    attributes_by_category: Dict[str, List[str]],
    lang: LanguageType = "zh"
 ) -> str:
-    """生成相鄰類別之間的自然關係"""
+    """Generate natural relationships between adjacent categories"""
    sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0))
    # Build attribute listing
    attr_listing = "\n".join([
-        f"【{cat.name if hasattr(cat, 'name') else cat['name']}】{', '.join(attributes_by_category.get(cat.name if hasattr(cat, 'name') else cat['name'], []))}"
+        f"[{cat.name if hasattr(cat, 'name') else cat['name']}] {', '.join(attributes_by_category.get(cat.name if hasattr(cat, 'name') else cat['name'], []))}"
        for cat in sorted_cats
    ])
    # Build direction hints
    direction_hints = " → ".join([cat.name if hasattr(cat, 'name') else cat['name'] for cat in sorted_cats])
-    return f"""/no_think
+    if lang == "en":
        return f"""/no_think
 Analyze the attribute relationships of "{query}".
 {attr_listing}
 [Relationship Direction] {direction_hints}
 [Rules]
 1. Only establish relationships between adjacent categories (e.g., Materials→Functions, Functions→Usages)
 2. Only output pairs that have true causal or associative relationships
 3. An attribute can connect to multiple downstream attributes, or none at all
 4. Not every attribute needs to have connections
 5. Relationships should be reasonable and meaningful
 Return JSON:
 {{
  "relationships": [
    {{"source_category": "CategoryA", "source": "attribute name", "target_category": "CategoryB", "target": "attribute name"}},
    ...
  ]
 }}
 Return JSON only."""
    else:
        return f"""/no_think
 分析「{query}」的屬性關係。
 {attr_listing}
--- a/backend/app/prompts/expert_transformation_prompt.py
+++ b/backend/app/prompts/expert_transformation_prompt.py
@@ -1,34 +1,68 @@
-"""Expert Transformation Agent 提示詞模組"""
+"""Expert Transformation Agent prompts module - Bilingual support"""
 from typing import List, Optional
 from .language_config import LanguageType
 def get_expert_generation_prompt(
    query: str,
    categories: List[str],
    expert_count: int,
-    custom_experts: Optional[List[str]] = None
+    custom_experts: Optional[List[str]] = None,
    lang: LanguageType = "zh"
 ) -> str:
-    """Step 0: 生成專家團隊（不依賴主題，純隨機多元）"""
+    """Step 0: Generate expert team (not dependent on topic, purely random and diverse)"""
    import time
    import random
-    custom_text = ""
+    # Add timestamp and random number for diversity
    if custom_experts and len(custom_experts) > 0:
        custom_text = f"（已指定：{', '.join(custom_experts[:expert_count])}）"
    # 加入時間戳和隨機數來增加多樣性
    seed = int(time.time() * 1000) % 10000
    diversity_hints = [
        "冷門、非主流、跨領域",
        "罕見職業、新興領域、邊緣學科",
        "非傳統、創新、小眾專業",
        "未來趨向、實驗性、非常規",
        "跨文化、混合領域、獨特視角"
    ]
    hint = random.choice(diversity_hints)
-    return f"""/no_think
+    if lang == "en":
        custom_text = ""
        if custom_experts and len(custom_experts) > 0:
            custom_text = f" (Specified: {', '.join(custom_experts[:expert_count])})"
        diversity_hints = [
            "obscure, non-mainstream, cross-disciplinary",
            "rare occupations, emerging fields, fringe disciplines",
            "unconventional, innovative, niche specialties",
            "future-oriented, experimental, non-traditional",
            "cross-cultural, hybrid fields, unique perspectives"
        ]
        hint = random.choice(diversity_hints)
        return f"""/no_think
 Randomly assemble a team of {expert_count} experts from completely different fields{custom_text}.
 [Innovation Requirements] (Random seed: {seed})
 - Prioritize {hint} experts
 - Avoid common professions (such as doctors, engineers, teachers, lawyers, etc.)
 - Each expert must be from a completely unrelated field
 - The rarer and more innovative, the better
 Return JSON:
 {{"experts": [{{"id": "expert-0", "name": "profession", "domain": "field", "perspective": "viewpoint"}}, ...]}}
 Rules:
 - id should be expert-0 to expert-{expert_count - 1}
 - name is the profession name (not a person's name), 2-5 words
 - domain should be specific and unique, no duplicate types"""
    else:
        custom_text = ""
        if custom_experts and len(custom_experts) > 0:
            custom_text = f"（已指定：{', '.join(custom_experts[:expert_count])}）"
        diversity_hints = [
            "冷門、非主流、跨領域",
            "罕見職業、新興領域、邊緣學科",
            "非傳統、創新、小眾專業",
            "未來趨向、實驗性、非常規",
            "跨文化、混合領域、獨特視角"
        ]
        hint = random.choice(diversity_hints)
        return f"""/no_think
 隨機組建 {expert_count} 個來自完全不同領域的專家團隊{custom_text}。
 【創新要求】（隨機種子:{seed}）
@@ -50,13 +84,39 @@ def get_expert_keyword_generation_prompt(
    category: str,
    attribute: str,
    experts: List[dict],  # List[ExpertProfile]
-    keywords_per_expert: int = 1
+    keywords_per_expert: int = 1,
    lang: LanguageType = "zh"
 ) -> str:
-    """Step 1: 專家視角關鍵字生成"""
+    """Step 1: Expert perspective keyword generation"""
-    # 建立專家列表，格式更清晰
+    # Build expert list in clearer format
    experts_list = "\n".join([f"- {exp['id']}: {exp['name']}" for exp in experts])
-    return f"""/no_think
+    if lang == "en":
        return f"""/no_think
 You need to play the role of the following experts to generate innovative keywords for an attribute:
 [Expert List]
 {experts_list}
 [Task]
 Attribute: "{attribute}" (Category: {category})
 For each expert, please:
 1. First understand the professional background, knowledge domain, and work content of that profession
 2. Think about "{attribute}" from that profession's unique perspective
 3. Generate {keywords_per_expert} innovative keyword(s) related to that specialty (2-6 words)
 Keywords must reflect that expert's professional thinking style, for example:
 - Accountant viewing "movement" → "cash flow", "cost-benefit"
 - Architect viewing "movement" → "circulation design", "spatial flow"
 - Psychologist viewing "movement" → "behavioral motivation", "emotional transition"
 Return JSON:
 {{"keywords": [{{"keyword": "term", "expert_id": "expert-X", "expert_name": "name"}}, ...]}}
 Total of {len(experts) * keywords_per_expert} keywords needed, each keyword must be clearly related to the corresponding expert's professional field."""
    else:
        return f"""/no_think
 你需要扮演以下專家，為屬性生成創新關鍵字：
 【專家名單】
@@ -86,13 +146,29 @@ def get_single_description_prompt(
    keyword: str,
    expert_id: str,
    expert_name: str,
-    expert_domain: str
+    expert_domain: str,
    lang: LanguageType = "zh"
 ) -> str:
-    """Step 2: 為單一關鍵字生成描述"""
+    """Step 2: Generate description for a single keyword"""
-    # 如果 domain 是通用的，就只用職業名稱
+    if lang == "en":
-    domain_text = f"（{expert_domain}領域）" if expert_domain and expert_domain != "Professional Field" else ""
+        # If domain is generic, just use profession name
        domain_text = f" ({expert_domain} field)" if expert_domain and expert_domain != "Professional Field" else ""
-    return f"""/no_think
+        return f"""/no_think
 You are a {expert_name}{domain_text}.
 Task: Generate an innovative application description for "{query}".
 Keyword: {keyword}
 From your professional perspective, explain how to apply the concept of "{keyword}" to "{query}". The description should be specific, creative, 15-30 words.
 Return JSON only, no other text:
 {{"description": "your innovative application description"}}"""
    else:
        # 如果 domain 是通用的，就只用職業名稱
        domain_text = f"（{expert_domain}領域）" if expert_domain and expert_domain != "Professional Field" else ""
        return f"""/no_think
 你是一位{expert_name}{domain_text}。
 任務：為「{query}」生成一段創新應用描述。
--- a/backend/app/prompts/language_config.py
+++ b/backend/app/prompts/language_config.py
@@ -0,0 +1,51 @@
 """Language configuration for prompts"""
 from enum import Enum
 from typing import Literal
 class Language(str, Enum):
    CHINESE = "zh"
    ENGLISH = "en"
 LanguageType = Literal["zh", "en"]
 # Default categories for each language
 DEFAULT_CATEGORIES = {
    "zh": ["材料", "功能", "用途", "使用族群", "特性"],
    "en": ["Materials", "Functions", "Usages", "User Groups", "Characteristics"],
 }
 CATEGORY_DESCRIPTIONS = {
    "zh": {
        "材料": "物件由什麼材料組成",
        "功能": "物件能做什麼",
        "用途": "物件在什麼場景使用",
        "使用族群": "誰會使用這個物件",
        "特性": "物件有什麼特徵",
    },
    "en": {
        "Materials": "What materials the object is made of",
        "Functions": "What the object can do",
        "Usages": "In what scenarios the object is used",
        "User Groups": "Who uses this object",
        "Characteristics": "What features the object has",
    },
 }
 # Category name mappings between languages
 CATEGORY_MAPPING = {
    "zh_to_en": {
        "材料": "Materials",
        "功能": "Functions",
        "用途": "Usages",
        "使用族群": "User Groups",
        "特性": "Characteristics",
    },
    "en_to_zh": {
        "Materials": "材料",
        "Functions": "功能",
        "Usages": "用途",
        "User Groups": "使用族群",
        "Characteristics": "特性",
    },
 }
--- a/backend/app/prompts/transformation_prompt.py
+++ b/backend/app/prompts/transformation_prompt.py
@@ -1,22 +1,43 @@
-"""Transformation Agent 提示詞模組"""
+"""Transformation Agent prompts module - Bilingual support"""
 from typing import List
 from .language_config import LanguageType
 def get_keyword_generation_prompt(
    category: str,
    attributes: List[str],
-    keyword_count: int = 3
+    keyword_count: int = 3,
    lang: LanguageType = "zh"
 ) -> str:
    """
-    Step 1: 生成新關鍵字
+    Step 1: Generate new keywords
-    給定類別和現有屬性，生成全新的、有創意的關鍵字。
+    Given a category and existing attributes, generate new, creative keywords.
-    不考慮原始查詢，只專注於類別本身可能的延伸。
+    Don't consider the original query, focus only on possible extensions of the category itself.
    """
-    attrs_text = "、".join(attributes)
+    attrs_text = ", ".join(attributes) if lang == "en" else "、".join(attributes)
-    return f"""/no_think
+    if lang == "en":
        return f"""/no_think
 You are a creative brainstorming expert. Given a category and its existing attributes, please generate new, creative keywords or descriptive phrases.
 [Category] {category}
 [Existing Attributes] {attrs_text}
 [Important Rules]
 1. Generate {keyword_count} completely new keywords
 2. Keywords must fit within the scope of "{category}" category
 3. Keywords should be creative and not duplicate or be too similar to existing attributes
 4. Don't consider any specific object, focus only on possible extensions of this category
 5. Each keyword should be 2-6 words
 Return JSON only:
 {{
  "keywords": ["keyword1", "keyword2", "keyword3"]
 }}"""
    else:
        return f"""/no_think
 你是一個創意發想專家。給定一個類別和該類別下的現有屬性，請生成全新的、有創意的關鍵字或描述片段。
 【類別】{category}
@@ -38,14 +59,36 @@ def get_keyword_generation_prompt(
 def get_description_generation_prompt(
    query: str,
    category: str,
-    keyword: str
+    keyword: str,
    lang: LanguageType = "zh"
 ) -> str:
    """
-    Step 2: 結合原始查詢生成描述
+    Step 2: Combine with original query to generate description
-    用新關鍵字創造一個與原始查詢相關的創新應用描述。
+    Use new keyword to create an innovative application description related to the original query.
    """
-    return f"""/no_think
+    if lang == "en":
        return f"""/no_think
 You are an innovation application expert. Please apply a new keyword concept to a specific object to create an innovative application description.
 [Object] {query}
 [Category] {category}
 [New Keyword] {keyword}
 [Task]
 Using the concept of "{keyword}", create an innovative application description for "{query}".
 The description should be a complete sentence or phrase explaining how to apply this new concept to the object.
 [Example Format]
 - If the object is "bicycle" and keyword is "monitor", you could generate "bicycle monitors the rider's health status"
 - If the object is "umbrella" and keyword is "generate power", you could generate "umbrella generates electricity using raindrop impacts"
 Return JSON only:
 {{
  "description": "innovative application description"
 }}"""
    else:
        return f"""/no_think
 你是一個創新應用專家。請將一個新的關鍵字概念應用到特定物件上，創造出創新的應用描述。
 【物件】{query}
@@ -69,15 +112,35 @@ def get_description_generation_prompt(
 def get_batch_description_prompt(
    query: str,
    category: str,
-    keywords: List[str]
+    keywords: List[str],
    lang: LanguageType = "zh"
 ) -> str:
    """
-    批次生成描述（可選的優化版本，一次處理多個關鍵字）
+    Batch description generation (optional optimized version, process multiple keywords at once)
    """
-    keywords_text = "、".join(keywords)
+    keywords_text = ", ".join(keywords) if lang == "en" else "、".join(keywords)
    keywords_json = ", ".join([f'"{k}"' for k in keywords])
-    return f"""/no_think
+    if lang == "en":
        return f"""/no_think
 You are an innovation application expert. Please apply multiple new keyword concepts to a specific object, creating an innovative application description for each keyword.
 [Object] {query}
 [Category] {category}
 [New Keywords] {keywords_text}
 [Task]
 Create an innovative application description related to "{query}" for each keyword.
 Each description should be a complete sentence or phrase.
 Return JSON only:
 {{
  "descriptions": [
    {{"keyword": "keyword1", "description": "description1"}},
    {{"keyword": "keyword2", "description": "description2"}}
  ]
 }}"""
    else:
        return f"""/no_think
 你是一個創新應用專家。請將多個新的關鍵字概念應用到特定物件上，為每個關鍵字創造創新的應用描述。
 【物件】{query}
--- a/backend/app/routers/attributes.py
+++ b/backend/app/routers/attributes.py
@@ -58,7 +58,8 @@ async def execute_step0(
    prompt = get_step0_category_analysis_prompt(
        request.query,
        request.suggested_category_count,
-        exclude_categories=exclude_categories
+        exclude_categories=exclude_categories,
        lang=request.lang
    )
    temperature = request.temperature if request.temperature is not None else 0.7
    response = await ollama_provider.generate(
@@ -310,7 +311,7 @@ async def generate_sse_events(request: StreamAnalyzeRequest) -> AsyncGenerator[s
        # ========== Step 1: Generate Attributes (Dynamic) ==========
        yield f"event: step1_start\ndata: {json.dumps({'message': '生成屬性...'}, ensure_ascii=False)}\n\n"
-        step1_prompt = get_step1_dynamic_attributes_prompt(request.query, final_categories)
+        step1_prompt = get_step1_dynamic_attributes_prompt(request.query, final_categories, lang=request.lang)
        logger.info(f"Step 1 prompt: {step1_prompt[:200]}")
        step1_response = await ollama_provider.generate(
@@ -330,6 +331,7 @@ async def generate_sse_events(request: StreamAnalyzeRequest) -> AsyncGenerator[s
            query=request.query,
            categories=final_categories,
            attributes_by_category=step1_result.attributes,
            lang=request.lang
        )
        logger.info(f"Step 2 (relationships) prompt: {step2_prompt[:300]}")
--- a/backend/app/routers/deduplication.py
+++ b/backend/app/routers/deduplication.py
@@ -63,7 +63,8 @@ async def deduplicate_descriptions(request: DeduplicationRequest) -> Deduplicati
            # 使用 LLM 成對比較去重
            result = await llm_deduplication_service.deduplicate(
                descriptions=request.descriptions,
-                model=request.model
+                model=request.model,
                lang=request.lang
            )
        return result
    except ValueError as e:
--- a/backend/app/routers/expert_transformation.py
+++ b/backend/app/routers/expert_transformation.py
@@ -68,7 +68,8 @@ async def generate_expert_transformation_events(
                    query=request.query,
                    categories=all_categories,
                    expert_count=request.expert_count,
-                    custom_experts=actual_custom_experts if actual_custom_experts else None
+                    custom_experts=actual_custom_experts if actual_custom_experts else None,
                    lang=request.lang
                )
                logger.info(f"Expert prompt: {expert_prompt[:200]}")
@@ -119,7 +120,8 @@ async def generate_expert_transformation_events(
                        query=request.query,
                        categories=all_categories,
                        expert_count=request.expert_count,
-                        custom_experts=actual_custom_experts if actual_custom_experts else None
+                        custom_experts=actual_custom_experts if actual_custom_experts else None,
                        lang=request.lang
                    )
                    expert_response = await ollama_provider.generate(
@@ -160,7 +162,8 @@ async def generate_expert_transformation_events(
                    category=request.category,
                    attribute=attribute,
                    experts=[e.model_dump() for e in experts],
-                    keywords_per_expert=request.keywords_per_expert
+                    keywords_per_expert=request.keywords_per_expert,
                    lang=request.lang
                )
                logger.info(f"Keyword prompt for '{attribute}': {kw_prompt[:300]}")
@@ -214,7 +217,8 @@ async def generate_expert_transformation_events(
                    keyword=kw.keyword,
                    expert_id=kw.expert_id,
                    expert_name=kw.expert_name,
-                    expert_domain=expert_domain
+                    expert_domain=expert_domain,
                    lang=request.lang
                )
                desc_response = await ollama_provider.generate(
--- a/backend/app/routers/patent_search.py
+++ b/backend/app/routers/patent_search.py
@@ -0,0 +1,137 @@
 """Patent Search Router - Search for similar patents using Lens.org API"""
 import logging
 from typing import Optional, List
 from fastapi import APIRouter
 from pydantic import BaseModel
 from ..services.patent_search_service import patent_search_service
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api/patent", tags=["patent"])
 # ===== Request/Response Models =====
 class PatentSearchRequest(BaseModel):
    """Patent search request"""
    query: str                          # Search query (description or keywords)
    max_results: int = 10               # Maximum results to return (1-20)
 class PatentResult(BaseModel):
    """Single patent result from Lens.org"""
    lens_id: str
    doc_number: str
    jurisdiction: str
    kind: str
    title: str
    abstract: Optional[str] = None
    date_published: Optional[str] = None
    applicants: List[str] = []
    inventors: List[str] = []
    legal_status: Optional[str] = None
    classifications_cpc: List[str] = []
    families_simple: List[str] = []
    url: str
 class PatentSearchResponse(BaseModel):
    """Patent search response"""
    query: str
    total_results: int
    patents: List[PatentResult]
    error: Optional[str] = None
 class BatchPatentSearchRequest(BaseModel):
    """Batch patent search request - search multiple descriptions"""
    queries: List[str]                  # List of descriptions to search
    max_results_per_query: int = 5      # Max results per query
 class BatchPatentSearchResult(BaseModel):
    """Results for a single query in batch search"""
    query: str
    total_results: int
    patents: List[PatentResult]
    error: Optional[str] = None
 class BatchPatentSearchResponse(BaseModel):
    """Batch patent search response"""
    results: List[BatchPatentSearchResult]
    total_queries: int
 # ===== Endpoints =====
@router.post("/search", response_model=PatentSearchResponse)
 async def search_patents(request: PatentSearchRequest):
    """
    Search for patents similar to the given description/query.
    Uses Lens.org API to find related patents based on title, abstract, and claims.
    """
    logger.info(f"Patent search request: {request.query[:100]}...")
    # Limit max_results to reasonable range
    max_results = min(max(1, request.max_results), 20)
    result = await patent_search_service.search(
        query=request.query,
        max_results=max_results,
    )
    return PatentSearchResponse(
        query=request.query,
        total_results=result.get("total_results", 0),
        patents=[PatentResult(**p) for p in result.get("patents", [])],
        error=result.get("error"),
    )
@router.post("/search/batch", response_model=BatchPatentSearchResponse)
 async def batch_search_patents(request: BatchPatentSearchRequest):
    """
    Search for patents for multiple descriptions at once.
    Useful for checking multiple creative descriptions against patents.
    """
    logger.info(f"Batch patent search: {len(request.queries)} queries")
    # Limit results per query
    max_per_query = min(max(1, request.max_results_per_query), 10)
    results: List[BatchPatentSearchResult] = []
    for query in request.queries:
        result = await patent_search_service.search(
            query=query,
            max_results=max_per_query,
        )
        results.append(BatchPatentSearchResult(
            query=query,
            total_results=result.get("total_results", 0),
            patents=[PatentResult(**p) for p in result.get("patents", [])],
            error=result.get("error"),
        ))
    return BatchPatentSearchResponse(
        results=results,
        total_queries=len(request.queries),
    )
@router.get("/health")
 async def patent_search_health():
    """Check if patent search service is working"""
    # Do a simple test search
    result = await patent_search_service.search("test", max_results=1)
    if result.get("error"):
        return {"status": "unhealthy", "error": result["error"]}
    return {"status": "healthy"}
--- a/backend/app/routers/transformation.py
+++ b/backend/app/routers/transformation.py
@@ -36,7 +36,8 @@ async def generate_transformation_events(
        keyword_prompt = get_keyword_generation_prompt(
            category=request.category,
            attributes=request.attributes,
-            keyword_count=request.keyword_count
+            keyword_count=request.keyword_count,
            lang=request.lang
        )
        logger.info(f"Keyword prompt: {keyword_prompt[:200]}")
@@ -61,7 +62,8 @@ async def generate_transformation_events(
        desc_prompt = get_batch_description_prompt(
            query=request.query,
            category=request.category,
-            keywords=new_keywords
+            keywords=new_keywords,
            lang=request.lang
        )
        logger.info(f"Description prompt: {desc_prompt[:300]}")
--- a/backend/app/services/embedding_service.py
+++ b/backend/app/services/embedding_service.py
@@ -26,7 +26,7 @@ class EmbeddingService:
    def __init__(self):
        self.base_url = settings.ollama_base_url
-        self.default_model = "nomic-embed-text"  # Ollama 預設的 embedding 模型
+        self.default_model = "qwen3-embedding:4b"  # Qwen3 embedding model for better semantic understanding
        self.client = httpx.AsyncClient(timeout=120.0)
    async def get_embedding(self, text: str, model: Optional[str] = None) -> List[float]:
--- a/backend/app/services/llm_deduplication_service.py
+++ b/backend/app/services/llm_deduplication_service.py
@@ -1,12 +1,12 @@
 """
-LLM Deduplication Service - 使用 LLM 成對比較進行去重
+LLM Deduplication Service - Using LLM pairwise comparison for deduplication
-讓 LLM 判斷兩個描述是否語意重複，透過並行處理加速。
+Let LLM determine whether two descriptions are semantically duplicate, accelerated by parallel processing.
 """
 import asyncio
 import logging
-from typing import List, Tuple, Optional
+from typing import List, Tuple, Optional, Literal
 import httpx
 import numpy as np
@@ -18,6 +18,7 @@ from ..models.schemas import (
    DeduplicationMethod,
    DescriptionGroup,
 )
 from ..prompts.language_config import LanguageType
 logger = logging.getLogger(__name__)
@@ -31,27 +32,20 @@ class LLMDeduplicationService:
        self.client = httpx.AsyncClient(timeout=60.0)
        self.max_concurrent = 5  # 最大並行數，避免 Ollama 過載
-    async def compare_pair(
+    def _get_comparison_prompt(self, desc1: str, desc2: str, lang: LanguageType = "zh") -> str:
-        self,
+        """Get comparison prompt in the specified language"""
-        desc1: str,
+        if lang == "en":
-        desc2: str,
+            return f"""Determine whether the following two innovative descriptions express the same or very similar concepts:
        model: str,
        semaphore: asyncio.Semaphore
    ) -> bool:
        """
        讓 LLM 判斷兩個描述是否語意重複
-        Args:
+Description 1: {desc1}
            desc1: 第一個描述
            desc2: 第二個描述
            model: LLM 模型名稱
            semaphore: 並行控制信號量
-        Returns:
+Description 2: {desc2}
-            bool: 是否為重複描述
+
-        """
+If both descriptions essentially express the same or very similar innovative concept, answer "YES"
-        async with semaphore:  # 控制並行數
+If the two descriptions express different innovative concepts, answer "NO"
-            prompt = f"""判斷以下兩個創新描述是否表達相同或非常相似的概念：
+Only answer YES or NO, no other text"""
        else:
            return f"""判斷以下兩個創新描述是否表達相同或非常相似的概念：
 描述1: {desc1}
@@ -61,6 +55,30 @@ class LLMDeduplicationService:
 如果兩者描述不同的創新概念，回答 "NO"
 只回答 YES 或 NO，不要其他文字"""
    async def compare_pair(
        self,
        desc1: str,
        desc2: str,
        model: str,
        semaphore: asyncio.Semaphore,
        lang: LanguageType = "zh"
    ) -> bool:
        """
        Let LLM determine whether two descriptions are semantically duplicate
        Args:
            desc1: First description
            desc2: Second description
            model: LLM model name
            semaphore: Concurrency control semaphore
            lang: Language for the prompt
        Returns:
            bool: Whether the descriptions are duplicates
        """
        async with semaphore:  # Control concurrency
            prompt = self._get_comparison_prompt(desc1, desc2, lang)
            try:
                response = await self.client.post(
                    f"{self.base_url}/api/generate",
@@ -86,26 +104,28 @@ class LLMDeduplicationService:
    async def compare_batch(
        self,
        pairs: List[Tuple[int, int, str, str]],
-        model: str
+        model: str,
        lang: LanguageType = "zh"
    ) -> List[Tuple[int, int, bool]]:
        """
-        並行批次比較多個描述對
+        Parallel batch comparison of multiple description pairs
        Args:
-            pairs: 待比較的配對列表 [(i, j, desc1, desc2), ...]
+            pairs: List of pairs to compare [(i, j, desc1, desc2), ...]
-            model: LLM 模型名稱
+            model: LLM model name
            lang: Language for the prompt
        Returns:
-            比較結果列表 [(i, j, is_similar), ...]
+            List of comparison results [(i, j, is_similar), ...]
        """
        semaphore = asyncio.Semaphore(self.max_concurrent)
        async def compare_one(pair: Tuple[int, int, str, str]) -> Tuple[int, int, bool]:
            i, j, desc1, desc2 = pair
-            is_similar = await self.compare_pair(desc1, desc2, model, semaphore)
+            is_similar = await self.compare_pair(desc1, desc2, model, semaphore, lang)
            return (i, j, is_similar)
-        # 使用 asyncio.gather 並行執行所有比較
+        # Use asyncio.gather to execute all comparisons in parallel
        results = await asyncio.gather(*[compare_one(p) for p in pairs])
        return results
@@ -144,17 +164,19 @@ class LLMDeduplicationService:
    async def deduplicate(
        self,
        descriptions: List[ExpertTransformationDescription],
-        model: Optional[str] = None
+        model: Optional[str] = None,
        lang: LanguageType = "zh"
    ) -> DeduplicationResult:
        """
-        使用 LLM 成對比較進行去重
+        Use LLM pairwise comparison for deduplication
        Args:
-            descriptions: 要去重的描述列表
+            descriptions: List of descriptions to deduplicate
-            model: LLM 模型名稱
+            model: LLM model name
            lang: Language for the prompt
        Returns:
-            DeduplicationResult: 去重結果
+            DeduplicationResult: Deduplication result
        """
        model = model or self.default_model
@@ -188,10 +210,10 @@ class LLMDeduplicationService:
                ))
        total_pairs = len(pairs)
-        logger.info(f"LLM deduplication: {total_pairs} pairs to compare (parallel={self.max_concurrent}, model={model})")
+        logger.info(f"LLM deduplication: {total_pairs} pairs to compare (parallel={self.max_concurrent}, model={model}, lang={lang})")
-        # 並行批次比較
+        # Parallel batch comparison
-        results = await self.compare_batch(pairs, model)
+        results = await self.compare_batch(pairs, model, lang)
        # 填入相似度矩陣
        for i, j, is_similar in results:
--- a/backend/app/services/patent_search_service.py
+++ b/backend/app/services/patent_search_service.py
@@ -0,0 +1,264 @@
 """Patent Search Service using Lens.org API"""
 import httpx
 import logging
 from typing import List, Optional, Dict, Any
 from dataclasses import dataclass, asdict
 from app.config import settings
 logger = logging.getLogger(__name__)
@dataclass
 class PatentSearchResult:
    """Single patent search result from Lens.org"""
    lens_id: str
    doc_number: str
    jurisdiction: str
    kind: str
    title: str
    abstract: Optional[str]
    date_published: Optional[str]
    applicants: List[str]
    inventors: List[str]
    legal_status: Optional[str]
    classifications_cpc: List[str]
    families_simple: List[str]
    url: str
    def to_dict(self) -> Dict[str, Any]:
        return asdict(self)
 class PatentSearchService:
    """Service for searching patents using Lens.org API"""
    LENS_API_URL = "https://api.lens.org/patent/search"
    def __init__(self):
        self._client: Optional[httpx.AsyncClient] = None
    async def _get_client(self) -> httpx.AsyncClient:
        if self._client is None or self._client.is_closed:
            self._client = httpx.AsyncClient(
                timeout=30.0,
                follow_redirects=True,
            )
        return self._client
    async def close(self):
        if self._client and not self._client.is_closed:
            await self._client.aclose()
    def _get_headers(self) -> Dict[str, str]:
        """Get headers with authorization token"""
        token = settings.lens_api_token
        if not token:
            raise ValueError("LENS_API_TOKEN environment variable is not set")
        return {
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json",
            "Accept": "application/json",
        }
    async def search(
        self,
        query: str,
        max_results: int = 10,
    ) -> dict:
        """
        Search Lens.org for relevant patents
        Args:
            query: Search query (searches title, abstract, and claims)
            max_results: Maximum number of results to return
        Returns:
            Dict with total_results count and list of patent results
        """
        try:
            client = await self._get_client()
            # Build Lens.org query using query string format for full-text search
            request_body = {
                "query": query,
                "size": max_results,
                "sort": [{"_score": "desc"}]
            }
            logger.info(f"Searching Lens.org patents with query: {query[:100]}...")
            response = await client.post(
                self.LENS_API_URL,
                json=request_body,
                headers=self._get_headers(),
            )
            if response.status_code == 401:
                logger.error("Lens.org API authentication failed - check LENS_API_TOKEN")
                return {
                    "total_results": 0,
                    "patents": [],
                    "error": "Authentication failed - invalid API token"
                }
            if response.status_code == 429:
                logger.warning("Lens.org API rate limit exceeded")
                return {
                    "total_results": 0,
                    "patents": [],
                    "error": "Rate limit exceeded - please try again later"
                }
            if response.status_code != 200:
                logger.error(f"Lens.org API returned status {response.status_code}: {response.text}")
                return {
                    "total_results": 0,
                    "patents": [],
                    "error": f"API returned status {response.status_code}"
                }
            data = response.json()
            total_results = data.get("total", 0)
            results = data.get("data", [])
            patents: List[PatentSearchResult] = []
            for item in results:
                patent = self._parse_patent(item)
                patents.append(patent)
            logger.info(f"Found {total_results} total patents, returning {len(patents)}")
            return {
                "total_results": total_results,
                "patents": [p.to_dict() for p in patents],
            }
        except ValueError as e:
            logger.error(f"Configuration error: {e}")
            return {
                "total_results": 0,
                "patents": [],
                "error": str(e)
            }
        except httpx.HTTPError as e:
            logger.error(f"HTTP error searching patents: {e}")
            return {
                "total_results": 0,
                "patents": [],
                "error": str(e)
            }
        except Exception as e:
            logger.error(f"Error searching patents: {e}")
            return {
                "total_results": 0,
                "patents": [],
                "error": str(e)
            }
    def _parse_patent(self, item: Dict[str, Any]) -> PatentSearchResult:
        """Parse a single patent result from Lens.org response"""
        lens_id = item.get("lens_id", "")
        jurisdiction = item.get("jurisdiction", "")
        doc_number = item.get("doc_number", "")
        kind = item.get("kind", "")
        # Get biblio section (contains title, parties, classifications)
        biblio = item.get("biblio", {})
        # Extract title from biblio.invention_title (list with lang info)
        title_data = biblio.get("invention_title", [])
        title = self._extract_text_with_lang(title_data)
        # Extract abstract (top-level, list with lang info)
        abstract_data = item.get("abstract", [])
        abstract = self._extract_text_with_lang(abstract_data)
        # Extract applicants from biblio.parties.applicants
        parties = biblio.get("parties", {})
        applicants = []
        applicant_data = parties.get("applicants", [])
        if isinstance(applicant_data, list):
            for app in applicant_data:
                if isinstance(app, dict):
                    name = app.get("extracted_name", {}).get("value", "")
                    if name:
                        applicants.append(name)
        # Extract inventors from biblio.parties.inventors
        inventors = []
        inventor_data = parties.get("inventors", [])
        if isinstance(inventor_data, list):
            for inv in inventor_data:
                if isinstance(inv, dict):
                    name = inv.get("extracted_name", {}).get("value", "")
                    if name:
                        inventors.append(name)
        # Extract legal status
        legal_status_data = item.get("legal_status", {})
        legal_status = None
        if isinstance(legal_status_data, dict):
            legal_status = legal_status_data.get("patent_status")
        # Extract CPC classifications from biblio.classifications_cpc
        classifications_cpc = []
        cpc_data = biblio.get("classifications_cpc", [])
        if isinstance(cpc_data, list):
            for cpc in cpc_data:
                if isinstance(cpc, dict):
                    symbol = cpc.get("symbol", "")
                    if symbol:
                        classifications_cpc.append(symbol)
        # Extract simple family members
        families_simple = []
        families_data = item.get("families", {})
        if isinstance(families_data, dict):
            simple_family = families_data.get("simple", {})
            if isinstance(simple_family, dict):
                members = simple_family.get("members", [])
                if isinstance(members, list):
                    families_simple = [m.get("lens_id", "") for m in members if isinstance(m, dict) and m.get("lens_id")]
        # Build URL to Lens.org patent page
        url = f"https://www.lens.org/lens/patent/{lens_id}" if lens_id else ""
        return PatentSearchResult(
            lens_id=lens_id,
            doc_number=doc_number,
            jurisdiction=jurisdiction,
            kind=kind,
            title=title,
            abstract=abstract,
            date_published=item.get("date_published"),
            applicants=applicants,
            inventors=inventors,
            legal_status=legal_status,
            classifications_cpc=classifications_cpc,
            families_simple=families_simple,
            url=url,
        )
    def _extract_text_with_lang(self, data: Any, prefer_lang: str = "en") -> str:
        """Extract text from Lens.org language-tagged list, preferring specified language"""
        if not data:
            return ""
        if isinstance(data, str):
            return data
        if isinstance(data, list) and data:
            # Prefer specified language
            for item in data:
                if isinstance(item, dict) and item.get("lang") == prefer_lang:
                    return item.get("text", "")
            # Fall back to first item
            first = data[0]
            if isinstance(first, dict):
                return first.get("text", "")
            return str(first)
        return ""
 # Singleton instance
 patent_search_service = PatentSearchService()
--- a/experiments/init.py
+++ b/experiments/init.py
@@ -0,0 +1,7 @@
 """
 Experiment module for 5-condition idea generation study.
 This module implements a 2×2 factorial design + control to test
 the contributions of attribute decomposition and expert perspectives
 to creative ideation quality.
 """
--- a/experiments/analyze_results.py
+++ b/experiments/analyze_results.py
@@ -0,0 +1,546 @@
 """
 Statistical analysis for experiment results.
 Performs:
 - 2×2 ANOVA for main effects (attributes, experts) and interaction
 - Post-hoc tests (Tukey HSD)
 - Effect sizes (Cohen's d)
 - Control comparison (C2 vs C5)
 Usage:
    python -m experiments.analyze_results --input results/experiment_xxx_metrics.json
 """
 import sys
 import json
 import argparse
 from pathlib import Path
 from typing import List, Dict, Any, Tuple
 from dataclasses import dataclass
 import numpy as np
 class NumpyEncoder(json.JSONEncoder):
    """JSON encoder that handles numpy types."""
    def default(self, obj):
        if isinstance(obj, (np.integer, np.int64, np.int32)):
            return int(obj)
        if isinstance(obj, (np.floating, np.float64, np.float32)):
            return float(obj)
        if isinstance(obj, (np.bool_, bool)):
            return bool(obj)
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        return super().default(obj)
 # Add experiments to path
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from experiments.config import RESULTS_DIR
 # Try to import statistical libraries
 try:
    from scipy import stats
    SCIPY_AVAILABLE = True
 except ImportError:
    SCIPY_AVAILABLE = False
    print("Warning: scipy not installed. Some statistical tests will be unavailable.")
 try:
    import pandas as pd
    PANDAS_AVAILABLE = True
 except ImportError:
    PANDAS_AVAILABLE = False
@dataclass
 class EffectSize:
    """Cohen's d effect size with interpretation."""
    d: float
    interpretation: str  # small, medium, large
    @staticmethod
    def from_groups(group1: List[float], group2: List[float]) -> 'EffectSize':
        """Calculate Cohen's d from two groups."""
        n1, n2 = len(group1), len(group2)
        if n1 < 2 or n2 < 2:
            return EffectSize(d=0, interpretation="insufficient data")
        mean1, mean2 = np.mean(group1), np.mean(group2)
        var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
        # Pooled standard deviation
        pooled_std = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2))
        if pooled_std == 0:
            return EffectSize(d=0, interpretation="no variance")
        d = (mean1 - mean2) / pooled_std
        # Interpretation (Cohen's conventions)
        abs_d = abs(d)
        if abs_d < 0.2:
            interpretation = "negligible"
        elif abs_d < 0.5:
            interpretation = "small"
        elif abs_d < 0.8:
            interpretation = "medium"
        else:
            interpretation = "large"
        return EffectSize(d=round(d, 4), interpretation=interpretation)
@dataclass
 class TTestResult:
    """Independent samples t-test result."""
    t_statistic: float
    p_value: float
    effect_size: EffectSize
    significant: bool  # p < 0.05
    group1_mean: float
    group2_mean: float
    group1_std: float
    group2_std: float
    group1_n: int
    group2_n: int
@dataclass
 class ANOVAResult:
    """2×2 ANOVA result."""
    main_effect_attributes: Dict[str, float]  # F, p
    main_effect_experts: Dict[str, float]     # F, p
    interaction: Dict[str, float]             # F, p
    significant_effects: List[str]
 def extract_metric_values(
    metrics: Dict[str, Any],
    metric_path: str
 ) -> Dict[str, List[float]]:
    """
    Extract values for a specific metric across all queries.
    Args:
        metrics: Full metrics dict from compute_metrics.py
        metric_path: Dot-separated path like "post_dedup_diversity.mean_pairwise_distance"
    Returns:
        Dict mapping condition name to list of values
    """
    by_condition = {}
    for query_metrics in metrics.get("metrics_by_query", []):
        for condition, cond_metrics in query_metrics.get("conditions", {}).items():
            if condition not in by_condition:
                by_condition[condition] = []
            # Navigate the metric path
            value = cond_metrics
            for key in metric_path.split("."):
                if value is None:
                    break
                if isinstance(value, dict):
                    value = value.get(key)
                else:
                    value = None
            if value is not None and isinstance(value, (int, float)):
                by_condition[condition].append(float(value))
    return by_condition
 def perform_ttest(
    group1: List[float],
    group2: List[float],
    group1_name: str = "Group 1",
    group2_name: str = "Group 2"
 ) -> TTestResult:
    """Perform independent samples t-test."""
    if not SCIPY_AVAILABLE:
        return None
    if len(group1) < 2 or len(group2) < 2:
        return None
    t_stat, p_value = stats.ttest_ind(group1, group2)
    effect = EffectSize.from_groups(group1, group2)
    return TTestResult(
        t_statistic=round(t_stat, 4),
        p_value=round(p_value, 4),
        effect_size=effect,
        significant=p_value < 0.05,
        group1_mean=round(np.mean(group1), 4),
        group2_mean=round(np.mean(group2), 4),
        group1_std=round(np.std(group1, ddof=1), 4),
        group2_std=round(np.std(group2, ddof=1), 4),
        group1_n=len(group1),
        group2_n=len(group2)
    )
 def perform_2x2_anova(
    c1_direct: List[float],      # No attributes, No experts
    c2_expert: List[float],      # No attributes, With experts
    c3_attribute: List[float],   # With attributes, No experts
    c4_full: List[float]         # With attributes, With experts
 ) -> ANOVAResult:
    """
    Perform 2×2 factorial ANOVA.
    Factors:
    - Attributes: Without (C1, C2) vs With (C3, C4)
    - Experts: Without (C1, C3) vs With (C2, C4)
    """
    if not SCIPY_AVAILABLE:
        return None
    # Check minimum data
    min_n = min(len(c1_direct), len(c2_expert), len(c3_attribute), len(c4_full))
    if min_n < 2:
        return None
    # For a proper 2×2 ANOVA, we'd use statsmodels or similar
    # Here we'll compute main effects and interaction manually
    # Main effect of Attributes: (C3 + C4) vs (C1 + C2)
    no_attr = c1_direct + c2_expert
    with_attr = c3_attribute + c4_full
    f_attr, p_attr = stats.f_oneway(no_attr, with_attr)
    # Main effect of Experts: (C2 + C4) vs (C1 + C3)
    no_expert = c1_direct + c3_attribute
    with_expert = c2_expert + c4_full
    f_expert, p_expert = stats.f_oneway(no_expert, with_expert)
    # Interaction: Compare the difference of differences
    # (C4 - C3) - (C2 - C1) = interaction term
    # Simplified approach: compare all 4 groups
    f_all, p_all = stats.f_oneway(c1_direct, c2_expert, c3_attribute, c4_full)
    # Estimate interaction by checking if combination is super-additive
    mean1, mean2, mean3, mean4 = np.mean(c1_direct), np.mean(c2_expert), np.mean(c3_attribute), np.mean(c4_full)
    expected_additive = mean1 + (mean2 - mean1) + (mean3 - mean1)  # Additive prediction
    actual_combination = mean4
    interaction_strength = actual_combination - expected_additive
    significant_effects = []
    if p_attr < 0.05:
        significant_effects.append("Attributes")
    if p_expert < 0.05:
        significant_effects.append("Experts")
    if p_all < 0.05 and abs(interaction_strength) > 0.01:
        significant_effects.append("Interaction")
    return ANOVAResult(
        main_effect_attributes={"F": round(f_attr, 4), "p": round(p_attr, 4)},
        main_effect_experts={"F": round(f_expert, 4), "p": round(p_expert, 4)},
        interaction={
            "F_all_groups": round(f_all, 4),
            "p_all_groups": round(p_all, 4),
            "interaction_strength": round(interaction_strength, 4),
            "super_additive": interaction_strength > 0
        },
        significant_effects=significant_effects
    )
 def analyze_experiment(metrics: Dict[str, Any]) -> Dict[str, Any]:
    """
    Perform full statistical analysis on experiment metrics.
    Returns analysis results for multiple metrics.
    """
    results = {
        "analysis_metrics": [],
        "research_questions": {}
    }
    # Define metrics to analyze
    metrics_to_analyze = [
        ("Survival Rate", "survival_rate"),
        ("Post-Dedup Diversity", "post_dedup_diversity.mean_pairwise_distance"),
        ("Normalized Diversity", "normalized_diversity.mean_pairwise_distance"),
        ("Query Distance", "post_dedup_query_distance.mean_distance"),
        ("Cluster Count", "post_dedup_clusters.optimal_clusters"),
    ]
    for metric_name, metric_path in metrics_to_analyze:
        print(f"\n{'='*60}")
        print(f"Analyzing: {metric_name}")
        print(f"{'='*60}")
        # Extract values by condition
        by_condition = extract_metric_values(metrics, metric_path)
        if not by_condition:
            print(f"  No data available for {metric_name}")
            continue
        metric_results = {
            "metric_name": metric_name,
            "metric_path": metric_path,
            "descriptive": {},
            "comparisons": {},
            "anova": None
        }
        # Descriptive statistics
        print(f"\nDescriptive Statistics:")
        print(f"{'Condition':<25} {'Mean':<10} {'Std':<10} {'N':<5}")
        print("-" * 50)
        for cond, values in sorted(by_condition.items()):
            if values:
                mean = np.mean(values)
                std = np.std(values, ddof=1) if len(values) > 1 else 0
                metric_results["descriptive"][cond] = {
                    "mean": round(mean, 4),
                    "std": round(std, 4),
                    "n": len(values)
                }
                print(f"{cond:<25} {mean:<10.4f} {std:<10.4f} {len(values):<5}")
        # Key comparisons
        comparisons = []
        # 1. C1 (Direct) vs C4 (Full Pipeline) - Main comparison
        if "c1_direct" in by_condition and "c4_full_pipeline" in by_condition:
            result = perform_ttest(
                by_condition["c4_full_pipeline"],
                by_condition["c1_direct"],
                "Full Pipeline", "Direct"
            )
            if result:
                comparisons.append(("C4 vs C1 (Full vs Direct)", result))
                metric_results["comparisons"]["c4_vs_c1"] = {
                    "t": result.t_statistic,
                    "p": result.p_value,
                    "d": result.effect_size.d,
                    "interpretation": result.effect_size.interpretation,
                    "significant": result.significant
                }
        # 2. C2 (Expert) vs C5 (Random) - Control comparison
        if "c2_expert_only" in by_condition and "c5_random_perspective" in by_condition:
            result = perform_ttest(
                by_condition["c2_expert_only"],
                by_condition["c5_random_perspective"],
                "Expert", "Random"
            )
            if result:
                comparisons.append(("C2 vs C5 (Expert vs Random)", result))
                metric_results["comparisons"]["c2_vs_c5"] = {
                    "t": result.t_statistic,
                    "p": result.p_value,
                    "d": result.effect_size.d,
                    "interpretation": result.effect_size.interpretation,
                    "significant": result.significant
                }
        # 3. C2 (Expert-Only) vs C1 (Direct) - Effect of experts alone
        if "c2_expert_only" in by_condition and "c1_direct" in by_condition:
            result = perform_ttest(
                by_condition["c2_expert_only"],
                by_condition["c1_direct"],
                "Expert-Only", "Direct"
            )
            if result:
                comparisons.append(("C2 vs C1 (Expert effect)", result))
                metric_results["comparisons"]["c2_vs_c1"] = {
                    "t": result.t_statistic,
                    "p": result.p_value,
                    "d": result.effect_size.d,
                    "interpretation": result.effect_size.interpretation,
                    "significant": result.significant
                }
        # 4. C3 (Attribute-Only) vs C1 (Direct) - Effect of attributes alone
        if "c3_attribute_only" in by_condition and "c1_direct" in by_condition:
            result = perform_ttest(
                by_condition["c3_attribute_only"],
                by_condition["c1_direct"],
                "Attribute-Only", "Direct"
            )
            if result:
                comparisons.append(("C3 vs C1 (Attribute effect)", result))
                metric_results["comparisons"]["c3_vs_c1"] = {
                    "t": result.t_statistic,
                    "p": result.p_value,
                    "d": result.effect_size.d,
                    "interpretation": result.effect_size.interpretation,
                    "significant": result.significant
                }
        # Print comparisons
        if comparisons:
            print(f"\nPairwise Comparisons:")
            print(f"{'Comparison':<30} {'t':<10} {'p':<10} {'d':<10} {'Sig?':<8}")
            print("-" * 68)
            for name, result in comparisons:
                sig = "Yes*" if result.significant else "No"
                print(f"{name:<30} {result.t_statistic:<10.3f} {result.p_value:<10.4f} "
                      f"{result.effect_size.d:<10.3f} {sig:<8}")
        # 2×2 ANOVA (if all conditions available)
        if all(c in by_condition for c in ["c1_direct", "c2_expert_only", "c3_attribute_only", "c4_full_pipeline"]):
            anova = perform_2x2_anova(
                by_condition["c1_direct"],
                by_condition["c2_expert_only"],
                by_condition["c3_attribute_only"],
                by_condition["c4_full_pipeline"]
            )
            if anova:
                metric_results["anova"] = {
                    "main_effect_attributes": anova.main_effect_attributes,
                    "main_effect_experts": anova.main_effect_experts,
                    "interaction": anova.interaction,
                    "significant_effects": anova.significant_effects
                }
                print(f"\n2×2 ANOVA Results:")
                print(f"  Main Effect (Attributes): F={anova.main_effect_attributes['F']:.3f}, "
                      f"p={anova.main_effect_attributes['p']:.4f}")
                print(f"  Main Effect (Experts):    F={anova.main_effect_experts['F']:.3f}, "
                      f"p={anova.main_effect_experts['p']:.4f}")
                print(f"  Interaction Strength:     {anova.interaction['interaction_strength']:.4f} "
                      f"({'super-additive' if anova.interaction['super_additive'] else 'sub-additive'})")
                print(f"  Significant Effects:      {', '.join(anova.significant_effects) or 'None'}")
        results["analysis_metrics"].append(metric_results)
    # Summarize research questions
    results["research_questions"] = summarize_research_questions(results["analysis_metrics"])
    return results
 def summarize_research_questions(analysis_metrics: List[Dict]) -> Dict[str, str]:
    """Summarize findings for each research question."""
    rq = {}
    # Find the diversity metric results
    diversity_results = None
    for m in analysis_metrics:
        if "Diversity" in m["metric_name"] and "Normalized" in m["metric_name"]:
            diversity_results = m
            break
    if diversity_results is None:
        for m in analysis_metrics:
            if "Diversity" in m["metric_name"]:
                diversity_results = m
                break
    if diversity_results:
        anova = diversity_results.get("anova", {})
        comparisons = diversity_results.get("comparisons", {})
        # RQ1: Does attribute decomposition improve diversity?
        if anova and "main_effect_attributes" in anova:
            p = anova["main_effect_attributes"]["p"]
            rq["RQ1_attributes"] = f"Main effect p={p:.4f}. " + \
                ("Significant effect of attributes." if p < 0.05 else "No significant effect.")
        # RQ2: Do expert perspectives improve diversity?
        if anova and "main_effect_experts" in anova:
            p = anova["main_effect_experts"]["p"]
            rq["RQ2_experts"] = f"Main effect p={p:.4f}. " + \
                ("Significant effect of experts." if p < 0.05 else "No significant effect.")
        # RQ3: Interaction effect?
        if anova and "interaction" in anova:
            strength = anova["interaction"]["interaction_strength"]
            super_add = anova["interaction"]["super_additive"]
            rq["RQ3_interaction"] = f"Interaction strength={strength:.4f}. " + \
                ("Super-additive (combination better than sum)." if super_add else "Sub-additive or additive.")
        # RQ5: Expert vs Random (C2 vs C5)
        if "c2_vs_c5" in comparisons:
            comp = comparisons["c2_vs_c5"]
            rq["RQ5_expert_vs_random"] = f"d={comp['d']:.3f} ({comp['interpretation']}), p={comp['p']:.4f}. " + \
                ("Expert knowledge matters." if comp["significant"] and comp["d"] > 0 else "No significant difference from random perspectives.")
    return rq
 def print_research_summary(results: Dict[str, Any]):
    """Print summary of research question findings."""
    print("\n" + "=" * 70)
    print("RESEARCH QUESTIONS SUMMARY")
    print("=" * 70)
    rq = results.get("research_questions", {})
    print("\nRQ1: Does attribute decomposition improve semantic diversity?")
    print(f"  → {rq.get('RQ1_attributes', 'Insufficient data')}")
    print("\nRQ2: Do expert perspectives improve semantic diversity?")
    print(f"  → {rq.get('RQ2_experts', 'Insufficient data')}")
    print("\nRQ3: Is there an interaction effect (Full Pipeline > sum of parts)?")
    print(f"  → {rq.get('RQ3_interaction', 'Insufficient data')}")
    print("\nRQ5: Do experts beat random perspectives? (C2 vs C5)")
    print(f"  → {rq.get('RQ5_expert_vs_random', 'Insufficient data')}")
    print("\n" + "=" * 70)
    print("Note: With pilot data (n=1 query), statistical power is limited.")
    print("Full experiment (n=10+ queries) needed for reliable conclusions.")
    print("=" * 70)
 def main():
    parser = argparse.ArgumentParser(
        description="Statistical analysis for experiment results"
    )
    parser.add_argument(
        "--input",
        type=str,
        required=True,
        help="Input metrics JSON file"
    )
    parser.add_argument(
        "--output",
        type=str,
        help="Output file path (default: input_analysis.json)"
    )
    args = parser.parse_args()
    input_path = Path(args.input)
    if not input_path.exists():
        input_path = RESULTS_DIR / args.input
        if not input_path.exists():
            print(f"Error: Input file not found: {args.input}")
            sys.exit(1)
    # Load metrics
    with open(input_path, "r", encoding="utf-8") as f:
        metrics = json.load(f)
    # Run analysis
    results = analyze_experiment(metrics)
    # Print research summary
    print_research_summary(results)
    # Save results
    if args.output:
        output_path = Path(args.output)
    else:
        stem = input_path.stem.replace("_metrics", "")
        output_path = input_path.parent / f"{stem}_analysis.json"
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(results, f, indent=2, ensure_ascii=False, cls=NumpyEncoder)
    print(f"\nAnalysis saved to: {output_path}")
 if __name__ == "__main__":
    main()
--- a/experiments/assessment/README.md
+++ b/experiments/assessment/README.md
@@ -0,0 +1,314 @@
 # Human Assessment Web Interface
 A standalone web application for human assessment of generated ideas using Torrance-inspired creativity metrics.
 ## Overview
 This tool enables blind evaluation of creative ideas generated by the novelty-seeking experiment. Raters assess ideas on four dimensions without knowing which experimental condition produced each idea, ensuring unbiased evaluation.
 ## Quick Start
 ```bash
 cd experiments/assessment
 # 1. Prepare assessment data (if not already done)
 python3 prepare_data.py
 # 2. Start the system
 ./start.sh
 # 3. Open browser
 open http://localhost:5174
 ```
 ## Directory Structure
 ```
 assessment/
 ├── backend/
 │   ├── app.py           # FastAPI backend API
 │   ├── database.py      # SQLite database operations
 │   ├── models.py        # Pydantic models & dimension definitions
 │   └── requirements.txt # Python dependencies
 ├── frontend/
 │   ├── src/
 │   │   ├── components/  # React UI components
 │   │   ├── hooks/       # React state management
 │   │   ├── services/    # API client
 │   │   └── types/       # TypeScript definitions
 │   └── package.json
 ├── data/
 │   └── assessment_items.json  # Prepared ideas for rating
 ├── results/
 │   └── ratings.db             # SQLite database with ratings
 ├── prepare_data.py      # Data preparation script
 ├── analyze_ratings.py   # Inter-rater reliability analysis
 ├── start.sh             # Start both servers
 ├── stop.sh              # Stop all services
 └── README.md            # This file
 ```
 ## Data Preparation
 ### List Available Experiment Files
 ```bash
 python3 prepare_data.py --list
 ```
 Output:
 ```
 Available experiment files (most recent first):
  experiment_20260119_165650_deduped.json (1571.3 KB)
  experiment_20260119_163040_deduped.json (156.4 KB)
 ```
 ### Prepare Assessment Data
 ```bash
 # Use all ideas (not recommended for human assessment)
 python3 prepare_data.py
 # RECOMMENDED: Stratified sampling - 4 ideas per condition per query
 # Results in ~200 ideas (5 conditions × 4 ideas × 10 queries)
 python3 prepare_data.py --per-condition 4
 # Alternative: Sample 150 ideas total (proportionally across queries)
 python3 prepare_data.py --sample 150
 # Limit per query (20 ideas max per query)
 python3 prepare_data.py --per-query 20
 # Combined: 4 per condition, max 15 per query
 python3 prepare_data.py --per-condition 4 --per-query 15
 # Specify a different experiment file
 python3 prepare_data.py experiment_20260119_163040_deduped.json --per-condition 4
 ```
 ### Sampling Options
 | Option | Description | Example |
 |--------|-------------|---------|
 | `--per-condition N` | Max N ideas per condition per query (stratified) | `--per-condition 4` → ~200 ideas |
 | `--per-query N` | Max N ideas per query | `--per-query 20` |
 | `--sample N` | Total N ideas (proportionally distributed) | `--sample 150` |
 | `--seed N` | Random seed for reproducibility | `--seed 42` (default) |
 **Recommendation**: Use `--per-condition 4` for balanced assessment across conditions.
 The script:
 1. Loads the deduped experiment results
 2. Extracts all unique ideas with hidden metadata (condition, expert, keyword)
 3. Assigns stable IDs to each idea
 4. Shuffles ideas within each query (reproducible with seed=42)
 5. Outputs `data/assessment_items.json`
 ## Assessment Dimensions
 Raters evaluate each idea on four dimensions using a 1-5 Likert scale:
 ### Originality
 *How unexpected or surprising is this idea?*
 | Score | Description |
 |-------|-------------|
 | 1 | Very common/obvious idea anyone would suggest |
 | 2 | Somewhat common, slight variation on expected ideas |
 | 3 | Moderately original, some unexpected elements |
 | 4 | Quite original, notably different approach |
 | 5 | Highly unexpected, truly novel concept |
 ### Elaboration
 *How detailed and well-developed is this idea?*
 | Score | Description |
 |-------|-------------|
 | 1 | Vague, minimal detail, just a concept |
 | 2 | Basic idea with little specificity |
 | 3 | Moderately detailed, some specifics provided |
 | 4 | Well-developed with clear implementation hints |
 | 5 | Highly specific, thoroughly developed concept |
 ### Coherence
 *Does this idea make logical sense and relate to the query object?*
 | Score | Description |
 |-------|-------------|
 | 1 | Nonsensical, irrelevant, or incomprehensible |
 | 2 | Mostly unclear, weak connection to query |
 | 3 | Partially coherent, some logical gaps |
 | 4 | Mostly coherent with minor issues |
 | 5 | Fully coherent, clearly relates to query |
 ### Usefulness
 *Could this idea have practical value or inspire real innovation?*
 | Score | Description |
 |-------|-------------|
 | 1 | No practical value whatsoever |
 | 2 | Minimal usefulness, highly impractical |
 | 3 | Some potential value with major limitations |
 | 4 | Useful idea with realistic applications |
 | 5 | Highly useful, clear practical value |
 ## Running the System
 ### Start
 ```bash
 ./start.sh
 ```
 This will:
 1. Check for `data/assessment_items.json` (runs `prepare_data.py` if missing)
 2. Install frontend dependencies if needed
 3. Start backend API on port 8002
 4. Start frontend dev server on port 5174
 ### Stop
 ```bash
 ./stop.sh
 ```
 Or press `Ctrl+C` in the terminal running `start.sh`.
 ### Manual Start (Development)
 ```bash
 # Terminal 1: Backend
 cd backend
 ../../../backend/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8002 --reload
 # Terminal 2: Frontend
 cd frontend
 npm run dev
 ```
 ## API Endpoints
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/api/health` | GET | Health check |
 | `/api/info` | GET | Experiment info (total ideas, queries, conditions) |
 | `/api/dimensions` | GET | Dimension definitions for UI |
 | `/api/raters` | GET | List all raters |
 | `/api/raters` | POST | Register/login rater |
 | `/api/queries` | GET | List all queries |
 | `/api/queries/{id}` | GET | Get query with all ideas |
 | `/api/queries/{id}/unrated?rater_id=X` | GET | Get unrated ideas for rater |
 | `/api/ratings` | POST | Submit a rating |
 | `/api/progress/{rater_id}` | GET | Get rater's progress |
 | `/api/statistics` | GET | Overall statistics |
 | `/api/export` | GET | Export all ratings with metadata |
 ## Analysis
 After collecting ratings from multiple raters:
 ```bash
 python3 analyze_ratings.py
 ```
 This calculates:
 - **Krippendorff's alpha**: Inter-rater reliability for ordinal data
 - **ICC(2,1)**: Intraclass Correlation Coefficient with 95% CI
 - **Mean ratings per condition**: Compare experimental conditions
 - **Kruskal-Wallis test**: Statistical significance between conditions
 Output is saved to `results/analysis_results.json`.
 ## Database Schema
 SQLite database (`results/ratings.db`):
 ```sql
 -- Raters
 CREATE TABLE raters (
    rater_id TEXT PRIMARY KEY,
    name TEXT,
    created_at TIMESTAMP
 );
 -- Ratings
 CREATE TABLE ratings (
    id INTEGER PRIMARY KEY,
    rater_id TEXT,
    idea_id TEXT,
    query_id TEXT,
    originality INTEGER CHECK(originality BETWEEN 1 AND 5),
    elaboration INTEGER CHECK(elaboration BETWEEN 1 AND 5),
    coherence INTEGER CHECK(coherence BETWEEN 1 AND 5),
    usefulness INTEGER CHECK(usefulness BETWEEN 1 AND 5),
    skipped INTEGER DEFAULT 0,
    timestamp TIMESTAMP,
    UNIQUE(rater_id, idea_id)
 );
 -- Progress tracking
 CREATE TABLE progress (
    rater_id TEXT,
    query_id TEXT,
    completed_count INTEGER,
    total_count INTEGER,
    PRIMARY KEY (rater_id, query_id)
 );
 ```
 ## Blind Assessment Design
 To ensure unbiased evaluation:
 1. **Randomization**: Ideas are shuffled within each query using a fixed seed (42) for reproducibility
 2. **Hidden metadata**: Condition, expert name, and keywords are stored but not shown to raters
 3. **Consistent ordering**: All raters see the same randomized order
 4. **Context provided**: Only the query text is shown (e.g., "Chair", "Bicycle")
 ## Workflow for Raters
 1. **Login**: Enter a unique rater ID
 2. **Instructions**: Read dimension definitions (shown before first rating)
 3. **Rate ideas**: For each idea:
   - Read the idea text
   - Rate all 4 dimensions (1-5)
   - Click "Submit & Next" or "Skip"
 4. **Progress**: Track completion per query and overall
 5. **Completion**: Summary shown when all ideas are rated
 ## Troubleshooting
 ### Backend won't start
 ```bash
 # Check if port 8002 is in use
 lsof -i :8002
 # Check backend logs
 cat /tmp/assessment_backend.log
 ```
 ### Frontend won't start
 ```bash
 # Reinstall dependencies
 cd frontend
 rm -rf node_modules
 npm install
 ```
 ### Reset database
 ```bash
 rm results/ratings.db
 # Database is auto-created on next backend start
 ```
 ### Regenerate assessment data
 ```bash
 rm data/assessment_items.json
 python3 prepare_data.py
 ```
 ## Tech Stack
 - **Backend**: Python 3.11+, FastAPI, SQLite, Pydantic
 - **Frontend**: React 19, TypeScript, Vite, Ant Design 6.0
 - **Analysis**: NumPy, SciPy (for statistical tests)
--- a/experiments/assessment/analyze_ratings.py
+++ b/experiments/assessment/analyze_ratings.py
@@ -0,0 +1,356 @@
 #!/usr/bin/env python3
 """
 Analyze assessment ratings for inter-rater reliability and condition comparisons.
 This script:
 1. Loads ratings from the SQLite database
 2. Joins with hidden metadata (condition, expert)
 3. Calculates inter-rater reliability metrics
 4. Computes mean ratings per dimension per condition
 5. Performs statistical comparisons between conditions
 """
 import json
 import sqlite3
 from collections import defaultdict
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 import numpy as np
 from scipy import stats
 # Paths
 RESULTS_DIR = Path(__file__).parent / 'results'
 DATA_DIR = Path(__file__).parent / 'data'
 DB_PATH = RESULTS_DIR / 'ratings.db'
 ASSESSMENT_DATA_PATH = DATA_DIR / 'assessment_items.json'
 def load_assessment_data() -> dict[str, Any]:
    """Load the assessment items data with hidden metadata."""
    with open(ASSESSMENT_DATA_PATH, 'r', encoding='utf-8') as f:
        return json.load(f)
 def load_ratings_from_db() -> list[dict[str, Any]]:
    """Load all ratings from the SQLite database."""
    if not DB_PATH.exists():
        print(f"Database not found at {DB_PATH}")
        return []
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    cursor = conn.cursor()
    cursor.execute('''
        SELECT r.*, rat.name as rater_name
        FROM ratings r
        LEFT JOIN raters rat ON r.rater_id = rat.rater_id
        WHERE r.skipped = 0
    ''')
    ratings = [dict(row) for row in cursor.fetchall()]
    conn.close()
    return ratings
 def build_idea_lookup(assessment_data: dict[str, Any]) -> dict[str, dict[str, Any]]:
    """Build a lookup table from idea_id to metadata."""
    lookup = {}
    for query in assessment_data['queries']:
        for idea in query['ideas']:
            lookup[idea['idea_id']] = {
                'text': idea['text'],
                'query_id': query['query_id'],
                'query_text': query['query_text'],
                **idea['_hidden']
            }
    return lookup
 def calculate_krippendorff_alpha(ratings_matrix: np.ndarray) -> float:
    """
    Calculate Krippendorff's alpha for ordinal data.
    Args:
        ratings_matrix: 2D array where rows are items and columns are raters.
                       NaN values indicate missing ratings.
    Returns:
        Krippendorff's alpha coefficient
    """
    # Remove items with fewer than 2 raters
    valid_items = ~np.all(np.isnan(ratings_matrix), axis=1)
    ratings_matrix = ratings_matrix[valid_items]
    if ratings_matrix.shape[0] < 2:
        return np.nan
    n_items, n_raters = ratings_matrix.shape
    # Observed disagreement
    observed_disagreement = 0
    n_pairs = 0
    for i in range(n_items):
        values = ratings_matrix[i, ~np.isnan(ratings_matrix[i])]
        if len(values) < 2:
            continue
        # Ordinal distance: squared difference
        for j in range(len(values)):
            for k in range(j + 1, len(values)):
                observed_disagreement += (values[j] - values[k]) ** 2
                n_pairs += 1
    if n_pairs == 0:
        return np.nan
    observed_disagreement /= n_pairs
    # Expected disagreement (based on marginal distribution)
    all_values = ratings_matrix[~np.isnan(ratings_matrix)]
    if len(all_values) < 2:
        return np.nan
    expected_disagreement = 0
    n_total_pairs = 0
    for i in range(len(all_values)):
        for j in range(i + 1, len(all_values)):
            expected_disagreement += (all_values[i] - all_values[j]) ** 2
            n_total_pairs += 1
    if n_total_pairs == 0:
        return np.nan
    expected_disagreement /= n_total_pairs
    if expected_disagreement == 0:
        return 1.0
    alpha = 1 - (observed_disagreement / expected_disagreement)
    return alpha
 def calculate_icc(ratings_matrix: np.ndarray) -> tuple[float, float, float]:
    """
    Calculate Intraclass Correlation Coefficient (ICC(2,1)).
    Args:
        ratings_matrix: 2D array where rows are items and columns are raters.
    Returns:
        Tuple of (ICC, lower_bound, upper_bound)
    """
    # Remove rows with any NaN
    valid_rows = ~np.any(np.isnan(ratings_matrix), axis=1)
    ratings_matrix = ratings_matrix[valid_rows]
    if ratings_matrix.shape[0] < 2 or ratings_matrix.shape[1] < 2:
        return np.nan, np.nan, np.nan
    n, k = ratings_matrix.shape
    # Grand mean
    grand_mean = np.mean(ratings_matrix)
    # Row means (item means)
    row_means = np.mean(ratings_matrix, axis=1)
    # Column means (rater means)
    col_means = np.mean(ratings_matrix, axis=0)
    # Sum of squares
    ss_total = np.sum((ratings_matrix - grand_mean) ** 2)
    ss_rows = k * np.sum((row_means - grand_mean) ** 2)
    ss_cols = n * np.sum((col_means - grand_mean) ** 2)
    ss_error = ss_total - ss_rows - ss_cols
    # Mean squares
    ms_rows = ss_rows / (n - 1) if n > 1 else 0
    ms_cols = ss_cols / (k - 1) if k > 1 else 0
    ms_error = ss_error / ((n - 1) * (k - 1)) if (n > 1 and k > 1) else 0
    # ICC(2,1) - two-way random, absolute agreement, single rater
    if ms_error + (ms_cols - ms_error) / n == 0:
        return np.nan, np.nan, np.nan
    icc = (ms_rows - ms_error) / (ms_rows + (k - 1) * ms_error + k * (ms_cols - ms_error) / n)
    # Confidence interval (approximate)
    # Using F distribution
    df1 = n - 1
    df2 = (n - 1) * (k - 1)
    if ms_error == 0:
        return icc, np.nan, np.nan
    f_value = ms_rows / ms_error
    f_lower = f_value / stats.f.ppf(0.975, df1, df2)
    f_upper = f_value / stats.f.ppf(0.025, df1, df2)
    icc_lower = (f_lower - 1) / (f_lower + k - 1)
    icc_upper = (f_upper - 1) / (f_upper + k - 1)
    return icc, icc_lower, icc_upper
 def analyze_ratings():
    """Main analysis function."""
    print("=" * 60)
    print("CREATIVE IDEA ASSESSMENT ANALYSIS")
    print("=" * 60)
    print()
    # Load data
    assessment_data = load_assessment_data()
    ratings = load_ratings_from_db()
    idea_lookup = build_idea_lookup(assessment_data)
    if not ratings:
        print("No ratings found in database.")
        return
    print(f"Loaded {len(ratings)} ratings from database")
    print(f"Experiment ID: {assessment_data['experiment_id']}")
    print()
    # Get unique raters
    raters = list(set(r['rater_id'] for r in ratings))
    print(f"Raters: {raters}")
    print()
    # Join ratings with metadata
    enriched_ratings = []
    for r in ratings:
        idea_meta = idea_lookup.get(r['idea_id'], {})
        enriched_ratings.append({
            **r,
            'condition': idea_meta.get('condition', 'unknown'),
            'expert_name': idea_meta.get('expert_name', ''),
            'keyword': idea_meta.get('keyword', ''),
            'query_text': idea_meta.get('query_text', ''),
            'idea_text': idea_meta.get('text', '')
        })
    # Dimensions
    dimensions = ['originality', 'elaboration', 'coherence', 'usefulness']
    # ================================
    # Inter-rater reliability
    # ================================
    print("-" * 60)
    print("INTER-RATER RELIABILITY")
    print("-" * 60)
    print()
    if len(raters) >= 2:
        # Build ratings matrix per dimension
        idea_ids = list(set(r['idea_id'] for r in enriched_ratings))
        for dim in dimensions:
            # Create matrix: rows = ideas, cols = raters
            matrix = np.full((len(idea_ids), len(raters)), np.nan)
            idea_to_idx = {idea: idx for idx, idea in enumerate(idea_ids)}
            rater_to_idx = {rater: idx for idx, rater in enumerate(raters)}
            for r in enriched_ratings:
                if r[dim] is not None:
                    i = idea_to_idx[r['idea_id']]
                    j = rater_to_idx[r['rater_id']]
                    matrix[i, j] = r[dim]
            # Calculate metrics
            alpha = calculate_krippendorff_alpha(matrix)
            icc, icc_low, icc_high = calculate_icc(matrix)
            print(f"{dim.upper()}:")
            print(f"  Krippendorff's alpha: {alpha:.3f}")
            print(f"  ICC(2,1): {icc:.3f} (95% CI: {icc_low:.3f} - {icc_high:.3f})")
            print()
    else:
        print("Need at least 2 raters for inter-rater reliability analysis.")
        print()
    # ================================
    # Condition comparisons
    # ================================
    print("-" * 60)
    print("MEAN RATINGS BY CONDITION")
    print("-" * 60)
    print()
    # Group ratings by condition
    condition_ratings: dict[str, dict[str, list[int]]] = defaultdict(lambda: defaultdict(list))
    for r in enriched_ratings:
        condition = r['condition']
        for dim in dimensions:
            if r[dim] is not None:
                condition_ratings[condition][dim].append(r[dim])
    # Calculate means and print
    condition_stats = {}
    for condition in sorted(condition_ratings.keys()):
        print(f"\n{condition}:")
        condition_stats[condition] = {}
        for dim in dimensions:
            values = condition_ratings[condition][dim]
            if values:
                mean = np.mean(values)
                std = np.std(values)
                n = len(values)
                condition_stats[condition][dim] = {'mean': mean, 'std': std, 'n': n}
                print(f"  {dim}: {mean:.2f} (SD={std:.2f}, n={n})")
            else:
                print(f"  {dim}: no data")
    # ================================
    # Statistical comparisons
    # ================================
    print()
    print("-" * 60)
    print("STATISTICAL COMPARISONS (Kruskal-Wallis)")
    print("-" * 60)
    print()
    conditions = sorted(condition_ratings.keys())
    if len(conditions) >= 2:
        for dim in dimensions:
            groups = [condition_ratings[c][dim] for c in conditions if condition_ratings[c][dim]]
            if len(groups) >= 2:
                h_stat, p_value = stats.kruskal(*groups)
                sig = "*" if p_value < 0.05 else ""
                print(f"{dim}: H={h_stat:.2f}, p={p_value:.4f} {sig}")
            else:
                print(f"{dim}: insufficient data for comparison")
    else:
        print("Need at least 2 conditions with data for statistical comparison.")
    # ================================
    # Export results
    # ================================
    output = {
        'analysis_timestamp': datetime.utcnow().isoformat(),
        'experiment_id': assessment_data['experiment_id'],
        'total_ratings': len(ratings),
        'raters': raters,
        'rater_count': len(raters),
        'condition_stats': condition_stats,
        'enriched_ratings': enriched_ratings
    }
    output_path = RESULTS_DIR / 'analysis_results.json'
    with open(output_path, 'w', encoding='utf-8') as f:
        json.dump(output, f, ensure_ascii=False, indent=2, default=str)
    print()
    print("-" * 60)
    print(f"Results exported to: {output_path}")
    print("=" * 60)
 if __name__ == '__main__':
    analyze_ratings()
--- a/experiments/assessment/backend/init.py
+++ b/experiments/assessment/backend/init.py
@@ -0,0 +1 @@
 """Assessment backend package."""
--- a/experiments/assessment/backend/app.py
+++ b/experiments/assessment/backend/app.py
@@ -0,0 +1,374 @@
 """
 FastAPI backend for human assessment of creative ideas.
 """
 import json
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 from fastapi import FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 try:
    from . import database as db
    from .models import (
        DIMENSION_DEFINITIONS,
        ExportData,
        ExportRating,
        IdeaForRating,
        Progress,
        QueryInfo,
        QueryWithIdeas,
        Rater,
        RaterCreate,
        RaterProgress,
        Rating,
        RatingSubmit,
        Statistics,
    )
 except ImportError:
    import database as db
    from models import (
        DIMENSION_DEFINITIONS,
        ExportData,
        ExportRating,
        IdeaForRating,
        Progress,
        QueryInfo,
        QueryWithIdeas,
        Rater,
        RaterCreate,
        RaterProgress,
        Rating,
        RatingSubmit,
        Statistics,
    )
 # Load assessment data
 DATA_PATH = Path(__file__).parent.parent / 'data' / 'assessment_items.json'
 def load_assessment_data() -> dict[str, Any]:
    """Load the assessment items data."""
    if not DATA_PATH.exists():
        raise RuntimeError(f"Assessment data not found at {DATA_PATH}. Run prepare_data.py first.")
    with open(DATA_PATH, 'r', encoding='utf-8') as f:
        return json.load(f)
 # Initialize FastAPI app
 app = FastAPI(
    title="Creative Idea Assessment API",
    description="API for human assessment of creative ideas using Torrance-inspired metrics",
    version="1.0.0"
 )
 # CORS middleware
 app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 # Cache for assessment data
 _assessment_data: dict[str, Any] | None = None
 def get_assessment_data() -> dict[str, Any]:
    """Get cached assessment data."""
    global _assessment_data
    if _assessment_data is None:
        _assessment_data = load_assessment_data()
    return _assessment_data
 # Rater endpoints
@app.get("/api/raters", response_model=list[Rater])
 def list_raters() -> list[dict[str, Any]]:
    """List all registered raters."""
    return db.list_raters()
@app.post("/api/raters", response_model=Rater)
 def create_or_get_rater(rater_data: RaterCreate) -> dict[str, Any]:
    """Register a new rater or get existing one."""
    return db.create_rater(rater_data.rater_id, rater_data.name)
@app.get("/api/raters/{rater_id}", response_model=Rater)
 def get_rater(rater_id: str) -> dict[str, Any]:
    """Get a specific rater."""
    rater = db.get_rater(rater_id)
    if not rater:
        raise HTTPException(status_code=404, detail="Rater not found")
    return rater
 # Query endpoints
@app.get("/api/queries", response_model=list[QueryInfo])
 def list_queries() -> list[dict[str, Any]]:
    """List all queries available for assessment."""
    data = get_assessment_data()
    return [
        {
            'query_id': q['query_id'],
            'query_text': q['query_text'],
            'category': q.get('category', ''),
            'idea_count': q['idea_count']
        }
        for q in data['queries']
    ]
@app.get("/api/queries/{query_id}", response_model=QueryWithIdeas)
 def get_query_with_ideas(query_id: str) -> dict[str, Any]:
    """Get a query with all its ideas for rating (without hidden metadata)."""
    data = get_assessment_data()
    for query in data['queries']:
        if query['query_id'] == query_id:
            ideas = [
                IdeaForRating(
                    idea_id=idea['idea_id'],
                    text=idea['text'],
                    index=idx
                )
                for idx, idea in enumerate(query['ideas'])
            ]
            return QueryWithIdeas(
                query_id=query['query_id'],
                query_text=query['query_text'],
                category=query.get('category', ''),
                ideas=ideas,
                total_count=len(ideas)
            )
    raise HTTPException(status_code=404, detail="Query not found")
@app.get("/api/queries/{query_id}/unrated", response_model=QueryWithIdeas)
 def get_unrated_ideas(query_id: str, rater_id: str) -> dict[str, Any]:
    """Get unrated ideas for a query by a specific rater."""
    data = get_assessment_data()
    for query in data['queries']:
        if query['query_id'] == query_id:
            # Get already rated idea IDs
            rated_ids = db.get_rated_idea_ids(rater_id, query_id)
            # Filter to unrated ideas
            unrated_ideas = [
                IdeaForRating(
                    idea_id=idea['idea_id'],
                    text=idea['text'],
                    index=idx
                )
                for idx, idea in enumerate(query['ideas'])
                if idea['idea_id'] not in rated_ids
            ]
            return QueryWithIdeas(
                query_id=query['query_id'],
                query_text=query['query_text'],
                category=query.get('category', ''),
                ideas=unrated_ideas,
                total_count=query['idea_count']
            )
    raise HTTPException(status_code=404, detail="Query not found")
 # Rating endpoints
@app.post("/api/ratings", response_model=dict[str, Any])
 def submit_rating(rating: RatingSubmit) -> dict[str, Any]:
    """Submit a rating for an idea."""
    # Validate that rater exists
    rater = db.get_rater(rating.rater_id)
    if not rater:
        raise HTTPException(status_code=404, detail="Rater not found. Please register first.")
    # Validate idea exists
    data = get_assessment_data()
    idea_found = False
    for query in data['queries']:
        for idea in query['ideas']:
            if idea['idea_id'] == rating.idea_id:
                idea_found = True
                break
        if idea_found:
            break
    if not idea_found:
        raise HTTPException(status_code=404, detail="Idea not found")
    # If not skipped, require all ratings
    if not rating.skipped:
        if rating.originality is None or rating.elaboration is None or rating.coherence is None or rating.usefulness is None:
            raise HTTPException(
                status_code=400,
                detail="All dimensions must be rated unless skipping"
            )
    # Save rating
    return db.save_rating(
        rater_id=rating.rater_id,
        idea_id=rating.idea_id,
        query_id=rating.query_id,
        originality=rating.originality,
        elaboration=rating.elaboration,
        coherence=rating.coherence,
        usefulness=rating.usefulness,
        skipped=rating.skipped
    )
@app.get("/api/ratings/{rater_id}/{idea_id}", response_model=Rating | None)
 def get_rating(rater_id: str, idea_id: str) -> dict[str, Any] | None:
    """Get a specific rating."""
    return db.get_rating(rater_id, idea_id)
@app.get("/api/ratings/rater/{rater_id}", response_model=list[Rating])
 def get_ratings_by_rater(rater_id: str) -> list[dict[str, Any]]:
    """Get all ratings by a rater."""
    return db.get_ratings_by_rater(rater_id)
 # Progress endpoints
@app.get("/api/progress/{rater_id}", response_model=RaterProgress)
 def get_rater_progress(rater_id: str) -> RaterProgress:
    """Get complete progress for a rater."""
    rater = db.get_rater(rater_id)
    if not rater:
        raise HTTPException(status_code=404, detail="Rater not found")
    data = get_assessment_data()
    # Get rated idea counts per query
    ratings = db.get_ratings_by_rater(rater_id)
    ratings_per_query: dict[str, int] = {}
    for r in ratings:
        qid = r['query_id']
        ratings_per_query[qid] = ratings_per_query.get(qid, 0) + 1
    # Build progress list
    query_progress = []
    total_completed = 0
    total_ideas = 0
    for query in data['queries']:
        qid = query['query_id']
        completed = ratings_per_query.get(qid, 0)
        total = query['idea_count']
        query_progress.append(Progress(
            rater_id=rater_id,
            query_id=qid,
            completed_count=completed,
            total_count=total
        ))
        total_completed += completed
        total_ideas += total
    percentage = (total_completed / total_ideas * 100) if total_ideas > 0 else 0
    return RaterProgress(
        rater_id=rater_id,
        queries=query_progress,
        total_completed=total_completed,
        total_ideas=total_ideas,
        percentage=round(percentage, 1)
    )
 # Statistics endpoint
@app.get("/api/statistics", response_model=Statistics)
 def get_statistics() -> Statistics:
    """Get overall assessment statistics."""
    stats = db.get_statistics()
    return Statistics(**stats)
 # Dimension definitions endpoint
@app.get("/api/dimensions")
 def get_dimensions() -> dict[str, Any]:
    """Get dimension definitions for the UI."""
    return DIMENSION_DEFINITIONS
 # Export endpoint
@app.get("/api/export", response_model=ExportData)
 def export_ratings() -> ExportData:
    """Export all ratings with hidden metadata for analysis."""
    data = get_assessment_data()
    all_ratings = db.get_all_ratings()
    # Build idea lookup with hidden metadata
    idea_lookup: dict[str, dict[str, Any]] = {}
    query_lookup: dict[str, str] = {}
    for query in data['queries']:
        query_lookup[query['query_id']] = query['query_text']
        for idea in query['ideas']:
            idea_lookup[idea['idea_id']] = {
                'text': idea['text'],
                'condition': idea['_hidden']['condition'],
                'expert_name': idea['_hidden']['expert_name'],
                'keyword': idea['_hidden']['keyword']
            }
    # Build export ratings
    export_ratings = []
    for r in all_ratings:
        idea_data = idea_lookup.get(r['idea_id'], {})
        export_ratings.append(ExportRating(
            rater_id=r['rater_id'],
            idea_id=r['idea_id'],
            query_id=r['query_id'],
            query_text=query_lookup.get(r['query_id'], ''),
            idea_text=idea_data.get('text', ''),
            originality=r['originality'],
            elaboration=r['elaboration'],
            coherence=r['coherence'],
            usefulness=r['usefulness'],
            skipped=bool(r['skipped']),
            condition=idea_data.get('condition', ''),
            expert_name=idea_data.get('expert_name', ''),
            keyword=idea_data.get('keyword', ''),
            timestamp=r['timestamp']
        ))
    return ExportData(
        experiment_id=data['experiment_id'],
        export_timestamp=datetime.utcnow(),
        rater_count=len(db.list_raters()),
        rating_count=len(export_ratings),
        ratings=export_ratings
    )
 # Health check
@app.get("/api/health")
 def health_check() -> dict[str, str]:
    """Health check endpoint."""
    return {"status": "healthy"}
 # Info endpoint
@app.get("/api/info")
 def get_info() -> dict[str, Any]:
    """Get assessment session info."""
    data = get_assessment_data()
    return {
        'experiment_id': data['experiment_id'],
        'total_ideas': data['total_ideas'],
        'query_count': data['query_count'],
        'conditions': data['conditions'],
        'randomization_seed': data['randomization_seed']
    }
--- a/experiments/assessment/backend/database.py
+++ b/experiments/assessment/backend/database.py
@@ -0,0 +1,309 @@
 """
 SQLite database setup and operations for assessment ratings storage.
 """
 import sqlite3
 from contextlib import contextmanager
 from datetime import datetime
 from pathlib import Path
 from typing import Any, Generator
 # Database path
 DB_PATH = Path(__file__).parent.parent / 'results' / 'ratings.db'
 def get_db_path() -> Path:
    """Get the database path, ensuring directory exists."""
    DB_PATH.parent.mkdir(parents=True, exist_ok=True)
    return DB_PATH
@contextmanager
 def get_connection() -> Generator[sqlite3.Connection, None, None]:
    """Get a database connection as a context manager."""
    conn = sqlite3.connect(get_db_path())
    conn.row_factory = sqlite3.Row
    try:
        yield conn
    finally:
        conn.close()
 def init_db() -> None:
    """Initialize the database with required tables."""
    with get_connection() as conn:
        cursor = conn.cursor()
        # Raters table
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS raters (
                rater_id TEXT PRIMARY KEY,
                name TEXT,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
            )
        ''')
        # Ratings table
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS ratings (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                rater_id TEXT NOT NULL,
                idea_id TEXT NOT NULL,
                query_id TEXT NOT NULL,
                originality INTEGER CHECK(originality BETWEEN 1 AND 5),
                elaboration INTEGER CHECK(elaboration BETWEEN 1 AND 5),
                coherence INTEGER CHECK(coherence BETWEEN 1 AND 5),
                usefulness INTEGER CHECK(usefulness BETWEEN 1 AND 5),
                skipped INTEGER DEFAULT 0,
                timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                FOREIGN KEY (rater_id) REFERENCES raters(rater_id),
                UNIQUE(rater_id, idea_id)
            )
        ''')
        # Progress table
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS progress (
                rater_id TEXT NOT NULL,
                query_id TEXT NOT NULL,
                completed_count INTEGER DEFAULT 0,
                total_count INTEGER DEFAULT 0,
                started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                PRIMARY KEY (rater_id, query_id),
                FOREIGN KEY (rater_id) REFERENCES raters(rater_id)
            )
        ''')
        # Create indexes for common queries
        cursor.execute('''
            CREATE INDEX IF NOT EXISTS idx_ratings_rater
            ON ratings(rater_id)
        ''')
        cursor.execute('''
            CREATE INDEX IF NOT EXISTS idx_ratings_idea
            ON ratings(idea_id)
        ''')
        conn.commit()
 # Rater operations
 def create_rater(rater_id: str, name: str | None = None) -> dict[str, Any]:
    """Create a new rater."""
    with get_connection() as conn:
        cursor = conn.cursor()
        try:
            cursor.execute(
                'INSERT INTO raters (rater_id, name) VALUES (?, ?)',
                (rater_id, name or rater_id)
            )
            conn.commit()
            return {'rater_id': rater_id, 'name': name or rater_id, 'created': True}
        except sqlite3.IntegrityError:
            # Rater already exists
            return get_rater(rater_id)
 def get_rater(rater_id: str) -> dict[str, Any] | None:
    """Get a rater by ID."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute('SELECT * FROM raters WHERE rater_id = ?', (rater_id,))
        row = cursor.fetchone()
        if row:
            return dict(row)
        return None
 def list_raters() -> list[dict[str, Any]]:
    """List all raters."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute('SELECT * FROM raters ORDER BY created_at')
        return [dict(row) for row in cursor.fetchall()]
 # Rating operations
 def save_rating(
    rater_id: str,
    idea_id: str,
    query_id: str,
    originality: int | None,
    elaboration: int | None,
    coherence: int | None,
    usefulness: int | None,
    skipped: bool = False
 ) -> dict[str, Any]:
    """Save or update a rating."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute('''
            INSERT INTO ratings (rater_id, idea_id, query_id, originality, elaboration, coherence, usefulness, skipped, timestamp)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
            ON CONFLICT(rater_id, idea_id) DO UPDATE SET
                originality = excluded.originality,
                elaboration = excluded.elaboration,
                coherence = excluded.coherence,
                usefulness = excluded.usefulness,
                skipped = excluded.skipped,
                timestamp = excluded.timestamp
        ''', (rater_id, idea_id, query_id, originality, elaboration, coherence, usefulness, int(skipped), datetime.utcnow()))
        conn.commit()
        # Update progress
        update_progress(rater_id, query_id)
        return {
            'rater_id': rater_id,
            'idea_id': idea_id,
            'saved': True
        }
 def get_rating(rater_id: str, idea_id: str) -> dict[str, Any] | None:
    """Get a specific rating."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute(
            'SELECT * FROM ratings WHERE rater_id = ? AND idea_id = ?',
            (rater_id, idea_id)
        )
        row = cursor.fetchone()
        if row:
            return dict(row)
        return None
 def get_ratings_by_rater(rater_id: str) -> list[dict[str, Any]]:
    """Get all ratings by a rater."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute(
            'SELECT * FROM ratings WHERE rater_id = ? ORDER BY timestamp',
            (rater_id,)
        )
        return [dict(row) for row in cursor.fetchall()]
 def get_ratings_by_idea(idea_id: str) -> list[dict[str, Any]]:
    """Get all ratings for an idea."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute(
            'SELECT * FROM ratings WHERE idea_id = ? ORDER BY rater_id',
            (idea_id,)
        )
        return [dict(row) for row in cursor.fetchall()]
 def get_all_ratings() -> list[dict[str, Any]]:
    """Get all ratings."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute('SELECT * FROM ratings ORDER BY timestamp')
        return [dict(row) for row in cursor.fetchall()]
 # Progress operations
 def update_progress(rater_id: str, query_id: str) -> None:
    """Update progress for a rater on a query."""
    with get_connection() as conn:
        cursor = conn.cursor()
        # Count completed ratings for this query
        cursor.execute('''
            SELECT COUNT(*) as count FROM ratings
            WHERE rater_id = ? AND query_id = ?
        ''', (rater_id, query_id))
        completed = cursor.fetchone()['count']
        # Update or insert progress
        cursor.execute('''
            INSERT INTO progress (rater_id, query_id, completed_count, updated_at)
            VALUES (?, ?, ?, ?)
            ON CONFLICT(rater_id, query_id) DO UPDATE SET
                completed_count = excluded.completed_count,
                updated_at = excluded.updated_at
        ''', (rater_id, query_id, completed, datetime.utcnow()))
        conn.commit()
 def set_progress_total(rater_id: str, query_id: str, total: int) -> None:
    """Set the total count for a query's progress."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute('''
            INSERT INTO progress (rater_id, query_id, total_count, completed_count)
            VALUES (?, ?, ?, 0)
            ON CONFLICT(rater_id, query_id) DO UPDATE SET
                total_count = excluded.total_count
        ''', (rater_id, query_id, total))
        conn.commit()
 def get_progress(rater_id: str) -> list[dict[str, Any]]:
    """Get progress for all queries for a rater."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute(
            'SELECT * FROM progress WHERE rater_id = ? ORDER BY query_id',
            (rater_id,)
        )
        return [dict(row) for row in cursor.fetchall()]
 def get_progress_for_query(rater_id: str, query_id: str) -> dict[str, Any] | None:
    """Get progress for a specific query."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute(
            'SELECT * FROM progress WHERE rater_id = ? AND query_id = ?',
            (rater_id, query_id)
        )
        row = cursor.fetchone()
        if row:
            return dict(row)
        return None
 def get_rated_idea_ids(rater_id: str, query_id: str) -> set[str]:
    """Get the set of idea IDs already rated by a rater for a query."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute(
            'SELECT idea_id FROM ratings WHERE rater_id = ? AND query_id = ?',
            (rater_id, query_id)
        )
        return {row['idea_id'] for row in cursor.fetchall()}
 # Statistics
 def get_statistics() -> dict[str, Any]:
    """Get overall statistics."""
    with get_connection() as conn:
        cursor = conn.cursor()
        cursor.execute('SELECT COUNT(*) as count FROM raters')
        rater_count = cursor.fetchone()['count']
        cursor.execute('SELECT COUNT(*) as count FROM ratings WHERE skipped = 0')
        rating_count = cursor.fetchone()['count']
        cursor.execute('SELECT COUNT(*) as count FROM ratings WHERE skipped = 1')
        skip_count = cursor.fetchone()['count']
        cursor.execute('SELECT COUNT(DISTINCT idea_id) as count FROM ratings')
        rated_ideas = cursor.fetchone()['count']
        return {
            'rater_count': rater_count,
            'rating_count': rating_count,
            'skip_count': skip_count,
            'rated_ideas': rated_ideas
        }
 # Initialize on import
 init_db()
--- a/experiments/assessment/backend/models.py
+++ b/experiments/assessment/backend/models.py
@@ -0,0 +1,183 @@
 """
 Pydantic models for the assessment API.
 """
 from datetime import datetime
 from pydantic import BaseModel, Field
 # Request models
 class RaterCreate(BaseModel):
    """Request to create or login as a rater."""
    rater_id: str = Field(..., min_length=1, max_length=50, description="Unique rater identifier")
    name: str | None = Field(None, max_length=100, description="Optional display name")
 class RatingSubmit(BaseModel):
    """Request to submit a rating."""
    rater_id: str = Field(..., description="Rater identifier")
    idea_id: str = Field(..., description="Idea identifier")
    query_id: str = Field(..., description="Query identifier")
    originality: int | None = Field(None, ge=1, le=5, description="Originality score 1-5")
    elaboration: int | None = Field(None, ge=1, le=5, description="Elaboration score 1-5")
    coherence: int | None = Field(None, ge=1, le=5, description="Coherence score 1-5")
    usefulness: int | None = Field(None, ge=1, le=5, description="Usefulness score 1-5")
    skipped: bool = Field(False, description="Whether the idea was skipped")
 # Response models
 class Rater(BaseModel):
    """Rater information."""
    rater_id: str
    name: str | None
    created_at: datetime | None = None
 class Rating(BaseModel):
    """A single rating."""
    id: int
    rater_id: str
    idea_id: str
    query_id: str
    originality: int | None
    elaboration: int | None
    coherence: int | None
    usefulness: int | None
    skipped: int
    timestamp: datetime | None
 class Progress(BaseModel):
    """Progress for a rater on a query."""
    rater_id: str
    query_id: str
    completed_count: int
    total_count: int
    started_at: datetime | None = None
    updated_at: datetime | None = None
 class QueryInfo(BaseModel):
    """Information about a query."""
    query_id: str
    query_text: str
    category: str
    idea_count: int
 class IdeaForRating(BaseModel):
    """An idea presented for rating (without hidden metadata)."""
    idea_id: str
    text: str
    index: int  # Position in the randomized list for this query
 class QueryWithIdeas(BaseModel):
    """A query with its ideas for rating."""
    query_id: str
    query_text: str
    category: str
    ideas: list[IdeaForRating]
    total_count: int
 class Statistics(BaseModel):
    """Overall statistics."""
    rater_count: int
    rating_count: int
    skip_count: int
    rated_ideas: int
 class RaterProgress(BaseModel):
    """Complete progress summary for a rater."""
    rater_id: str
    queries: list[Progress]
    total_completed: int
    total_ideas: int
    percentage: float
 # Export response models
 class ExportRating(BaseModel):
    """Rating with hidden metadata for export."""
    rater_id: str
    idea_id: str
    query_id: str
    query_text: str
    idea_text: str
    originality: int | None
    elaboration: int | None
    coherence: int | None
    usefulness: int | None
    skipped: bool
    condition: str
    expert_name: str
    keyword: str
    timestamp: datetime | None
 class ExportData(BaseModel):
    """Full export data structure."""
    experiment_id: str
    export_timestamp: datetime
    rater_count: int
    rating_count: int
    ratings: list[ExportRating]
 # Dimension definitions (for frontend)
 DIMENSION_DEFINITIONS = {
    "originality": {
        "name": "Originality",
        "question": "How unexpected or surprising is this idea? Would most people NOT think of this?",
        "scale": {
            1: "Very common/obvious idea anyone would suggest",
            2: "Somewhat common, slight variation on expected ideas",
            3: "Moderately original, some unexpected elements",
            4: "Quite original, notably different approach",
            5: "Highly unexpected, truly novel concept"
        },
        "low_label": "Common",
        "high_label": "Unexpected"
    },
    "elaboration": {
        "name": "Elaboration",
        "question": "How detailed and well-developed is this idea?",
        "scale": {
            1: "Vague, minimal detail, just a concept",
            2: "Basic idea with little specificity",
            3: "Moderately detailed, some specifics provided",
            4: "Well-developed with clear implementation hints",
            5: "Highly specific, thoroughly developed concept"
        },
        "low_label": "Vague",
        "high_label": "Detailed"
    },
    "coherence": {
        "name": "Coherence",
        "question": "Does this idea make logical sense and relate to the query object?",
        "scale": {
            1: "Nonsensical, irrelevant, or incomprehensible",
            2: "Mostly unclear, weak connection to query",
            3: "Partially coherent, some logical gaps",
            4: "Mostly coherent with minor issues",
            5: "Fully coherent, clearly relates to query"
        },
        "low_label": "Nonsense",
        "high_label": "Coherent"
    },
    "usefulness": {
        "name": "Usefulness",
        "question": "Could this idea have practical value or inspire real innovation?",
        "scale": {
            1: "No practical value whatsoever",
            2: "Minimal usefulness, highly impractical",
            3: "Some potential value with major limitations",
            4: "Useful idea with realistic applications",
            5: "Highly useful, clear practical value"
        },
        "low_label": "Useless",
        "high_label": "Useful"
    }
 }
--- a/experiments/assessment/backend/requirements.txt
+++ b/experiments/assessment/backend/requirements.txt
@@ -0,0 +1,3 @@
 fastapi>=0.109.0
 uvicorn>=0.27.0
 pydantic>=2.5.0
--- a/experiments/assessment/data/assessment_items.json
+++ b/experiments/assessment/data/assessment_items.json
--- a/experiments/assessment/frontend/index.html
+++ b/experiments/assessment/frontend/index.html
@@ -0,0 +1,13 @@
 <!DOCTYPE html>
 <html lang="en">
  <head>
    <meta charset="UTF-8" />
    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Creative Idea Assessment</title>
  </head>
  <body>
    <div id="root"></div>
    <script type="module" src="/src/main.tsx"></script>
  </body>
 </html>
--- a/experiments/assessment/frontend/package-lock.json
+++ b/experiments/assessment/frontend/package-lock.json
--- a/experiments/assessment/frontend/package.json
+++ b/experiments/assessment/frontend/package.json
@@ -0,0 +1,32 @@
 {
  "name": "assessment-frontend",
  "private": true,
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "dev": "vite",
    "build": "tsc -b && vite build",
    "lint": "eslint .",
    "preview": "vite preview"
  },
  "dependencies": {
    "@ant-design/icons": "^6.1.0",
    "antd": "^6.0.0",
    "react": "^19.2.0",
    "react-dom": "^19.2.0"
  },
  "devDependencies": {
    "@eslint/js": "^9.39.1",
    "@types/node": "^24.10.1",
    "@types/react": "^19.2.5",
    "@types/react-dom": "^19.2.3",
    "@vitejs/plugin-react": "^5.1.1",
    "eslint": "^9.39.1",
    "eslint-plugin-react-hooks": "^7.0.1",
    "eslint-plugin-react-refresh": "^0.4.24",
    "globals": "^16.5.0",
    "typescript": "~5.9.3",
    "typescript-eslint": "^8.46.4",
    "vite": "^7.2.4"
  }
 }
--- a/experiments/assessment/frontend/src/App.tsx
+++ b/experiments/assessment/frontend/src/App.tsx
@@ -0,0 +1,109 @@
 /**
 * Main application component for the assessment interface.
 */
 import { ConfigProvider, theme, Spin } from 'antd';
 import { useAssessment } from './hooks/useAssessment';
 import { RaterLogin } from './components/RaterLogin';
 import { InstructionsPage } from './components/InstructionsPage';
 import { AssessmentPage } from './components/AssessmentPage';
 import { CompletionPage } from './components/CompletionPage';
 function App() {
  const assessment = useAssessment();
  const renderContent = () => {
    // Show loading spinner for initial load
    if (assessment.loading && !assessment.rater) {
      return (
        <div style={{
          display: 'flex',
          justifyContent: 'center',
          alignItems: 'center',
          minHeight: '100vh'
        }}>
          <Spin size="large" />
        </div>
      );
    }
    switch (assessment.view) {
      case 'login':
        return (
          <RaterLogin
            onLogin={assessment.login}
            loading={assessment.loading}
            error={assessment.error}
          />
        );
      case 'instructions':
        return (
          <InstructionsPage
            dimensions={assessment.dimensions}
            onStart={assessment.startAssessment}
            loading={assessment.loading}
          />
        );
      case 'assessment':
        if (!assessment.rater || !assessment.currentQuery || !assessment.currentIdea || !assessment.dimensions) {
          return (
            <div style={{
              display: 'flex',
              justifyContent: 'center',
              alignItems: 'center',
              minHeight: '100vh'
            }}>
              <Spin size="large" tip="Loading..." />
            </div>
          );
        }
        return (
          <AssessmentPage
            raterId={assessment.rater.rater_id}
            queryId={assessment.currentQuery.query_id}
            queryText={assessment.currentQuery.query_text}
            idea={assessment.currentIdea}
            ideaIndex={assessment.currentIdeaIndex}
            totalIdeas={assessment.currentQuery.total_count}
            dimensions={assessment.dimensions}
            progress={assessment.progress}
            onNext={assessment.nextIdea}
            onPrev={assessment.prevIdea}
            onShowDefinitions={assessment.showInstructions}
            onLogout={assessment.logout}
            canGoPrev={assessment.currentIdeaIndex > 0}
          />
        );
      case 'completion':
        return (
          <CompletionPage
            raterId={assessment.rater?.rater_id ?? ''}
            progress={assessment.progress}
            onLogout={assessment.logout}
          />
        );
      default:
        return null;
    }
  };
  return (
    <ConfigProvider
      theme={{
        algorithm: theme.defaultAlgorithm,
        token: {
          colorPrimary: '#1677ff',
          borderRadius: 6,
        },
      }}
    >
      {renderContent()}
    </ConfigProvider>
  );
 }
 export default App;
--- a/experiments/assessment/frontend/src/components/AssessmentPage.tsx
+++ b/experiments/assessment/frontend/src/components/AssessmentPage.tsx
@@ -0,0 +1,199 @@
 /**
 * Main assessment page for rating ideas.
 */
 import { Card, Button, Space, Alert, Typography } from 'antd';
 import {
  ArrowLeftOutlined,
  ArrowRightOutlined,
  ForwardOutlined,
  BookOutlined,
  LogoutOutlined
 } from '@ant-design/icons';
 import type { IdeaForRating, DimensionDefinitions, RaterProgress } from '../types';
 import { useRatings } from '../hooks/useRatings';
 import { IdeaCard } from './IdeaCard';
 import { RatingSlider } from './RatingSlider';
 import { ProgressBar } from './ProgressBar';
 const { Text } = Typography;
 interface AssessmentPageProps {
  raterId: string;
  queryId: string;
  queryText: string;
  idea: IdeaForRating;
  ideaIndex: number;
  totalIdeas: number;
  dimensions: DimensionDefinitions;
  progress: RaterProgress | null;
  onNext: () => void;
  onPrev: () => void;
  onShowDefinitions: () => void;
  onLogout: () => void;
  canGoPrev: boolean;
 }
 export function AssessmentPage({
  raterId,
  queryId,
  queryText,
  idea,
  ideaIndex,
  totalIdeas,
  dimensions,
  progress,
  onNext,
  onPrev,
  onShowDefinitions,
  onLogout,
  canGoPrev
 }: AssessmentPageProps) {
  const {
    ratings,
    setRating,
    isComplete,
    submit,
    skip,
    submitting,
    error
  } = useRatings({
    raterId,
    queryId,
    ideaId: idea.idea_id,
    onSuccess: onNext
  });
  const handleSubmit = async () => {
    await submit();
  };
  const handleSkip = async () => {
    await skip();
  };
  // Calculate query progress
  const queryProgress = progress?.queries.find(q => q.query_id === queryId);
  const queryCompleted = queryProgress?.completed_count ?? ideaIndex;
  const queryTotal = totalIdeas;
  return (
    <div style={{ maxWidth: 800, margin: '0 auto', padding: 24 }}>
      {/* Header with query info and overall progress */}
      <Card size="small" style={{ marginBottom: 16 }}>
        <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center', marginBottom: 8 }}>
          <Text strong style={{ fontSize: 16 }}>Query: "{queryText}"</Text>
          <Space>
            <Button
              icon={<BookOutlined />}
              onClick={onShowDefinitions}
              size="small"
            >
              Definitions
            </Button>
            <Button
              icon={<LogoutOutlined />}
              onClick={onLogout}
              size="small"
              danger
            >
              Exit
            </Button>
          </Space>
        </div>
        <ProgressBar
          completed={queryCompleted}
          total={queryTotal}
          label="Query Progress"
        />
        {progress && (
          <div style={{ marginTop: 8 }}>
            <ProgressBar
              completed={progress.total_completed}
              total={progress.total_ideas}
              label="Overall Progress"
            />
          </div>
        )}
      </Card>
      {/* Error display */}
      {error && (
        <Alert
          message={error}
          type="error"
          showIcon
          closable
          style={{ marginBottom: 16 }}
        />
      )}
      {/* Idea card */}
      <IdeaCard
        ideaNumber={ideaIndex + 1}
        text={idea.text}
        queryText={queryText}
      />
      {/* Rating inputs */}
      <Card style={{ marginBottom: 16 }}>
        <RatingSlider
          dimension={dimensions.originality}
          value={ratings.originality}
          onChange={(v) => setRating('originality', v)}
          disabled={submitting}
        />
        <RatingSlider
          dimension={dimensions.elaboration}
          value={ratings.elaboration}
          onChange={(v) => setRating('elaboration', v)}
          disabled={submitting}
        />
        <RatingSlider
          dimension={dimensions.coherence}
          value={ratings.coherence}
          onChange={(v) => setRating('coherence', v)}
          disabled={submitting}
        />
        <RatingSlider
          dimension={dimensions.usefulness}
          value={ratings.usefulness}
          onChange={(v) => setRating('usefulness', v)}
          disabled={submitting}
        />
      </Card>
      {/* Navigation buttons */}
      <Card>
        <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
          <Button
            icon={<ArrowLeftOutlined />}
            onClick={onPrev}
            disabled={!canGoPrev || submitting}
          >
            Back
          </Button>
          <Space>
            <Button
              icon={<ForwardOutlined />}
              onClick={handleSkip}
              loading={submitting}
            >
              Skip
            </Button>
            <Button
              type="primary"
              icon={<ArrowRightOutlined />}
              onClick={handleSubmit}
              loading={submitting}
              disabled={!isComplete()}
            >
              Submit & Next
            </Button>
          </Space>
        </div>
      </Card>
    </div>
  );
 }
--- a/experiments/assessment/frontend/src/components/CompletionPage.tsx
+++ b/experiments/assessment/frontend/src/components/CompletionPage.tsx
@@ -0,0 +1,105 @@
 /**
 * Completion page shown when all ideas have been rated.
 */
 import { Card, Button, Typography, Space, Result, Statistic, Row, Col } from 'antd';
 import { CheckCircleOutlined, BarChartOutlined, LogoutOutlined } from '@ant-design/icons';
 import type { RaterProgress } from '../types';
 const { Title, Text } = Typography;
 interface CompletionPageProps {
  raterId: string;
  progress: RaterProgress | null;
  onLogout: () => void;
 }
 export function CompletionPage({ raterId, progress, onLogout }: CompletionPageProps) {
  const completed = progress?.total_completed ?? 0;
  const total = progress?.total_ideas ?? 0;
  const percentage = progress?.percentage ?? 0;
  const isFullyComplete = completed >= total;
  return (
    <div style={{
      display: 'flex',
      justifyContent: 'center',
      alignItems: 'center',
      minHeight: '100vh',
      padding: 24
    }}>
      <Card style={{ maxWidth: 600, width: '100%' }}>
        <Result
          status={isFullyComplete ? 'success' : 'info'}
          icon={isFullyComplete ? <CheckCircleOutlined /> : <BarChartOutlined />}
          title={isFullyComplete ? 'Assessment Complete!' : 'Session Summary'}
          subTitle={
            isFullyComplete
              ? 'Thank you for completing the assessment.'
              : 'You have made progress on the assessment.'
          }
          extra={[
            <Button
              type="primary"
              key="logout"
              icon={<LogoutOutlined />}
              onClick={onLogout}
            >
              Exit
            </Button>
          ]}
        >
          <Row gutter={16} style={{ marginTop: 24 }}>
            <Col span={8}>
              <Statistic
                title="Ideas Rated"
                value={completed}
                suffix={`/ ${total}`}
              />
            </Col>
            <Col span={8}>
              <Statistic
                title="Progress"
                value={percentage}
                suffix="%"
                precision={1}
              />
            </Col>
            <Col span={8}>
              <Statistic
                title="Rater ID"
                value={raterId}
                valueStyle={{ fontSize: 16 }}
              />
            </Col>
          </Row>
          {progress && progress.queries.length > 0 && (
            <div style={{ marginTop: 24 }}>
              <Title level={5}>Progress by Query</Title>
              <Space direction="vertical" style={{ width: '100%' }}>
                {progress.queries.map((q) => (
                  <div
                    key={q.query_id}
                    style={{
                      display: 'flex',
                      justifyContent: 'space-between',
                      padding: '4px 0'
                    }}
                  >
                    <Text>{q.query_id}</Text>
                    <Text type={q.completed_count >= q.total_count ? 'success' : 'secondary'}>
                      {q.completed_count} / {q.total_count}
                      {q.completed_count >= q.total_count && ' ✓'}
                    </Text>
                  </div>
                ))}
              </Space>
            </div>
          )}
        </Result>
      </Card>
    </div>
  );
 }
--- a/experiments/assessment/frontend/src/components/IdeaCard.tsx
+++ b/experiments/assessment/frontend/src/components/IdeaCard.tsx
@@ -0,0 +1,36 @@
 /**
 * Card displaying a single idea for rating.
 */
 import { Card, Typography, Tag } from 'antd';
 const { Text, Paragraph } = Typography;
 interface IdeaCardProps {
  ideaNumber: number;
  text: string;
  queryText: string;
 }
 export function IdeaCard({ ideaNumber, text, queryText }: IdeaCardProps) {
  return (
    <Card
      title={
        <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
          <Text strong>IDEA #{ideaNumber}</Text>
          <Tag color="blue">Query: {queryText}</Tag>
        </div>
      }
      style={{ marginBottom: 24 }}
    >
      <Paragraph style={{
        fontSize: 16,
        lineHeight: 1.8,
        margin: 0,
        padding: '8px 0'
      }}>
        "{text}"
      </Paragraph>
    </Card>
  );
 }
--- a/experiments/assessment/frontend/src/components/InstructionsPage.tsx
+++ b/experiments/assessment/frontend/src/components/InstructionsPage.tsx
@@ -0,0 +1,134 @@
 /**
 * Instructions page showing dimension definitions.
 */
 import { useState } from 'react';
 import { Card, Button, Typography, Space, Checkbox, Divider, Tag } from 'antd';
 import { PlayCircleOutlined } from '@ant-design/icons';
 import type { DimensionDefinitions } from '../types';
 const { Title, Text, Paragraph } = Typography;
 interface InstructionsPageProps {
  dimensions: DimensionDefinitions | null;
  onStart: () => void;
  onBack?: () => void;
  loading: boolean;
  isReturning?: boolean;
 }
 export function InstructionsPage({
  dimensions,
  onStart,
  onBack,
  loading,
  isReturning = false
 }: InstructionsPageProps) {
  const [acknowledged, setAcknowledged] = useState(isReturning);
  if (!dimensions) {
    return (
      <div style={{ padding: 24, textAlign: 'center' }}>
        <Text>Loading instructions...</Text>
      </div>
    );
  }
  const dimensionOrder = ['originality', 'elaboration', 'coherence', 'usefulness'] as const;
  return (
    <div style={{
      maxWidth: 800,
      margin: '0 auto',
      padding: 24
    }}>
      <Card>
        <Space direction="vertical" size="large" style={{ width: '100%' }}>
          <div style={{ textAlign: 'center' }}>
            <Title level={2}>Assessment Instructions</Title>
            <Paragraph type="secondary">
              You will rate creative ideas on 4 dimensions using a 1-5 scale.
              Please read each definition carefully before beginning.
            </Paragraph>
          </div>
          <Divider />
          {dimensionOrder.map((key) => {
            const dim = dimensions[key];
            return (
              <Card
                key={key}
                size="small"
                title={
                  <Space>
                    <Tag color="blue">{dim.name}</Tag>
                    <Text type="secondary">{dim.question}</Text>
                  </Space>
                }
                style={{ marginBottom: 16 }}
              >
                <div style={{
                  display: 'grid',
                  gridTemplateColumns: 'auto 1fr',
                  gap: '8px 16px',
                  fontSize: 14
                }}>
                  {([1, 2, 3, 4, 5] as const).map((score) => (
                    <>
                      <Tag
                        key={`score-${score}`}
                        color={score <= 2 ? 'red' : score === 3 ? 'orange' : 'green'}
                      >
                        {score}
                      </Tag>
                      <Text key={`text-${score}`}>
                        {dim.scale[score]}
                      </Text>
                    </>
                  ))}
                </div>
                <Divider style={{ margin: '12px 0' }} />
                <div style={{ display: 'flex', justifyContent: 'space-between' }}>
                  <Text type="secondary">{dim.low_label}</Text>
                  <Text type="secondary">{dim.high_label}</Text>
                </div>
              </Card>
            );
          })}
          <Divider />
          <Space direction="vertical" style={{ width: '100%' }}>
            {!isReturning && (
              <Checkbox
                checked={acknowledged}
                onChange={(e) => setAcknowledged(e.target.checked)}
              >
                I have read and understood the instructions
              </Checkbox>
            )}
            <Space style={{ width: '100%', justifyContent: 'center' }}>
              {onBack && (
                <Button onClick={onBack}>
                  Back to Assessment
                </Button>
              )}
              <Button
                type="primary"
                size="large"
                icon={<PlayCircleOutlined />}
                onClick={onStart}
                loading={loading}
                disabled={!acknowledged}
              >
                {isReturning ? 'Continue Rating' : 'Begin Rating'}
              </Button>
            </Space>
          </Space>
        </Space>
      </Card>
    </div>
  );
 }
--- a/experiments/assessment/frontend/src/components/ProgressBar.tsx
+++ b/experiments/assessment/frontend/src/components/ProgressBar.tsx
@@ -0,0 +1,39 @@
 /**
 * Progress bar component showing assessment progress.
 */
 import { Progress, Typography, Space } from 'antd';
 const { Text } = Typography;
 interface ProgressBarProps {
  completed: number;
  total: number;
  label?: string;
 }
 export function ProgressBar({ completed, total, label }: ProgressBarProps) {
  const percentage = total > 0 ? Math.round((completed / total) * 100) : 0;
  return (
    <div style={{ width: '100%' }}>
      {label && (
        <Space style={{ marginBottom: 4, justifyContent: 'space-between', width: '100%' }}>
          <Text type="secondary">{label}</Text>
          <Text type="secondary">
            {completed}/{total} ({percentage}%)
          </Text>
        </Space>
      )}
      <Progress
        percent={percentage}
        showInfo={!label}
        status="active"
        strokeColor={{
          '0%': '#108ee9',
          '100%': '#87d068',
        }}
      />
    </div>
  );
 }
--- a/experiments/assessment/frontend/src/components/RaterLogin.tsx
+++ b/experiments/assessment/frontend/src/components/RaterLogin.tsx
@@ -0,0 +1,116 @@
 /**
 * Rater login component.
 */
 import { useState, useEffect } from 'react';
 import { Card, Input, Button, Typography, Space, List, Alert } from 'antd';
 import { UserOutlined, LoginOutlined } from '@ant-design/icons';
 import * as api from '../services/api';
 import type { Rater } from '../types';
 const { Title, Text } = Typography;
 interface RaterLoginProps {
  onLogin: (raterId: string, name?: string) => void;
  loading: boolean;
  error: string | null;
 }
 export function RaterLogin({ onLogin, loading, error }: RaterLoginProps) {
  const [raterId, setRaterId] = useState('');
  const [existingRaters, setExistingRaters] = useState<Rater[]>([]);
  useEffect(() => {
    api.listRaters()
      .then(setExistingRaters)
      .catch(console.error);
  }, []);
  const handleLogin = () => {
    if (raterId.trim()) {
      onLogin(raterId.trim());
    }
  };
  const handleQuickLogin = (rater: Rater) => {
    onLogin(rater.rater_id);
  };
  return (
    <div style={{
      display: 'flex',
      justifyContent: 'center',
      alignItems: 'center',
      minHeight: '100vh',
      padding: 24
    }}>
      <Card
        style={{ width: 400, maxWidth: '100%' }}
        styles={{ body: { padding: 32 } }}
      >
        <Space direction="vertical" size="large" style={{ width: '100%' }}>
          <div style={{ textAlign: 'center' }}>
            <Title level={3} style={{ marginBottom: 8 }}>
              Creative Idea Assessment
            </Title>
            <Text type="secondary">
              Enter your rater ID to begin
            </Text>
          </div>
          {error && (
            <Alert message={error} type="error" showIcon />
          )}
          <Input
            size="large"
            placeholder="Enter your rater ID"
            prefix={<UserOutlined />}
            value={raterId}
            onChange={(e) => setRaterId(e.target.value)}
            onPressEnter={handleLogin}
            disabled={loading}
          />
          <Button
            type="primary"
            size="large"
            icon={<LoginOutlined />}
            onClick={handleLogin}
            loading={loading}
            disabled={!raterId.trim()}
            block
          >
            Start Assessment
          </Button>
          {existingRaters.length > 0 && (
            <div>
              <Text type="secondary" style={{ display: 'block', marginBottom: 8 }}>
                Existing raters:
              </Text>
              <List
                size="small"
                bordered
                dataSource={existingRaters}
                renderItem={(rater) => (
                  <List.Item
                    style={{ cursor: 'pointer' }}
                    onClick={() => handleQuickLogin(rater)}
                  >
                    <Text code>{rater.rater_id}</Text>
                    {rater.name && rater.name !== rater.rater_id && (
                      <Text type="secondary" style={{ marginLeft: 8 }}>
                        ({rater.name})
                      </Text>
                    )}
                  </List.Item>
                )}
              />
            </div>
          )}
        </Space>
      </Card>
    </div>
  );
 }
--- a/experiments/assessment/frontend/src/components/RatingSlider.tsx
+++ b/experiments/assessment/frontend/src/components/RatingSlider.tsx
@@ -0,0 +1,74 @@
 /**
 * Rating input component with radio buttons for 1-5 scale.
 */
 import { Radio, Typography, Space, Tooltip, Button } from 'antd';
 import { QuestionCircleOutlined } from '@ant-design/icons';
 import type { DimensionDefinition } from '../types';
 const { Text } = Typography;
 interface RatingSliderProps {
  dimension: DimensionDefinition;
  value: number | null;
  onChange: (value: number | null) => void;
  disabled?: boolean;
 }
 export function RatingSlider({ dimension, value, onChange, disabled }: RatingSliderProps) {
  return (
    <div style={{ marginBottom: 24 }}>
      <div style={{ display: 'flex', alignItems: 'center', marginBottom: 8 }}>
        <Text strong style={{ marginRight: 8 }}>
          {dimension.name.toUpperCase()}
        </Text>
        <Tooltip
          title={
            <div>
              <p style={{ marginBottom: 8 }}>{dimension.question}</p>
              {([1, 2, 3, 4, 5] as const).map((score) => (
                <div key={score} style={{ marginBottom: 4 }}>
                  <strong>{score}:</strong> {dimension.scale[score]}
                </div>
              ))}
            </div>
          }
          placement="right"
          overlayStyle={{ maxWidth: 400 }}
        >
          <Button
            type="text"
            size="small"
            icon={<QuestionCircleOutlined />}
            style={{ padding: 0, height: 'auto' }}
          />
        </Tooltip>
      </div>
      <div style={{ display: 'flex', alignItems: 'center', gap: 16 }}>
        <Text type="secondary" style={{ minWidth: 80, textAlign: 'right' }}>
          {dimension.low_label}
        </Text>
        <Radio.Group
          value={value}
          onChange={(e) => onChange(e.target.value)}
          disabled={disabled}
          style={{ flex: 1 }}
        >
          <Space size="large">
            {[1, 2, 3, 4, 5].map((score) => (
              <Radio key={score} value={score}>
                {score}
              </Radio>
            ))}
          </Space>
        </Radio.Group>
        <Text type="secondary" style={{ minWidth: 80 }}>
          {dimension.high_label}
        </Text>
      </div>
    </div>
  );
 }
--- a/experiments/assessment/frontend/src/hooks/useAssessment.ts
+++ b/experiments/assessment/frontend/src/hooks/useAssessment.ts
@@ -0,0 +1,272 @@
 /**
 * Hook for managing the assessment session state.
 */
 import { useState, useCallback, useEffect } from 'react';
 import type {
  AppView,
  DimensionDefinitions,
  QueryInfo,
  QueryWithIdeas,
  Rater,
  RaterProgress,
 } from '../types';
 import * as api from '../services/api';
 interface AssessmentState {
  view: AppView;
  rater: Rater | null;
  queries: QueryInfo[];
  currentQueryIndex: number;
  currentQuery: QueryWithIdeas | null;
  currentIdeaIndex: number;
  progress: RaterProgress | null;
  dimensions: DimensionDefinitions | null;
  loading: boolean;
  error: string | null;
 }
 const initialState: AssessmentState = {
  view: 'login',
  rater: null,
  queries: [],
  currentQueryIndex: 0,
  currentQuery: null,
  currentIdeaIndex: 0,
  progress: null,
  dimensions: null,
  loading: false,
  error: null,
 };
 export function useAssessment() {
  const [state, setState] = useState<AssessmentState>(initialState);
  // Load dimension definitions on mount
  useEffect(() => {
    api.getDimensionDefinitions()
      .then((dimensions) => setState((s) => ({ ...s, dimensions })))
      .catch((err) => console.error('Failed to load dimensions:', err));
  }, []);
  // Login as a rater
  const login = useCallback(async (raterId: string, name?: string) => {
    setState((s) => ({ ...s, loading: true, error: null }));
    try {
      const rater = await api.createOrGetRater({ rater_id: raterId, name });
      const queries = await api.listQueries();
      const progress = await api.getRaterProgress(raterId);
      setState((s) => ({
        ...s,
        rater,
        queries,
        progress,
        view: 'instructions',
        loading: false,
      }));
    } catch (err) {
      setState((s) => ({
        ...s,
        error: err instanceof Error ? err.message : 'Login failed',
        loading: false,
      }));
    }
  }, []);
  // Start assessment (move from instructions to assessment)
  const startAssessment = useCallback(async () => {
    if (!state.rater || state.queries.length === 0) return;
    setState((s) => ({ ...s, loading: true }));
    try {
      // Find first query with unrated ideas
      let queryIndex = 0;
      let queryData: QueryWithIdeas | null = null;
      for (let i = 0; i < state.queries.length; i++) {
        const unrated = await api.getUnratedIdeas(state.queries[i].query_id, state.rater.rater_id);
        if (unrated.ideas.length > 0) {
          queryIndex = i;
          queryData = unrated;
          break;
        }
      }
      if (!queryData) {
        // All done
        setState((s) => ({
          ...s,
          view: 'completion',
          loading: false,
        }));
        return;
      }
      setState((s) => ({
        ...s,
        view: 'assessment',
        currentQueryIndex: queryIndex,
        currentQuery: queryData,
        currentIdeaIndex: 0,
        loading: false,
      }));
    } catch (err) {
      setState((s) => ({
        ...s,
        error: err instanceof Error ? err.message : 'Failed to start assessment',
        loading: false,
      }));
    }
  }, [state.rater, state.queries]);
  // Move to next idea
  const nextIdea = useCallback(async () => {
    if (!state.currentQuery || !state.rater) return;
    const nextIndex = state.currentIdeaIndex + 1;
    if (nextIndex < state.currentQuery.ideas.length) {
      // More ideas in current query
      setState((s) => ({ ...s, currentIdeaIndex: nextIndex }));
    } else {
      // Query complete, try to move to next query
      const nextQueryIndex = state.currentQueryIndex + 1;
      if (nextQueryIndex < state.queries.length) {
        setState((s) => ({ ...s, loading: true }));
        try {
          const unrated = await api.getUnratedIdeas(
            state.queries[nextQueryIndex].query_id,
            state.rater.rater_id
          );
          if (unrated.ideas.length > 0) {
            setState((s) => ({
              ...s,
              currentQueryIndex: nextQueryIndex,
              currentQuery: unrated,
              currentIdeaIndex: 0,
              loading: false,
            }));
          } else {
            // Try to find next query with unrated ideas
            for (let i = nextQueryIndex + 1; i < state.queries.length; i++) {
              const nextUnrated = await api.getUnratedIdeas(
                state.queries[i].query_id,
                state.rater.rater_id
              );
              if (nextUnrated.ideas.length > 0) {
                setState((s) => ({
                  ...s,
                  currentQueryIndex: i,
                  currentQuery: nextUnrated,
                  currentIdeaIndex: 0,
                  loading: false,
                }));
                return;
              }
            }
            // All queries complete
            setState((s) => ({
              ...s,
              view: 'completion',
              loading: false,
            }));
          }
        } catch (err) {
          setState((s) => ({
            ...s,
            error: err instanceof Error ? err.message : 'Failed to load next query',
            loading: false,
          }));
        }
      } else {
        // All queries complete
        setState((s) => ({ ...s, view: 'completion' }));
      }
    }
    // Refresh progress
    try {
      const progress = await api.getRaterProgress(state.rater.rater_id);
      setState((s) => ({ ...s, progress }));
    } catch (err) {
      console.error('Failed to refresh progress:', err);
    }
  }, [state.currentQuery, state.currentIdeaIndex, state.currentQueryIndex, state.queries, state.rater]);
  // Move to previous idea
  const prevIdea = useCallback(() => {
    if (state.currentIdeaIndex > 0) {
      setState((s) => ({ ...s, currentIdeaIndex: s.currentIdeaIndex - 1 }));
    }
  }, [state.currentIdeaIndex]);
  // Jump to a specific query
  const jumpToQuery = useCallback(async (queryIndex: number) => {
    if (!state.rater || queryIndex < 0 || queryIndex >= state.queries.length) return;
    setState((s) => ({ ...s, loading: true }));
    try {
      const queryData = await api.getQueryWithIdeas(state.queries[queryIndex].query_id);
      setState((s) => ({
        ...s,
        currentQueryIndex: queryIndex,
        currentQuery: queryData,
        currentIdeaIndex: 0,
        view: 'assessment',
        loading: false,
      }));
    } catch (err) {
      setState((s) => ({
        ...s,
        error: err instanceof Error ? err.message : 'Failed to load query',
        loading: false,
      }));
    }
  }, [state.rater, state.queries]);
  // Refresh progress
  const refreshProgress = useCallback(async () => {
    if (!state.rater) return;
    try {
      const progress = await api.getRaterProgress(state.rater.rater_id);
      setState((s) => ({ ...s, progress }));
    } catch (err) {
      console.error('Failed to refresh progress:', err);
    }
  }, [state.rater]);
  // Show definitions
  const showInstructions = useCallback(() => {
    setState((s) => ({ ...s, view: 'instructions' }));
  }, []);
  // Return to assessment
  const returnToAssessment = useCallback(() => {
    setState((s) => ({ ...s, view: 'assessment' }));
  }, []);
  // Logout
  const logout = useCallback(() => {
    setState(initialState);
  }, []);
  // Get current idea
  const currentIdea = state.currentQuery?.ideas[state.currentIdeaIndex] ?? null;
  return {
    ...state,
    currentIdea,
    login,
    startAssessment,
    nextIdea,
    prevIdea,
    jumpToQuery,
    refreshProgress,
    showInstructions,
    returnToAssessment,
    logout,
  };
 }
--- a/experiments/assessment/frontend/src/hooks/useRatings.ts
+++ b/experiments/assessment/frontend/src/hooks/useRatings.ts
@@ -0,0 +1,133 @@
 /**
 * Hook for managing rating submission.
 */
 import { useState, useCallback } from 'react';
 import type { RatingState, DimensionKey } from '../types';
 import * as api from '../services/api';
 interface UseRatingsOptions {
  raterId: string | null;
  queryId: string | null;
  ideaId: string | null;
  onSuccess?: () => void;
 }
 export function useRatings({ raterId, queryId, ideaId, onSuccess }: UseRatingsOptions) {
  const [ratings, setRatings] = useState<RatingState>({
    originality: null,
    elaboration: null,
    coherence: null,
    usefulness: null,
  });
  const [submitting, setSubmitting] = useState(false);
  const [error, setError] = useState<string | null>(null);
  // Set a single rating
  const setRating = useCallback((dimension: DimensionKey, value: number | null) => {
    setRatings((prev) => ({ ...prev, [dimension]: value }));
  }, []);
  // Reset all ratings
  const resetRatings = useCallback(() => {
    setRatings({
      originality: null,
      elaboration: null,
      coherence: null,
      usefulness: null,
    });
    setError(null);
  }, []);
  // Check if all ratings are set
  const isComplete = useCallback(() => {
    return (
      ratings.originality !== null &&
      ratings.elaboration !== null &&
      ratings.coherence !== null &&
      ratings.usefulness !== null
    );
  }, [ratings]);
  // Submit rating
  const submit = useCallback(async () => {
    if (!raterId || !queryId || !ideaId) {
      setError('Missing required information');
      return false;
    }
    if (!isComplete()) {
      setError('Please rate all dimensions');
      return false;
    }
    setSubmitting(true);
    setError(null);
    try {
      await api.submitRating({
        rater_id: raterId,
        idea_id: ideaId,
        query_id: queryId,
        originality: ratings.originality,
        elaboration: ratings.elaboration,
        coherence: ratings.coherence,
        usefulness: ratings.usefulness,
        skipped: false,
      });
      resetRatings();
      onSuccess?.();
      return true;
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Failed to submit rating');
      return false;
    } finally {
      setSubmitting(false);
    }
  }, [raterId, queryId, ideaId, ratings, isComplete, resetRatings, onSuccess]);
  // Skip idea
  const skip = useCallback(async () => {
    if (!raterId || !queryId || !ideaId) {
      setError('Missing required information');
      return false;
    }
    setSubmitting(true);
    setError(null);
    try {
      await api.submitRating({
        rater_id: raterId,
        idea_id: ideaId,
        query_id: queryId,
        originality: null,
        elaboration: null,
        coherence: null,
        usefulness: null,
        skipped: true,
      });
      resetRatings();
      onSuccess?.();
      return true;
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Failed to skip idea');
      return false;
    } finally {
      setSubmitting(false);
    }
  }, [raterId, queryId, ideaId, resetRatings, onSuccess]);
  return {
    ratings,
    setRating,
    resetRatings,
    isComplete,
    submit,
    skip,
    submitting,
    error,
  };
 }
--- a/experiments/assessment/frontend/src/index.css
+++ b/experiments/assessment/frontend/src/index.css
@@ -0,0 +1,43 @@
 :root {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
  line-height: 1.5;
  font-weight: 400;
  color-scheme: light;
  color: rgba(0, 0, 0, 0.88);
  background-color: #f5f5f5;
  font-synthesis: none;
  text-rendering: optimizeLegibility;
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
 }
 body {
  margin: 0;
  min-height: 100vh;
 }
 #root {
  min-height: 100vh;
 }
 /* Custom scrollbar */
 ::-webkit-scrollbar {
  width: 8px;
  height: 8px;
 }
 ::-webkit-scrollbar-track {
  background: #f1f1f1;
  border-radius: 4px;
 }
 ::-webkit-scrollbar-thumb {
  background: #c1c1c1;
  border-radius: 4px;
 }
 ::-webkit-scrollbar-thumb:hover {
  background: #a8a8a8;
 }
--- a/experiments/assessment/frontend/src/main.tsx
+++ b/experiments/assessment/frontend/src/main.tsx
@@ -0,0 +1,10 @@
 import { StrictMode } from 'react'
 import { createRoot } from 'react-dom/client'
 import './index.css'
 import App from './App'
 createRoot(document.getElementById('root')!).render(
  <StrictMode>
    <App />
  </StrictMode>,
 )
--- a/experiments/assessment/frontend/src/services/api.ts
+++ b/experiments/assessment/frontend/src/services/api.ts
@@ -0,0 +1,116 @@
 /**
 * API client for the assessment backend.
 */
 import type {
  DimensionDefinitions,
  QueryInfo,
  QueryWithIdeas,
  Rater,
  RaterCreate,
  RaterProgress,
  Rating,
  RatingSubmit,
  SessionInfo,
  Statistics,
 } from '../types';
 const API_BASE = '/api';
 async function fetchJson<T>(url: string, options?: RequestInit): Promise<T> {
  const response = await fetch(`${API_BASE}${url}`, {
    headers: {
      'Content-Type': 'application/json',
      ...options?.headers,
    },
    ...options,
  });
  if (!response.ok) {
    const error = await response.json().catch(() => ({ detail: response.statusText }));
    throw new Error(error.detail || 'API request failed');
  }
  return response.json();
 }
 // Rater API
 export async function listRaters(): Promise<Rater[]> {
  return fetchJson<Rater[]>('/raters');
 }
 export async function createOrGetRater(data: RaterCreate): Promise<Rater> {
  return fetchJson<Rater>('/raters', {
    method: 'POST',
    body: JSON.stringify(data),
  });
 }
 export async function getRater(raterId: string): Promise<Rater> {
  return fetchJson<Rater>(`/raters/${encodeURIComponent(raterId)}`);
 }
 // Query API
 export async function listQueries(): Promise<QueryInfo[]> {
  return fetchJson<QueryInfo[]>('/queries');
 }
 export async function getQueryWithIdeas(queryId: string): Promise<QueryWithIdeas> {
  return fetchJson<QueryWithIdeas>(`/queries/${encodeURIComponent(queryId)}`);
 }
 export async function getUnratedIdeas(queryId: string, raterId: string): Promise<QueryWithIdeas> {
  return fetchJson<QueryWithIdeas>(
    `/queries/${encodeURIComponent(queryId)}/unrated?rater_id=${encodeURIComponent(raterId)}`
  );
 }
 // Rating API
 export async function submitRating(rating: RatingSubmit): Promise<{ saved: boolean }> {
  return fetchJson<{ saved: boolean }>('/ratings', {
    method: 'POST',
    body: JSON.stringify(rating),
  });
 }
 export async function getRating(raterId: string, ideaId: string): Promise<Rating | null> {
  try {
    return await fetchJson<Rating>(`/ratings/${encodeURIComponent(raterId)}/${encodeURIComponent(ideaId)}`);
  } catch {
    return null;
  }
 }
 export async function getRatingsByRater(raterId: string): Promise<Rating[]> {
  return fetchJson<Rating[]>(`/ratings/rater/${encodeURIComponent(raterId)}`);
 }
 // Progress API
 export async function getRaterProgress(raterId: string): Promise<RaterProgress> {
  return fetchJson<RaterProgress>(`/progress/${encodeURIComponent(raterId)}`);
 }
 // Statistics API
 export async function getStatistics(): Promise<Statistics> {
  return fetchJson<Statistics>('/statistics');
 }
 // Dimension definitions API
 export async function getDimensionDefinitions(): Promise<DimensionDefinitions> {
  return fetchJson<DimensionDefinitions>('/dimensions');
 }
 // Session info API
 export async function getSessionInfo(): Promise<SessionInfo> {
  return fetchJson<SessionInfo>('/info');
 }
 // Health check
 export async function healthCheck(): Promise<boolean> {
  try {
    await fetchJson<{ status: string }>('/health');
    return true;
  } catch {
    return false;
  }
 }
--- a/experiments/assessment/frontend/src/types/index.ts
+++ b/experiments/assessment/frontend/src/types/index.ts
@@ -0,0 +1,142 @@
 /**
 * TypeScript types for the assessment frontend.
 */
 // Rater types
 export interface Rater {
  rater_id: string;
  name: string | null;
  created_at?: string;
 }
 export interface RaterCreate {
  rater_id: string;
  name?: string;
 }
 // Query types
 export interface QueryInfo {
  query_id: string;
  query_text: string;
  category: string;
  idea_count: number;
 }
 export interface IdeaForRating {
  idea_id: string;
  text: string;
  index: number;
 }
 export interface QueryWithIdeas {
  query_id: string;
  query_text: string;
  category: string;
  ideas: IdeaForRating[];
  total_count: number;
 }
 // Rating types
 export interface RatingSubmit {
  rater_id: string;
  idea_id: string;
  query_id: string;
  originality: number | null;
  elaboration: number | null;
  coherence: number | null;
  usefulness: number | null;
  skipped: boolean;
 }
 export interface Rating {
  id: number;
  rater_id: string;
  idea_id: string;
  query_id: string;
  originality: number | null;
  elaboration: number | null;
  coherence: number | null;
  usefulness: number | null;
  skipped: number;
  timestamp: string | null;
 }
 // Progress types
 export interface QueryProgress {
  rater_id: string;
  query_id: string;
  completed_count: number;
  total_count: number;
  started_at?: string;
  updated_at?: string;
 }
 export interface RaterProgress {
  rater_id: string;
  queries: QueryProgress[];
  total_completed: number;
  total_ideas: number;
  percentage: number;
 }
 // Statistics types
 export interface Statistics {
  rater_count: number;
  rating_count: number;
  skip_count: number;
  rated_ideas: number;
 }
 // Dimension definition types
 export interface DimensionScale {
  1: string;
  2: string;
  3: string;
  4: string;
  5: string;
 }
 export interface DimensionDefinition {
  name: string;
  question: string;
  scale: DimensionScale;
  low_label: string;
  high_label: string;
 }
 export interface DimensionDefinitions {
  originality: DimensionDefinition;
  elaboration: DimensionDefinition;
  coherence: DimensionDefinition;
  usefulness: DimensionDefinition;
 }
 // Session info
 export interface SessionInfo {
  experiment_id: string;
  total_ideas: number;
  query_count: number;
  conditions: string[];
  randomization_seed: number;
 }
 // UI State types
 export type AppView = 'login' | 'instructions' | 'assessment' | 'completion';
 export interface RatingState {
  originality: number | null;
  elaboration: number | null;
  coherence: number | null;
  usefulness: number | null;
 }
 export const EMPTY_RATING_STATE: RatingState = {
  originality: null,
  elaboration: null,
  coherence: null,
  usefulness: null,
 };
 export type DimensionKey = keyof RatingState;
 export const DIMENSION_KEYS: DimensionKey[] = ['originality', 'elaboration', 'coherence', 'usefulness'];
--- a/experiments/assessment/frontend/tsconfig.json
+++ b/experiments/assessment/frontend/tsconfig.json
@@ -0,0 +1,20 @@
 {
  "compilerOptions": {
    "target": "ES2020",
    "useDefineForClassFields": true,
    "lib": ["ES2020", "DOM", "DOM.Iterable"],
    "module": "ESNext",
    "skipLibCheck": true,
    "moduleResolution": "bundler",
    "allowImportingTsExtensions": true,
    "isolatedModules": true,
    "moduleDetection": "force",
    "noEmit": true,
    "jsx": "react-jsx",
    "strict": true,
    "noUnusedLocals": true,
    "noUnusedParameters": true,
    "noFallthroughCasesInSwitch": true
  },
  "include": ["src"]
 }
--- a/experiments/assessment/frontend/vite.config.ts
+++ b/experiments/assessment/frontend/vite.config.ts
@@ -0,0 +1,16 @@
 import { defineConfig } from 'vite'
 import react from '@vitejs/plugin-react'
 export default defineConfig({
  plugins: [react()],
  server: {
    host: '0.0.0.0',
    port: 5174,
    proxy: {
      '/api': {
        target: 'http://localhost:8002',
        changeOrigin: true
      }
    }
  },
 })
--- a/experiments/assessment/prepare_data.py
+++ b/experiments/assessment/prepare_data.py
@@ -0,0 +1,375 @@
 #!/usr/bin/env python3
 """
 Prepare assessment data from experiment results.
 Extracts unique ideas from deduped experiment results, assigns stable IDs,
 and randomizes the order within each query for unbiased human assessment.
 Usage:
    python prepare_data.py                              # Use latest, all ideas
    python prepare_data.py --sample 100                 # Sample 100 ideas total
    python prepare_data.py --per-query 10               # 10 ideas per query
    python prepare_data.py --per-condition 5            # 5 ideas per condition per query
    python prepare_data.py --list                       # List available files
 """
 import argparse
 import json
 import random
 from pathlib import Path
 from typing import Any
 def load_experiment_data(filepath: Path) -> dict[str, Any]:
    """Load experiment data from JSON file."""
    with open(filepath, 'r', encoding='utf-8') as f:
        return json.load(f)
 def sample_ideas_stratified(
    ideas: list[dict[str, Any]],
    per_condition: int | None = None,
    total_limit: int | None = None,
    rng: random.Random | None = None
 ) -> list[dict[str, Any]]:
    """
    Sample ideas with stratification by condition.
    Args:
        ideas: List of ideas with _hidden.condition metadata
        per_condition: Max ideas per condition (stratified sampling)
        total_limit: Max total ideas (after stratified sampling)
        rng: Random number generator for reproducibility
    Returns:
        Sampled list of ideas
    """
    if rng is None:
        rng = random.Random()
    if per_condition is None and total_limit is None:
        return ideas
    # Group by condition
    by_condition: dict[str, list[dict[str, Any]]] = {}
    for idea in ideas:
        condition = idea['_hidden']['condition']
        if condition not in by_condition:
            by_condition[condition] = []
        by_condition[condition].append(idea)
    # Sample per condition
    sampled = []
    for condition, cond_ideas in by_condition.items():
        rng.shuffle(cond_ideas)
        if per_condition is not None:
            cond_ideas = cond_ideas[:per_condition]
        sampled.extend(cond_ideas)
    # Apply total limit if specified
    if total_limit is not None and len(sampled) > total_limit:
        rng.shuffle(sampled)
        sampled = sampled[:total_limit]
    return sampled
 def extract_ideas_from_condition(
    query_id: str,
    condition_name: str,
    condition_data: dict[str, Any],
    idea_counter: dict[str, int]
 ) -> list[dict[str, Any]]:
    """Extract ideas from a single condition with hidden metadata."""
    ideas = []
    dedup_data = condition_data.get('dedup', {})
    unique_ideas_with_source = dedup_data.get('unique_ideas_with_source', [])
    for item in unique_ideas_with_source:
        idea_text = item.get('idea', '')
        if not idea_text:
            continue
        # Generate stable idea ID
        current_count = idea_counter.get(query_id, 0)
        idea_id = f"{query_id}_I{current_count:03d}"
        idea_counter[query_id] = current_count + 1
        ideas.append({
            'idea_id': idea_id,
            'text': idea_text,
            '_hidden': {
                'condition': condition_name,
                'expert_name': item.get('expert_name', ''),
                'keyword': item.get('keyword', '')
            }
        })
    return ideas
 def prepare_assessment_data(
    experiment_filepath: Path,
    output_filepath: Path,
    seed: int = 42,
    sample_total: int | None = None,
    per_query: int | None = None,
    per_condition: int | None = None
 ) -> dict[str, Any]:
    """
    Prepare assessment data from experiment results.
    Args:
        experiment_filepath: Path to deduped experiment JSON
        output_filepath: Path to write assessment items JSON
        seed: Random seed for reproducible shuffling
        sample_total: Total number of ideas to sample (across all queries)
        per_query: Maximum ideas per query
        per_condition: Maximum ideas per condition per query (stratified)
    Returns:
        Assessment data structure
    """
    rng = random.Random(seed)
    # Load experiment data
    data = load_experiment_data(experiment_filepath)
    experiment_id = data.get('experiment_id', 'unknown')
    conditions = data.get('conditions', [])
    results = data.get('results', [])
    print(f"Loading experiment: {experiment_id}")
    print(f"Conditions: {conditions}")
    print(f"Number of queries: {len(results)}")
    # Show sampling config
    if sample_total or per_query or per_condition:
        print(f"Sampling config: total={sample_total}, per_query={per_query}, per_condition={per_condition}")
    assessment_queries = []
    total_ideas = 0
    idea_counter: dict[str, int] = {}
    for result in results:
        query_id = result.get('query_id', '')
        query_text = result.get('query', '')
        category = result.get('category', '')
        query_ideas = []
        # Extract ideas from all conditions
        conditions_data = result.get('conditions', {})
        for condition_name, condition_data in conditions_data.items():
            ideas = extract_ideas_from_condition(
                query_id, condition_name, condition_data, idea_counter
            )
            query_ideas.extend(ideas)
        # Apply stratified sampling if per_condition is specified
        if per_condition is not None:
            query_ideas = sample_ideas_stratified(
                query_ideas,
                per_condition=per_condition,
                rng=rng
            )
        # Apply per-query limit
        if per_query is not None and len(query_ideas) > per_query:
            rng.shuffle(query_ideas)
            query_ideas = query_ideas[:per_query]
        # Shuffle ideas within this query
        rng.shuffle(query_ideas)
        assessment_queries.append({
            'query_id': query_id,
            'query_text': query_text,
            'category': category,
            'ideas': query_ideas,
            'idea_count': len(query_ideas)
        })
        total_ideas += len(query_ideas)
        print(f"  Query '{query_text}' ({query_id}): {len(query_ideas)} ideas")
    # Apply total sample limit across all queries (proportionally)
    if sample_total is not None and total_ideas > sample_total:
        print(f"\nApplying total sample limit: {sample_total} (from {total_ideas})")
        # Calculate proportion to keep
        keep_ratio = sample_total / total_ideas
        new_total = 0
        for query in assessment_queries:
            n_keep = max(1, int(len(query['ideas']) * keep_ratio))
            rng.shuffle(query['ideas'])
            query['ideas'] = query['ideas'][:n_keep]
            query['idea_count'] = len(query['ideas'])
            new_total += len(query['ideas'])
        total_ideas = new_total
    # Build output structure
    assessment_data = {
        'experiment_id': experiment_id,
        'queries': assessment_queries,
        'total_ideas': total_ideas,
        'query_count': len(assessment_queries),
        'conditions': conditions,
        'randomization_seed': seed,
        'sampling': {
            'sample_total': sample_total,
            'per_query': per_query,
            'per_condition': per_condition
        },
        'metadata': {
            'source_file': str(experiment_filepath.name),
            'prepared_for': 'human_assessment'
        }
    }
    # Write output
    output_filepath.parent.mkdir(parents=True, exist_ok=True)
    with open(output_filepath, 'w', encoding='utf-8') as f:
        json.dump(assessment_data, f, ensure_ascii=False, indent=2)
    print(f"\nTotal ideas for assessment: {total_ideas}")
    print(f"Output written to: {output_filepath}")
    return assessment_data
 def list_experiment_files(results_dir: Path) -> list[Path]:
    """List available deduped experiment files."""
    return sorted(results_dir.glob('*_deduped.json'), key=lambda p: p.stat().st_mtime, reverse=True)
 def main():
    """Main entry point."""
    parser = argparse.ArgumentParser(
        description='Prepare assessment data from experiment results.',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  python prepare_data.py                              # Use latest, all ideas
  python prepare_data.py --sample 100                 # Sample 100 ideas total
  python prepare_data.py --per-query 20               # Max 20 ideas per query
  python prepare_data.py --per-condition 4            # 4 ideas per condition per query
  python prepare_data.py --per-condition 4 --per-query 15  # Combined limits
  python prepare_data.py --list                       # List available files
 Recommended for human assessment:
  # 5 conditions × 4 ideas × 10 queries = 200 ideas (balanced)
  python prepare_data.py --per-condition 4
  # Or limit total to ~150 ideas
  python prepare_data.py --sample 150
        """
    )
    parser.add_argument(
        'experiment_file',
        nargs='?',
        default=None,
        help='Experiment file name (e.g., experiment_20260119_165650_deduped.json)'
    )
    parser.add_argument(
        '--list', '-l',
        action='store_true',
        help='List available experiment files'
    )
    parser.add_argument(
        '--sample',
        type=int,
        default=None,
        metavar='N',
        help='Total number of ideas to sample (proportionally across queries)'
    )
    parser.add_argument(
        '--per-query',
        type=int,
        default=None,
        metavar='N',
        help='Maximum ideas per query'
    )
    parser.add_argument(
        '--per-condition',
        type=int,
        default=None,
        metavar='N',
        help='Maximum ideas per condition per query (stratified sampling)'
    )
    parser.add_argument(
        '--seed', '-s',
        type=int,
        default=42,
        help='Random seed for shuffling (default: 42)'
    )
    args = parser.parse_args()
    # Paths
    base_dir = Path(__file__).parent.parent
    results_dir = base_dir / 'results'
    output_file = Path(__file__).parent / 'data' / 'assessment_items.json'
    # List available files
    available_files = list_experiment_files(results_dir)
    if args.list:
        print("Available experiment files (most recent first):")
        for f in available_files:
            size_kb = f.stat().st_size / 1024
            print(f"  {f.name} ({size_kb:.1f} KB)")
        return
    # Determine which file to use
    if args.experiment_file:
        experiment_file = results_dir / args.experiment_file
        if not experiment_file.exists():
            # Try without .json extension
            experiment_file = results_dir / f"{args.experiment_file}.json"
    else:
        # Use the latest deduped file
        if not available_files:
            print("Error: No deduped experiment files found in results directory.")
            return
        experiment_file = available_files[0]
        print(f"Using latest experiment file: {experiment_file.name}")
    if not experiment_file.exists():
        print(f"Error: Experiment file not found: {experiment_file}")
        print("\nAvailable files:")
        for f in available_files:
            print(f"  {f.name}")
        return
    prepare_assessment_data(
        experiment_file,
        output_file,
        seed=args.seed,
        sample_total=args.sample,
        per_query=args.per_query,
        per_condition=args.per_condition
    )
    # Verify output
    with open(output_file, 'r') as f:
        data = json.load(f)
    print("\n--- Verification ---")
    print(f"Queries: {data['query_count']}")
    print(f"Total ideas: {data['total_ideas']}")
    # Show distribution by condition (from hidden metadata)
    condition_counts: dict[str, int] = {}
    for query in data['queries']:
        for idea in query['ideas']:
            condition = idea['_hidden']['condition']
            condition_counts[condition] = condition_counts.get(condition, 0) + 1
    print("\nIdeas per condition:")
    for condition, count in sorted(condition_counts.items()):
        print(f"  {condition}: {count}")
 if __name__ == '__main__':
    main()
--- a/experiments/assessment/results/ratings.db
+++ b/experiments/assessment/results/ratings.db
--- a/experiments/assessment/start.sh
+++ b/experiments/assessment/start.sh
@@ -0,0 +1,101 @@
 #!/bin/bash
 # Human Assessment Web Interface Start Script
 # This script starts both the backend API and frontend dev server
 set -e
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 cd "$SCRIPT_DIR"
 # Colors for output
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 NC='\033[0m' # No Color
 echo -e "${GREEN}================================${NC}"
 echo -e "${GREEN}Creative Idea Assessment System${NC}"
 echo -e "${GREEN}================================${NC}"
 echo
 # Find Python with FastAPI (use project venv or system)
 VENV_PYTHON="$SCRIPT_DIR/../../backend/venv/bin/python"
 if [ -x "$VENV_PYTHON" ]; then
    PYTHON_CMD="$VENV_PYTHON"
    UVICORN_CMD="$SCRIPT_DIR/../../backend/venv/bin/uvicorn"
 else
    PYTHON_CMD="python3"
    UVICORN_CMD="uvicorn"
 fi
 # Check if assessment data exists
 if [ ! -f "data/assessment_items.json" ]; then
    echo -e "${YELLOW}Assessment data not found. Running prepare_data.py...${NC}"
    $PYTHON_CMD prepare_data.py
    echo
 fi
 # Check if node_modules exist in frontend
 if [ ! -d "frontend/node_modules" ]; then
    echo -e "${YELLOW}Installing frontend dependencies...${NC}"
    cd frontend
    npm install
    cd ..
    echo
 fi
 # Function to cleanup background processes on exit
 cleanup() {
    echo
    echo -e "${YELLOW}Shutting down...${NC}"
    kill $BACKEND_PID 2>/dev/null || true
    kill $FRONTEND_PID 2>/dev/null || true
    exit 0
 }
 trap cleanup SIGINT SIGTERM
 # Start backend
 echo -e "${GREEN}Starting backend API on port 8002...${NC}"
 cd backend
 $UVICORN_CMD app:app --host 0.0.0.0 --port 8002 --reload &
 BACKEND_PID=$!
 cd ..
 # Wait for backend to start
 echo "Waiting for backend to initialize..."
 sleep 2
 # Check if backend is running
 if ! curl -s http://localhost:8002/api/health > /dev/null 2>&1; then
    echo -e "${RED}Backend failed to start. Check for errors above.${NC}"
    kill $BACKEND_PID 2>/dev/null || true
    exit 1
 fi
 echo -e "${GREEN}Backend is running.${NC}"
 echo
 # Start frontend
 echo -e "${GREEN}Starting frontend on port 5174...${NC}"
 cd frontend
 npm run dev &
 FRONTEND_PID=$!
 cd ..
 # Wait for frontend to start
 sleep 3
 echo
 echo -e "${GREEN}================================${NC}"
 echo -e "${GREEN}Assessment system is running!${NC}"
 echo -e "${GREEN}================================${NC}"
 echo
 echo -e "Backend API:  ${YELLOW}http://localhost:8002${NC}"
 echo -e "Frontend UI:  ${YELLOW}http://localhost:5174${NC}"
 echo
 echo -e "Press Ctrl+C to stop all services"
 echo
 # Wait for any process to exit
 wait
--- a/experiments/assessment/stop.sh
+++ b/experiments/assessment/stop.sh
@@ -0,0 +1,13 @@
 #!/bin/bash
 # Stop the assessment system
 echo "Stopping assessment system..."
 # Kill backend (uvicorn on port 8002)
 pkill -f "uvicorn app:app.*8002" 2>/dev/null && echo "Backend stopped" || echo "Backend not running"
 # Kill frontend (vite on port 5174)
 pkill -f "vite.*5174" 2>/dev/null && echo "Frontend stopped" || echo "Frontend not running"
 echo "Done"
--- a/experiments/aut_flexibility_analysis.py
+++ b/experiments/aut_flexibility_analysis.py
--- a/experiments/compute_metrics.py
+++ b/experiments/compute_metrics.py
@@ -0,0 +1,666 @@
 """
 Compute metrics for experiment results.
 Computes metrics BOTH before and after deduplication:
 - Pre-dedup: Measures raw generation capability
 - Post-dedup: Measures quality of unique ideas
 Also normalizes idea counts for fair cross-condition comparison.
 Usage:
    python -m experiments.compute_metrics --input results/experiment_xxx_deduped.json
 """
 import sys
 import json
 import argparse
 import asyncio
 import logging
 import random
 from pathlib import Path
 from typing import List, Dict, Any, Optional, Tuple
 from dataclasses import dataclass, asdict
 import numpy as np
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent / "backend"))
 from app.services.embedding_service import embedding_service
 from app.services.llm_service import ollama_provider, extract_json_from_response
 from experiments.config import RESULTS_DIR, MODEL, RANDOM_SEED
 # Configure logging
 logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
 )
 logger = logging.getLogger(__name__)
@dataclass
 class DiversityMetrics:
    """Semantic diversity metrics for a set of ideas."""
    mean_pairwise_distance: float
    std_pairwise_distance: float
    min_pairwise_distance: float
    max_pairwise_distance: float
    idea_count: int
@dataclass
 class ClusterMetrics:
    """Cluster analysis metrics."""
    optimal_clusters: int
    silhouette_score: float
    cluster_sizes: List[int]
@dataclass
 class QueryDistanceMetrics:
    """Distance from original query metrics."""
    mean_distance: float
    std_distance: float
    min_distance: float
    max_distance: float
    distances: List[float]
@dataclass
 class RelevanceMetrics:
    """LLM-as-judge relevance metrics (for hallucination detection)."""
    relevance_rate: float  # Score >= 2
    nonsense_rate: float   # Score == 1
    mean_score: float
    score_distribution: Dict[int, int]  # {1: count, 2: count, 3: count}
@dataclass
 class ConditionMetrics:
    """All metrics for a single condition."""
    condition: str
    query: str
    # Idea counts
    raw_count: int
    unique_count: int
    survival_rate: float
    # Pre-dedup metrics (on raw ideas)
    pre_dedup_diversity: Optional[DiversityMetrics]
    # Post-dedup metrics (on unique ideas)
    post_dedup_diversity: Optional[DiversityMetrics]
    post_dedup_clusters: Optional[ClusterMetrics]
    post_dedup_query_distance: Optional[QueryDistanceMetrics]
    # Normalized metrics (on equal-sized samples)
    normalized_diversity: Optional[DiversityMetrics]
    normalized_sample_size: int
    # Relevance/hallucination (post-dedup only)
    relevance: Optional[RelevanceMetrics]
 # ============================================================
 # Embedding-based metrics
 # ============================================================
 async def get_embeddings(texts: List[str]) -> List[List[float]]:
    """Get embeddings for a list of texts."""
    if not texts:
        return []
    return await embedding_service.get_embeddings_batch(texts)
 def compute_pairwise_distances(embeddings: List[List[float]]) -> List[float]:
    """Compute all pairwise cosine distances."""
    n = len(embeddings)
    if n < 2:
        return []
    distances = []
    for i in range(n):
        for j in range(i + 1, n):
            sim = embedding_service.cosine_similarity(embeddings[i], embeddings[j])
            dist = 1 - sim  # Convert similarity to distance
            distances.append(dist)
    return distances
 async def compute_diversity_metrics(ideas: List[str]) -> Optional[DiversityMetrics]:
    """Compute semantic diversity metrics for a set of ideas."""
    if len(ideas) < 2:
        return None
    embeddings = await get_embeddings(ideas)
    distances = compute_pairwise_distances(embeddings)
    if not distances:
        return None
    return DiversityMetrics(
        mean_pairwise_distance=float(np.mean(distances)),
        std_pairwise_distance=float(np.std(distances)),
        min_pairwise_distance=float(np.min(distances)),
        max_pairwise_distance=float(np.max(distances)),
        idea_count=len(ideas)
    )
 async def compute_query_distance_metrics(
    query: str,
    ideas: List[str]
 ) -> Optional[QueryDistanceMetrics]:
    """Compute distance of ideas from the original query."""
    if not ideas:
        return None
    # Get query embedding
    query_emb = await embedding_service.get_embedding(query)
    idea_embs = await get_embeddings(ideas)
    distances = []
    for emb in idea_embs:
        sim = embedding_service.cosine_similarity(query_emb, emb)
        dist = 1 - sim
        distances.append(dist)
    return QueryDistanceMetrics(
        mean_distance=float(np.mean(distances)),
        std_distance=float(np.std(distances)),
        min_distance=float(np.min(distances)),
        max_distance=float(np.max(distances)),
        distances=distances
    )
 async def compute_cluster_metrics(ideas: List[str]) -> Optional[ClusterMetrics]:
    """Compute cluster analysis metrics."""
    if len(ideas) < 3:
        return None
    try:
        from sklearn.cluster import KMeans
        from sklearn.metrics import silhouette_score
    except ImportError:
        logger.warning("sklearn not installed, skipping cluster metrics")
        return None
    embeddings = await get_embeddings(ideas)
    embeddings_np = np.array(embeddings)
    # Find optimal k using silhouette score
    max_k = min(len(ideas) - 1, 10)
    if max_k < 2:
        return None
    best_k = 2
    best_score = -1
    for k in range(2, max_k + 1):
        try:
            kmeans = KMeans(n_clusters=k, random_state=RANDOM_SEED, n_init=10)
            labels = kmeans.fit_predict(embeddings_np)
            score = silhouette_score(embeddings_np, labels)
            if score > best_score:
                best_score = score
                best_k = k
        except Exception as e:
            logger.warning(f"Clustering failed for k={k}: {e}")
            continue
    # Get cluster sizes for optimal k
    kmeans = KMeans(n_clusters=best_k, random_state=RANDOM_SEED, n_init=10)
    labels = kmeans.fit_predict(embeddings_np)
    cluster_sizes = [int(np.sum(labels == i)) for i in range(best_k)]
    return ClusterMetrics(
        optimal_clusters=best_k,
        silhouette_score=float(best_score),
        cluster_sizes=sorted(cluster_sizes, reverse=True)
    )
 # ============================================================
 # LLM-as-Judge relevance metrics
 # ============================================================
 async def judge_relevance(query: str, idea: str, model: str = None) -> Dict[str, Any]:
    """Use LLM to judge if an idea is relevant to the query."""
    model = model or MODEL
    prompt = f"""/no_think
 You are evaluating whether a generated idea is relevant and applicable to an original query.
 Original query: {query}
 Generated idea: {idea}
 Rate the relevance on a scale of 1-3:
 1 = Nonsense/completely irrelevant (no logical connection to the query)
 2 = Weak but valid connection (requires stretch but has some relevance)
 3 = Clearly relevant and applicable (directly relates to the query)
 Return JSON only:
 {{"score": N, "reason": "brief explanation (10-20 words)"}}
 """
    try:
        response = await ollama_provider.generate(
            prompt=prompt,
            model=model,
            temperature=0.3  # Lower temperature for more consistent judgments
        )
        result = extract_json_from_response(response)
        return {
            "score": result.get("score", 2),
            "reason": result.get("reason", "")
        }
    except Exception as e:
        logger.warning(f"Relevance judgment failed: {e}")
        return {"score": 2, "reason": "judgment failed"}
 async def compute_relevance_metrics(
    query: str,
    ideas: List[str],
    model: str = None,
    sample_size: int = None
 ) -> Optional[RelevanceMetrics]:
    """Compute LLM-as-judge relevance metrics for ideas."""
    if not ideas:
        return None
    # Optionally sample to reduce API calls
    if sample_size and len(ideas) > sample_size:
        rng = random.Random(RANDOM_SEED)
        ideas_to_judge = rng.sample(ideas, sample_size)
    else:
        ideas_to_judge = ideas
    scores = []
    for idea in ideas_to_judge:
        result = await judge_relevance(query, idea, model)
        scores.append(result["score"])
    # Compute distribution
    distribution = {1: 0, 2: 0, 3: 0}
    for s in scores:
        if s in distribution:
            distribution[s] += 1
    nonsense_count = distribution[1]
    relevant_count = distribution[2] + distribution[3]
    return RelevanceMetrics(
        relevance_rate=relevant_count / len(scores) if scores else 0,
        nonsense_rate=nonsense_count / len(scores) if scores else 0,
        mean_score=float(np.mean(scores)) if scores else 0,
        score_distribution=distribution
    )
 # ============================================================
 # Main metrics computation
 # ============================================================
 async def compute_condition_metrics(
    query: str,
    condition: str,
    raw_ideas: List[str],
    unique_ideas: List[str],
    normalized_sample_size: int,
    compute_relevance: bool = False
 ) -> ConditionMetrics:
    """Compute all metrics for a single condition."""
    raw_count = len(raw_ideas)
    unique_count = len(unique_ideas)
    survival_rate = unique_count / raw_count if raw_count > 0 else 1.0
    logger.info(f"  Computing metrics for {condition}...")
    logger.info(f"    Raw: {raw_count}, Unique: {unique_count}, Survival: {survival_rate:.1%}")
    # Pre-dedup diversity (on raw ideas)
    logger.info(f"    Computing pre-dedup diversity...")
    pre_dedup_diversity = await compute_diversity_metrics(raw_ideas)
    # Post-dedup diversity (on unique ideas)
    logger.info(f"    Computing post-dedup diversity...")
    post_dedup_diversity = await compute_diversity_metrics(unique_ideas)
    # Cluster analysis (post-dedup)
    logger.info(f"    Computing cluster metrics...")
    post_dedup_clusters = await compute_cluster_metrics(unique_ideas)
    # Query distance (post-dedup)
    logger.info(f"    Computing query distance...")
    post_dedup_query_distance = await compute_query_distance_metrics(query, unique_ideas)
    # Normalized diversity (equal-sized sample for fair comparison)
    normalized_diversity = None
    if len(unique_ideas) >= normalized_sample_size and normalized_sample_size > 1:
        logger.info(f"    Computing normalized diversity (n={normalized_sample_size})...")
        rng = random.Random(RANDOM_SEED)
        sampled_ideas = rng.sample(unique_ideas, normalized_sample_size)
        normalized_diversity = await compute_diversity_metrics(sampled_ideas)
    # Relevance metrics (optional, expensive)
    relevance = None
    if compute_relevance and unique_ideas:
        logger.info(f"    Computing relevance metrics (LLM-as-judge)...")
        # Sample up to 10 ideas to reduce cost
        relevance = await compute_relevance_metrics(
            query, unique_ideas, sample_size=min(10, len(unique_ideas))
        )
    return ConditionMetrics(
        condition=condition,
        query=query,
        raw_count=raw_count,
        unique_count=unique_count,
        survival_rate=survival_rate,
        pre_dedup_diversity=pre_dedup_diversity,
        post_dedup_diversity=post_dedup_diversity,
        post_dedup_clusters=post_dedup_clusters,
        post_dedup_query_distance=post_dedup_query_distance,
        normalized_diversity=normalized_diversity,
        normalized_sample_size=normalized_sample_size,
        relevance=relevance
    )
 async def process_experiment_results(
    input_file: Path,
    output_file: Optional[Path] = None,
    compute_relevance: bool = False
 ) -> Dict[str, Any]:
    """
    Process experiment results and compute all metrics.
    Args:
        input_file: Path to deduped experiment results JSON
        output_file: Path for output (default: input with _metrics suffix)
        compute_relevance: Whether to compute LLM-as-judge relevance
    Returns:
        Results with computed metrics
    """
    # Load experiment results
    with open(input_file, "r", encoding="utf-8") as f:
        experiment = json.load(f)
    logger.info(f"Processing experiment: {experiment.get('experiment_id', 'unknown')}")
    # Determine normalized sample size (minimum unique count across all conditions)
    min_unique_count = float('inf')
    for query_result in experiment["results"]:
        for condition, cond_result in query_result["conditions"].items():
            if cond_result.get("success", False):
                dedup = cond_result.get("dedup", {})
                unique_count = len(dedup.get("unique_ideas", cond_result.get("ideas", [])))
                if unique_count > 0:
                    min_unique_count = min(min_unique_count, unique_count)
    normalized_sample_size = min(int(min_unique_count), 10) if min_unique_count != float('inf') else 5
    logger.info(f"Normalized sample size: {normalized_sample_size}")
    # Process each query
    all_metrics = []
    for query_result in experiment["results"]:
        query = query_result["query"]
        query_id = query_result["query_id"]
        logger.info(f"\nProcessing query: {query} ({query_id})")
        query_metrics = {
            "query_id": query_id,
            "query": query,
            "conditions": {}
        }
        for condition, cond_result in query_result["conditions"].items():
            if not cond_result.get("success", False):
                logger.warning(f"  Skipping failed condition: {condition}")
                continue
            # Get raw and unique ideas
            raw_ideas = cond_result.get("ideas", [])
            dedup = cond_result.get("dedup", {})
            unique_ideas = dedup.get("unique_ideas", raw_ideas)
            # Compute metrics
            metrics = await compute_condition_metrics(
                query=query,
                condition=condition,
                raw_ideas=raw_ideas,
                unique_ideas=unique_ideas,
                normalized_sample_size=normalized_sample_size,
                compute_relevance=compute_relevance
            )
            # Convert to dict for JSON serialization
            query_metrics["conditions"][condition] = asdict(metrics)
        all_metrics.append(query_metrics)
    # Calculate aggregate statistics
    aggregate = calculate_aggregate_metrics(all_metrics)
    # Build output
    output = {
        "experiment_id": experiment.get("experiment_id"),
        "config": experiment.get("config"),
        "normalized_sample_size": normalized_sample_size,
        "metrics_by_query": all_metrics,
        "aggregate": aggregate
    }
    # Save results
    if output_file is None:
        stem = input_file.stem.replace("_deduped", "").replace("_complete", "")
        output_file = input_file.parent / f"{stem}_metrics.json"
    with open(output_file, "w", encoding="utf-8") as f:
        json.dump(output, f, indent=2, ensure_ascii=False)
    logger.info(f"\nMetrics saved to: {output_file}")
    return output
 def calculate_aggregate_metrics(all_metrics: List[Dict]) -> Dict[str, Any]:
    """Calculate aggregate statistics across all queries."""
    aggregate = {}
    # Collect metrics by condition
    by_condition = {}
    for query_metrics in all_metrics:
        for condition, metrics in query_metrics["conditions"].items():
            if condition not in by_condition:
                by_condition[condition] = {
                    "raw_counts": [],
                    "unique_counts": [],
                    "survival_rates": [],
                    "pre_dedup_diversity": [],
                    "post_dedup_diversity": [],
                    "normalized_diversity": [],
                    "query_distances": [],
                    "cluster_counts": [],
                    "silhouette_scores": [],
                    "relevance_rates": [],
                    "nonsense_rates": []
                }
            bc = by_condition[condition]
            bc["raw_counts"].append(metrics["raw_count"])
            bc["unique_counts"].append(metrics["unique_count"])
            bc["survival_rates"].append(metrics["survival_rate"])
            if metrics.get("pre_dedup_diversity"):
                bc["pre_dedup_diversity"].append(
                    metrics["pre_dedup_diversity"]["mean_pairwise_distance"]
                )
            if metrics.get("post_dedup_diversity"):
                bc["post_dedup_diversity"].append(
                    metrics["post_dedup_diversity"]["mean_pairwise_distance"]
                )
            if metrics.get("normalized_diversity"):
                bc["normalized_diversity"].append(
                    metrics["normalized_diversity"]["mean_pairwise_distance"]
                )
            if metrics.get("post_dedup_query_distance"):
                bc["query_distances"].append(
                    metrics["post_dedup_query_distance"]["mean_distance"]
                )
            if metrics.get("post_dedup_clusters"):
                bc["cluster_counts"].append(
                    metrics["post_dedup_clusters"]["optimal_clusters"]
                )
                bc["silhouette_scores"].append(
                    metrics["post_dedup_clusters"]["silhouette_score"]
                )
            if metrics.get("relevance"):
                bc["relevance_rates"].append(metrics["relevance"]["relevance_rate"])
                bc["nonsense_rates"].append(metrics["relevance"]["nonsense_rate"])
    # Calculate means and stds
    for condition, data in by_condition.items():
        aggregate[condition] = {}
        for metric_name, values in data.items():
            if values:
                aggregate[condition][metric_name] = {
                    "mean": float(np.mean(values)),
                    "std": float(np.std(values)),
                    "min": float(np.min(values)),
                    "max": float(np.max(values)),
                    "n": len(values)
                }
    return aggregate
 def print_metrics_summary(metrics: Dict[str, Any]):
    """Print a formatted summary of computed metrics."""
    print("\n" + "=" * 80)
    print("METRICS SUMMARY")
    print("=" * 80)
    print(f"\nNormalized sample size: {metrics.get('normalized_sample_size', 'N/A')}")
    aggregate = metrics.get("aggregate", {})
    # Idea counts
    print("\n--- Idea Counts ---")
    print(f"{'Condition':<25} {'Raw':<10} {'Unique':<10} {'Survival':<10}")
    print("-" * 55)
    for cond, data in aggregate.items():
        raw = data.get("raw_counts", {}).get("mean", 0)
        unique = data.get("unique_counts", {}).get("mean", 0)
        survival = data.get("survival_rates", {}).get("mean", 0)
        print(f"{cond:<25} {raw:<10.1f} {unique:<10.1f} {survival:<10.1%}")
    # Diversity metrics
    print("\n--- Semantic Diversity (Mean Pairwise Distance) ---")
    print(f"{'Condition':<25} {'Pre-Dedup':<12} {'Post-Dedup':<12} {'Normalized':<12}")
    print("-" * 61)
    for cond, data in aggregate.items():
        pre = data.get("pre_dedup_diversity", {}).get("mean", 0)
        post = data.get("post_dedup_diversity", {}).get("mean", 0)
        norm = data.get("normalized_diversity", {}).get("mean", 0)
        print(f"{cond:<25} {pre:<12.4f} {post:<12.4f} {norm:<12.4f}")
    # Query distance
    print("\n--- Query Distance (Novelty) ---")
    print(f"{'Condition':<25} {'Mean Distance':<15} {'Std':<10}")
    print("-" * 50)
    for cond, data in aggregate.items():
        dist = data.get("query_distances", {})
        mean = dist.get("mean", 0)
        std = dist.get("std", 0)
        print(f"{cond:<25} {mean:<15.4f} {std:<10.4f}")
    # Cluster metrics
    print("\n--- Cluster Analysis ---")
    print(f"{'Condition':<25} {'Clusters':<12} {'Silhouette':<12}")
    print("-" * 49)
    for cond, data in aggregate.items():
        clusters = data.get("cluster_counts", {}).get("mean", 0)
        silhouette = data.get("silhouette_scores", {}).get("mean", 0)
        print(f"{cond:<25} {clusters:<12.1f} {silhouette:<12.4f}")
    # Relevance (if computed)
    has_relevance = any(
        "relevance_rates" in data and data["relevance_rates"].get("n", 0) > 0
        for data in aggregate.values()
    )
    if has_relevance:
        print("\n--- Relevance (LLM-as-Judge) ---")
        print(f"{'Condition':<25} {'Relevance':<12} {'Nonsense':<12}")
        print("-" * 49)
        for cond, data in aggregate.items():
            rel = data.get("relevance_rates", {}).get("mean", 0)
            non = data.get("nonsense_rates", {}).get("mean", 0)
            print(f"{cond:<25} {rel:<12.1%} {non:<12.1%}")
    print("\n" + "=" * 80)
    print("Interpretation:")
    print("- Higher pairwise distance = more diverse ideas")
    print("- Higher query distance = more novel (farther from original)")
    print("- More clusters = more distinct themes")
    print("- Higher silhouette = cleaner cluster separation")
    print("=" * 80)
 async def main():
    parser = argparse.ArgumentParser(
        description="Compute metrics for experiment results"
    )
    parser.add_argument(
        "--input",
        type=str,
        required=True,
        help="Input deduped experiment results JSON file"
    )
    parser.add_argument(
        "--output",
        type=str,
        help="Output file path (default: input_metrics.json)"
    )
    parser.add_argument(
        "--relevance",
        action="store_true",
        help="Compute LLM-as-judge relevance metrics (expensive)"
    )
    args = parser.parse_args()
    input_path = Path(args.input)
    if not input_path.exists():
        input_path = RESULTS_DIR / args.input
        if not input_path.exists():
            print(f"Error: Input file not found: {args.input}")
            sys.exit(1)
    output_path = Path(args.output) if args.output else None
    metrics = await process_experiment_results(
        input_file=input_path,
        output_file=output_path,
        compute_relevance=args.relevance
    )
    print_metrics_summary(metrics)
 if __name__ == "__main__":
    asyncio.run(main())
--- a/experiments/conditions/init.py
+++ b/experiments/conditions/init.py
@@ -0,0 +1,23 @@
 """
 Condition implementations for the 5-condition experiment.
 C1: Direct generation (baseline)
 C2: Expert-only (no attributes)
 C3: Attribute-only (no experts)
 C4: Full pipeline (attributes + experts)
 C5: Random-perspective (random words instead of experts)
 """
 from .c1_direct import generate_ideas as c1_generate
 from .c2_expert_only import generate_ideas as c2_generate
 from .c3_attribute_only import generate_ideas as c3_generate
 from .c4_full_pipeline import generate_ideas as c4_generate
 from .c5_random_perspective import generate_ideas as c5_generate
 __all__ = [
    "c1_generate",
    "c2_generate",
    "c3_generate",
    "c4_generate",
    "c5_generate",
 ]
--- a/experiments/conditions/c1_direct.py
+++ b/experiments/conditions/c1_direct.py
@@ -0,0 +1,111 @@
 """
 Condition 1: Direct Generation (Baseline)
 Single LLM call asking for creative ideas directly.
 No attribute decomposition, no expert perspectives.
 """
 import sys
 from pathlib import Path
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
 from typing import List, Dict, Any
 from app.services.llm_service import ollama_provider, extract_json_from_response
 from experiments.config import MODEL, TEMPERATURE, IDEAS_DIRECT, PROMPT_LANGUAGE
 def get_direct_generation_prompt(query: str, idea_count: int, lang: str = "en") -> str:
    """Generate prompt for direct idea generation."""
    if lang == "en":
        return f"""/no_think
 Generate {idea_count} creative and innovative ideas for "{query}".
 Requirements:
 1. Each idea should be specific and actionable
 2. Ideas should be diverse, covering different aspects and applications
 3. Include both practical improvements and creative innovations
 4. Ideas should be 15-30 words each
 Return JSON only:
 {{"ideas": ["idea 1", "idea 2", "idea 3", ...]}}
 Generate exactly {idea_count} ideas."""
    else:
        return f"""/no_think
 為「{query}」生成 {idea_count} 個創意點子。
 要求：
 1. 每個點子要具體可行
 2. 點子要多元，涵蓋不同面向和應用
 3. 包含實用改進和創意創新
 4. 每個點子 15-30 字
 只回傳 JSON：
 {{"ideas": ["點子1", "點子2", "點子3", ...]}}
 生成正好 {idea_count} 個點子。"""
 async def generate_ideas(
    query: str,
    model: str = None,
    temperature: float = None,
    idea_count: int = None,
    lang: str = None
 ) -> Dict[str, Any]:
    """
    Generate ideas using direct LLM generation (C1 baseline).
    Args:
        query: The object/concept to generate ideas for
        model: LLM model to use (default from config)
        temperature: Generation temperature (default from config)
        idea_count: Number of ideas to generate (default from config)
        lang: Language for prompts (default from config)
    Returns:
        Dict with ideas and metadata
    """
    model = model or MODEL
    temperature = temperature or TEMPERATURE
    idea_count = idea_count or IDEAS_DIRECT
    lang = lang or PROMPT_LANGUAGE
    prompt = get_direct_generation_prompt(query, idea_count, lang)
    response = await ollama_provider.generate(
        prompt=prompt,
        model=model,
        temperature=temperature
    )
    result = extract_json_from_response(response)
    ideas = result.get("ideas", [])
    return {
        "condition": "c1_direct",
        "query": query,
        "ideas": ideas,
        "idea_count": len(ideas),
        "metadata": {
            "model": model,
            "temperature": temperature,
            "prompt_language": lang,
            "mechanism": "direct_llm_generation"
        }
    }
 # For testing
 if __name__ == "__main__":
    import asyncio
    async def test():
        result = await generate_ideas("Chair")
        print(f"Generated {result['idea_count']} ideas:")
        for i, idea in enumerate(result['ideas'], 1):
            print(f"  {i}. {idea}")
    asyncio.run(test())
--- a/experiments/conditions/c2_expert_only.py
+++ b/experiments/conditions/c2_expert_only.py
@@ -0,0 +1,176 @@
 """
 Condition 2: Expert-Only (No Attributes)
 Uses expert perspectives to generate ideas, but without
 attribute decomposition. Each expert generates ideas directly
 for the query from their professional perspective.
 """
 import sys
 from pathlib import Path
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
 from typing import List, Dict, Any
 from app.services.llm_service import ollama_provider, extract_json_from_response
 from app.services.expert_source_service import expert_source_service
 from experiments.config import (
    MODEL, TEMPERATURE, EXPERT_COUNT, EXPERT_SOURCE,
    IDEAS_PER_EXPERT, PROMPT_LANGUAGE
 )
 def get_expert_idea_generation_prompt(
    query: str,
    expert_name: str,
    expert_domain: str,
    idea_count: int,
    lang: str = "en"
 ) -> str:
    """Generate prompt for expert-based idea generation."""
    if lang == "en":
        domain_text = f" ({expert_domain} field)" if expert_domain else ""
        return f"""/no_think
 You are a {expert_name}{domain_text}.
 Task: Generate {idea_count} creative and innovative ideas for "{query}" from your professional perspective.
 Requirements:
 1. Each idea should reflect your professional expertise and unique viewpoint
 2. Think about how concepts from your field could improve or reimagine "{query}"
 3. Ideas should be specific and actionable (15-30 words each)
 4. Combine your professional knowledge with creative thinking
 Return JSON only:
 {{"ideas": ["idea 1", "idea 2", "idea 3", ...]}}
 Generate exactly {idea_count} ideas from your perspective as a {expert_name}."""
    else:
        domain_text = f"（{expert_domain}領域）" if expert_domain else ""
        return f"""/no_think
 你是一位{expert_name}{domain_text}。
 任務：從你的專業角度，為「{query}」生成 {idea_count} 個創意點子。
 要求：
 1. 每個點子要反映你的專業知識和獨特觀點
 2. 思考你領域的概念如何改進或重新想像「{query}」
 3. 點子要具體可行（每個 15-30 字）
 4. 結合專業知識和創意思維
 只回傳 JSON：
 {{"ideas": ["點子1", "點子2", "點子3", ...]}}
 從你作為{expert_name}的角度生成正好 {idea_count} 個點子。"""
 async def generate_ideas(
    query: str,
    model: str = None,
    temperature: float = None,
    expert_count: int = None,
    expert_source: str = None,
    ideas_per_expert: int = None,
    lang: str = None
 ) -> Dict[str, Any]:
    """
    Generate ideas using expert perspectives only (C2).
    Args:
        query: The object/concept to generate ideas for
        model: LLM model to use
        temperature: Generation temperature
        expert_count: Number of experts to use
        expert_source: Source of experts (curated, dbpedia, etc.)
        ideas_per_expert: Ideas each expert generates
        lang: Language for prompts
    Returns:
        Dict with ideas and metadata
    """
    model = model or MODEL
    temperature = temperature or TEMPERATURE
    expert_count = expert_count or EXPERT_COUNT
    expert_source = expert_source or EXPERT_SOURCE
    ideas_per_expert = ideas_per_expert or IDEAS_PER_EXPERT
    lang = lang or PROMPT_LANGUAGE
    # Get experts from curated source
    experts, actual_source = expert_source_service.get_experts(
        source=expert_source,
        count=expert_count,
        language=lang
    )
    all_ideas = []
    expert_details = []
    for expert in experts:
        expert_name = expert.get("name", "Expert")
        expert_domain = expert.get("domain", "")
        prompt = get_expert_idea_generation_prompt(
            query=query,
            expert_name=expert_name,
            expert_domain=expert_domain,
            idea_count=ideas_per_expert,
            lang=lang
        )
        response = await ollama_provider.generate(
            prompt=prompt,
            model=model,
            temperature=temperature
        )
        result = extract_json_from_response(response)
        ideas = result.get("ideas", [])
        # Tag ideas with expert source
        for idea in ideas:
            all_ideas.append({
                "idea": idea,
                "expert_name": expert_name,
                "expert_domain": expert_domain
            })
        expert_details.append({
            "name": expert_name,
            "domain": expert_domain,
            "ideas_generated": len(ideas)
        })
    return {
        "condition": "c2_expert_only",
        "query": query,
        "ideas": [item["idea"] for item in all_ideas],
        "ideas_with_source": all_ideas,
        "idea_count": len(all_ideas),
        "metadata": {
            "model": model,
            "temperature": temperature,
            "prompt_language": lang,
            "expert_count": expert_count,
            "expert_source": actual_source,
            "ideas_per_expert": ideas_per_expert,
            "experts": expert_details,
            "mechanism": "expert_perspectives_only"
        }
    }
 # For testing
 if __name__ == "__main__":
    import asyncio
    async def test():
        result = await generate_ideas("Chair")
        print(f"Generated {result['idea_count']} ideas from {len(result['metadata']['experts'])} experts:")
        for exp in result['metadata']['experts']:
            print(f"  - {exp['name']}: {exp['ideas_generated']} ideas")
        print("\nSample ideas:")
        for i, item in enumerate(result['ideas_with_source'][:5], 1):
            print(f"  {i}. [{item['expert_name']}] {item['idea']}")
    asyncio.run(test())
--- a/experiments/conditions/c3_attribute_only.py
+++ b/experiments/conditions/c3_attribute_only.py
@@ -0,0 +1,181 @@
 """
 Condition 3: Attribute-Only (No Experts)
 Uses attribute decomposition to break down the query into
 structured categories, then generates ideas from each attribute.
 No expert perspectives involved.
 """
 import sys
 from pathlib import Path
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
 from typing import List, Dict, Any
 from app.services.llm_service import ollama_provider, extract_json_from_response
 from app.prompts.attribute_prompt import get_step1_dynamic_attributes_prompt
 from experiments.config import (
    MODEL, TEMPERATURE, FIXED_CATEGORIES, PROMPT_LANGUAGE
 )
 def get_attribute_idea_generation_prompt(
    query: str,
    category: str,
    attribute: str,
    idea_count: int,
    lang: str = "en"
 ) -> str:
    """Generate prompt for attribute-based idea generation."""
    if lang == "en":
        return f"""/no_think
 Generate {idea_count} creative ideas for "{query}" focusing on the attribute "{attribute}" (Category: {category}).
 Requirements:
 1. Each idea should be directly inspired by the attribute "{attribute}"
 2. Think about how this attribute could be improved, reimagined, or applied in new ways
 3. Ideas should be specific and actionable (15-30 words each)
 4. Be creative while maintaining relevance to the attribute
 Return JSON only:
 {{"ideas": ["idea 1", "idea 2", ...]}}
 Generate exactly {idea_count} ideas based on the attribute "{attribute}"."""
    else:
        return f"""/no_think
 為「{query}」生成 {idea_count} 個創意點子，聚焦於屬性「{attribute}」（類別：{category}）。
 要求：
 1. 每個點子要直接受屬性「{attribute}」啟發
 2. 思考如何改進、重新想像或以新方式應用這個屬性
 3. 點子要具體可行（每個 15-30 字）
 4. 保持創意同時與屬性相關
 只回傳 JSON：
 {{"ideas": ["點子1", "點子2", ...]}}
 基於屬性「{attribute}」生成正好 {idea_count} 個點子。"""
 async def generate_ideas(
    query: str,
    model: str = None,
    temperature: float = None,
    categories: List[str] = None,
    ideas_per_attribute: int = 1,
    lang: str = None
 ) -> Dict[str, Any]:
    """
    Generate ideas using attribute decomposition only (C3).
    Args:
        query: The object/concept to generate ideas for
        model: LLM model to use
        temperature: Generation temperature
        categories: Categories to use for decomposition
        ideas_per_attribute: Ideas to generate per attribute
        lang: Language for prompts
    Returns:
        Dict with ideas and metadata
    """
    model = model or MODEL
    temperature = temperature or TEMPERATURE
    categories = categories or FIXED_CATEGORIES
    lang = lang or PROMPT_LANGUAGE
    # Step 1: Generate attributes using existing prompt
    # Build category definitions for the prompt
    category_defs = [
        {"name": cat, "description": f"Related {cat.lower()} of the object", "order": i}
        for i, cat in enumerate(categories)
    ]
    attr_prompt = get_step1_dynamic_attributes_prompt(
        query=query,
        categories=category_defs,
        lang=lang
    )
    attr_response = await ollama_provider.generate(
        prompt=attr_prompt,
        model=model,
        temperature=temperature
    )
    attributes_by_category = extract_json_from_response(attr_response)
    # Step 2: Generate ideas for each attribute
    all_ideas = []
    attribute_details = []
    for category in categories:
        attrs = attributes_by_category.get(category, [])
        for attr in attrs:
            prompt = get_attribute_idea_generation_prompt(
                query=query,
                category=category,
                attribute=attr,
                idea_count=ideas_per_attribute,
                lang=lang
            )
            response = await ollama_provider.generate(
                prompt=prompt,
                model=model,
                temperature=temperature
            )
            result = extract_json_from_response(response)
            ideas = result.get("ideas", [])
            # Tag ideas with attribute source
            for idea in ideas:
                all_ideas.append({
                    "idea": idea,
                    "category": category,
                    "attribute": attr
                })
            attribute_details.append({
                "category": category,
                "attribute": attr,
                "ideas_generated": len(ideas)
            })
    return {
        "condition": "c3_attribute_only",
        "query": query,
        "ideas": [item["idea"] for item in all_ideas],
        "ideas_with_source": all_ideas,
        "idea_count": len(all_ideas),
        "metadata": {
            "model": model,
            "temperature": temperature,
            "prompt_language": lang,
            "categories": categories,
            "attributes_by_category": attributes_by_category,
            "attribute_count": sum(len(v) for v in attributes_by_category.values()),
            "ideas_per_attribute": ideas_per_attribute,
            "attributes": attribute_details,
            "mechanism": "attribute_decomposition_only"
        }
    }
 # For testing
 if __name__ == "__main__":
    import asyncio
    async def test():
        result = await generate_ideas("Chair")
        print(f"Generated {result['idea_count']} ideas from {result['metadata']['attribute_count']} attributes:")
        for cat, attrs in result['metadata']['attributes_by_category'].items():
            print(f"  {cat}: {', '.join(attrs)}")
        print("\nSample ideas:")
        for i, item in enumerate(result['ideas_with_source'][:5], 1):
            print(f"  {i}. [{item['category']}/{item['attribute']}] {item['idea']}")
    asyncio.run(test())
--- a/experiments/conditions/c4_full_pipeline.py
+++ b/experiments/conditions/c4_full_pipeline.py
@@ -0,0 +1,214 @@
 """
 Condition 4: Full Pipeline (Attributes + Experts)
 The complete novelty-seeking system:
 1. Attribute decomposition into categories
 2. Expert team generation
 3. Expert keyword generation for each attribute
 4. Description generation for each keyword
 """
 import sys
 from pathlib import Path
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
 from typing import List, Dict, Any
 from app.services.llm_service import ollama_provider, extract_json_from_response
 from app.services.expert_source_service import expert_source_service
 from app.prompts.attribute_prompt import get_step1_dynamic_attributes_prompt
 from app.prompts.expert_transformation_prompt import (
    get_expert_keyword_generation_prompt,
    get_single_description_prompt
 )
 from experiments.config import (
    MODEL, TEMPERATURE, FIXED_CATEGORIES, EXPERT_COUNT,
    EXPERT_SOURCE, KEYWORDS_PER_EXPERT, PROMPT_LANGUAGE
 )
 async def generate_ideas(
    query: str,
    model: str = None,
    temperature: float = None,
    categories: List[str] = None,
    expert_count: int = None,
    expert_source: str = None,
    keywords_per_expert: int = None,
    lang: str = None
 ) -> Dict[str, Any]:
    """
    Generate ideas using the full pipeline (C4).
    Args:
        query: The object/concept to generate ideas for
        model: LLM model to use
        temperature: Generation temperature
        categories: Categories for attribute decomposition
        expert_count: Number of experts
        expert_source: Source of experts
        keywords_per_expert: Keywords each expert generates per attribute
        lang: Language for prompts
    Returns:
        Dict with ideas and metadata
    """
    model = model or MODEL
    temperature = temperature or TEMPERATURE
    categories = categories or FIXED_CATEGORIES
    expert_count = expert_count or EXPERT_COUNT
    expert_source = expert_source or EXPERT_SOURCE
    keywords_per_expert = keywords_per_expert or KEYWORDS_PER_EXPERT
    lang = lang or PROMPT_LANGUAGE
    # Step 0: Get experts from curated source
    experts_data, actual_source = expert_source_service.get_experts(
        source=expert_source,
        count=expert_count,
        language=lang
    )
    # Convert to expected format
    experts = [
        {
            "id": f"expert-{i}",
            "name": exp.get("name", "Expert"),
            "domain": exp.get("domain", ""),
            "perspective": exp.get("perspective", "")
        }
        for i, exp in enumerate(experts_data)
    ]
    # Step 1: Generate attributes
    category_defs = [
        {"name": cat, "description": f"Related {cat.lower()} of the object", "order": i}
        for i, cat in enumerate(categories)
    ]
    attr_prompt = get_step1_dynamic_attributes_prompt(
        query=query,
        categories=category_defs,
        lang=lang
    )
    attr_response = await ollama_provider.generate(
        prompt=attr_prompt,
        model=model,
        temperature=temperature
    )
    attributes_by_category = extract_json_from_response(attr_response)
    # Step 2: Expert keyword generation for each category/attribute
    all_keywords = []
    for category in categories:
        attrs = attributes_by_category.get(category, [])
        for attr in attrs:
            # Generate keywords from all experts for this attribute
            keyword_prompt = get_expert_keyword_generation_prompt(
                category=category,
                attribute=attr,
                experts=experts,
                keywords_per_expert=keywords_per_expert,
                lang=lang
            )
            keyword_response = await ollama_provider.generate(
                prompt=keyword_prompt,
                model=model,
                temperature=temperature
            )
            keyword_result = extract_json_from_response(keyword_response)
            keywords = keyword_result.get("keywords", [])
            for kw in keywords:
                all_keywords.append({
                    "category": category,
                    "attribute": attr,
                    "keyword": kw.get("keyword", ""),
                    "expert_id": kw.get("expert_id", ""),
                    "expert_name": kw.get("expert_name", "")
                })
    # Step 3: Generate descriptions for each keyword
    all_ideas = []
    for kw_info in all_keywords:
        # Find expert details
        expert = next(
            (e for e in experts if e["id"] == kw_info["expert_id"]),
            {"name": kw_info["expert_name"], "domain": "", "id": kw_info["expert_id"]}
        )
        desc_prompt = get_single_description_prompt(
            query=query,
            keyword=kw_info["keyword"],
            expert_id=expert["id"],
            expert_name=expert["name"],
            expert_domain=expert.get("domain", ""),
            lang=lang
        )
        desc_response = await ollama_provider.generate(
            prompt=desc_prompt,
            model=model,
            temperature=temperature
        )
        desc_result = extract_json_from_response(desc_response)
        description = desc_result.get("description", "")
        all_ideas.append({
            "idea": description,
            "keyword": kw_info["keyword"],
            "category": kw_info["category"],
            "attribute": kw_info["attribute"],
            "expert_name": expert["name"],
            "expert_domain": expert.get("domain", "")
        })
    return {
        "condition": "c4_full_pipeline",
        "query": query,
        "ideas": [item["idea"] for item in all_ideas],
        "ideas_with_source": all_ideas,
        "idea_count": len(all_ideas),
        "metadata": {
            "model": model,
            "temperature": temperature,
            "prompt_language": lang,
            "categories": categories,
            "attributes_by_category": attributes_by_category,
            "attribute_count": sum(len(v) for v in attributes_by_category.values()),
            "expert_count": expert_count,
            "expert_source": actual_source,
            "keywords_per_expert": keywords_per_expert,
            "total_keywords": len(all_keywords),
            "experts": [{"name": e["name"], "domain": e["domain"]} for e in experts],
            "mechanism": "full_pipeline_attributes_plus_experts"
        }
    }
 # For testing
 if __name__ == "__main__":
    import asyncio
    async def test():
        result = await generate_ideas("Chair")
        print(f"Generated {result['idea_count']} ideas using full pipeline:")
        print(f"  Attributes: {result['metadata']['attribute_count']}")
        print(f"  Experts: {result['metadata']['expert_count']}")
        print(f"  Keywords: {result['metadata']['total_keywords']}")
        print("\nExperts used:")
        for exp in result['metadata']['experts']:
            print(f"  - {exp['name']} ({exp['domain']})")
        print("\nSample ideas:")
        for i, item in enumerate(result['ideas_with_source'][:5], 1):
            print(f"  {i}. [{item['expert_name']}] {item['keyword']}: {item['idea']}")
    asyncio.run(test())
--- a/experiments/conditions/c5_random_perspective.py
+++ b/experiments/conditions/c5_random_perspective.py
@@ -0,0 +1,178 @@
 """
 Condition 5: Random-Perspective Control
 Uses random words as "perspectives" instead of domain experts.
 Tests whether the benefit from expert perspectives comes from
 domain knowledge or simply from any perspective shift.
 """
 import sys
 import json
 import random
 from pathlib import Path
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
 from typing import List, Dict, Any
 from app.services.llm_service import ollama_provider, extract_json_from_response
 from experiments.config import (
    MODEL, TEMPERATURE, EXPERT_COUNT, IDEAS_PER_EXPERT,
    PROMPT_LANGUAGE, RANDOM_SEED, DATA_DIR
 )
 def load_random_words() -> List[str]:
    """Load the random word pool from data file."""
    words_file = DATA_DIR / "random_words.json"
    with open(words_file, "r", encoding="utf-8") as f:
        data = json.load(f)
    return data.get("words", [])
 def get_random_perspective_prompt(
    query: str,
    perspective_word: str,
    idea_count: int,
    lang: str = "en"
 ) -> str:
    """Generate prompt for random-perspective idea generation."""
    if lang == "en":
        return f"""/no_think
 Generate {idea_count} creative and innovative ideas for "{query}" inspired by the concept of "{perspective_word}".
 Requirements:
 1. Each idea should draw inspiration from "{perspective_word}" - its qualities, characteristics, or associations
 2. Think about how concepts related to "{perspective_word}" could improve or reimagine "{query}"
 3. Ideas should be specific and actionable (15-30 words each)
 4. Be creative in connecting "{perspective_word}" to "{query}"
 Return JSON only:
 {{"ideas": ["idea 1", "idea 2", "idea 3", ...]}}
 Generate exactly {idea_count} ideas inspired by "{perspective_word}"."""
    else:
        return f"""/no_think
 為「{query}」生成 {idea_count} 個創意點子，靈感來自「{perspective_word}」這個概念。
 要求：
 1. 每個點子要從「{perspective_word}」獲得靈感——它的特質、特徵或聯想
 2. 思考與「{perspective_word}」相關的概念如何改進或重新想像「{query}」
 3. 點子要具體可行（每個 15-30 字）
 4. 創意地連接「{perspective_word}」和「{query}」
 只回傳 JSON：
 {{"ideas": ["點子1", "點子2", "點子3", ...]}}
 生成正好 {idea_count} 個受「{perspective_word}」啟發的點子。"""
 async def generate_ideas(
    query: str,
    model: str = None,
    temperature: float = None,
    word_count: int = None,
    ideas_per_word: int = None,
    lang: str = None,
    seed: int = None
 ) -> Dict[str, Any]:
    """
    Generate ideas using random word perspectives (C5 control).
    Args:
        query: The object/concept to generate ideas for
        model: LLM model to use
        temperature: Generation temperature
        word_count: Number of random words to use (matches expert count)
        ideas_per_word: Ideas to generate per word
        lang: Language for prompts
        seed: Random seed for reproducibility
    Returns:
        Dict with ideas and metadata
    """
    model = model or MODEL
    temperature = temperature or TEMPERATURE
    word_count = word_count or EXPERT_COUNT
    ideas_per_word = ideas_per_word or IDEAS_PER_EXPERT
    lang = lang or PROMPT_LANGUAGE
    seed = seed or RANDOM_SEED
    # Load word pool and sample random words
    word_pool = load_random_words()
    # Use seeded random for reproducibility
    # Create a unique seed per query to get different words for different queries
    # but same words for same query across runs
    query_seed = seed + hash(query) % 10000
    rng = random.Random(query_seed)
    selected_words = rng.sample(word_pool, min(word_count, len(word_pool)))
    all_ideas = []
    word_details = []
    for word in selected_words:
        prompt = get_random_perspective_prompt(
            query=query,
            perspective_word=word,
            idea_count=ideas_per_word,
            lang=lang
        )
        response = await ollama_provider.generate(
            prompt=prompt,
            model=model,
            temperature=temperature
        )
        result = extract_json_from_response(response)
        ideas = result.get("ideas", [])
        # Tag ideas with perspective word source
        for idea in ideas:
            all_ideas.append({
                "idea": idea,
                "perspective_word": word
            })
        word_details.append({
            "word": word,
            "ideas_generated": len(ideas)
        })
    return {
        "condition": "c5_random_perspective",
        "query": query,
        "ideas": [item["idea"] for item in all_ideas],
        "ideas_with_source": all_ideas,
        "idea_count": len(all_ideas),
        "metadata": {
            "model": model,
            "temperature": temperature,
            "prompt_language": lang,
            "word_count": word_count,
            "ideas_per_word": ideas_per_word,
            "random_seed": seed,
            "query_seed": query_seed,
            "selected_words": selected_words,
            "word_details": word_details,
            "word_pool_size": len(word_pool),
            "mechanism": "random_perspective_control"
        }
    }
 # For testing
 if __name__ == "__main__":
    import asyncio
    async def test():
        result = await generate_ideas("Chair")
        print(f"Generated {result['idea_count']} ideas from {len(result['metadata']['selected_words'])} random words:")
        print(f"  Words used: {', '.join(result['metadata']['selected_words'])}")
        print(f"  Seed: {result['metadata']['random_seed']}, Query seed: {result['metadata']['query_seed']}")
        print("\nSample ideas:")
        for i, item in enumerate(result['ideas_with_source'][:5], 1):
            print(f"  {i}. [{item['perspective_word']}] {item['idea']}")
    asyncio.run(test())
--- a/experiments/config.py
+++ b/experiments/config.py
@@ -0,0 +1,72 @@
 """
 Experiment configuration for 5-condition idea generation study.
 """
 from typing import Literal
 from pathlib import Path
 # Paths
 EXPERIMENTS_DIR = Path(__file__).parent
 DATA_DIR = EXPERIMENTS_DIR / "data"
 RESULTS_DIR = EXPERIMENTS_DIR / "results"
 DOCS_DIR = EXPERIMENTS_DIR / "docs"
 # LLM Settings
 MODEL = "qwen3:8b"
 TEMPERATURE = 0.9
 # Expert Settings
 EXPERT_COUNT = 4
 EXPERT_SOURCE: Literal["curated", "llm", "dbpedia", "wikidata"] = "curated"
 KEYWORDS_PER_EXPERT = 1
 # Language Settings
 PROMPT_LANGUAGE: Literal["en", "zh"] = "en"
 # Attribute Settings
 FIXED_CATEGORIES = ["Functions", "Usages", "User Groups", "Characteristics"]
 # Deduplication Settings
 DEDUP_THRESHOLD = 0.85
 DEDUP_METHOD: Literal["embedding", "llm"] = "embedding"
 # Reproducibility
 RANDOM_SEED = 42
 # Idea Generation Settings
 IDEAS_PER_EXPERT = 5  # For C2 and C5
 IDEAS_DIRECT = 20     # For C1
 # Condition Names
 CONDITIONS = [
    "c1_direct",
    "c2_expert_only",
    "c3_attribute_only",
    "c4_full_pipeline",
    "c5_random_perspective",
 ]
 # Condition Display Names
 CONDITION_NAMES = {
    "c1_direct": "C1: Direct Generation",
    "c2_expert_only": "C2: Expert-Only",
    "c3_attribute_only": "C3: Attribute-Only",
    "c4_full_pipeline": "C4: Full Pipeline",
    "c5_random_perspective": "C5: Random-Perspective",
 }
 # Summary Config Dict (for logging/reporting)
 EXPERIMENT_CONFIG = {
    "model": MODEL,
    "temperature": TEMPERATURE,
    "expert_count": EXPERT_COUNT,
    "expert_source": EXPERT_SOURCE,
    "keywords_per_expert": KEYWORDS_PER_EXPERT,
    "prompt_language": PROMPT_LANGUAGE,
    "random_seed": RANDOM_SEED,
    "categories": FIXED_CATEGORIES,
    "dedup_threshold": DEDUP_THRESHOLD,
    "dedup_method": DEDUP_METHOD,
    "ideas_per_expert": IDEAS_PER_EXPERT,
    "ideas_direct": IDEAS_DIRECT,
 }
--- a/experiments/data/queries.json
+++ b/experiments/data/queries.json
@@ -0,0 +1,66 @@
 {
  "description": "10 pilot queries for the 5-condition experiment, balanced across categories",
  "version": "1.0",
  "queries": [
    {
      "id": "A1",
      "query": "Chair",
      "category": "everyday",
      "description": "Common household furniture"
    },
    {
      "id": "A5",
      "query": "Bicycle",
      "category": "everyday",
      "description": "Personal transportation device"
    },
    {
      "id": "A7",
      "query": "Smartphone",
      "category": "everyday",
      "description": "Mobile communication device"
    },
    {
      "id": "B1",
      "query": "Solar panel",
      "category": "technology",
      "description": "Renewable energy technology"
    },
    {
      "id": "B3",
      "query": "3D printer",
      "category": "technology",
      "description": "Additive manufacturing device"
    },
    {
      "id": "B4",
      "query": "Drone",
      "category": "technology",
      "description": "Unmanned aerial vehicle"
    },
    {
      "id": "C1",
      "query": "Food delivery service",
      "category": "services",
      "description": "Restaurant meal delivery platform"
    },
    {
      "id": "C2",
      "query": "Online education platform",
      "category": "services",
      "description": "Digital learning service"
    },
    {
      "id": "C4",
      "query": "Public transportation",
      "category": "services",
      "description": "Mass transit system"
    },
    {
      "id": "C9",
      "query": "Elderly care service",
      "category": "services",
      "description": "Senior citizen support service"
    }
  ]
 }
--- a/experiments/data/random_words.json
+++ b/experiments/data/random_words.json
@@ -0,0 +1,28 @@
 {
  "description": "Word pool for C5 random-perspective condition",
  "version": "1.0",
  "selection_criteria": [
    "Concrete and evocative (easy to generate associations)",
    "Diverse domains (no overlap with typical expert knowledge)",
    "No obvious connection to test queries",
    "Equal representation across conceptual categories"
  ],
  "categories": {
    "nature": ["ocean", "mountain", "forest", "desert", "cave"],
    "optics": ["microscope", "telescope", "kaleidoscope", "prism", "lens"],
    "animals": ["butterfly", "elephant", "octopus", "eagle", "ant"],
    "weather": ["sunrise", "thunderstorm", "rainbow", "fog", "aurora"],
    "art": ["clockwork", "origami", "mosaic", "symphony", "ballet"],
    "temporal": ["ancient", "futuristic", "organic", "crystalline", "liquid"],
    "sensory": ["whisper", "explosion", "rhythm", "silence", "echo"]
  },
  "words": [
    "ocean", "mountain", "forest", "desert", "cave",
    "microscope", "telescope", "kaleidoscope", "prism", "lens",
    "butterfly", "elephant", "octopus", "eagle", "ant",
    "sunrise", "thunderstorm", "rainbow", "fog", "aurora",
    "clockwork", "origami", "mosaic", "symphony", "ballet",
    "ancient", "futuristic", "organic", "crystalline", "liquid",
    "whisper", "explosion", "rhythm", "silence", "echo"
  ]
 }
--- a/experiments/deduplication.py
+++ b/experiments/deduplication.py
@@ -0,0 +1,328 @@
 """
 Post-generation deduplication for experiment results.
 Applies embedding-based deduplication uniformly to all conditions
 to normalize idea counts and measure "dedup survival rate".
 Usage:
    python -m experiments.deduplication --input results/experiment_xxx.json
 """
 import sys
 import json
 import argparse
 import asyncio
 import logging
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 from dataclasses import dataclass
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent / "backend"))
 from app.services.embedding_service import embedding_service
 from app.models.schemas import ExpertTransformationDescription
 from experiments.config import DEDUP_THRESHOLD, RESULTS_DIR
 # Configure logging
 logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
 )
 logger = logging.getLogger(__name__)
@dataclass
 class DedupStats:
    """Deduplication statistics for a single condition."""
    condition: str
    pre_dedup_count: int
    post_dedup_count: int
    duplicates_removed: int
    survival_rate: float
    groups: List[Dict[str, Any]]
 def ideas_to_descriptions(
    ideas: List[str],
    ideas_with_source: Optional[List[Dict[str, Any]]] = None
 ) -> List[ExpertTransformationDescription]:
    """
    Convert experiment ideas to ExpertTransformationDescription format
    for compatibility with the embedding service.
    """
    descriptions = []
    if ideas_with_source:
        # Use source information if available
        for i, item in enumerate(ideas_with_source):
            desc = ExpertTransformationDescription(
                keyword=item.get("keyword", item.get("attribute", item.get("perspective_word", ""))),
                expert_id=f"source-{i}",
                expert_name=item.get("expert_name", item.get("perspective_word", "direct")),
                description=item.get("idea", "")
            )
            descriptions.append(desc)
    else:
        # Simple conversion for ideas without source
        for i, idea in enumerate(ideas):
            desc = ExpertTransformationDescription(
                keyword="",
                expert_id=f"idea-{i}",
                expert_name="direct",
                description=idea
            )
            descriptions.append(desc)
    return descriptions
 async def deduplicate_condition(
    ideas: List[str],
    ideas_with_source: Optional[List[Dict[str, Any]]] = None,
    threshold: float = DEDUP_THRESHOLD
 ) -> Dict[str, Any]:
    """
    Apply deduplication to ideas from a single condition.
    Returns:
        Dict with deduplicated ideas and statistics
    """
    if not ideas:
        return {
            "unique_ideas": [],
            "unique_ideas_with_source": [],
            "groups": [],
            "stats": {
                "pre_dedup_count": 0,
                "post_dedup_count": 0,
                "duplicates_removed": 0,
                "survival_rate": 1.0
            }
        }
    # Convert to description format
    descriptions = ideas_to_descriptions(ideas, ideas_with_source)
    # Run deduplication
    result = await embedding_service.deduplicate(
        descriptions=descriptions,
        threshold=threshold
    )
    # Extract unique ideas (representatives from each group)
    unique_ideas = []
    unique_ideas_with_source = []
    groups_info = []
    for group in result.groups:
        rep = group.representative
        unique_ideas.append(rep.description)
        # Reconstruct source info
        source_info = {
            "idea": rep.description,
            "keyword": rep.keyword,
            "expert_name": rep.expert_name
        }
        unique_ideas_with_source.append(source_info)
        # Group info for analysis
        group_info = {
            "representative": rep.description,
            "duplicates": [d.description for d in group.duplicates],
            "duplicate_count": len(group.duplicates),
            "similarity_scores": group.similarity_scores
        }
        groups_info.append(group_info)
    pre_count = len(ideas)
    post_count = len(unique_ideas)
    survival_rate = post_count / pre_count if pre_count > 0 else 1.0
    return {
        "unique_ideas": unique_ideas,
        "unique_ideas_with_source": unique_ideas_with_source,
        "groups": groups_info,
        "stats": {
            "pre_dedup_count": pre_count,
            "post_dedup_count": post_count,
            "duplicates_removed": pre_count - post_count,
            "survival_rate": survival_rate
        }
    }
 async def process_experiment_results(
    input_file: Path,
    output_file: Optional[Path] = None,
    threshold: float = DEDUP_THRESHOLD
 ) -> Dict[str, Any]:
    """
    Process an experiment results file and apply deduplication.
    Args:
        input_file: Path to experiment results JSON
        output_file: Path for output (default: input_file with _deduped suffix)
        threshold: Similarity threshold for deduplication
    Returns:
        Processed results with deduplication applied
    """
    # Load experiment results
    with open(input_file, "r", encoding="utf-8") as f:
        experiment = json.load(f)
    logger.info(f"Processing experiment: {experiment.get('experiment_id', 'unknown')}")
    logger.info(f"Deduplication threshold: {threshold}")
    # Process each query's conditions
    dedup_summary = {
        "threshold": threshold,
        "conditions": {}
    }
    for query_result in experiment["results"]:
        query = query_result["query"]
        query_id = query_result["query_id"]
        logger.info(f"\nProcessing query: {query} ({query_id})")
        for condition, cond_result in query_result["conditions"].items():
            if not cond_result.get("success", False):
                logger.warning(f"  Skipping failed condition: {condition}")
                continue
            logger.info(f"  Deduplicating {condition}...")
            ideas = cond_result.get("ideas", [])
            ideas_with_source = cond_result.get("ideas_with_source", [])
            dedup_result = await deduplicate_condition(
                ideas=ideas,
                ideas_with_source=ideas_with_source,
                threshold=threshold
            )
            # Add dedup results to condition
            cond_result["dedup"] = dedup_result
            # Update summary stats
            if condition not in dedup_summary["conditions"]:
                dedup_summary["conditions"][condition] = {
                    "total_pre_dedup": 0,
                    "total_post_dedup": 0,
                    "total_removed": 0,
                    "query_stats": []
                }
            stats = dedup_result["stats"]
            cond_summary = dedup_summary["conditions"][condition]
            cond_summary["total_pre_dedup"] += stats["pre_dedup_count"]
            cond_summary["total_post_dedup"] += stats["post_dedup_count"]
            cond_summary["total_removed"] += stats["duplicates_removed"]
            cond_summary["query_stats"].append({
                "query_id": query_id,
                "query": query,
                **stats
            })
            logger.info(f"    {stats['pre_dedup_count']} -> {stats['post_dedup_count']} "
                       f"(survival: {stats['survival_rate']:.1%})")
    # Calculate overall survival rates
    for condition, cond_stats in dedup_summary["conditions"].items():
        if cond_stats["total_pre_dedup"] > 0:
            cond_stats["overall_survival_rate"] = (
                cond_stats["total_post_dedup"] / cond_stats["total_pre_dedup"]
            )
        else:
            cond_stats["overall_survival_rate"] = 1.0
    # Add dedup summary to experiment
    experiment["dedup_summary"] = dedup_summary
    # Save results
    if output_file is None:
        stem = input_file.stem.replace("_complete", "").replace("_intermediate", "")
        output_file = input_file.parent / f"{stem}_deduped.json"
    with open(output_file, "w", encoding="utf-8") as f:
        json.dump(experiment, f, indent=2, ensure_ascii=False)
    logger.info(f"\nResults saved to: {output_file}")
    return experiment
 def print_dedup_summary(experiment: Dict[str, Any]):
    """Print formatted deduplication summary."""
    dedup = experiment.get("dedup_summary", {})
    print("\n" + "=" * 70)
    print("DEDUPLICATION SUMMARY")
    print("=" * 70)
    print(f"Threshold: {dedup.get('threshold', 'N/A')}")
    print("\nResults by condition:")
    print("-" * 70)
    print(f"{'Condition':<30} {'Pre-Dedup':<12} {'Post-Dedup':<12} {'Survival':<10}")
    print("-" * 70)
    for condition, stats in dedup.get("conditions", {}).items():
        pre = stats.get("total_pre_dedup", 0)
        post = stats.get("total_post_dedup", 0)
        survival = stats.get("overall_survival_rate", 1.0)
        print(f"{condition:<30} {pre:<12} {post:<12} {survival:<10.1%}")
    print("-" * 70)
    print("\nInterpretation:")
    print("- Higher survival rate = more diverse/unique ideas")
    print("- Lower survival rate = more redundant ideas removed")
 async def main():
    parser = argparse.ArgumentParser(
        description="Apply deduplication to experiment results"
    )
    parser.add_argument(
        "--input",
        type=str,
        required=True,
        help="Input experiment results JSON file"
    )
    parser.add_argument(
        "--output",
        type=str,
        help="Output file path (default: input_deduped.json)"
    )
    parser.add_argument(
        "--threshold",
        type=float,
        default=DEDUP_THRESHOLD,
        help=f"Similarity threshold (default: {DEDUP_THRESHOLD})"
    )
    args = parser.parse_args()
    input_path = Path(args.input)
    if not input_path.exists():
        # Try relative to results dir
        input_path = RESULTS_DIR / args.input
        if not input_path.exists():
            print(f"Error: Input file not found: {args.input}")
            sys.exit(1)
    output_path = Path(args.output) if args.output else None
    experiment = await process_experiment_results(
        input_file=input_path,
        output_file=output_path,
        threshold=args.threshold
    )
    print_dedup_summary(experiment)
 if __name__ == "__main__":
    asyncio.run(main())
--- a/experiments/docs/aut_flexibility_explanation_zh.md
+++ b/experiments/docs/aut_flexibility_explanation_zh.md
@@ -0,0 +1,301 @@
 # AUT 彈性評估方法說明
 ## 什麼是 AUT（替代用途任務）？
 AUT（Alternative Uses Task，替代用途任務）是一個經典的**發散性思維測試**，由 Guilford 在 1967 年提出。
 **測試方式：**
 ```
 問題：「請列出磚塊的所有可能用途」
 典型回答：
 1. 蓋房子
 2. 當門擋
 3. 壓紙張
 4. 當武器
 5. 墊高東西
 ...
 ```
 ---
 ## Torrance 創造力四維度
 | 維度 | 中文 | 定義 | 測量方式 |
 |------|------|------|----------|
 | **Fluency** | 流暢性 | 產生多少想法 | 計算數量 |
 | **Flexibility** | 彈性/靈活性 | 想法涵蓋多少不同類別 | 計算類別數 |
 | **Originality** | 原創性 | 想法的稀有程度 | 統計罕見度 |
 | **Elaboration** | 精緻性 | 想法的詳細程度 | 評估細節 |
 ---
 ## 我們實作的三種彈性評估方法
 ### 方法一：LLM 雙階段分類法（Hadas & Hershkovitz 2024）
 **原理：** 讓大型語言模型識別想法的語義類別，然後計算類別數量
 ```
 第一階段：讓 LLM 識別所有想法的語義類別
 輸入：「椅子」的 195 個創意想法
 輸出：["交通運輸", "藝術裝飾", "醫療健康", "教育", "儲存", ...]
 第二階段：將每個想法分配到類別
 想法 1：「太陽能充電椅」→ 科技類
 想法 2：「椅子改裝成擔架」→ 醫療類
 想法 3：「椅腳當鼓棒」→ 藝術類
 彈性分數 = 使用的不同類別數量
 ```
 **優點：** 類別名稱有語義意義，可解釋性強
 **缺點：** 依賴 LLM 的一致性，可能有解析錯誤
 ---
 ### 方法二：嵌入向量階層式聚類法（arXiv:2405.00899）
 **原理：** 將想法轉換成向量，用數學方法自動分群
 ```
 步驟 1：將每個想法轉換成向量（embedding）
        「太陽能充電椅」→ [0.12, -0.34, 0.56, ...]（1024 維）
 步驟 2：使用 Ward 連結法進行階層式聚類
        計算所有想法之間的餘弦距離
        由下而上合併最相似的群組
 步驟 3：在相似度 ≥ 0.7 的閾值切割樹狀圖
        確保同一群內的想法夠相似
 彈性分數 = 產生的群集數量
 ```
 **優點：** 客觀、可重現、不依賴 LLM 判斷
 **缺點：** 群集沒有語義標籤，需要人工解讀
 ---
 ### 方法三：組合跳躍信號分析（Combined Jump Signal, arXiv:2405.00899）
 **原理：** 使用更嚴格的「真正跳躍」定義，減少假陽性
 ```
 組合跳躍 = 類別跳躍 ∧ 語義跳躍
 類別跳躍（jumpcat）：連續想法屬於不同的 embedding 群集
 語義跳躍（jumpSS）：連續想法的語義相似度 < 0.7
 真正跳躍 = 兩個條件都必須成立
 ```
 **為什麼需要組合跳躍？**
 ```
 問題：單獨使用類別跳躍可能產生假陽性
 例如：「人體工學椅」和「可調節椅」
  - 可能被分到不同群集（類別跳躍 = True）
  - 但語義上很相似（語義跳躍 = False）
  - 不應該算作真正的「創意跳躍」
 解決：組合跳躍要求兩者同時成立，更準確
 ```
 | 跳躍比率 | 探索模式 | 含義 |
 |----------|----------|------|
 | 高（>45%） | 靈活探索（Flexible） | 廣泛切換類別，思維跳躍 |
 | 中（30-45%） | 混合模式（Mixed） | 適度切換 |
 | 低（<30%） | 持續探索（Persistent） | 深入單一領域，專注發展 |
 **應用：** 區分 LLM 與人類的創意模式差異
 ---
 ## 研究發現
 ### 發現一：新穎性（Novelty）與彈性（Flexibility）是獨立維度
 | 條件 | 新穎性分數 | 彈性（群集數） | 平均相似度 | 模式 |
 |------|:----------:|:--------------:|:----------:|------|
 | C4 完整管線 | **0.395**（最高） | 10 | 0.583 | 高新穎、中等彈性 |
 | C5 隨機視角 | 0.365 | **15**（最高） | 0.521 | 高新穎、高彈性 |
 | C2 專家視角 | 0.315 | 13 | 0.517 | 中等新穎、高彈性 |
 | C3 屬性分解 | 0.337 | 12 | - | 中等新穎、中等彈性 |
 | C1 直接生成 | 0.273（最低） | **1**（最低） | 0.647 | 低新穎、低彈性 |
 **視覺化解讀：**
 ```
 C1 直接生成的想法：
 ┌─────────────────────────────────────┐
 │  ○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○  │  ← 所有想法集中在一個「普通領域」
 │     （彼此相似，且都很典型）          │     （低新穎性 + 低彈性）
 └─────────────────────────────────────┘
 C5 隨機視角的想法：
 ┌───┐  ┌───┐  ┌───┐  ┌───┐  ┌───┐
 │ ★ │  │ ★ │  │ ★ │  │ ★ │  │ ★ │  ← 分散在多個「新穎領域」
 └───┘  └───┘  └───┘  └───┘  └───┘     （高新穎性 + 高彈性）
  ↑      ↑      ↑      ↑      ↑
 交通   醫療   藝術   教育   科技
 C4 完整管線的想法：
      ┌─────────────────┐
   ┌──┤  ★★★★★★★★★★★★  ├──┐  ← 集中在一個「新穎領域」但有多個子類別
   │  └─────────────────┘  │     （最高新穎性 + 中等彈性）
   │          ↓            │
   └── 10 個語義群集 ───────┘
 ```
 ### 發現二：組合跳躍信號分析結果
 | 條件 | 類別跳躍 | 語義跳躍 | **組合跳躍** | 彈性檔案 |
 |------|:--------:|:--------:|:------------:|:--------:|
 | C2 專家視角 | 54 | 125 | **48** | 持續探索 |
 | C3 屬性分解 | 34 | 107 | **33** | 持續探索 |
 | C5 隨機視角 | 22 | 116 | **20** | 持續探索 |
 | C4 完整管線 | 13 | 348 | **13** | 持續探索 |
 | C1 直接生成 | 0 | 104 | **0** | 持續探索 |
 **組合跳躍比率：**
 | 條件 | 組合跳躍比率 | 彈性檔案 | 解讀 |
 |------|:------------:|:--------:|------|
 | C3 屬性分解 | **26.6%** | Persistent | 適度類別切換 |
 | C2 專家視角 | **24.4%** | Persistent | 適度類別切換 |
 | C5 隨機視角 | 10.1% | Persistent | 較低類別切換 |
 | C4 完整管線 | **3.2%** | Persistent | 非常專注的探索 |
 | C1 直接生成 | 0.0% | Persistent | 單一群集（無跳躍） |
 **關鍵洞察：** 組合跳躍 ≤ 類別跳躍（符合預期）。所有條件都呈現「持續探索」模式。
 ---
 ### 發現三：🔑 原創性-彈性相關性（關鍵發現）
 **論文發現（arXiv:2405.00899）：**
 - **人類：** 原創性與彈性**無相關**（r ≈ 0）
 - **典型 LLM：** **正相關** — 靈活的 LLM 原創性更高
 **我們的結果：**
 | 指標 | 數值 | 解讀 |
 |------|:----:|------|
 | **Pearson r** | **0.071** | 接近零的相關性 |
 | 模式 | **類似人類** | 打破典型 LLM 模式 |
 **各條件數據：**
 | 條件 | 新穎性分數 | 彈性（組合跳躍數） |
 |------|:----------:|:------------------:|
 | C4 完整管線 | **0.395**（最高） | **13**（最低） |
 | C5 隨機視角 | 0.365 | 20 |
 | C3 屬性分解 | 0.337 | 33 |
 | C2 專家視角 | 0.315 | 48（最高） |
 | C1 直接生成 | 0.273（最低） | 0 |
 **重大發現：** 屬性+專家管線（C4）實現**最高新穎性但最低彈性**，
 證明結構化的無上下文生成能產生**聚焦的新穎性**而非分散的探索。
 **這意味著什麼？**
 ```
 典型 LLM 模式：
  彈性高 → 新穎性高（正相關）
  想法越分散，越可能遇到新穎概念
 我們的管線（C4）：
  彈性低 + 新穎性高（打破模式）
  專注探索一個新穎領域，而非到處跳躍
 這是「類似人類」的創意模式！
  人類專家通常深入探索一個領域，而非廣泛但淺薄地涉獵
 ```
 ---
 ## 這對創意研究的意義
 1. **創造力是多維度的**
   - 新穎性（Novelty）和彈性（Flexibility）是**獨立維度**
   - 高新穎不代表高彈性，反之亦然
   - 需要同時考慮流暢性、彈性、原創性、精緻性
 2. **管線設計的取捨**
   | 策略 | 新穎性 | 彈性 | 特點 |
   |------|:------:|:----:|------|
   | 直接生成（C1） | 低 | 低 | 快速但普通 |
   | 專家視角（C2） | 中 | 高 | 多元觀點 |
   | 隨機視角（C5） | 高 | **最高** | 強迫跳躍 |
   | 完整管線（C4） | **最高** | 中 | 結構化新穎 |
 3. **為什麼專家/隨機視角產生更多類別？**
   ```
   C1 直接生成：
     LLM 沒有外部刺激 → 停留在「家具改良」單一領域
     平均相似度 0.647（最高）→ 想法彼此很像
   C2 專家視角：
     4 個不同領域專家 → 引入不同思維框架
     平均相似度 0.517（較低）→ 想法更分散
   C5 隨機視角：
     隨機詞彙強迫跳躍 → 意外的連結
     平均相似度 0.521 → 最多語義類別（15 個）
   ```
 4. **實務建議**
   - 若需要**高新穎性**：使用完整管線（C4）
   - 若需要**高彈性/多元性**：使用隨機視角（C5）或專家視角（C2）
   - 若需要**兩者兼顧**：可能需要混合策略
 ---
 ## 方法論修正說明
 ### 原始演算法的問題
 最初的聚類演算法有邏輯錯誤：
 ```
 原本的邏輯（錯誤）：
  目標：找到群內相似度 >= 0.7 的群集
  問題：當想法很分散時（低相似度），
        無法形成符合閾值的緊密群集
        → 演算法放棄，回傳 1 個群集
  結果：C2/C5 的分散想法被錯誤標記為「1 個群集」
 ```
 ### 修正後的演算法
 ```
 修正後的邏輯（正確）：
  方法：使用 average linkage 階層式聚類
  閾值：在距離 0.5 處切割樹狀圖
        （即相似度 < 0.5 時分開）
  結果：分散的想法正確地被分成多個群集
 ```
 ### 結果對比
 | 條件 | 修正前群集數 | 修正後群集數 | 平均相似度 |
 |------|:------------:|:------------:|:----------:|
 | C1 直接生成 | 29 | **1** | 0.647（高） |
 | C2 專家視角 | 1 | **13** | 0.517（低） |
 | C5 隨機視角 | 1 | **15** | 0.521（低） |
 **關鍵洞察：** 低相似度 = 高多元性 = 高彈性分數
 ---
 ## 參考文獻
 1. Hadas & Hershkovitz (2024). "Using Large Language Models to Evaluate Alternative Uses Task Flexibility Score." *Thinking Skills and Creativity*, Vol. 52.
 2. arXiv:2405.00899 - "Characterising Creative Process in Humans and LLMs" - Jump signal methodology
 3. Guilford, J.P. (1967). *The Nature of Human Intelligence*. McGraw-Hill.
 4. Torrance, E.P. (1974). *Torrance Tests of Creative Thinking*. Scholastic Testing Service.
--- a/experiments/docs/creative_process_metrics_zh.md
+++ b/experiments/docs/creative_process_metrics_zh.md
@@ -0,0 +1,477 @@
 # 創意過程特徵化指標詳解
 ## 基於 arXiv:2405.00899 論文的方法論
 **論文標題：** "Characterising the Creative Process in Humans and Large Language Models"
 **來源：** [arXiv:2405.00899](https://arxiv.org/html/2405.00899v2)
 本文檔詳細解釋我們從該論文引入的創意過程評估指標，以及這些指標在我們實驗中揭示的重要發現。
 ---
 ## 一、組合跳躍信號（Combined Jump Signal）
 ### 1.1 什麼是「跳躍」？
 在創意發散思維中，「跳躍」指的是連續產生的想法之間的**語義類別切換**。
 ```
 想法序列範例：
  1. 太陽能充電椅         → 科技類
  2. 智慧溫控座椅         → 科技類（無跳躍）
  3. 椅子改裝成擔架       → 醫療類（跳躍！）
  4. 輪椅輔助站立功能     → 醫療類（無跳躍）
  5. 椅腳當鼓棒           → 藝術類（跳躍！）
 ```
 ### 1.2 為什麼需要「組合」跳躍？
 **原始方法的問題：**
 單純使用類別跳躍（jumpcat）可能產生**假陽性**：
 ```
 問題情境：
  想法 A：「可折疊露營椅」  → 群集 1
  想法 B：「便攜式野餐椅」  → 群集 2
  類別跳躍 = True（不同群集）
  但這兩個想法語義上非常相似！
  這不應該算作真正的「創意跳躍」
 ```
 **論文的解決方案：組合跳躍信號**
 ```
 組合跳躍 = 類別跳躍 ∧ 語義跳躍
 其中：
  類別跳躍（jumpcat）：連續想法屬於不同的 embedding 群集
  語義跳躍（jumpSS）：連續想法的餘弦相似度 < 0.7
 真正跳躍 = 兩個條件都必須成立
 ```
 ### 1.3 數學定義
 對於連續的想法 $i$ 和 $i-1$：
 $$
 \text{jump}_i = \text{jump}_{cat,i} \land \text{jump}_{SS,i}
 $$
 其中：
 - $\text{jump}_{cat,i} = \mathbb{1}[c_i \neq c_{i-1}]$（類別是否改變）
 - $\text{jump}_{SS,i} = \mathbb{1}[\text{sim}(e_i, e_{i-1}) < 0.7]$（相似度是否低於閾值）
 ### 1.4 我們的實驗結果
 | 條件 | 類別跳躍 | 語義跳躍 | **組合跳躍** | 組合比率 |
 |------|:--------:|:--------:|:------------:|:--------:|
 | C2 專家視角 | 54 | 125 | **48** | 24.4% |
 | C3 屬性分解 | 34 | 107 | **33** | 26.6% |
 | C5 隨機視角 | 22 | 116 | **20** | 10.1% |
 | C4 完整管線 | 13 | 348 | **13** | 3.2% |
 | C1 直接生成 | 0 | 104 | **0** | 0.0% |
 **關鍵觀察：**
 - 組合跳躍 ≤ 類別跳躍（驗證方法有效性）
 - C4 的語義跳躍很高（348）但類別跳躍很低（13）→ 想法在語義上分散但停留在相似類別
 - C1 沒有類別跳躍 → 所有想法在單一語義群集內
 ---
 ## 二、彈性檔案分類（Flexibility Profile Classification）
 ### 2.1 三種創意探索模式
 根據論文研究，創意探索可分為三種模式：
 | 檔案 | 英文 | 跳躍比率 | 特徵 |
 |------|------|:--------:|------|
 | **持續探索** | Persistent | < 30% | 深入單一領域，專注發展想法 |
 | **混合模式** | Mixed | 30-45% | 適度切換，平衡深度與廣度 |
 | **靈活探索** | Flexible | > 45% | 頻繁跳躍，廣泛涉獵不同領域 |
 ### 2.2 視覺化理解
 ```
 持續探索（Persistent）：
 ┌─────────────────────────────────────┐
 │  ●→●→●→●→●→●→●→●→●→●                │  深入探索一個領域
 │     科技類                           │  偶爾切換（<30%）
 │              ↓                       │
 │              ●→●→●→●                 │
 │              醫療類                   │
 └─────────────────────────────────────┘
 靈活探索（Flexible）：
 ┌─────────────────────────────────────┐
 │  ●→  ●→  ●→  ●→  ●→  ●→  ●→  ●      │  頻繁在不同領域間跳躍
 │  科 醫 藝 教 科 社 環 科              │  每個領域停留很短
 │  技 療 術 育 技 會 保 技              │  （>45% 跳躍）
 └─────────────────────────────────────┘
 混合模式（Mixed）：
 ┌─────────────────────────────────────┐
 │  ●→●→●→●→  ●→●→●→  ●→●→●→●         │  適度平衡
 │    科技類    醫療類    藝術類         │  （30-45% 跳躍）
 └─────────────────────────────────────┘
 ```
 ### 2.3 我們的實驗結果
 | 條件 | 組合跳躍比率 | 彈性檔案 | 解讀 |
 |------|:------------:|:--------:|------|
 | C3 屬性分解 | 26.6% | Persistent | 接近 Mixed 的邊界 |
 | C2 專家視角 | 24.4% | Persistent | 適度的類別切換 |
 | C5 隨機視角 | 10.1% | Persistent | 較少切換 |
 | **C4 完整管線** | **3.2%** | **Persistent** | 非常專注的探索 |
 | C1 直接生成 | 0.0% | Persistent | 單一群集 |
 **重要發現：** 所有條件都呈現「持續探索」模式，但程度不同。
 ---
 ## 三、原創性-彈性相關性分析（Originality-Flexibility Correlation）
 ### 3.1 論文的核心發現
 arXiv:2405.00899 論文發現了一個關鍵差異：
 | 主體 | 原創性與彈性的關係 | 解讀 |
 |------|:------------------:|------|
 | **人類** | r ≈ 0（無相關） | 原創性和彈性是獨立的能力 |
 | **典型 LLM** | r > 0（正相關） | 越靈活的 LLM 越原創 |
 **為什麼會有這種差異？**
 ```
 人類創意模式：
  - 有些人善於深入探索（低彈性、高原創）
  - 有些人善於廣泛聯想（高彈性、高原創）
  - 兩種能力是獨立的維度
 典型 LLM 模式：
  - LLM 透過「隨機性」產生多樣性
  - 高 temperature → 更多跳躍 → 更多意外發現
  - 彈性和原創性被「隨機性」綁定在一起
 ```
 ### 3.2 我們的實驗結果
 **Pearson 相關係數：r = 0.071**
 | 指標 | 數值 | 解讀 |
 |------|:----:|------|
 | **Pearson r** | **0.071** | 接近零 |
 | 統計意義 | 無顯著相關 | 兩個維度獨立 |
 | **模式判定** | **類似人類** | 打破典型 LLM 模式 |
 **各條件詳細數據：**
 | 條件 | 新穎性（距離質心） | 彈性（組合跳躍數） | 組合 |
 |------|:------------------:|:------------------:|------|
 | C4 完整管線 | **0.395**（最高） | **13**（最低） | 高新穎 + 低彈性 |
 | C5 隨機視角 | 0.365 | 20 | 高新穎 + 低彈性 |
 | C3 屬性分解 | 0.337 | 33 | 中新穎 + 中彈性 |
 | C2 專家視角 | 0.315 | **48**（最高） | 中新穎 + 高彈性 |
 | C1 直接生成 | 0.273（最低） | 0 | 低新穎 + 低彈性 |
 ### 3.3 這個發現的重大意義
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                    原創性-彈性空間                           │
 │                                                             │
 │  高原創 │  C4●                                              │
 │         │       C5●                                         │
 │         │            C3●                                    │
 │         │                 C2●                               │
 │         │                                                   │
 │  低原創 │  C1●                                              │
 │         └────────────────────────────────────────────────   │
 │              低彈性                          高彈性          │
 │                                                             │
 │  r = 0.071 → 幾乎垂直於對角線 → 無相關 → 類似人類！         │
 └─────────────────────────────────────────────────────────────┘
 對比典型 LLM（r > 0.3）：
 ┌─────────────────────────────────────────────────────────────┐
 │  高原創 │                              ●                    │
 │         │                         ●                         │
 │         │                    ●                              │
 │         │               ●                                   │
 │  低原創 │          ●                                        │
 │         └────────────────────────────────────────────────   │
 │              低彈性                          高彈性          │
 │                                                             │
 │  r > 0.3 → 沿對角線分布 → 正相關 → 典型 LLM 模式            │
 └─────────────────────────────────────────────────────────────┘
 ```
 ---
 ## 四、累積跳躍輪廓（Cumulative Jump Profile）
 ### 4.1 什麼是累積跳躍輪廓？
 追蹤在想法生成過程中，跳躍次數如何隨時間累積。
 ```
 想法位置：  1   2   3   4   5   6   7   8   9   10
 跳躍發生：  -   -   ✓   -   ✓   -   ✓   ✓   -   ✓
 累積計數：  0   0   1   1   2   2   3   4   4   5
 輪廓線：
  5 │                                        ●
  4 │                              ●────●
  3 │                    ●────●
  2 │          ●────●
  1 │    ●────●
  0 │●────●
    └────────────────────────────────────────
      1   2   3   4   5   6   7   8   9   10
                    想法位置
 ```
 ### 4.2 輪廓線的解讀
 | 輪廓特徵 | 含義 | 創意模式 |
 |----------|------|----------|
 | **陡峭斜率** | 快速累積跳躍 | 頻繁切換類別 |
 | **平緩區域** | 跳躍暫停 | 深入探索當前類別 |
 | **階梯狀** | 突然爆發跳躍 | 類別耗盡後轉移 |
 | **近乎水平** | 幾乎沒有跳躍 | 持續在單一領域 |
 ### 4.3 我們的實驗視覺化
 ![累積跳躍輪廓](../results/cumulative_jump_profiles.png)
 **各條件輪廓解讀：**
 | 條件 | 輪廓特徵 | 創意策略 |
 |------|----------|----------|
 | C2 專家視角 | 穩定上升 | 持續的類別切換 |
 | C3 屬性分解 | 穩定上升 | 持續的類別切換 |
 | C5 隨機視角 | 緩慢上升 | 較少切換 |
 | C4 完整管線 | 幾乎水平 | 非常專注的單一領域探索 |
 | C1 直接生成 | 完全水平 | 無任何類別切換 |
 ---
 ## 五、實驗發現的綜合意義
 ### 5.1 核心發現總結
 | 發現 | 內容 | 意義 |
 |------|------|------|
 | **發現一** | 原創性-彈性相關 r = 0.071 | 管線產生「類似人類」的創意模式 |
 | **發現二** | C4 最高新穎性 + 最低彈性 | 結構化方法產生聚焦的新穎性 |
 | **發現三** | 所有條件都是 Persistent | LLM 傾向深度探索而非廣度 |
 | **發現四** | 組合跳躍 < 類別跳躍 | 驗證方法學的有效性 |
 ### 5.2 為什麼 C4 能打破 LLM 模式？
 ```
 典型 LLM 的問題：
 ┌─────────────────────────────────────────────────────────────┐
 │  直接生成：「給我椅子的創新用途」                            │
 │                                                             │
 │  LLM 依賴 temperature 產生多樣性                            │
 │    → 高 temperature = 更多隨機性                            │
 │    → 更多隨機性 = 更多跳躍（高彈性）                        │
 │    → 更多跳躍 = 更可能遇到新穎想法（高原創）                │
 │                                                             │
 │  結果：彈性和原創性被綁定（正相關）                         │
 └─────────────────────────────────────────────────────────────┘
 C4 管線的突破：
 ┌─────────────────────────────────────────────────────────────┐
 │  結構化生成：                                                │
 │                                                             │
 │  Step 1: 屬性分解                                           │
 │    「椅子」→ [便攜, 可堆疊, 人體工學, ...]                 │
 │                                                             │
 │  Step 2: 專家無上下文關鍵字                                 │
 │    會計師 + 「便攜」→ 「流動資產」（不知道是椅子！）        │
 │                                                             │
 │  Step 3: 重新結合                                           │
 │    「椅子」+ 「流動資產」+ 會計師視角                       │
 │    → 「帶 RFID 資產追蹤的企業椅子」                        │
 │                                                             │
 │  關鍵機制：                                                  │
 │    - 結構強制「跳出」典型語義空間（高新穎性）              │
 │    - 但所有想法都錨定在相同屬性集（低彈性）                │
 │    - 新穎性來自「強制bisociation」而非「隨機探索」         │
 │                                                             │
 │  結果：高新穎性 + 低彈性 → 打破正相關 → 類似人類            │
 └─────────────────────────────────────────────────────────────┘
 ```
 ### 5.3 這對創意 AI 研究的意義
 **理論貢獻：**
 1. **證明 LLM 可以產生「類似人類」的創意模式**
   - 不是透過模仿人類數據
   - 而是透過結構化的創意管線設計
 2. **原創性和彈性是可以獨立控制的**
   - 傳統認為需要高隨機性才能高原創
   - 我們證明結構化約束也能達到高原創
 3. **「專注的新穎性」vs「分散的探索」**
   - C4：深入一個新穎領域（專家策略）
   - C5：廣泛接觸多個領域（通才策略）
   - 兩種都有價值，但機制不同
 **實務應用：**
 | 目標 | 推薦策略 | 原因 |
 |------|----------|------|
 | 最大化新穎性 | C4 完整管線 | 最高距離質心分數 |
 | 最大化類別多樣性 | C2 專家視角 | 最多組合跳躍 |
 | 平衡新穎與多樣 | C3 屬性分解 | 中等水平 |
 | 快速生成 | C1 直接生成 | 最少 API 調用 |
 ---
 ## 六、方法論驗證
 ### 6.1 組合跳躍 ≤ 類別跳躍
 這是方法學的必要條件驗證：
 ```
 邏輯推導：
  組合跳躍 = 類別跳躍 ∧ 語義跳躍
  當類別跳躍 = False 時：
    組合跳躍 = False ∧ ? = False
  當類別跳躍 = True 時：
    組合跳躍 = True ∧ 語義跳躍 = 語義跳躍（可能 True 或 False）
  因此：組合跳躍 ≤ 類別跳躍（必然成立）
 ```
 **實驗驗證：**
 | 條件 | 類別跳躍 | 組合跳躍 | 驗證 |
 |------|:--------:|:--------:|:----:|
 | C2 | 54 | 48 | ✓ |
 | C3 | 34 | 33 | ✓ |
 | C5 | 22 | 20 | ✓ |
 | C4 | 13 | 13 | ✓ |
 | C1 | 0 | 0 | ✓ |
 ### 6.2 彈性檔案閾值的選擇
 論文使用的閾值（30%、45%）基於人類實驗數據的分布。我們的 LLM 實驗中，所有條件都落在 Persistent 區間，這本身就是一個發現：
 ```
 人類分布（論文數據）：
  Persistent: ~33%
  Mixed: ~34%
  Flexible: ~33%
 我們的 LLM 分布：
  Persistent: 100%（所有條件）
  Mixed: 0%
  Flexible: 0%
 解讀：
  LLM（即使加入專家/屬性引導）仍傾向持續探索模式
  這可能是 LLM 架構的固有特性
 ```
 ---
 ## 七、與其他指標的整合
 ### 7.1 完整指標體系
 | 維度 | 指標 | 來源 | C4 表現 |
 |------|------|------|:-------:|
 | **流暢性** | 想法數量 | Torrance | 402（最多） |
 | **彈性** | 組合跳躍數 | arXiv:2405.00899 | 13（最低） |
 | **原創性** | 距離質心 | 本研究 | 0.395（最高） |
 | **精緻性** | 平均字數 | Torrance | 26.2 |
 ### 7.2 C4 的獨特位置
 ```
 創意空間定位：
     高原創性
         │
    C4 ●│
         │    C5●
         │         C3●
         │              C2●
         │
    C1 ●│
         └──────────────────── 高彈性
     低原創性
 C4 占據了「高原創性 + 低彈性」的獨特位置
 這在人類創意者中常見（專家型），但在 LLM 中罕見
 ```
 ---
 ## 八、未來研究方向
 基於這些發現，建議的後續研究：
 1. **跨模型驗證**
   - 在 GPT-4、Claude、Llama-3 上重複實驗
   - 確認發現是否為通用現象
 2. **Temperature 敏感度測試**
   - 論文發現 LLM 對 temperature 不敏感
   - 測試我們的管線是否也有此特性
 3. **人類基準比較**
   - 收集人類在相同任務上的數據
   - 直接比較彈性檔案分布
 4. **管線變體測試**
   - 調整屬性數量、專家數量
   - 找到最佳平衡點
 ---
 ## 參考文獻
 1. **arXiv:2405.00899** - "Characterising the Creative Process in Humans and Large Language Models"
   - 組合跳躍信號、彈性檔案分類的原始論文
 2. **Hadas & Hershkovitz (2024)** - "Using LLMs to Evaluate AUT Flexibility Score"
   - LLM 雙階段分類法的來源
 3. **Torrance (1974)** - *Torrance Tests of Creative Thinking*
   - 創造力四維度框架
 4. **Koestler (1964)** - *The Act of Creation*
   - Bisociation 理論基礎
 ---
 ## 附錄：程式碼參考
 相關分析程式碼位於：
 - `experiments/aut_flexibility_analysis.py`
  - `compute_jump_signal()` - 組合跳躍計算
  - `classify_flexibility_profile()` - 彈性檔案分類
  - `analyze_originality_flexibility_correlation()` - 相關性分析
  - `compute_cumulative_jump_profile()` - 累積跳躍輪廓
  - `plot_cumulative_jump_profiles()` - 視覺化
 執行分析：
 ```bash
 cd experiments
 source ../backend/venv/bin/activate
 python aut_flexibility_analysis.py experiment_20260119_165650_deduped.json
 ```
--- a/experiments/docs/experiment_design_2026-01-19.md
+++ b/experiments/docs/experiment_design_2026-01-19.md
@@ -0,0 +1,259 @@
 # Experiment Design: 5-Condition Idea Generation Study
 **Date:** January 19, 2026
 **Version:** 1.0
 **Status:** Pilot Implementation
 ## Overview
 This experiment tests whether the novelty-seeking system's two key mechanisms—**attribute decomposition** and **expert transformation**—independently and jointly improve creative ideation quality compared to direct LLM generation.
 ## Research Questions
 1. Does decomposing a query into structured attributes improve idea diversity?
 2. Do expert perspectives improve idea novelty?
 3. Do these mechanisms have synergistic effects when combined?
 4. Is the benefit from experts due to domain knowledge, or simply perspective-shifting?
 ## Experimental Design
 ### 2×2 Factorial Design + Control
 |                    | No Attributes | With Attributes |
 |--------------------|---------------|-----------------|
 | **No Experts**     | C1: Direct    | C3: Attr-Only   |
 | **With Experts**   | C2: Expert-Only | C4: Full Pipeline |
 **Plus:** C5: Random-Perspective (tests perspective-shifting without domain knowledge)
 ### Condition Descriptions
 #### C1: Direct Generation (Baseline)
 - Single LLM call: "Generate 20 creative ideas for [query]"
 - No attribute decomposition
 - No expert perspectives
 - Purpose: Baseline for standard LLM ideation
 #### C2: Expert-Only
 - 4 experts from curated occupations
 - Each expert generates 5 ideas directly for the query
 - No attribute decomposition
 - Purpose: Isolate expert contribution
 #### C3: Attribute-Only
 - Decompose query into 4 fixed categories
 - Generate attributes per category
 - Direct idea generation per attribute (no expert framing)
 - Purpose: Isolate attribute decomposition contribution
 #### C4: Full Pipeline
 - Full attribute decomposition (4 categories)
 - Expert transformation (4 experts × 1 keyword per attribute)
 - Purpose: Test combined mechanism (main system)
 #### C5: Random-Perspective
 - 4 random words per query (from curated pool)
 - Each word used as a "perspective" to generate 5 ideas
 - Purpose: Control for perspective-shifting vs. expert knowledge
 ---
 ## Key Design Decisions & Rationale
 ### 1. Why 5 Conditions?
 C1-C4 form a 2×2 factorial design that isolates the independent contributions of:
 - **Attribute decomposition** (C1 vs C3, C2 vs C4)
 - **Expert perspectives** (C1 vs C2, C3 vs C4)
 C5 addresses a critical confound: if experts improve ideation, is it because of their **domain knowledge** or simply because any **perspective shift** helps? By using random words instead of domain experts, C5 tests whether the perspective-taking mechanism alone provides benefits.
 ### 2. Why Random Words in C5 (Not Fixed)?
 **Decision:** Use randomly sampled words (with seed) rather than a fixed set.
 **Rationale:**
 - Stronger generalization: results hold across many word combinations
 - Avoids cherry-picking accusation ("you just picked easy words")
 - Reproducible via random seed (seed=42)
 - Each query gets different random words, increasing robustness
 ### 3. Why Apply Deduplication Uniformly?
 **Decision:** Apply embedding-based deduplication (threshold=0.85) to ALL conditions after generation.
 **Rationale:**
 - Fair comparison: all conditions normalized to unique ideas
 - Creates "dedup survival rate" as an additional metric
 - Hypothesis: Full Pipeline ideas are diverse (low redundancy), not just numerous
 - Direct generation may produce many similar ideas that collapse after dedup
 ### 4. Why FIXED_ONLY Categories?
 **Decision:** Use 4 fixed categories: Functions, Usages, User Groups, Characteristics
 **Rationale:**
 - Best for proof power: isolates "attribute decomposition" effect
 - No confound from dynamic category selection variability
 - Universal applicability: these 4 categories apply to objects, technology, and services
 - Dropped "Materials" category as it doesn't apply well to services
 ### 5. Why Curated Expert Source?
 **Decision:** Use curated occupations (210 professions) rather than LLM-generated experts.
 **Rationale:**
 - Reproducibility: same occupation pool across runs
 - Consistency: no variance from LLM expert generation
 - Control: we know exactly which experts are available
 - Validation: occupations were manually curated for diversity
 ### 6. Why Temperature 0.9?
 **Decision:** Use temperature=0.9 for all conditions.
 **Rationale:**
 - Higher temperature encourages more diverse/creative outputs
 - Matches typical creative task settings
 - Consistent across conditions for fair comparison
 - Lower temperatures (0.7) showed more repetitive outputs in testing
 ### 7. Why 10 Pilot Queries?
 **Decision:** Start with 10 queries before scaling to full 30.
 **Rationale:**
 - Validate pipeline works before full investment
 - Catch implementation bugs early
 - Balanced across categories (3 everyday, 3 technology, 4 services)
 - Sufficient for initial pattern detection
 ---
 ## Configuration Summary
 | Setting | Value | Rationale |
 |---------|-------|-----------|
 | **LLM Model** | qwen3:8b | Local, fast, consistent |
 | **Temperature** | 0.9 | Encourages creativity |
 | **Expert Count** | 4 | Balance diversity vs. cost |
 | **Expert Source** | Curated | Reproducibility |
 | **Keywords/Expert** | 1 | Simplifies analysis |
 | **Language** | English | Consistency |
 | **Categories** | Functions, Usages, User Groups, Characteristics | Universal applicability |
 | **Dedup Threshold** | 0.85 | Standard similarity cutoff |
 | **Random Seed** | 42 | Reproducibility |
 | **Pilot Queries** | 10 | Validation before scaling |
 ---
 ## Query Selection
 ### Pilot Queries (10)
 | ID | Query | Category |
 |----|-------|----------|
 | A1 | Chair | Everyday |
 | A5 | Bicycle | Everyday |
 | A7 | Smartphone | Everyday |
 | B1 | Solar panel | Technology |
 | B3 | 3D printer | Technology |
 | B4 | Drone | Technology |
 | C1 | Food delivery service | Services |
 | C2 | Online education platform | Services |
 | C4 | Public transportation | Services |
 | C9 | Elderly care service | Services |
 ### Selection Criteria
 - Balanced across 3 domains (everyday objects, technology, services)
 - Varying complexity levels
 - Different user familiarity levels
 - Subset from full 30-query experimental protocol
 ---
 ## Random Word Pool (C5)
 35 words selected across 7 conceptual categories:
 | Category | Words |
 |----------|-------|
 | Nature | ocean, mountain, forest, desert, cave |
 | Optics | microscope, telescope, kaleidoscope, prism, lens |
 | Animals | butterfly, elephant, octopus, eagle, ant |
 | Weather | sunrise, thunderstorm, rainbow, fog, aurora |
 | Art | clockwork, origami, mosaic, symphony, ballet |
 | Temporal | ancient, futuristic, organic, crystalline, liquid |
 | Sensory | whisper, explosion, rhythm, silence, echo |
 **Selection Criteria:**
 - Concrete and evocative (easy to generate associations)
 - Diverse domains (no overlap with typical expert knowledge)
 - No obvious connection to test queries
 - Equal representation across categories
 ---
 ## Expected Outputs
 ### Per Condition Per Query
 | Condition | Expected Ideas (pre-dedup) | Mechanism |
 |-----------|---------------------------|-----------|
 | C1 | 20 | Direct request |
 | C2 | 20 | 4 experts × 5 ideas |
 | C3 | ~20 | Varies by attribute count |
 | C4 | ~20 | 4 experts × ~5 keywords × 1 description |
 | C5 | 20 | 4 words × 5 ideas |
 ### Metrics to Collect
 1. **Pre-deduplication count**: Raw ideas generated
 2. **Post-deduplication count**: Unique ideas after similarity filtering
 3. **Dedup survival rate**: post/pre ratio
 4. **Generation metadata**: Experts/words used, attributes generated
 ---
 ## File Structure
 ```
 experiments/
 ├── __init__.py
 ├── config.py               # Experiment configuration
 ├── docs/
 │   └── experiment_design_2026-01-19.md  # This file
 ├── conditions/
 │   ├── __init__.py
 │   ├── c1_direct.py
 │   ├── c2_expert_only.py
 │   ├── c3_attribute_only.py
 │   ├── c4_full_pipeline.py
 │   └── c5_random_perspective.py
 ├── data/
 │   ├── queries.json        # 10 pilot queries
 │   └── random_words.json   # Word pool for C5
 ├── generate_ideas.py       # Main runner
 ├── deduplication.py        # Post-processing
 └── results/                # Output (gitignored)
 ```
 ---
 ## Verification Checklist
 - [ ] Each condition produces expected number of ideas
 - [ ] Deduplication reduces count meaningfully
 - [ ] Results JSON contains all required metadata
 - [ ] Random seed produces reproducible C5 results
 - [ ] No runtime errors on all 10 pilot queries
 ---
 ## Next Steps After Pilot
 1. Analyze pilot results for obvious issues
 2. Adjust parameters if needed (idea count normalization, etc.)
 3. Scale to full 30 queries
 4. Human evaluation of idea quality (novelty, usefulness, feasibility)
 5. Statistical analysis of condition differences
--- a/experiments/docs/experiment_report_2026-01-19.md
+++ b/experiments/docs/experiment_report_2026-01-19.md
@@ -0,0 +1,813 @@
 ---
 marp: true
 theme: default
 paginate: true
 backgroundColor: #fff
 style: |
  section {
    font-size: 24px;
  }
  h1 {
    color: #2c3e50;
  }
  h2 {
    color: #34495e;
  }
  table {
    font-size: 18px;
  }
  .columns {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 1rem;
  }
 ---
 # Breaking Semantic Gravity in LLM-Based Creative Ideation
 ## A Pilot Study on Attribute Decomposition and Expert Perspectives
 **Date:** January 19, 2026
 **Model:** Qwen3:8b (Temperature: 0.9)
 **Queries:** 10 pilot queries
 ---
 # Research Problem
 ## The "Semantic Gravity" Challenge
 LLMs tend to generate ideas clustered around **high-probability training distributions**
 ```
 Query: "Chair"
 Typical LLM output:
  - Ergonomic office chair
  - Comfortable reading chair
  - Foldable portable chair
  ← All within "furniture comfort" semantic cluster
 ```
 **Goal:** Break this gravitational pull toward obvious solutions
 ---
 # Theoretical Framework
 ## Bisociation Theory (Koestler, 1964)
 Creative thinking occurs when two unrelated "matrices of thought" collide
 **Our Approach:**
 1. **Attribute Decomposition** → Break object into structural components
 2. **Expert Perspectives** → Introduce distant domain knowledge
 3. **Context-Free Keywords** → Force unexpected conceptual leaps
 ---
 # Experimental Design
 ## 2×2 Factorial + Control
 | Condition | Attributes | Experts | Description |
 |-----------|:----------:|:-------:|-------------|
 | **C1** Direct | - | - | Baseline: Direct LLM generation |
 | **C2** Expert-Only | - | ✓ | Expert perspectives without structure |
 | **C3** Attribute-Only | ✓ | - | Structure without expert knowledge |
 | **C4** Full Pipeline | ✓ | ✓ | Combined approach |
 | **C5** Random-Perspective | - | Random | Control: Random words as "experts" |
 ---
 # Research Questions
 1. **RQ1:** Does attribute decomposition increase idea diversity?
 2. **RQ2:** Do expert perspectives increase idea diversity?
 3. **RQ3:** Is there a synergistic (super-additive) interaction effect?
 4. **RQ4:** Do domain-relevant experts outperform random perspectives?
 ---
 # Pipeline Architecture
 ## C4: Full Pipeline Process
 ```
 Query: "Chair"
    ↓
 Step 1: Attribute Decomposition
    → "portable", "stackable", "ergonomic", ...
    ↓
 Step 2: Context-Free Keyword Generation (Expert sees ONLY attribute)
    → Accountant + "portable" → "mobile assets"
    → Architect + "portable" → "modular units"
    ↓
 Step 3: Idea Synthesis (Reunite with query)
    → "Chair" + "mobile assets" + Accountant perspective
    → "Asset-tracking chairs for corporate inventory management"
 ```
 ---
 # Key Design Decision
 ## Context-Free Keyword Generation
 The expert **never sees the original query** when generating keywords
 ```python
 # Step 2: Expert sees only attribute
 prompt = f"As a {expert}, what keyword comes to mind for '{attribute}'?"
 # Input: "portable" (NOT "portable chair")
 # Step 3: Reunite with query
 prompt = f"Apply '{keyword}' to '{query}' from {expert}'s perspective"
 # Input: "mobile assets" + "Chair" + "Accountant"
 ```
 **Purpose:** Force bisociation by preventing obvious associations
 ---
 # Pilot Study Parameters
 ## Model & Generation Settings
 | Parameter | Value |
 |-----------|-------|
 | LLM Model | Qwen3:8b (Ollama) |
 | Temperature | 0.9 |
 | Ollama Endpoint | localhost:11435 |
 | Language | English |
 | Random Seed | 42 |
 ---
 # Pilot Study Parameters (cont.)
 ## Pipeline Configuration
 | Parameter | Value |
 |-----------|-------|
 | Queries | 10 (Chair, Bicycle, Smartphone, Solar panel, 3D printer, Drone, Food delivery, Online education, Public transport, Elderly care) |
 | Attribute Categories | 4 (Functions, Usages, User Groups, Characteristics) |
 | Attributes per Category | 5 |
 | Expert Source | Curated (210 occupations) |
 | Experts per Query | 4 |
 | Keywords per Expert | 1 |
 ---
 # Pilot Study Parameters (cont.)
 ## Output & Evaluation
 | Parameter | Value |
 |-----------|-------|
 | Total Ideas Generated | 1,119 (after deduplication) |
 | Ideas by Condition | C1: 195, C2: 198, C3: 125, C4: 402, C5: 199 |
 | Deduplication Threshold | 0.90 (cosine similarity) |
 | Embedding Model | qwen3-embedding:4b (1024D) |
 ---
 # Background: Embedding Models Evolution
 ## From Static to Contextual Representations
 | Generation | Model | Characteristics | Limitation |
 |------------|-------|-----------------|------------|
 | **1st Gen** | Word2Vec, GloVe | Static vectors, one vector per word | "bank" = same vector (river vs finance) |
 | **2nd Gen** | BERT, Sentence-BERT | Contextual, transformer-based | Limited context window, older training |
 | **3rd Gen** | Qwen3-embedding | LLM-based, instruction-tuned | Requires more compute |
 ---
 # Background: Transformer vs LLM-based Embedding
 ## Architecture Differences
 | Aspect | Transformer (BERT) | LLM-based (Qwen3) |
 |--------|-------------------|-------------------|
 | **架構** | Encoder-only | Decoder-only (GPT-style) |
 | **訓練目標** | MLM (遮罩語言模型) | Next-token prediction |
 | **訓練數據** | ~16GB (Wikipedia + Books) | ~數 TB (網頁、程式碼、書籍) |
 | **參數量** | 110M - 340M | 4B+ |
 | **上下文** | 512 tokens | 8K - 128K tokens |
 ---
 # Background
 ## Key Comparison
 ```
 1. 較多的知識訓練
   BERT: 只知道 2019 年前的知識
   Qwen3: 知道 "drone delivery", "AI-powered", "IoT" 等現代概念
 2. 較廣語義理解
   BERT: "chair for elderly" ≈ "elderly chair" (詞袋相似)
   Qwen3: 理解 "mobility assistance" vs "comfort seating" 的差異
 3. 接受指令微調 (Instruction Tuning)
   傳統: 無法根據任務意圖調整
   Qwen3: 可以理解 "找出創意想法之間的語義差異"
 ```
 ---
 # Background:  Qwen3-Embedding?
 ## Comparison with Traditional Methods
 ```
 傳統 Sentence-BERT (all-MiniLM-L6-v2):
  - 384 維向量
  - 訓練於 2021 年之前的數據
  - 對短句效果好，長文本理解有限
  - Encoder-only，MLM 訓練
 Qwen3-Embedding (qwen3-embedding:4b):
  - 1024 維向量（更豐富的語義表達）
  - 基於 Qwen3 LLM（2024+ 訓練數據）
  - 支援長上下文（8K tokens）
  - 指令微調（instruction-tuned）→ 配合任務意圖
  - 繼承 LLM 的部分能力
 ```
 **選擇理由：** 創意想法通常較長且語義複雜，需要更強的上下文理解能力
 ---
 # Background: How Embedding Works
 ## Semantic Similarity via Vector Space
 ```
 Step 1: 將文字轉換為向量
  "Solar-powered charging chair" → [0.12, -0.34, 0.56, ..., 0.78] (1024D)
 Step 2: 計算餘弦相似度
  similarity = cos(θ) = (A · B) / (|A| × |B|)
 Step 3: 相似度解讀
  1.0 = 完全相同
  0.9 = 非常相似（可能是重複想法）
  0.5 = 中等相關
  0.0 = 無關
 ```
 **應用：** 去重（similarity > 0.9）、彈性分析（clustering）、新穎性（centroid distance）
 ---
 # Results: Semantic Diversity 
 ## Mean Pairwise Distance (Higher = More Diverse)
 > **Method:** We convert each idea into a vector embedding (qwen3-embedding:4b), then calculate the average cosine distance between all pairs of ideas within each condition. Higher values indicate ideas are more spread out in semantic space.
 | Condition | Mean | SD | vs C1 (Cohen's d) |
 |-----------|:----:|:--:|:-----------------:|
 | C1 Direct | 0.294 | 0.039 | - |
 | C2 Expert-Only | 0.400 | 0.028 | **3.15*** |
 | C3 Attribute-Only | 0.377 | 0.036 | **2.20*** |
 | C4 Full Pipeline | 0.395 | 0.019 | **3.21*** |
 | C5 Random | 0.405 | 0.062 | **2.72*** |
 *p < 0.001, Large effect sizes (d > 0.8)
 > **Cohen's d:** Measures effect size (how big the difference is). d > 0.8 = large effect, d > 0.5 = medium, d > 0.2 = small.
 ---
 # Results: ANOVA Summary
 ## Normalized Diversity Metric
 > **Method:** Two-way ANOVA tests whether Attributes and Experts each have independent effects on diversity, and whether combining them produces extra benefit (interaction). F-statistic measures variance between groups vs within groups.
 | Effect | F | p | Significant |
 |--------|:-:|:-:|:-----------:|
 | **Attributes (RQ1)** | 5.31 | 0.027 | Yes |
 | **Experts (RQ2)** | 26.07 | <0.001 | Yes |
 | **Interaction (RQ3)** | - | - | Sub-additive |
 **Key Finding:** Both factors work, but combination is **not synergistic**
 ---
 # Results: Expert vs Random (RQ4)
 ## C2 (Expert-Only) vs C5 (Random-Perspective)
 | Metric | C2 Expert | C5 Random | p-value | Effect |
 |--------|:---------:|:---------:|:-------:|:------:|
 | Diversity | 0.399 | 0.414 | 0.463 | n.s. |
 | Query Distance | 0.448 | 0.437 | 0.654 | n.s. |
 **Finding:** Random words perform as well as domain experts
 Implication: The value may be in **perspective shift itself**, not expert knowledge
 ---
 # Results: Efficiency Analysis
 ## Diversity per Idea Generated
 | Condition | Mean Ideas | Diversity | Efficiency |
 |-----------|:----------:|:---------:|:----------:|
 | C1 Direct | 20.0 | 0.293 | 1.46 |
 | C2 Expert-Only | 20.0 | 0.399 | **1.99** |
 | C3 Attribute-Only | 12.8 | 0.376 | **3.01** |
 | C4 Full Pipeline | 51.9 | 0.393 | 0.78 |
 | C5 Random | 20.0 | 0.405 | 2.02 |
 **C4 produces 2.6× more ideas but achieves same diversity**
 ---
 # Visualization: Diversity by Condition
 ![height:450px](../results/figures/20260119_165650_diversity_boxplot.png)
 ---
 # Visualization: Query Distance
 ![height:450px](../results/figures/20260119_165650_query_distance_boxplot.png)
 ---
 # Advanced Analysis: Lexical Diversity
 ## Type-Token Ratio & Vocabulary Richness
 > **Method:** Type-Token Ratio (TTR) = unique words ÷ total words. High TTR means more varied vocabulary; low TTR means more word repetition. Vocabulary size counts total unique words across all ideas in a condition.
 | Condition | TTR | Vocabulary | Avg Words/Idea |
 |-----------|:---:|:----------:|:--------------:|
 | C1 Direct | **0.382** | 853 | 11.5 |
 | C2 Expert-Only | 0.330 | 1,358 | 20.8 |
 | C3 Attribute-Only | 0.330 | 1,098 | 26.6 |
 | C4 Full Pipeline | 0.189 | **1,992** | 26.2 |
 | C5 Random | 0.320 | 1,331 | 20.9 |
 **Finding:** C4 has largest vocabulary (1,992) but lowest TTR (0.189)
 → More words but more repetition across ideas
 ---
 # Advanced Analysis: Concept Extraction
 ## Top Keywords by Condition
 > **Method:** Extract meaningful keywords from idea texts using NLP (removing stopwords, lemmatization). Top keywords show most frequent concepts; unique keywords count distinct terms. Domain coverage checks if ideas span different knowledge areas.
 | Condition | Top Keywords | Unique Keywords |
 |-----------|--------------|:---------------:|
 | C1 Direct | solar, powered, smart, delivery, drone | 805 |
 | C2 Expert | real, create, design, time, develop | 1,306 |
 | C3 Attribute | real, time, create, develop, powered | 1,046 |
 | C4 Pipeline | time, real, data, ensuring, enhancing | **1,937** |
 | C5 Random | like, solar, inspired, energy, uses | 1,286 |
 **Finding:** C5 Random shows "inspired" → suggests analogical thinking
 All conditions cover 6 domain categories
 ---
 # Advanced Analysis: Novelty Scores
 ## Distance from Global Centroid (Higher = More Novel)
 > **Method:** Compute the centroid (average vector) of ALL ideas across all conditions. Then measure each idea's distance from this "typical idea" center. Ideas far from the centroid are semantically unusual compared to the overall pool.
 | Condition | Mean | Std | Interpretation |
 |-----------|:----:|:---:|----------------|
 | C1 Direct | 0.273 | 0.037 | Closest to "typical" ideas |
 | C2 Expert-Only | 0.315 | 0.062 | Moderate novelty |
 | C3 Attribute-Only | 0.337 | 0.066 | Moderate novelty |
 | C5 Random | 0.365 | 0.069 | High novelty |
 | **C4 Full Pipeline** | **0.395** | 0.083 | **Highest novelty** |
 **Finding:** C4 produces ideas furthest from the "average" idea space
 ---
 # Advanced Analysis: Cross-Condition Cohesion
 ## % Nearest Neighbors from Same Condition
 > **Method:** For each idea, find its K nearest neighbors in embedding space. Cohesion = percentage of neighbors from the same condition. High cohesion means ideas from that condition cluster together; low cohesion means they're scattered among other conditions.
 | Condition | Cohesion | Interpretation |
 |-----------|:--------:|----------------|
 | **C4 Full Pipeline** | **88.6%** | Highly distinct idea cluster |
 | C2 Expert-Only | 72.7% | Moderate clustering |
 | C5 Random | 71.4% | Moderate clustering |
 | C1 Direct | 70.8% | Moderate clustering |
 | C3 Attribute-Only | 51.2% | Ideas scattered, overlap with others |
 **Finding:** C4 ideas form a distinct cluster in semantic space
 ---
 # Advanced Analysis: AUT Flexibility
 ## Semantic Category Diversity (Hadas & Hershkovitz 2024)
 > **Method:** Uses the Alternative Uses Task (AUT) flexibility framework. Embedding-based: Hierarchical clustering with average linkage, cut at distance threshold 0.5. Higher cluster count = more semantic categories covered = higher flexibility.
 | Condition | Embedding Clusters | Mean Pairwise Similarity |
 |-----------|:------------------:|:------------------------:|
 | **C5 Random** | **15** | 0.521 (most diverse) |
 | **C2 Expert-Only** | **13** | 0.517 |
 | C3 Attribute-Only | 12 | - |
 | C4 Full Pipeline | 10 | 0.583 |
 | C1 Direct | **1** | 0.647 (most similar) |
 **Finding:** Expert perspectives (C2, C5) produce more diverse categories than direct generation (C1)
 ---
 # Advanced Analysis: Combined Jump Signal
 ## Enhanced Method from arXiv:2405.00899
 > **Method:** Combined jump signal uses logical AND of two conditions:
 > - **jumpcat:** Category changes between consecutive ideas (from embedding clustering)
 > - **jumpSS:** Semantic similarity < 0.7 (ideas are semantically dissimilar)
 >
 > **True jump = jumpcat ∧ jumpSS** — reduces false positives where similar ideas happen to be in different clusters.
 | Condition | Cat-Only | Sem-Only | **Combined** | Profile |
 |-----------|:--------:|:--------:|:------------:|---------|
 | C2 Expert-Only | 54 | 125 | **48** | Persistent |
 | C3 Attribute-Only | 34 | 107 | **33** | Persistent |
 | C5 Random | 22 | 116 | **20** | Persistent |
 | C4 Full Pipeline | 13 | 348 | **13** | Persistent |
 | C1 Direct | 0 | 104 | **0** | Persistent |
 **Finding:** Combined jumps ≤ category jumps (as expected). All conditions show "Persistent" exploration pattern.
 ---
 # Advanced Analysis: Flexibility Profiles
 ## Classification Based on Combined Jump Ratio
 > **Method:** Classify creativity style based on normalized jump ratio (jumps / transitions):
 > - **Persistent:** ratio < 0.30 (deep exploration within categories)
 > - **Flexible:** ratio > 0.45 (broad exploration across categories)
 > - **Mixed:** 0.30 ≤ ratio ≤ 0.45
 | Condition | Combined Jump Ratio | Profile | Interpretation |
 |-----------|:-------------------:|:-------:|----------------|
 | C3 Attribute-Only | **26.6%** | Persistent | Moderate category switching |
 | C2 Expert-Only | **24.4%** | Persistent | Moderate category switching |
 | C5 Random | 10.1% | Persistent | Low category switching |
 | **C4 Full Pipeline** | **3.2%** | Persistent | Very deep within-category exploration |
 | C1 Direct | 0.0% | Persistent | Single semantic cluster |
 **Key Insight:** C4's low jump ratio indicates focused, persistent exploration within novel semantic territory
 ---
 # Key Finding: Originality-Flexibility Correlation
 ## Does Our Pipeline Break the Typical LLM Pattern?
 > **Paper Finding (arXiv:2405.00899):**
 > - **Humans:** No correlation between flexibility and originality (r ≈ 0)
 > - **LLMs:** Positive correlation — flexible LLMs score higher on originality
 **Our Results:**
 | Metric | Value | Interpretation |
 |--------|:-----:|----------------|
 | **Pearson r** | **0.071** | Near zero correlation |
 | Interpretation | **Human-like pattern** | Breaks typical LLM pattern |
 **Per-Condition Breakdown:**
 | Condition | Novelty | Flexibility (combined jumps) |
 |-----------|:-------:|:----------------------------:|
 | C4 Full Pipeline | **0.395** (highest) | **13** (lowest) |
 | C5 Random | 0.365 | 20 |
 | C3 Attribute-Only | 0.337 | 33 |
 | C2 Expert-Only | 0.315 | 48 (highest) |
 | C1 Direct | 0.273 (lowest) | 0 |
 **Critical Finding:** The attribute+expert pipeline (C4) achieves **highest novelty with lowest flexibility**, demonstrating that structured context-free generation produces **focused novelty** rather than scattered exploration.
 ---
 # Cumulative Jump Profile Visualization
 ## Exploration Patterns Over Generation Sequence
 > **Method:** Track cumulative jump count at each response position. Steep slopes indicate rapid category switching; flat regions indicate persistent exploration within categories.
 ![height:400px](../results/cumulative_jump_profiles.png)
 **Visual Pattern:**
 - C2/C3 show steady accumulation of jumps → regular category switching
 - C4/C5 show flatter profiles → persistent within-category exploration
 - C1 is flat (0 jumps) → all ideas in single cluster
 ---
 # Flexibility vs Novelty: Key Insight
 ## Novelty and Flexibility are Orthogonal Dimensions
 | Condition | Novelty (centroid dist) | Flexibility (combined jumps) | Pattern |
 |-----------|:-----------------------:|:----------------------------:|---------|
 | C4 Pipeline | **0.395** (highest) | **13** (lowest) | High novel, low flex |
 | C5 Random | 0.365 | 20 | High novel, low flex |
 | C2 Expert | 0.315 | **48** (highest) | Moderate novel, high flex |
 | C3 Attribute | 0.337 | 33 | Moderate both |
 | C1 Direct | 0.273 (lowest) | 0 | Typical, single category |
 **Interpretation:**
 - **C1 Direct** produces similar ideas within one typical category (low novelty, no jumps)
 - **C4 Full Pipeline** produces the most novel ideas with focused exploration (low jump ratio)
 - **C2 Expert-Only** produces the most category switching but moderate novelty
 - **r = 0.071** confirms these are orthogonal dimensions (human-like pattern)
 ---
 # Embedding Visualization: PCA
 > **Method:** Principal Component Analysis reduces high-dimensional embeddings (1024D) to 2D for visualization by finding directions of maximum variance. Points close together = semantically similar ideas. Colors represent conditions.
 ![height:450px](../results/embedding_pca.png)
 ---
 # Embedding Visualization: t-SNE
 > **Method:** t-SNE (t-distributed Stochastic Neighbor Embedding) preserves local neighborhood structure when reducing to 2D. Better at revealing clusters than PCA, but distances between clusters are less meaningful. Good for seeing if conditions form distinct groups.
 ![height:450px](../results/embedding_tsne.png)
 ---
 # Integrated Findings
 ## What the Advanced Analysis Reveals
 | Analysis | C4 Full Pipeline Characteristic |
 |----------|--------------------------------|
 | Lexical | Largest vocabulary (1,992 words) |
 | Novelty | Highest distance from centroid (0.395) |
 | Cohesion | Tightest cluster (88.6% same-condition NN) |
 | Diversity | High pairwise distance (0.395) |
 | **Flexibility** | **Lowest combined jumps (13) = focused exploration** |
 **Interpretation:** C4 creates a **distinct semantic territory** -
 novel ideas that are internally coherent but far from other conditions.
 Low flexibility (3.2% jump ratio) indicates deep, focused exploration within a novel space.
 ## Understanding Novelty vs Flexibility
 | Condition | Novelty | Flexibility (jumps) | Strategy |
 |-----------|:-------:|:-------------------:|----------|
 | C1 Direct | Low | Lowest (0) | Typical, single category |
 | C2 Expert | Medium | **Highest (48)** | Experts = diverse exploration |
 | C3 Attribute | Medium | Medium (33) | Structured exploration |
 | C5 Random | High | Low (20) | Random but focused |
 | **C4 Pipeline** | **Highest** | **Low (13)** | **Focused novelty** |
 ---
 # Critical Limitation
 ## Embedding Distance ≠ True Novelty
 Current metrics measure **semantic spread**, not **creative value**
 | What We Measure | What We Miss |
 |-----------------|--------------|
 | Vector distance | Practical usefulness |
 | Cluster spread | Conceptual surprise |
 | Query distance | Non-obviousness |
 | | Feasibility |
 ```
 "Quantum entanglement chair" → High distance, Low novelty
 "Chair legs as drumsticks" → Low distance, High novelty
 ```
 ---
 # Torrance Creativity Framework
 ## What True Novelty Assessment Requires
 | Dimension | Definition | Our Coverage |
 |-----------|------------|:------------:|
 | **Fluency** | Number of ideas | ✓ Measured |
 | **Flexibility** | Category diversity | ✓ Measured (LLM + embedding) |
 | **Originality** | Statistical rarity | Not measured |
 | **Elaboration** | Detail & development | Not measured |
 **Originality requires human judgment or LLM-as-Judge**
 ---
 # Discussion: The Attribute Anchoring Effect
 ## Why C4 Has Highest Novelty but Lowest Flexibility
 ```
 C2 (Expert-Only): HIGHEST FLEXIBILITY (48 combined jumps)
  Architect → "load-bearing furniture"
  Chef → "dining experience design"
  ← Each expert explores freely, frequent category switching
 C4 (Full Pipeline): LOWEST FLEXIBILITY (13 combined jumps, 3.2% ratio)
  All experts respond to same attribute set
  Architect + "portable" → "modular portable"
  Chef + "portable" → "portable serving"
  ← Attribute anchoring constrains category switching
  ← BUT forced bisociation produces HIGHEST NOVELTY
 ```
 **Key Mechanism:** Attributes anchor experts to similar conceptual space (low flexibility),
 but context-free keyword generation forces novel associations (high novelty).
 **Result:** "Focused novelty" — deep exploration in a distant semantic territory
 ---
 # Key Findings Summary
 | RQ | Question | Answer |
 |----|----------|--------|
 | RQ1 | Attributes increase diversity? | **Yes** (p=0.027) |
 | RQ2 | Experts increase diversity? | **Yes** (p<0.001) |
 | RQ3 | Synergistic interaction? | **No** (sub-additive) |
 | RQ4 | Experts > Random? | **No** (p=0.463) |
 **Additional Findings (arXiv:2405.00899 Metrics):**
 - Full Pipeline (C4) has **highest novelty** but **lowest flexibility**
 - **Originality-Flexibility correlation r=0.071** (human-like, breaks typical LLM pattern)
 - Novelty and Flexibility are **orthogonal dimensions**
 - All conditions show **Persistent** exploration profile (combined jump ratio < 30%)
 - Direct generation (C1) produces ideas in a **single semantic cluster**
 ---
 # Limitations
 1. **Sample Size:** 10 queries (pilot study)
 2. **Novelty Measurement:** Embedding-based metrics only measure semantic distance, not true creative value
 3. **Single Model:** Results may vary with different LLMs
 4. **No Human Evaluation:** No validation of idea quality or usefulness
 5. **Fixed Categories:** 4 attribute categories may limit exploration
 ---
 # Future Work
 ## Immediate Next Steps
 1. **Human Assessment Interface** (Built)
   - Web-based rating tool with Torrance dimensions
   - Stratified sampling: 200 ideas (4 per condition × 10 queries)
   - 4 dimensions: Originality, Elaboration, Coherence, Usefulness
 2. **Multi-Model Validation** (Priority)
   - Replicate on GPT-4, Claude, Llama-3
   - Verify findings generalize across LLMs
 3. **LLM-as-Judge evaluation** for full-scale scoring
 4. **Scale to 30 queries** for statistical power
 5. **Alternative pipeline designs** to address attribute anchoring
 **Documentation:**
 - `experiments/docs/future_research_plan_zh.md` - Detailed research plan
 - `experiments/docs/creative_process_metrics_zh.md` - arXiv:2405.00899 metrics explanation
 ---
 # Conclusion
 ## Key Takeaways
 1. **Both attribute decomposition and expert perspectives significantly increase semantic diversity** compared to direct generation
 2. **The combination is sub-additive**, suggesting attribute structure may constrain expert creativity
 3. **Random perspectives work as well as domain experts**, implying the value is in perspective shift, not expert knowledge
 4. **Novelty and Flexibility are orthogonal creativity dimensions** - high novelty ≠ high flexibility
   - C4 Full Pipeline: Highest novelty, lowest flexibility
   - C5 Random: Higher flexibility, moderate novelty
 5. **🔑 Key Finding:** The pipeline produces **human-like originality-flexibility patterns** (r=0.071)
   - Typical LLMs show positive correlation (flexible → more original)
   - Our method breaks this pattern: high novelty with focused exploration
 6. **True novelty assessment requires judgment-based evaluation** beyond embedding metrics
 ---
 # Appendix: Statistical Details
 ## T-test Results (vs C1 Baseline)
 | Comparison | t | p | Cohen's d |
 |------------|:-:|:-:|:---------:|
 | C4 vs C1 | 8.55 | <0.001 | 4.05 |
 | C2 vs C1 | 7.67 | <0.001 | 3.43 |
 | C3 vs C1 | 4.23 | <0.001 | 1.89 |
 All experimental conditions significantly outperform baseline
 ---
 # Appendix: Experiment Configuration
 ```python
 EXPERIMENT_CONFIG = {
    "model": "qwen3:8b",
    "temperature": 0.9,
    "expert_count": 4,
    "expert_source": "curated",  # 210 occupations
    "keywords_per_expert": 1,
    "categories": ["Functions", "Usages",
                   "User Groups", "Characteristics"],
    "dedup_threshold": 0.90,
    "random_seed": 42
 }
 ```
 ---
 # Thank You
 ## Questions?
 **Repository:** novelty-seeking
 **Experiment Date:** January 19, 2026
 **Contact:** [Your Email]
 ---
 # Backup Slides
 ---
 # Backup: Deduplication Threshold Analysis
 Original threshold (0.85) was too aggressive:
 - 40.5% of removed pairs were borderline (0.85-0.87)
 - Many genuinely different concepts were grouped
 Raised to 0.90:
 - RQ1 (Attributes) became significant (p: 0.052 → 0.027)
 - Preserved ~103 additional unique ideas
 ---
 # Backup: Sample Ideas by Condition
 ## Query: "Chair"
 **C1 Direct:**
 - Ergonomic office chair with lumbar support
 - Foldable camping chair
 **C2 Expert-Only (Architect):**
 - Load-bearing furniture integrated into building structure
 **C4 Full Pipeline:**
 - Asset-tracking chairs with RFID for corporate inventory
 - (Accountant + "portable" → "mobile assets")
 ---
 # Backup: Efficiency Calculation
 $$\text{Efficiency} = \frac{\text{Mean Pairwise Distance}}{\text{Idea Count}} \times 100$$
 | Condition | Calculation | Result |
 |-----------|-------------|:------:|
 | C3 Attribute | 0.376 / 12.8 × 100 | 3.01 |
 | C4 Pipeline | 0.393 / 51.9 × 100 | 0.78 |
 C3 achieves 96% of C4's diversity with 25% of the ideas
--- a/experiments/docs/future_research_plan_zh.md
+++ b/experiments/docs/future_research_plan_zh.md
@@ -0,0 +1,342 @@
 # 研究發表計畫與未來工作
 **建立日期：** 2026-01-19
 **專案：** Breaking Semantic Gravity in LLM-Based Creative Ideation
 ---
 ## 一、發表可行性評估
 ### 現有研究的覆蓋範圍
 | 主題 | 代表論文 | 我們的差異 |
 |------|----------|------------|
 | LLM 創意評估 | Organisciak et al. (2023) | 他們評估 LLM 創意，我們是**增強**創意 |
 | AUT 彈性評分 | Hadas & Hershkovitz (2024) | 他們是評估方法，我們是**生成方法** |
 | Prompt 工程 | Zhou et al. (2023) | 他們優化 prompt，我們是**結構化管線** |
 | LLM-as-Judge | Zheng et al. (2023) | 這是評估工具，非核心貢獻 |
 ### 本研究的獨特貢獻
 | 獨特性 | 說明 | 學術價值 |
 |--------|------|----------|
 | Context-Free Keyword Generation | 專家從未看到原始查詢，強迫雙重聯想 | 方法創新 |
 | 次加性交互作用 | 屬性 × 專家 = Sub-additive | 實證發現 |
 | 隨機視角 ≈ 領域專家 | 視角轉換本身比專業知識更重要 | 理論貢獻 |
 | 新穎性-彈性正交性 | 在 LLM 創意生成中首次驗證 | 理論驗證 |
 ---
 ## 二、目前研究狀態
 ### 已完成 ✓
 | 要素 | 狀態 | 詳情 |
 |------|:----:|------|
 | 理論框架 | ✓ | Bisociation Theory + Torrance Creativity Framework |
 | 實驗設計 | ✓ | 2×2 factorial + control (5 conditions) |
 | 管線實作 | ✓ | 屬性分解 → 專家轉換 → 去重 |
 | 自動評估指標 | ✓ | 新穎性、彈性、多樣性、凝聚度、跳躍信號 |
 | 人類評估介面 | ✓ | Web-based Torrance 評分工具 |
 | 統計分析 | ✓ | ANOVA、效果量、相關性分析 |
 | 初步實驗 | ✓ | 10 queries, Qwen3:8b, 1119 ideas |
 ### 需要補充 ✗
 | 缺口 | 重要性 | 說明 |
 |------|:------:|------|
 | 多模型驗證 | **高** | 目前只有 Qwen3:8b |
 | 人類評估數據 | **高** | 介面已建置但未收集數據 |
 | 樣本量擴充 | **中** | 10 → 30-50 queries |
 | Baseline 比較 | **中** | 與其他創意增強方法比較 |
 | LLM-as-Judge | 中 | 與人類評估的相關性驗證 |
 ---
 ## 三、發表策略選項
 ### 選項 A：完整論文（頂會/期刊）
 **目標會議/期刊：**
 - ACL / EMNLP（NLP 頂會）
 - CHI（人機互動頂會）
 - Creativity Research Journal（創意研究期刊）
 - Thinking Skills and Creativity（創意思維期刊）
 **論文標題建議：**
 > "Breaking Semantic Gravity: Context-Free Expert Perspectives for LLM Creative Ideation"
 **需要補充的工作：**
 | 工作項目 | 預估時間 | 優先級 |
 |----------|:--------:|:------:|
 | GPT-4 實驗 | 1 週 | P0 |
 | Claude 實驗 | 1 週 | P0 |
 | Llama-3 實驗 | 1 週 | P1 |
 | 人類評估收集 | 2-3 週 | P0 |
 | 樣本量擴充 (30 queries) | 1 週 | P1 |
 | Baseline 比較實驗 | 1-2 週 | P1 |
 | 論文撰寫 | 2-3 週 | - |
 **總預估時間：** 2-3 個月
 ---
 ### 選項 B：短論文 / Workshop Paper
 **目標：**
 - ACL/EMNLP Workshop on Creativity and AI
 - NeurIPS Workshop on Creativity and Design
 - ICCC (International Conference on Computational Creativity)
 **需要補充的工作：**
 | 工作項目 | 預估時間 | 優先級 |
 |----------|:--------:|:------:|
 | GPT-4 實驗 | 1 週 | P0 |
 | 小規模人類評估 (50-100 ideas) | 1 週 | P0 |
 | 論文撰寫 | 1 週 | - |
 **總預估時間：** 2-4 週
 ---
 ## 四、實驗補充計畫
 ### Phase 1：多模型驗證（優先級 P0）
 ```
 目標：驗證方法的泛化性
 模型清單：
  □ GPT-4 / GPT-4o (OpenAI)
  □ Claude 3.5 Sonnet (Anthropic)
  □ Llama-3-70B (Meta)
  □ Gemini Pro (Google) [optional]
 實驗設計：
  - 相同的 10 queries
  - 相同的 5 conditions
  - 相同的評估指標
 預期結果：
  - 跨模型一致性分析
  - 模型特定效應識別
 ```
 ### Phase 2：人類評估（優先級 P0）
 ```
 目標：驗證自動指標與人類判斷的相關性
 評估維度（Torrance Framework）：
  1. 原創性 (Originality) - 1-5 Likert
  2. 精緻性 (Elaboration) - 1-5 Likert
  3. 可行性 (Feasibility) - 1-5 Likert
  4. 荒謬性 (Nonsense) - Binary
 樣本策略：
  - 分層抽樣：每 condition × 每 query = 4 ideas
  - 總計：5 × 10 × 4 = 200 ideas
  - 評審者：3-5 人（計算 ICC）
 介面：
  - 已建置：experiments/assessment/
  - 需要：招募評審者、收集數據
 ```
 ### Phase 3：樣本量擴充（優先級 P1）
 ```
 目標：提高統計效力
 擴充計畫：
  - 現有：10 queries
  - 目標：30-50 queries
 Query 來源：
  - 物品類：傢俱、工具、電器、交通工具
  - 概念類：服務、系統、流程
  - 混合類：結合物理和數位元素
 統計效力分析：
  - 當前效果量 d ≈ 2-3（大效應）
  - 30 queries 應足夠達到 power > 0.95
 ```
 ### Phase 4：Baseline 比較（優先級 P1）
 ```
 目標：與現有方法比較
 Baseline 方法：
  1. Vanilla Prompting
     "Generate creative uses for [object]"
  2. Chain-of-Thought (CoT)
     "Think step by step about creative uses..."
  3. Few-shot Examples
     提供 3-5 個創意範例
  4. Role-Playing (Standard)
     "As a [expert], suggest uses for [object]"
     （專家看到完整查詢）
 比較指標：
  - 新穎性、彈性、多樣性
  - 想法數量、生成時間
  - 人類評估分數
 ```
 ---
 ## 五、論文大綱草稿
 ### Title
 "Breaking Semantic Gravity: Context-Free Expert Perspectives for Enhanced LLM Creative Ideation"
 ### Abstract
 - Problem: LLMs generate ideas clustered around training distributions
 - Method: Attribute decomposition + context-free expert transformation
 - Results: Sub-additive interaction, random ≈ expert, novelty ⊥ flexibility
 - Contribution: Novel pipeline + empirical findings
 ### 1. Introduction
 - Semantic gravity problem in LLM creativity
 - Bisociation theory and creative thinking
 - Research questions (RQ1-4)
 ### 2. Related Work
 - LLM creativity evaluation
 - Prompt engineering for creativity
 - Computational creativity methods
 ### 3. Method
 - Pipeline architecture
 - Context-free keyword generation
 - Experimental design (2×2 + control)
 ### 4. Evaluation Framework
 - Automatic metrics (novelty, flexibility, diversity)
 - Human evaluation (Torrance dimensions)
 - LLM-as-Judge validation
 ### 5. Results
 - RQ1: Attribute effect
 - RQ2: Expert effect
 - RQ3: Interaction effect
 - RQ4: Expert vs Random
 - Cross-model validation
 ### 6. Discussion
 - Attribute anchoring effect
 - Value of perspective shift
 - Novelty vs flexibility orthogonality
 ### 7. Conclusion
 - Contributions
 - Limitations
 - Future work
 ---
 ## 六、時間線規劃
 ### 快速發表路線（Workshop Paper）
 ```
 Week 1-2: 多模型實驗 (GPT-4, Claude)
 Week 2-3: 小規模人類評估
 Week 3-4: 論文撰寫與投稿
 目標：2026 Q1 Workshop Deadline
 ```
 ### 完整發表路線（Full Paper）
 ```
 Month 1:
  - Week 1-2: 多模型實驗
  - Week 3-4: 樣本量擴充
 Month 2:
  - Week 1-2: 人類評估收集
  - Week 3-4: Baseline 比較實驗
 Month 3:
  - Week 1-2: 數據分析與統計
  - Week 3-4: 論文撰寫
 目標：ACL 2026 / EMNLP 2026
 ```
 ---
 ## 七、風險與緩解
 | 風險 | 可能性 | 影響 | 緩解策略 |
 |------|:------:|:----:|----------|
 | 跨模型結果不一致 | 中 | 高 | 報告為「模型特定發現」 |
 | 人類評估 ICC 低 | 中 | 中 | 增加評審者、修訂評分指南 |
 | 效應在大樣本消失 | 低 | 高 | 現有效果量大，風險較低 |
 | 競爭論文搶先 | 低 | 高 | 優先投 Workshop 建立優先權 |
 ---
 ## 八、資源需求
 ### 計算資源
 | 資源 | 用途 | 預估成本 |
 |------|------|:--------:|
 | OpenAI API | GPT-4 實驗 | ~$50-100 |
 | Anthropic API | Claude 實驗 | ~$50-100 |
 | Local GPU | Llama 實驗 | 已有 |
 | Ollama | Embedding | 已有 |
 ### 人力資源
 | 角色 | 需求 | 說明 |
 |------|------|------|
 | 人類評審者 | 3-5 人 | 可招募同學或眾包 |
 | 統計顧問 | 可選 | 複雜統計分析諮詢 |
 ---
 ## 九、成功指標
 ### 短期（1個月內）
 - [ ] 完成 GPT-4 實驗
 - [ ] 完成 Claude 實驗
 - [ ] 收集至少 100 個人類評估樣本
 ### 中期（3個月內）
 - [ ] 完成所有模型實驗
 - [ ] 完成人類評估（200+ samples, ICC > 0.7）
 - [ ] 完成 baseline 比較
 - [ ] 投稿第一篇論文
 ### 長期（6個月內）
 - [ ] 論文被接受
 - [ ] 開源程式碼和數據集
 - [ ] 擴展到其他創意任務
 ---
 ## 十、參考文獻
 1. Hadas, S., & Hershkovitz, A. (2024). Using Large Language Models to Evaluate Alternative Uses Task Flexibility Score. *Thinking Skills and Creativity*, 52, 101549.
 2. Organisciak, P., et al. (2023). Beyond Semantic Distance: Automated Scoring of Divergent Thinking Greatly Improves with Large Language Models. *Thinking Skills and Creativity*, 49, 101356.
 3. Koestler, A. (1964). *The Act of Creation*. Hutchinson.
 4. Torrance, E.P. (1974). *Torrance Tests of Creative Thinking*. Scholastic Testing Service.
 5. Stevenson, C., et al. (2024). Characterizing Creative Processes in Humans and Large Language Models. *arXiv:2405.00899*.
 6. Zheng, L., et al. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. *NeurIPS 2023*.
--- a/experiments/docs/presentation_notes_zh.md
+++ b/experiments/docs/presentation_notes_zh.md
@@ -0,0 +1,178 @@
 # 簡報備忘稿
 ---
 ## 開場（1-2 分鐘）
 **問題：** LLM 生成創意時有「語義引力」問題
 - 問「椅子創新用途」→ 都是「人體工學椅」「折疊椅」
 - 想法集中在訓練數據的高頻區域
 **我們的解法：** Bisociation（雙重聯想）
 - 拆解屬性 + 專家視角 + 無上下文關鍵字
 - 強迫產生意外連結
 ---
 ## 實驗設計（1 分鐘）
 **五個條件，2×2 + 控制組：**
 | 條件 | 記法 | 重點 |
 |------|------|------|
 | C1 | 直接生成 | Baseline |
 | C2 | 只有專家 | 專家自由發揮 |
 | C3 | 只有屬性 | 結構但無專家 |
 | C4 | 完整管線 | 屬性 + 專家 |
 | C5 | 隨機詞彙 | 控制組：隨機 vs 專家 |
 **關鍵設計：** 專家生成關鍵字時**看不到原始查詢**
 - 會計師 + 「便攜」→ 「流動資產」（不知道是椅子）
 - 再把「流動資產」+ 「椅子」結合
 ---
 ## 四個研究問題的答案
 | RQ | 問題 | 答案 | 一句話 |
 |----|------|:----:|--------|
 | RQ1 | 屬性有效？ | ✓ Yes | p=0.027 |
 | RQ2 | 專家有效？ | ✓ Yes | p<0.001 |
 | RQ3 | 有加乘效果？ | ✗ No | Sub-additive |
 | RQ4 | 專家 > 隨機？ | ✗ No | p=0.463 |
 **意外發現：** 隨機詞彙跟專家一樣好 → 價值在「視角轉換」本身
 ---
 ## 核心數據（記住這幾個數字）
 ### 新穎性（距離質心，越高越新穎）
 ```
 C4: 0.395 ← 最高！
 C5: 0.365
 C3: 0.337
 C2: 0.315
 C1: 0.273 ← 最低（最典型）
 ```
 ### 彈性（組合跳躍數，越高越分散）
 ```
 C2: 48 ← 最高！（專家自由探索）
 C3: 33
 C5: 20
 C4: 13 ← 最低！（專注探索）
 C1: 0  ← 單一群集
 ```
 ---
 ## 🔑 關鍵發現（重點中的重點）
 ### 發現 1：原創性-彈性相關性
 **論文說：**
 - 人類：r ≈ 0（無相關）
 - 典型 LLM：r > 0（正相關）
 **我們的結果：r = 0.071（接近零）**
 → **產生「類似人類」的創意模式！**
 ### 發現 2：C4 的獨特位置
 ```
 C4 = 最高新穎性 + 最低彈性
 這代表：「專注的新穎性」
 - 不是到處亂跳（高彈性）
 - 而是深入一個新穎領域（低彈性但高新穎）
 - 像人類專家的創意模式
 ```
 ### 發現 3：為什麼會這樣？
 ```
 屬性錨定效應：
  所有專家都回應同樣的屬性集
  → 想法被錨定在相似概念空間（低彈性）
  → 但無上下文關鍵字強迫新穎聯結（高新穎）
 結果：focused novelty（聚焦的新穎性）
 ```
 ---
 ## 方法論亮點
 ### 組合跳躍信號（Combined Jump）
 - 舊方法：只看類別切換
 - 新方法：類別切換 **且** 語義不相似
 - 減少假陽性，更準確
 ### 彈性檔案分類
 | 檔案 | 跳躍比率 | 我們的結果 |
 |------|:--------:|:----------:|
 | Persistent | <30% | 全部條件 |
 | Mixed | 30-45% | 無 |
 | Flexible | >45% | 無 |
 → LLM 傾向「持續探索」而非「靈活跳躍」
 ---
 ## 限制（誠實說）
 1. **樣本小：** 10 個查詢（pilot study）
 2. **沒有人工評估：** 只有 embedding 指標
 3. **單一模型：** 只測 Qwen3:8b
 4. **語義距離 ≠ 真正新穎：** 「量子糾纏椅」距離遠但不新穎
 ---
 ## 下一步（如果被問到）
 1. **人工評估介面**（已建好）
 2. **多模型驗證**（GPT-4, Claude）
 3. **LLM-as-Judge** 大規模評分
 4. **30 個查詢** 增加統計效力
 ---
 ## 一句話總結
 > **我們的屬性+專家管線讓 LLM 產生「類似人類專家」的創意模式：
 > 高新穎性但專注探索，打破典型 LLM 的「彈性=新穎」正相關。**
 ---
 ## 快問快答
 **Q: 為什麼隨機詞跟專家一樣好？**
 A: 價值在「視角轉換」本身，不在專業知識
 **Q: 為什麼 C4 彈性最低但新穎性最高？**
 A: 屬性把專家錨定在同一概念空間，但無上下文關鍵字強迫新穎連結
 **Q: r=0.071 代表什麼？**
 A: 新穎性和彈性無相關，跟人類一樣，打破典型 LLM 的正相關模式
 **Q: Persistent profile 是好是壞？**
 A: 不是好壞，是探索策略。C4 證明可以 persistent 但仍然 novel
 **Q: 這對實務有什麼用？**
 A: 想要高新穎性 → 用 C4；想要多元類別 → 用 C2
 ---
 ## 數字速查表
 | 指標 | C1 | C2 | C3 | C4 | C5 |
 |------|:--:|:--:|:--:|:--:|:--:|
 | 想法數 | 195 | 198 | 125 | **402** | 199 |
 | 新穎性 | 0.273 | 0.315 | 0.337 | **0.395** | 0.365 |
 | 彈性(jumps) | 0 | **48** | 33 | 13 | 20 |
 | 跳躍比率 | 0% | 24% | 27% | **3%** | 10% |
 | 凝聚度 | 71% | 73% | 51% | **89%** | 71% |
 **記憶口訣：** C4 最新穎、最凝聚、最低彈性 = 「聚焦的新穎」
--- a/experiments/generate_ideas.py
+++ b/experiments/generate_ideas.py
@@ -0,0 +1,290 @@
 """
 Main experiment runner for the 5-condition idea generation study.
 Usage:
    # Run single query through all conditions
    python -m experiments.generate_ideas --pilot --query "Chair"
    # Run all pilot queries
    python -m experiments.generate_ideas --pilot
    # Run specific conditions
    python -m experiments.generate_ideas --query "Bicycle" --conditions c1_direct c4_full_pipeline
 """
 import sys
 import json
 import argparse
 import asyncio
 import logging
 from datetime import datetime
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent / "backend"))
 from experiments.config import (
    CONDITIONS, CONDITION_NAMES, DATA_DIR, RESULTS_DIR, EXPERIMENT_CONFIG
 )
 from experiments.conditions import (
    c1_generate, c2_generate, c3_generate, c4_generate, c5_generate
 )
 # Configure logging
 logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
 )
 logger = logging.getLogger(__name__)
 # Condition function mapping
 CONDITION_FUNCTIONS = {
    "c1_direct": c1_generate,
    "c2_expert_only": c2_generate,
    "c3_attribute_only": c3_generate,
    "c4_full_pipeline": c4_generate,
    "c5_random_perspective": c5_generate,
 }
 def load_queries() -> List[Dict[str, Any]]:
    """Load pilot queries from data file."""
    queries_file = DATA_DIR / "queries.json"
    with open(queries_file, "r", encoding="utf-8") as f:
        data = json.load(f)
    return data.get("queries", [])
 def save_results(results: List[Dict[str, Any]], filename: str) -> Path:
    """Save results to JSON file."""
    RESULTS_DIR.mkdir(parents=True, exist_ok=True)
    output_path = RESULTS_DIR / filename
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(results, f, indent=2, ensure_ascii=False)
    return output_path
 async def run_condition(
    query: str,
    condition: str
 ) -> Dict[str, Any]:
    """Run a single condition for a query."""
    if condition not in CONDITION_FUNCTIONS:
        raise ValueError(f"Unknown condition: {condition}")
    generate_fn = CONDITION_FUNCTIONS[condition]
    result = await generate_fn(query)
    return result
 async def run_experiment(
    queries: Optional[List[str]] = None,
    conditions: Optional[List[str]] = None,
    save_intermediate: bool = True
 ) -> Dict[str, Any]:
    """
    Run the full experiment.
    Args:
        queries: List of queries to run (None = all pilot queries)
        conditions: List of conditions to run (None = all conditions)
        save_intermediate: Whether to save results after each query
    Returns:
        Complete experiment results
    """
    # Load queries if not provided
    if queries is None:
        query_data = load_queries()
        queries_to_run = [(q["id"], q["query"], q["category"]) for q in query_data]
    else:
        queries_to_run = [(f"Q{i}", q, "custom") for i, q in enumerate(queries)]
    # Default to all conditions
    conditions = conditions or CONDITIONS
    logger.info(f"Starting experiment with {len(queries_to_run)} queries and {len(conditions)} conditions")
    logger.info(f"Conditions: {', '.join(conditions)}")
    experiment_results = {
        "experiment_id": datetime.now().strftime("%Y%m%d_%H%M%S"),
        "config": EXPERIMENT_CONFIG,
        "conditions": conditions,
        "query_count": len(queries_to_run),
        "results": [],
        "summary": {}
    }
    for query_id, query, category in queries_to_run:
        logger.info(f"\n{'='*60}")
        logger.info(f"Processing query: {query} (ID: {query_id}, Category: {category})")
        logger.info(f"{'='*60}")
        query_results = {
            "query_id": query_id,
            "query": query,
            "category": category,
            "conditions": {}
        }
        for condition in conditions:
            logger.info(f"\n  Running {CONDITION_NAMES.get(condition, condition)}...")
            try:
                result = await run_condition(query, condition)
                query_results["conditions"][condition] = {
                    "success": True,
                    "idea_count": result["idea_count"],
                    "ideas": result["ideas"],
                    "ideas_with_source": result.get("ideas_with_source", []),
                    "metadata": result["metadata"]
                }
                logger.info(f"    Generated {result['idea_count']} ideas")
            except Exception as e:
                logger.error(f"    Error in {condition}: {e}")
                query_results["conditions"][condition] = {
                    "success": False,
                    "error": str(e),
                    "idea_count": 0,
                    "ideas": []
                }
        experiment_results["results"].append(query_results)
        # Save intermediate results
        if save_intermediate:
            save_results(
                experiment_results,
                f"experiment_{experiment_results['experiment_id']}_intermediate.json"
            )
    # Calculate summary statistics
    experiment_results["summary"] = calculate_summary(experiment_results)
    # Save final results
    output_path = save_results(
        experiment_results,
        f"experiment_{experiment_results['experiment_id']}_complete.json"
    )
    logger.info(f"\n{'='*60}")
    logger.info("Experiment complete!")
    logger.info(f"Results saved to: {output_path}")
    logger.info(f"{'='*60}")
    return experiment_results
 def calculate_summary(results: Dict[str, Any]) -> Dict[str, Any]:
    """Calculate summary statistics for the experiment."""
    summary = {
        "total_queries": len(results["results"]),
        "conditions": {}
    }
    for condition in results["conditions"]:
        condition_stats = {
            "total_ideas": 0,
            "successful_queries": 0,
            "failed_queries": 0,
            "avg_ideas_per_query": 0
        }
        for query_result in results["results"]:
            cond_result = query_result["conditions"].get(condition, {})
            if cond_result.get("success", False):
                condition_stats["successful_queries"] += 1
                condition_stats["total_ideas"] += cond_result.get("idea_count", 0)
            else:
                condition_stats["failed_queries"] += 1
        if condition_stats["successful_queries"] > 0:
            condition_stats["avg_ideas_per_query"] = (
                condition_stats["total_ideas"] / condition_stats["successful_queries"]
            )
        summary["conditions"][condition] = condition_stats
    return summary
 def print_summary(results: Dict[str, Any]):
    """Print a formatted summary of the experiment."""
    print("\n" + "=" * 70)
    print("EXPERIMENT SUMMARY")
    print("=" * 70)
    summary = results.get("summary", {})
    print(f"\nTotal queries processed: {summary.get('total_queries', 0)}")
    print("\nResults by condition:")
    print("-" * 70)
    print(f"{'Condition':<30} {'Success':<10} {'Total Ideas':<15} {'Avg/Query':<10}")
    print("-" * 70)
    for condition, stats in summary.get("conditions", {}).items():
        name = CONDITION_NAMES.get(condition, condition)
        success = stats.get("successful_queries", 0)
        total = stats.get("total_ideas", 0)
        avg = stats.get("avg_ideas_per_query", 0)
        print(f"{name:<30} {success:<10} {total:<15} {avg:<10.1f}")
    print("-" * 70)
 async def main():
    parser = argparse.ArgumentParser(
        description="Run the 5-condition idea generation experiment"
    )
    parser.add_argument(
        "--pilot",
        action="store_true",
        help="Run pilot experiment with all 10 queries"
    )
    parser.add_argument(
        "--query",
        type=str,
        help="Run single query (e.g., 'Chair')"
    )
    parser.add_argument(
        "--conditions",
        nargs="+",
        choices=CONDITIONS,
        help="Specific conditions to run"
    )
    parser.add_argument(
        "--no-save-intermediate",
        action="store_true",
        help="Don't save intermediate results"
    )
    args = parser.parse_args()
    # Determine queries to run
    if args.query:
        queries = [args.query]
    elif args.pilot:
        queries = None  # Will load all pilot queries
    else:
        parser.print_help()
        print("\nError: Must specify --pilot or --query")
        sys.exit(1)
    # Run experiment
    results = await run_experiment(
        queries=queries,
        conditions=args.conditions,
        save_intermediate=not args.no_save_intermediate
    )
    # Print summary
    print_summary(results)
 if __name__ == "__main__":
    asyncio.run(main())
--- a/experiments/novelty_loop/README.md
+++ b/experiments/novelty_loop/README.md
@@ -0,0 +1,253 @@
 # Novelty-Driven LLM Agent Loop
 An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
 ## Concept
 Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
 This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
 ```
 Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
 ```
 ## Research Foundation
 This work builds on established research:
 - **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
 - **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
 - **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
 - **Open-ended Learning**: Endless innovation through novelty pressure
 The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
 ## Architecture
 ```
 ┌──────────────────────────────────────────────────────────────────┐
 │              Novelty-Driven Task Generation Loop                 │
 ├──────────────────────────────────────────────────────────────────┤
 │                                                                  │
 │   ┌──────────┐                                                   │
 │   │ Seed     │  "Design a better bicycle"                        │
 │   │ Problem  │                                                   │
 │   └────┬─────┘                                                   │
 │        │                                                         │
 │        ▼                                                         │
 │   ┌─────────────────────────────────────────────────────────┐    │
 │   │  WHILE novelty < threshold AND iterations < max:        │    │
 │   │                                                         │    │
 │   │    1. Sample random expert (curated occupations)        │    │
 │   │       e.g., "marine biologist", "choreographer"         │    │
 │   │                                                         │    │
 │   │    2. Generate task from expert perspective             │    │
 │   │       "What task would a {expert} assign to improve     │    │
 │   │        {seed_problem}?"                                 │    │
 │   │                                                         │    │
 │   │    3. Embed task, compute novelty vs. centroid          │    │
 │   │                                                         │    │
 │   │    4. If novelty > threshold → STOP (breakthrough!)     │    │
 │   │                                                         │    │
 │   └─────────────────────────────────────────────────────────┘    │
 │        │                                                         │
 │        ▼                                                         │
 │   ┌──────────┐                                                   │
 │   │ Output:  │  Novel task that "jumped out" of typical space    │
 │   │ Task     │  + trajectory of exploration                      │
 │   └──────────┘                                                   │
 │                                                                  │
 └──────────────────────────────────────────────────────────────────┘
 ```
 ## Installation
 The module uses the existing project infrastructure. Ensure you have:
 1. **Ollama** running with the required models:
   ```bash
   ollama pull qwen3:8b
   ollama pull qwen3-embedding:4b
   ```
 2. **Python dependencies** (from project root):
   ```bash
   cd backend
   source venv/bin/activate
   pip install httpx numpy
   ```
 ## Quick Start
 ### Basic Usage
 ```bash
 cd experiments/novelty_loop
 python demo.py "Improve urban transportation"
 ```
 ### Example Output
 ```
 Iteration 1
  Expert: Architect (Architecture & Design)
  Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
  Novelty: [████████░░░░░░░░░░░░] 0.1234
 Iteration 2
  Expert: Chef (Culinary)
  Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
  Novelty: [███████████░░░░░░░░░] 0.1823
 Iteration 3
  Expert: Marine Biologist (Science)
  Task: Study fish schooling behavior to develop organic traffic flow algorithms
  Novelty: [██████████████░░░░░░] 0.3521
 Iteration 4
  Expert: Choreographer (Performing Arts)
  Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
  Novelty: [████████████████████] 0.5234
  ★ BREAKTHROUGH! ★
 ```
 ## Termination Strategies
 ### 1. Seek Breakthrough (Default)
 Stop when novelty exceeds threshold. Finds the first truly novel task.
 ```bash
 python demo.py "Your problem" --strategy breakthrough --threshold 0.4
 ```
 ### 2. Exhaust Frontier
 Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
 ```bash
 python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
 ```
 ### 3. Coverage Target
 Continue until N distinct conceptual clusters are covered. Ensures diversity.
 ```bash
 python demo.py "Your problem" --strategy coverage --clusters 5
 ```
 ## API Usage
 ```python
 import asyncio
 from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
 async def main():
    agent = NoveltyDrivenTaskAgent(
        novelty_threshold=0.4,
        max_iterations=20,
        language="en"
    )
    result = await agent.run("Design a better bicycle")
    print(f"Found breakthrough: {result.breakthrough_task.task}")
    print(f"Novelty score: {result.breakthrough_task.novelty_score}")
    print(f"From expert: {result.breakthrough_task.expert}")
    await agent.close()
 asyncio.run(main())
 ```
 ## Novelty Metrics
 The `novelty_metrics.py` module provides:
 - **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
 - **Min Distance**: Distance to nearest neighbor (detect duplicates)
 - **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
 - **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
 ```python
 from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
 metrics = NoveltyMetrics(similarity_threshold=0.7)
 # Add embeddings one by one
 for embedding in embeddings:
    novelty = metrics.compute_novelty(embedding)
    metrics.add_embedding(embedding, novelty)
    print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
 # Get trajectory stats
 print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
 print(f"Max novelty: {metrics.trajectory.max_novelty}")
 print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
 ```
 ## CLI Options
 ```
 positional arguments:
  seed_problem          The seed problem or challenge to explore
 options:
  --strategy {breakthrough,exhaust,coverage}
                        Termination strategy (default: breakthrough)
  --threshold, -t       Novelty threshold for breakthrough (default: 0.4)
  --max-iter, -m        Maximum iterations (default: 20)
  --language, -l {en,zh}
                        Language for prompts and experts (default: en)
  --model               LLM model for task generation (default: qwen3:8b)
  --embedding-model     Embedding model (default: qwen3-embedding:4b)
  --temperature         LLM temperature (default: 0.7)
  --output, -o          Save results to JSON file
  --quiet, -q           Suppress iteration output
  --verbose, -v         Enable verbose logging
 ```
 ## File Structure
 ```
 experiments/novelty_loop/
 ├── README.md           # This file
 ├── agent.py            # Core NoveltyDrivenTaskAgent and variants
 ├── novelty_metrics.py  # Novelty computation utilities
 └── demo.py             # Interactive CLI demo
 ```
 ## Design Decisions
 | Question | Decision | Rationale |
 |----------|----------|-----------|
 | Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
 | Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
 | Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
 | Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
 ## Connection to Main Project
 This module integrates with the main novelty-seeking project:
 - Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
 - Uses the same **embedding model** (qwen3-embedding:4b)
 - Builds on the **AUT flexibility analysis** metrics for novelty computation
 - Can use **DDC domain data** for alternative perturbation strategies
 ## Future Work
 1. **Hybrid Perturbation**: Combine expert + domain perspectives
 2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
 3. **Semantic Steering**: Guide generation away from centroid direction
 4. **Multi-Agent Exploration**: Parallel agents with different strategies
 5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
 ## References
 - Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
 - Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
 - Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
 - arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs
--- a/experiments/novelty_loop/init.py
+++ b/experiments/novelty_loop/init.py
@@ -0,0 +1,42 @@
 """
 Novelty-Driven LLM Agent Loop
 An autonomous agent that generates tasks using novelty as the termination condition.
 """
 from .agent import (
    NoveltyDrivenTaskAgent,
    ExhaustFrontierAgent,
    CoverageTargetAgent,
    GeneratedTask,
    TaskGenerationResult,
    ExpertProvider,
    DomainProvider,
 )
 from .novelty_metrics import (
    NoveltyMetrics,
    NoveltyScore,
    NoveltyTrajectory,
    compute_batch_novelty,
    find_most_novel,
 )
 __all__ = [
    # Agents
    "NoveltyDrivenTaskAgent",
    "ExhaustFrontierAgent",
    "CoverageTargetAgent",
    # Data classes
    "GeneratedTask",
    "TaskGenerationResult",
    "NoveltyScore",
    "NoveltyTrajectory",
    # Providers
    "ExpertProvider",
    "DomainProvider",
    # Metrics
    "NoveltyMetrics",
    "compute_batch_novelty",
    "find_most_novel",
 ]
--- a/experiments/novelty_loop/agent.py
+++ b/experiments/novelty_loop/agent.py
@@ -0,0 +1,725 @@
 """
 Novelty-Driven Task Agent - An autonomous agent that generates tasks using novelty as termination condition.
 This agent operates in a while loop, generating tasks from diverse expert perspectives,
 and terminates when it finds a task that exceeds the novelty threshold (a "breakthrough").
 The core innovation is using novelty assessment to help the agent "jump out" of its
 trained data distribution (semantic gravity), finding truly novel ideas.
 Architecture:
    Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
 Termination Strategy: "Seek Breakthrough"
    - Continue until novelty > threshold
    - Find the first truly novel task and stop
 Research Foundation:
    - Novelty Search (Lehman & Stanley): Reward novelty, not objectives
    - Curiosity-driven Exploration (Pathak et al.): Intrinsic motivation via prediction error
    - Quality-Diversity (MAP-Elites): Maintain diverse high-quality solutions
 """
 import asyncio
 import json
 import logging
 import random
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Any, Callable, List, Optional
 import httpx
 import numpy as np
 from .novelty_metrics import NoveltyMetrics, NoveltyScore, NoveltyTrajectory
 logger = logging.getLogger(__name__)
 # ============================================================================
 # Data Classes
 # ============================================================================
@dataclass
 class GeneratedTask:
    """A single generated task with metadata."""
    task: str
    expert: str
    expert_domain: str
    novelty_score: float
    iteration: int
    is_breakthrough: bool = False
    embedding: Optional[np.ndarray] = None
@dataclass
 class TaskGenerationResult:
    """Result of a complete novelty-driven task generation session."""
    seed_problem: str
    breakthrough_task: Optional[GeneratedTask] = None
    trajectory: List[GeneratedTask] = field(default_factory=list)
    total_iterations: int = 0
    terminated_by: str = "unknown"  # "breakthrough", "max_iterations", "error"
    novelty_trajectory: Optional[NoveltyTrajectory] = None
    start_time: Optional[str] = None
    end_time: Optional[str] = None
    config: dict = field(default_factory=dict)
    def to_dict(self) -> dict:
        """Convert to dictionary for JSON serialization."""
        return {
            "seed_problem": self.seed_problem,
            "breakthrough_task": {
                "task": self.breakthrough_task.task,
                "expert": self.breakthrough_task.expert,
                "expert_domain": self.breakthrough_task.expert_domain,
                "novelty_score": self.breakthrough_task.novelty_score,
                "iteration": self.breakthrough_task.iteration
            } if self.breakthrough_task else None,
            "trajectory": [
                {
                    "task": t.task,
                    "expert": t.expert,
                    "expert_domain": t.expert_domain,
                    "novelty_score": t.novelty_score,
                    "iteration": t.iteration,
                    "is_breakthrough": t.is_breakthrough
                }
                for t in self.trajectory
            ],
            "total_iterations": self.total_iterations,
            "terminated_by": self.terminated_by,
            "novelty_stats": {
                "mean_novelty": self.novelty_trajectory.mean_novelty if self.novelty_trajectory else 0,
                "max_novelty": self.novelty_trajectory.max_novelty if self.novelty_trajectory else 0,
                "jump_ratio": self.novelty_trajectory.jump_ratio if self.novelty_trajectory else 0,
                "cumulative_novelty": self.novelty_trajectory.final_cumulative_novelty if self.novelty_trajectory else 0
            },
            "start_time": self.start_time,
            "end_time": self.end_time,
            "config": self.config
        }
 # ============================================================================
 # Expert/Domain Providers
 # ============================================================================
 class ExpertProvider:
    """Provides random experts from curated occupation lists."""
    def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
        """
        Args:
            data_dir: Path to data directory containing occupation JSON files
            language: Language code ("en" or "zh")
        """
        if data_dir is None:
            # Default to backend data directory
            data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
        self.data_dir = data_dir
        self.language = language
        self._occupations: List[dict] = []
        self._load_occupations()
    def _load_occupations(self):
        """Load occupations from JSON file."""
        file_path = self.data_dir / f"curated_occupations_{self.language}.json"
        if not file_path.exists():
            logger.warning(f"Occupation file not found: {file_path}")
            # Fallback to some default experts
            self._occupations = [
                {"name": "Marine Biologist", "domain": "Science"},
                {"name": "Choreographer", "domain": "Arts"},
                {"name": "Urban Planner", "domain": "Architecture"},
                {"name": "Chef", "domain": "Culinary"},
                {"name": "Astronomer", "domain": "Science"},
            ]
            return
        try:
            with open(file_path, "r", encoding="utf-8") as f:
                data = json.load(f)
            self._occupations = data.get("occupations", [])
            logger.info(f"Loaded {len(self._occupations)} occupations from {file_path.name}")
        except Exception as e:
            logger.error(f"Error loading occupations: {e}")
            self._occupations = []
    def get_random_expert(self) -> dict:
        """Get a random expert with name and domain."""
        if not self._occupations:
            return {"name": "Expert", "domain": "General"}
        return random.choice(self._occupations)
    def get_random_experts(self, count: int) -> List[dict]:
        """Get multiple random experts without replacement."""
        if len(self._occupations) <= count:
            return self._occupations.copy()
        return random.sample(self._occupations, count)
 class DomainProvider:
    """Provides random knowledge domains from DDC classification."""
    def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
        if data_dir is None:
            data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
        self.data_dir = data_dir
        self.language = language
        self._domains: List[dict] = []
        self._load_domains()
    def _load_domains(self):
        """Load domains from JSON file."""
        file_path = self.data_dir / f"ddc_domains_{self.language}.json"
        if not file_path.exists():
            logger.warning(f"Domain file not found: {file_path}")
            self._domains = []
            return
        try:
            with open(file_path, "r", encoding="utf-8") as f:
                data = json.load(f)
            self._domains = data.get("domains", [])
            logger.info(f"Loaded {len(self._domains)} domains from {file_path.name}")
        except Exception as e:
            logger.error(f"Error loading domains: {e}")
    def get_random_domain(self, level: Optional[str] = None) -> dict:
        """Get a random domain, optionally filtered by level."""
        domains = self._domains
        if level:
            domains = [d for d in domains if d.get("level") == level]
        if not domains:
            return {"name": "General Knowledge", "code": "000"}
        return random.choice(domains)
 # ============================================================================
 # Novelty-Driven Task Agent
 # ============================================================================
 class NoveltyDrivenTaskAgent:
    """
    An autonomous agent that generates tasks using novelty as the termination condition.
    The agent operates in a loop:
    1. Sample a random expert perspective
    2. Generate a task from that expert's viewpoint
    3. Compute the task's novelty (distance from centroid of previous tasks)
    4. If novelty > threshold → STOP (found breakthrough!)
    5. Otherwise → Continue with next expert
    Example:
        agent = NoveltyDrivenTaskAgent(novelty_threshold=0.4)
        result = await agent.run("Improve urban transportation")
        # result.breakthrough_task contains the novel task found
        # result.trajectory shows the exploration path
    """
    def __init__(
        self,
        novelty_threshold: float = 0.4,
        max_iterations: int = 20,
        ollama_base_url: str = "http://localhost:11435",
        llm_model: str = "qwen3:8b",
        embedding_model: str = "qwen3-embedding:4b",
        language: str = "en",
        data_dir: Optional[Path] = None,
        on_iteration: Optional[Callable[[GeneratedTask], None]] = None,
        temperature: float = 0.7
    ):
        """
        Args:
            novelty_threshold: Novelty score threshold for breakthrough (0.0-1.0)
            max_iterations: Maximum iterations before stopping
            ollama_base_url: Ollama API endpoint
            llm_model: Model for task generation
            embedding_model: Model for embeddings
            language: Language for prompts and experts ("en" or "zh")
            data_dir: Path to data directory for expert/domain files
            on_iteration: Callback function called after each iteration
            temperature: LLM temperature for generation
        """
        self.novelty_threshold = novelty_threshold
        self.max_iterations = max_iterations
        self.ollama_base_url = ollama_base_url
        self.llm_model = llm_model
        self.embedding_model = embedding_model
        self.language = language
        self.temperature = temperature
        self.on_iteration = on_iteration
        # Initialize providers
        self.expert_provider = ExpertProvider(data_dir, language)
        self.domain_provider = DomainProvider(data_dir, language)
        # Initialize novelty metrics
        self.novelty_metrics = NoveltyMetrics(
            similarity_threshold=0.7,
            jump_detection_enabled=True
        )
        # HTTP client
        self._client: Optional[httpx.AsyncClient] = None
    async def _get_client(self) -> httpx.AsyncClient:
        """Get or create HTTP client."""
        if self._client is None:
            self._client = httpx.AsyncClient(timeout=120.0)
        return self._client
    async def close(self):
        """Close HTTP client."""
        if self._client is not None:
            await self._client.aclose()
            self._client = None
    async def _generate_text(self, prompt: str) -> str:
        """Generate text using Ollama LLM."""
        client = await self._get_client()
        url = f"{self.ollama_base_url}/api/generate"
        # Add /no_think prefix for qwen models to disable thinking
        if self.llm_model.lower().startswith("qwen"):
            prompt = f"/no_think\n{prompt}"
        try:
            response = await client.post(url, json={
                "model": self.llm_model,
                "prompt": prompt,
                "stream": False,
                "options": {
                    "temperature": self.temperature
                }
            })
            response.raise_for_status()
            result = response.json()
            return result.get("response", "").strip()
        except Exception as e:
            logger.error(f"LLM generation error: {e}")
            raise
    async def _get_embedding(self, text: str) -> np.ndarray:
        """Get embedding vector for text."""
        client = await self._get_client()
        url = f"{self.ollama_base_url}/api/embed"
        try:
            response = await client.post(url, json={
                "model": self.embedding_model,
                "input": text
            })
            response.raise_for_status()
            result = response.json()
            return np.array(result["embeddings"][0])
        except Exception as e:
            logger.error(f"Embedding error: {e}")
            raise
    def _build_task_prompt(
        self,
        seed_problem: str,
        expert: dict,
        previous_tasks: List[str]
    ) -> str:
        """Build the prompt for task generation."""
        expert_name = expert.get("name", "Expert")
        expert_domain = expert.get("domain", "General")
        # Build context from previous tasks (if any)
        context = ""
        if previous_tasks:
            recent = previous_tasks[-3:]  # Last 3 tasks
            context = "\n\nPrevious suggestions (generate something DIFFERENT):\n"
            for t in recent:
                context += f"- {t}\n"
        if self.language == "zh":
            prompt = f"""你是一位 {expert_name}（{expert_domain}）。
 给定问题：{seed_problem}
 请从你的专业角度出发，提出一个独特的改进任务或探索方向。
 这个任务应该结合你的专业知识，提供一个非传统但有价值的视角。
 {context}
 请直接给出任务描述，不要添加解释。任务应该具体、可行、且与众不同。
 任务："""
        else:
            prompt = f"""You are a {expert_name} ({expert_domain}).
 Given problem: {seed_problem}
 From your professional perspective, propose a unique task or exploration direction to improve or innovate on this problem.
 The task should leverage your domain expertise to provide an unconventional but valuable angle.
 {context}
 Provide just the task description without explanation. The task should be specific, actionable, and distinctive.
 Task:"""
        return prompt
    async def _generate_task(
        self,
        seed_problem: str,
        expert: dict,
        previous_tasks: List[str]
    ) -> str:
        """Generate a task from an expert's perspective."""
        prompt = self._build_task_prompt(seed_problem, expert, previous_tasks)
        task = await self._generate_text(prompt)
        # Clean up the response
        task = task.strip()
        # Remove common prefixes
        for prefix in ["Task:", "任务:", "Here's", "I suggest", "Based on"]:
            if task.lower().startswith(prefix.lower()):
                task = task[len(prefix):].strip()
        return task
    async def run(
        self,
        seed_problem: str,
        used_experts: Optional[List[dict]] = None
    ) -> TaskGenerationResult:
        """
        Run the novelty-driven task generation loop.
        Args:
            seed_problem: The initial problem/challenge to explore
            used_experts: Optional list of experts to avoid (for multi-run scenarios)
        Returns:
            TaskGenerationResult with breakthrough task (if found) and full trajectory
        """
        # Reset state
        self.novelty_metrics.reset()
        result = TaskGenerationResult(
            seed_problem=seed_problem,
            start_time=datetime.now(timezone.utc).isoformat(),
            config={
                "novelty_threshold": self.novelty_threshold,
                "max_iterations": self.max_iterations,
                "llm_model": self.llm_model,
                "embedding_model": self.embedding_model,
                "language": self.language
            }
        )
        used_expert_names = set()
        if used_experts:
            used_expert_names = {e["name"] for e in used_experts}
        previous_tasks: List[str] = []
        logger.info(f"Starting novelty loop: '{seed_problem}' (threshold={self.novelty_threshold})")
        try:
            for iteration in range(self.max_iterations):
                # 1. Sample a random expert (avoid duplicates)
                attempts = 0
                expert = self.expert_provider.get_random_expert()
                while expert["name"] in used_expert_names and attempts < 10:
                    expert = self.expert_provider.get_random_expert()
                    attempts += 1
                used_expert_names.add(expert["name"])
                logger.info(f"Iteration {iteration + 1}: Expert = {expert['name']} ({expert['domain']})")
                # 2. Generate task
                task = await self._generate_task(seed_problem, expert, previous_tasks)
                previous_tasks.append(task)
                # 3. Get embedding
                embedding = await self._get_embedding(task)
                # 4. Compute novelty
                novelty = self.novelty_metrics.compute_novelty(embedding)
                self.novelty_metrics.add_embedding(embedding, novelty)
                # 5. Create task record
                generated_task = GeneratedTask(
                    task=task,
                    expert=expert["name"],
                    expert_domain=expert["domain"],
                    novelty_score=novelty.score,
                    iteration=iteration + 1,
                    is_breakthrough=novelty.score > self.novelty_threshold,
                    embedding=embedding
                )
                result.trajectory.append(generated_task)
                logger.info(f"  Task: {task[:80]}...")
                logger.info(f"  Novelty: {novelty.score:.4f} (threshold: {self.novelty_threshold})")
                # Callback
                if self.on_iteration:
                    self.on_iteration(generated_task)
                # 6. Check for breakthrough
                if novelty.score > self.novelty_threshold:
                    result.breakthrough_task = generated_task
                    result.terminated_by = "breakthrough"
                    result.total_iterations = iteration + 1
                    logger.info(f"  BREAKTHROUGH! Stopping after {iteration + 1} iterations")
                    break
            else:
                # Max iterations reached without breakthrough
                result.terminated_by = "max_iterations"
                result.total_iterations = self.max_iterations
                logger.info(f"Max iterations ({self.max_iterations}) reached without breakthrough")
                # Find the most novel task as a fallback
                if result.trajectory:
                    best_task = max(result.trajectory, key=lambda t: t.novelty_score)
                    best_task.is_breakthrough = True  # Mark as best found
                    result.breakthrough_task = best_task
        except Exception as e:
            logger.error(f"Error during generation: {e}")
            result.terminated_by = f"error: {str(e)}"
            result.total_iterations = len(result.trajectory)
        # Finalize
        result.end_time = datetime.now(timezone.utc).isoformat()
        result.novelty_trajectory = self.novelty_metrics.trajectory
        return result
 # ============================================================================
 # Alternative Termination Strategies
 # ============================================================================
 class ExhaustFrontierAgent(NoveltyDrivenTaskAgent):
    """
    Alternative strategy: Continue while novelty is high, stop when it drops.
    This explores the "novelty frontier" more thoroughly, finding multiple novel
    ideas before stopping when exploration becomes repetitive.
    """
    def __init__(
        self,
        exhaustion_threshold: float = 0.15,
        window_size: int = 3,
        min_iterations: int = 5,
        **kwargs
    ):
        """
        Args:
            exhaustion_threshold: Stop when recent average novelty drops below this
            window_size: Number of recent iterations to average
            min_iterations: Minimum iterations before checking exhaustion
            **kwargs: Passed to parent class
        """
        super().__init__(**kwargs)
        self.exhaustion_threshold = exhaustion_threshold
        self.window_size = window_size
        self.min_iterations = min_iterations
    async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
        """Override to use exhaustion-based termination."""
        # Reset state
        self.novelty_metrics.reset()
        result = TaskGenerationResult(
            seed_problem=seed_problem,
            start_time=datetime.now(timezone.utc).isoformat(),
            config={
                "strategy": "exhaust_frontier",
                "exhaustion_threshold": self.exhaustion_threshold,
                "window_size": self.window_size,
                "min_iterations": self.min_iterations,
                "max_iterations": self.max_iterations,
                "llm_model": self.llm_model
            }
        )
        used_expert_names = set()
        previous_tasks: List[str] = []
        novelty_history: List[float] = []
        try:
            for iteration in range(self.max_iterations):
                # Sample expert
                expert = self.expert_provider.get_random_expert()
                while expert["name"] in used_expert_names and len(used_expert_names) < 200:
                    expert = self.expert_provider.get_random_expert()
                used_expert_names.add(expert["name"])
                # Generate and evaluate
                task = await self._generate_task(seed_problem, expert, previous_tasks)
                previous_tasks.append(task)
                embedding = await self._get_embedding(task)
                novelty = self.novelty_metrics.compute_novelty(embedding)
                self.novelty_metrics.add_embedding(embedding, novelty)
                novelty_history.append(novelty.score)
                generated_task = GeneratedTask(
                    task=task,
                    expert=expert["name"],
                    expert_domain=expert["domain"],
                    novelty_score=novelty.score,
                    iteration=iteration + 1
                )
                result.trajectory.append(generated_task)
                if self.on_iteration:
                    self.on_iteration(generated_task)
                # Check exhaustion condition
                if iteration >= self.min_iterations:
                    recent_avg = np.mean(novelty_history[-self.window_size:])
                    if recent_avg < self.exhaustion_threshold:
                        result.terminated_by = f"exhaustion (avg={recent_avg:.3f})"
                        result.total_iterations = iteration + 1
                        break
            else:
                result.terminated_by = "max_iterations"
                result.total_iterations = self.max_iterations
            # Find all "novel" tasks
            novel_tasks = [t for t in result.trajectory if t.novelty_score > self.exhaustion_threshold]
            if novel_tasks:
                result.breakthrough_task = max(novel_tasks, key=lambda t: t.novelty_score)
                result.breakthrough_task.is_breakthrough = True
        except Exception as e:
            result.terminated_by = f"error: {str(e)}"
            result.total_iterations = len(result.trajectory)
        result.end_time = datetime.now(timezone.utc).isoformat()
        result.novelty_trajectory = self.novelty_metrics.trajectory
        return result
 class CoverageTargetAgent(NoveltyDrivenTaskAgent):
    """
    Alternative strategy: Continue until N distinct clusters are covered.
    This ensures a diverse portfolio of ideas across different conceptual areas.
    """
    def __init__(
        self,
        target_clusters: int = 5,
        cluster_threshold: float = 0.7,
        **kwargs
    ):
        """
        Args:
            target_clusters: Target number of distinct clusters to find
            cluster_threshold: Similarity threshold for cluster membership
            **kwargs: Passed to parent class
        """
        super().__init__(**kwargs)
        self.target_clusters = target_clusters
        self.cluster_threshold = cluster_threshold
    def _count_clusters(self, embeddings: List[np.ndarray]) -> int:
        """Count distinct clusters using greedy clustering."""
        if not embeddings:
            return 0
        clusters = []
        for emb in embeddings:
            found_cluster = False
            for cluster_centroid in clusters:
                similarity = NoveltyMetrics.cosine_similarity(emb, cluster_centroid)
                if similarity >= self.cluster_threshold:
                    found_cluster = True
                    break
            if not found_cluster:
                clusters.append(emb)
        return len(clusters)
    async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
        """Override to use coverage-based termination."""
        self.novelty_metrics.reset()
        result = TaskGenerationResult(
            seed_problem=seed_problem,
            start_time=datetime.now(timezone.utc).isoformat(),
            config={
                "strategy": "coverage_target",
                "target_clusters": self.target_clusters,
                "cluster_threshold": self.cluster_threshold,
                "max_iterations": self.max_iterations
            }
        )
        used_expert_names = set()
        previous_tasks: List[str] = []
        all_embeddings: List[np.ndarray] = []
        try:
            for iteration in range(self.max_iterations):
                expert = self.expert_provider.get_random_expert()
                while expert["name"] in used_expert_names and len(used_expert_names) < 200:
                    expert = self.expert_provider.get_random_expert()
                used_expert_names.add(expert["name"])
                task = await self._generate_task(seed_problem, expert, previous_tasks)
                previous_tasks.append(task)
                embedding = await self._get_embedding(task)
                all_embeddings.append(embedding)
                novelty = self.novelty_metrics.compute_novelty(embedding)
                self.novelty_metrics.add_embedding(embedding, novelty)
                generated_task = GeneratedTask(
                    task=task,
                    expert=expert["name"],
                    expert_domain=expert["domain"],
                    novelty_score=novelty.score,
                    iteration=iteration + 1
                )
                result.trajectory.append(generated_task)
                if self.on_iteration:
                    self.on_iteration(generated_task)
                # Check coverage
                cluster_count = self._count_clusters(all_embeddings)
                if cluster_count >= self.target_clusters:
                    result.terminated_by = f"coverage ({cluster_count} clusters)"
                    result.total_iterations = iteration + 1
                    break
            else:
                final_clusters = self._count_clusters(all_embeddings)
                result.terminated_by = f"max_iterations ({final_clusters} clusters)"
                result.total_iterations = self.max_iterations
            # Find most novel task
            if result.trajectory:
                best_task = max(result.trajectory, key=lambda t: t.novelty_score)
                best_task.is_breakthrough = True
                result.breakthrough_task = best_task
        except Exception as e:
            result.terminated_by = f"error: {str(e)}"
            result.total_iterations = len(result.trajectory)
        result.end_time = datetime.now(timezone.utc).isoformat()
        result.novelty_trajectory = self.novelty_metrics.trajectory
        return result
--- a/experiments/novelty_loop/demo.py
+++ b/experiments/novelty_loop/demo.py
@@ -0,0 +1,313 @@
 #!/usr/bin/env python3
 """
 Novelty-Driven Task Generation Demo
 Interactive CLI for exploring the novelty-driven task generation agent.
 Examples:
    # Basic usage with default settings
    python demo.py "Improve urban transportation"
    # Custom threshold and iterations
    python demo.py "Design a better bicycle" --threshold 0.35 --max-iter 15
    # Use Chinese language
    python demo.py "改进城市交通" --language zh
    # Use exhaustion strategy (explore until stuck)
    python demo.py "Sustainable energy solutions" --strategy exhaust
    # Use coverage strategy (find N distinct clusters)
    python demo.py "Future of education" --strategy coverage --clusters 5
    # Save results to file
    python demo.py "Smart home innovations" --output results.json
    # Verbose mode with detailed logging
    python demo.py "Healthcare improvements" --verbose
 """
 import argparse
 import asyncio
 import json
 import logging
 import sys
 from datetime import datetime
 from pathlib import Path
 # Add parent directory to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent.parent))
 from experiments.novelty_loop.agent import (
    NoveltyDrivenTaskAgent,
    ExhaustFrontierAgent,
    CoverageTargetAgent,
    GeneratedTask,
    TaskGenerationResult
 )
 # ANSI color codes for terminal output
 class Colors:
    HEADER = '\033[95m'
    BLUE = '\033[94m'
    CYAN = '\033[96m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'
 def print_header(text: str):
    """Print a styled header."""
    print(f"\n{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}")
    print(f"{Colors.BOLD}{Colors.HEADER}{text.center(60)}{Colors.END}")
    print(f"{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}\n")
 def print_iteration(task: GeneratedTask):
    """Print iteration result with colors."""
    status_color = Colors.GREEN if task.is_breakthrough else Colors.CYAN
    print(f"\n{Colors.BOLD}Iteration {task.iteration}{Colors.END}")
    print(f"  {Colors.YELLOW}Expert:{Colors.END} {task.expert} ({task.expert_domain})")
    print(f"  {Colors.YELLOW}Task:{Colors.END} {task.task}")
    novelty_bar = "█" * int(task.novelty_score * 20) + "░" * (20 - int(task.novelty_score * 20))
    print(f"  {Colors.YELLOW}Novelty:{Colors.END} [{novelty_bar}] {task.novelty_score:.4f}")
    if task.is_breakthrough:
        print(f"  {Colors.GREEN}{Colors.BOLD}★ BREAKTHROUGH! ★{Colors.END}")
 def print_result(result: TaskGenerationResult):
    """Print final result summary."""
    print_header("RESULTS")
    print(f"{Colors.BOLD}Seed Problem:{Colors.END} {result.seed_problem}")
    print(f"{Colors.BOLD}Total Iterations:{Colors.END} {result.total_iterations}")
    print(f"{Colors.BOLD}Terminated By:{Colors.END} {result.terminated_by}")
    if result.novelty_trajectory:
        print(f"\n{Colors.BOLD}Novelty Statistics:{Colors.END}")
        print(f"  Mean Novelty: {result.novelty_trajectory.mean_novelty:.4f}")
        print(f"  Max Novelty:  {result.novelty_trajectory.max_novelty:.4f}")
        print(f"  Jump Ratio:   {result.novelty_trajectory.jump_ratio:.2%}")
    if result.breakthrough_task:
        print(f"\n{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
        print(f"{Colors.GREEN}{Colors.BOLD}BREAKTHROUGH TASK{Colors.END}")
        print(f"{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
        print(f"\n{Colors.BOLD}Expert:{Colors.END} {result.breakthrough_task.expert}")
        print(f"{Colors.BOLD}Domain:{Colors.END} {result.breakthrough_task.expert_domain}")
        print(f"{Colors.BOLD}Task:{Colors.END}")
        print(f"  {Colors.CYAN}{result.breakthrough_task.task}{Colors.END}")
        print(f"\n{Colors.BOLD}Novelty Score:{Colors.END} {result.breakthrough_task.novelty_score:.4f}")
        print(f"{Colors.BOLD}Found at Iteration:{Colors.END} {result.breakthrough_task.iteration}")
    # Show trajectory summary
    print(f"\n{Colors.BOLD}Exploration Trajectory:{Colors.END}")
    for task in result.trajectory:
        marker = "★" if task.is_breakthrough else "○"
        novelty_indicator = "█" * int(task.novelty_score * 10)
        print(f"  {marker} [{task.iteration:2d}] {task.expert:20s} | {novelty_indicator:10s} {task.novelty_score:.3f}")
 def save_result(result: TaskGenerationResult, output_path: str):
    """Save result to JSON file."""
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
    print(f"\n{Colors.GREEN}Results saved to: {output_path}{Colors.END}")
 async def run_demo(args):
    """Run the novelty-driven task generation demo."""
    print_header("NOVELTY-DRIVEN TASK GENERATION")
    print(f"{Colors.BOLD}Configuration:{Colors.END}")
    print(f"  Seed Problem: {args.seed_problem}")
    print(f"  Strategy: {args.strategy}")
    print(f"  Novelty Threshold: {args.threshold}")
    print(f"  Max Iterations: {args.max_iter}")
    print(f"  Language: {args.language}")
    print(f"  LLM Model: {args.model}")
    # Create appropriate agent based on strategy
    common_kwargs = {
        "max_iterations": args.max_iter,
        "llm_model": args.model,
        "embedding_model": args.embedding_model,
        "language": args.language,
        "temperature": args.temperature,
        "on_iteration": print_iteration if not args.quiet else None
    }
    if args.strategy == "breakthrough":
        agent = NoveltyDrivenTaskAgent(
            novelty_threshold=args.threshold,
            **common_kwargs
        )
    elif args.strategy == "exhaust":
        agent = ExhaustFrontierAgent(
            exhaustion_threshold=args.exhaust_threshold,
            window_size=args.window_size,
            min_iterations=args.min_iter,
            **common_kwargs
        )
    elif args.strategy == "coverage":
        agent = CoverageTargetAgent(
            target_clusters=args.clusters,
            cluster_threshold=args.cluster_threshold,
            **common_kwargs
        )
    else:
        print(f"{Colors.RED}Unknown strategy: {args.strategy}{Colors.END}")
        return
    print(f"\n{Colors.BOLD}Starting generation loop...{Colors.END}")
    print("-" * 60)
    try:
        result = await agent.run(args.seed_problem)
        print_result(result)
        if args.output:
            save_result(result, args.output)
    except Exception as e:
        print(f"\n{Colors.RED}Error: {e}{Colors.END}")
        if args.verbose:
            import traceback
            traceback.print_exc()
    finally:
        await agent.close()
 def main():
    parser = argparse.ArgumentParser(
        description="Novelty-Driven Task Generation Demo",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=__doc__
    )
    # Required argument
    parser.add_argument(
        "seed_problem",
        help="The seed problem or challenge to explore"
    )
    # Strategy selection
    parser.add_argument(
        "--strategy", "-s",
        choices=["breakthrough", "exhaust", "coverage"],
        default="breakthrough",
        help="Termination strategy (default: breakthrough)"
    )
    # Common options
    parser.add_argument(
        "--threshold", "-t",
        type=float,
        default=0.4,
        help="Novelty threshold for breakthrough (default: 0.4)"
    )
    parser.add_argument(
        "--max-iter", "-m",
        type=int,
        default=20,
        help="Maximum iterations (default: 20)"
    )
    parser.add_argument(
        "--language", "-l",
        choices=["en", "zh"],
        default="en",
        help="Language for prompts and experts (default: en)"
    )
    # Model options
    parser.add_argument(
        "--model",
        default="qwen3:8b",
        help="LLM model for task generation (default: qwen3:8b)"
    )
    parser.add_argument(
        "--embedding-model",
        default="qwen3-embedding:4b",
        help="Embedding model (default: qwen3-embedding:4b)"
    )
    parser.add_argument(
        "--temperature",
        type=float,
        default=0.7,
        help="LLM temperature (default: 0.7)"
    )
    # Exhaust strategy options
    parser.add_argument(
        "--exhaust-threshold",
        type=float,
        default=0.15,
        help="Exhaustion threshold for 'exhaust' strategy (default: 0.15)"
    )
    parser.add_argument(
        "--window-size",
        type=int,
        default=3,
        help="Window size for exhaustion check (default: 3)"
    )
    parser.add_argument(
        "--min-iter",
        type=int,
        default=5,
        help="Minimum iterations before exhaustion check (default: 5)"
    )
    # Coverage strategy options
    parser.add_argument(
        "--clusters",
        type=int,
        default=5,
        help="Target clusters for 'coverage' strategy (default: 5)"
    )
    parser.add_argument(
        "--cluster-threshold",
        type=float,
        default=0.7,
        help="Cluster similarity threshold (default: 0.7)"
    )
    # Output options
    parser.add_argument(
        "--output", "-o",
        help="Save results to JSON file"
    )
    parser.add_argument(
        "--quiet", "-q",
        action="store_true",
        help="Suppress iteration output"
    )
    parser.add_argument(
        "--verbose", "-v",
        action="store_true",
        help="Enable verbose logging"
    )
    args = parser.parse_args()
    # Configure logging
    if args.verbose:
        logging.basicConfig(
            level=logging.DEBUG,
            format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
        )
    else:
        logging.basicConfig(level=logging.WARNING)
    # Run the demo
    asyncio.run(run_demo(args))
 if __name__ == "__main__":
    main()
--- a/experiments/novelty_loop/novelty_metrics.py
+++ b/experiments/novelty_loop/novelty_metrics.py
@@ -0,0 +1,269 @@
 """
 Novelty Metrics Module - Compute novelty scores for generated outputs.
 This module provides embedding-based novelty metrics adapted from the AUT flexibility
 analysis framework for use in novelty-driven agent loops.
 Key Metrics:
 - Centroid Distance: Measures how far a new output is from the centroid of previous outputs
 - Cumulative Novelty: Tracks novelty over the generation sequence
 - Jump Detection: Identifies significant semantic shifts between consecutive outputs
 """
 from dataclasses import dataclass, field
 from typing import List, Optional
 import numpy as np
@dataclass
 class NoveltyScore:
    """Result of novelty computation for a single output."""
    score: float  # Main novelty score (0.0 = identical to centroid, 1.0 = maximally distant)
    distance_from_centroid: float
    min_distance_to_existing: float  # Nearest neighbor distance
    is_jump: bool  # Whether this represents a significant semantic jump
    jump_magnitude: Optional[float] = None  # Similarity to previous output (if applicable)
@dataclass
 class NoveltyTrajectory:
    """Tracks novelty scores over a generation sequence."""
    scores: List[float] = field(default_factory=list)
    cumulative_novelty: List[float] = field(default_factory=list)
    jump_positions: List[int] = field(default_factory=list)
    centroid_history: List[np.ndarray] = field(default_factory=list)
    @property
    def mean_novelty(self) -> float:
        """Average novelty across all outputs."""
        return float(np.mean(self.scores)) if self.scores else 0.0
    @property
    def max_novelty(self) -> float:
        """Maximum novelty achieved."""
        return float(max(self.scores)) if self.scores else 0.0
    @property
    def jump_ratio(self) -> float:
        """Proportion of transitions that were jumps."""
        if len(self.scores) < 2:
            return 0.0
        return len(self.jump_positions) / (len(self.scores) - 1)
    @property
    def final_cumulative_novelty(self) -> float:
        """Total accumulated novelty."""
        return self.cumulative_novelty[-1] if self.cumulative_novelty else 0.0
 class NoveltyMetrics:
    """
    Computes novelty metrics for embeddings in a streaming fashion.
    Designed for use in an agent loop where outputs are generated one at a time
    and we need to assess novelty incrementally.
    """
    def __init__(
        self,
        similarity_threshold: float = 0.7,
        jump_detection_enabled: bool = True
    ):
        """
        Args:
            similarity_threshold: Threshold for semantic similarity (below = jump)
            jump_detection_enabled: Whether to track semantic jumps
        """
        self.similarity_threshold = similarity_threshold
        self.jump_detection_enabled = jump_detection_enabled
        # State
        self.embeddings: List[np.ndarray] = []
        self.trajectory = NoveltyTrajectory()
        self._centroid: Optional[np.ndarray] = None
    def reset(self):
        """Reset all state for a new generation session."""
        self.embeddings = []
        self.trajectory = NoveltyTrajectory()
        self._centroid = None
    @staticmethod
    def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
        """Compute cosine similarity between two vectors."""
        norm_a = np.linalg.norm(a)
        norm_b = np.linalg.norm(b)
        if norm_a == 0 or norm_b == 0:
            return 0.0
        return float(np.dot(a, b) / (norm_a * norm_b))
    @staticmethod
    def cosine_distance(a: np.ndarray, b: np.ndarray) -> float:
        """Compute cosine distance (1 - similarity) between two vectors."""
        return 1.0 - NoveltyMetrics.cosine_similarity(a, b)
    def compute_centroid(self) -> Optional[np.ndarray]:
        """Compute centroid of all current embeddings."""
        if not self.embeddings:
            return None
        return np.mean(self.embeddings, axis=0)
    def compute_novelty(self, embedding: np.ndarray) -> NoveltyScore:
        """
        Compute novelty score for a new embedding.
        This does NOT add the embedding to the history - call add_embedding() for that.
        Args:
            embedding: The embedding vector to evaluate
        Returns:
            NoveltyScore with computed metrics
        """
        embedding = np.array(embedding)
        # First output is maximally novel (nothing to compare to)
        if not self.embeddings:
            return NoveltyScore(
                score=1.0,
                distance_from_centroid=1.0,
                min_distance_to_existing=1.0,
                is_jump=False,
                jump_magnitude=None
            )
        # Distance from centroid (primary novelty metric)
        centroid = self.compute_centroid()
        distance_from_centroid = self.cosine_distance(embedding, centroid)
        # Minimum distance to any existing embedding (nearest neighbor)
        min_distance = min(
            self.cosine_distance(embedding, existing)
            for existing in self.embeddings
        )
        # Jump detection (similarity to previous output)
        is_jump = False
        jump_magnitude = None
        if self.jump_detection_enabled and self.embeddings:
            similarity_to_prev = self.cosine_similarity(embedding, self.embeddings[-1])
            jump_magnitude = similarity_to_prev
            is_jump = similarity_to_prev < self.similarity_threshold
        # Primary novelty score is distance from centroid
        # Normalized to [0, 1] range where higher = more novel
        novelty_score = distance_from_centroid
        return NoveltyScore(
            score=novelty_score,
            distance_from_centroid=distance_from_centroid,
            min_distance_to_existing=min_distance,
            is_jump=is_jump,
            jump_magnitude=jump_magnitude
        )
    def add_embedding(self, embedding: np.ndarray, novelty: Optional[NoveltyScore] = None):
        """
        Add an embedding to the history and update trajectory.
        Args:
            embedding: The embedding to add
            novelty: Pre-computed novelty score (computed if not provided)
        """
        embedding = np.array(embedding)
        if novelty is None:
            novelty = self.compute_novelty(embedding)
        # Update state
        self.embeddings.append(embedding)
        self._centroid = self.compute_centroid()
        # Update trajectory
        self.trajectory.scores.append(novelty.score)
        # Cumulative novelty
        prev_cumulative = self.trajectory.cumulative_novelty[-1] if self.trajectory.cumulative_novelty else 0.0
        self.trajectory.cumulative_novelty.append(prev_cumulative + novelty.score)
        # Track jumps
        if novelty.is_jump:
            self.trajectory.jump_positions.append(len(self.embeddings) - 1)
        # Store centroid history
        if self._centroid is not None:
            self.trajectory.centroid_history.append(self._centroid.copy())
    def get_current_state(self) -> dict:
        """Get current state as a dictionary for logging/debugging."""
        return {
            "num_embeddings": len(self.embeddings),
            "mean_novelty": self.trajectory.mean_novelty,
            "max_novelty": self.trajectory.max_novelty,
            "jump_ratio": self.trajectory.jump_ratio,
            "cumulative_novelty": self.trajectory.final_cumulative_novelty,
            "recent_scores": self.trajectory.scores[-5:] if self.trajectory.scores else []
        }
 def compute_batch_novelty(
    embeddings: List[np.ndarray],
    reference_embeddings: Optional[List[np.ndarray]] = None
 ) -> List[float]:
    """
    Compute novelty scores for a batch of embeddings.
    Useful for post-hoc analysis of generated outputs.
    Args:
        embeddings: List of embeddings to evaluate
        reference_embeddings: Optional reference set (uses self if not provided)
    Returns:
        List of novelty scores (distance from centroid)
    """
    if not embeddings:
        return []
    embeddings_arr = np.array(embeddings)
    if reference_embeddings is not None:
        centroid = np.mean(reference_embeddings, axis=0)
    else:
        centroid = np.mean(embeddings_arr, axis=0)
    scores = []
    for emb in embeddings_arr:
        distance = NoveltyMetrics.cosine_distance(emb, centroid)
        scores.append(distance)
    return scores
 def find_most_novel(
    embeddings: List[np.ndarray],
    texts: List[str],
    top_k: int = 5
 ) -> List[tuple]:
    """
    Find the most novel outputs from a batch.
    Args:
        embeddings: List of embeddings
        texts: Corresponding text outputs
        top_k: Number of top results to return
    Returns:
        List of (text, novelty_score, index) tuples, sorted by novelty descending
    """
    scores = compute_batch_novelty(embeddings)
    indexed_results = [
        (texts[i], scores[i], i)
        for i in range(len(texts))
    ]
    # Sort by novelty score descending
    indexed_results.sort(key=lambda x: x[1], reverse=True)
    return indexed_results[:top_k]
--- a/experiments/results/.gitignore
+++ b/experiments/results/.gitignore
@@ -0,0 +1,5 @@
 # Ignore all experiment result files
 *.json
 # But keep this .gitignore
 !.gitignore
--- a/experiments/results/cumulative_jump_profiles.png
+++ b/experiments/results/cumulative_jump_profiles.png
--- a/experiments/results/embedding_pca.png
+++ b/experiments/results/embedding_pca.png
--- a/experiments/results/embedding_tsne.png
+++ b/experiments/results/embedding_tsne.png
--- a/experiments/results/figures/20260119_163040_diversity_boxplot.png
+++ b/experiments/results/figures/20260119_163040_diversity_boxplot.png
--- a/experiments/results/figures/20260119_163040_idea_counts.png
+++ b/experiments/results/figures/20260119_163040_idea_counts.png
--- a/experiments/results/figures/20260119_163040_interaction_diversity.png
+++ b/experiments/results/figures/20260119_163040_interaction_diversity.png
--- a/experiments/results/figures/20260119_163040_interaction_novelty.png
+++ b/experiments/results/figures/20260119_163040_interaction_novelty.png
--- a/experiments/results/figures/20260119_163040_metrics_comparison.png
+++ b/experiments/results/figures/20260119_163040_metrics_comparison.png
--- a/experiments/results/figures/20260119_163040_query_distance_boxplot.png
+++ b/experiments/results/figures/20260119_163040_query_distance_boxplot.png
--- a/experiments/results/figures/20260119_163040_survival_rates.png
+++ b/experiments/results/figures/20260119_163040_survival_rates.png
--- a/experiments/results/figures/20260119_165650_diversity_boxplot.png
+++ b/experiments/results/figures/20260119_165650_diversity_boxplot.png
--- a/experiments/results/figures/20260119_165650_idea_counts.png
+++ b/experiments/results/figures/20260119_165650_idea_counts.png
--- a/experiments/results/figures/20260119_165650_interaction_diversity.png
+++ b/experiments/results/figures/20260119_165650_interaction_diversity.png
--- a/experiments/results/figures/20260119_165650_interaction_novelty.png
+++ b/experiments/results/figures/20260119_165650_interaction_novelty.png
--- a/experiments/results/figures/20260119_165650_metrics_comparison.png
+++ b/experiments/results/figures/20260119_165650_metrics_comparison.png
--- a/experiments/results/figures/20260119_165650_query_distance_boxplot.png
+++ b/experiments/results/figures/20260119_165650_query_distance_boxplot.png
--- a/experiments/results/figures/20260119_165650_survival_rates.png
+++ b/experiments/results/figures/20260119_165650_survival_rates.png
--- a/experiments/visualize.py
+++ b/experiments/visualize.py
@@ -0,0 +1,521 @@
 """
 Visualization for experiment results.
 Generates:
 - Box plots of diversity by condition
 - 2×2 interaction plots
 - Bar charts of survival rates
 - t-SNE/UMAP of idea embeddings (optional)
 Usage:
    python -m experiments.visualize --input results/experiment_xxx_metrics.json
 """
 import sys
 import json
 import argparse
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 import numpy as np
 # Add experiments to path
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from experiments.config import RESULTS_DIR
 # Try to import visualization libraries
 try:
    import matplotlib.pyplot as plt
    import matplotlib.patches as mpatches
    MATPLOTLIB_AVAILABLE = True
 except ImportError:
    MATPLOTLIB_AVAILABLE = False
    print("Warning: matplotlib not installed. Visualization unavailable.")
    print("Install with: pip install matplotlib")
 # Condition display names and colors
 CONDITION_LABELS = {
    "c1_direct": "C1: Direct",
    "c2_expert_only": "C2: Expert-Only",
    "c3_attribute_only": "C3: Attr-Only",
    "c4_full_pipeline": "C4: Full Pipeline",
    "c5_random_perspective": "C5: Random"
 }
 CONDITION_COLORS = {
    "c1_direct": "#808080",           # Gray (baseline)
    "c2_expert_only": "#2196F3",      # Blue
    "c3_attribute_only": "#FF9800",   # Orange
    "c4_full_pipeline": "#4CAF50",    # Green (main)
    "c5_random_perspective": "#9C27B0" # Purple (control)
 }
 # 2×2 factorial structure
 FACTORIAL_2X2 = {
    "no_attr_no_expert": "c1_direct",
    "no_attr_with_expert": "c2_expert_only",
    "with_attr_no_expert": "c3_attribute_only",
    "with_attr_with_expert": "c4_full_pipeline"
 }
 def extract_metric_values(
    metrics: Dict[str, Any],
    metric_path: str
 ) -> Dict[str, List[float]]:
    """Extract values for a specific metric across all queries."""
    by_condition = {}
    for query_metrics in metrics.get("metrics_by_query", []):
        for condition, cond_metrics in query_metrics.get("conditions", {}).items():
            if condition not in by_condition:
                by_condition[condition] = []
            value = cond_metrics
            for key in metric_path.split("."):
                if value is None:
                    break
                if isinstance(value, dict):
                    value = value.get(key)
                else:
                    value = None
            if value is not None and isinstance(value, (int, float)):
                by_condition[condition].append(float(value))
    return by_condition
 def plot_box_comparison(
    metrics: Dict[str, Any],
    metric_path: str,
    title: str,
    ylabel: str,
    output_path: Path,
    figsize: tuple = (10, 6)
 ):
    """Create box plot comparing conditions."""
    if not MATPLOTLIB_AVAILABLE:
        return
    by_condition = extract_metric_values(metrics, metric_path)
    # Order conditions
    ordered_conditions = [
        "c1_direct", "c2_expert_only", "c3_attribute_only",
        "c4_full_pipeline", "c5_random_perspective"
    ]
    conditions = [c for c in ordered_conditions if c in by_condition]
    if not conditions:
        print(f"No data for {metric_path}")
        return
    fig, ax = plt.subplots(figsize=figsize)
    # Prepare data
    data = [by_condition[c] for c in conditions]
    labels = [CONDITION_LABELS.get(c, c) for c in conditions]
    colors = [CONDITION_COLORS.get(c, "#888888") for c in conditions]
    # Create box plot
    bp = ax.boxplot(data, labels=labels, patch_artist=True)
    # Color boxes
    for patch, color in zip(bp['boxes'], colors):
        patch.set_facecolor(color)
        patch.set_alpha(0.7)
    # Add individual points
    for i, (cond, values) in enumerate(zip(conditions, data)):
        x = np.random.normal(i + 1, 0.04, size=len(values))
        ax.scatter(x, values, alpha=0.6, color=colors[i], edgecolor='black', s=50)
    ax.set_ylabel(ylabel)
    ax.set_title(title)
    ax.grid(axis='y', alpha=0.3)
    # Rotate labels if needed
    plt.xticks(rotation=15, ha='right')
    plt.tight_layout()
    plt.savefig(output_path, dpi=150, bbox_inches='tight')
    plt.close()
    print(f"Saved: {output_path}")
 def plot_interaction_2x2(
    metrics: Dict[str, Any],
    metric_path: str,
    title: str,
    ylabel: str,
    output_path: Path,
    figsize: tuple = (8, 6)
 ):
    """Create 2×2 factorial interaction plot."""
    if not MATPLOTLIB_AVAILABLE:
        return
    by_condition = extract_metric_values(metrics, metric_path)
    # Check if all 2×2 conditions available
    required = ["c1_direct", "c2_expert_only", "c3_attribute_only", "c4_full_pipeline"]
    if not all(c in by_condition and by_condition[c] for c in required):
        print(f"Insufficient data for 2×2 plot of {metric_path}")
        return
    fig, ax = plt.subplots(figsize=figsize)
    # Calculate means
    means = {c: np.mean(by_condition[c]) for c in required}
    stds = {c: np.std(by_condition[c], ddof=1) if len(by_condition[c]) > 1 else 0 for c in required}
    # X positions: No Experts, With Experts
    x = [0, 1]
    x_labels = ["Without Experts", "With Experts"]
    # Line 1: Without Attributes (C1 -> C2)
    y_no_attr = [means["c1_direct"], means["c2_expert_only"]]
    err_no_attr = [stds["c1_direct"], stds["c2_expert_only"]]
    ax.errorbar(x, y_no_attr, yerr=err_no_attr, marker='o', markersize=10,
                linewidth=2, capsize=5, label="Without Attributes",
                color="#FF9800", linestyle='--')
    # Line 2: With Attributes (C3 -> C4)
    y_with_attr = [means["c3_attribute_only"], means["c4_full_pipeline"]]
    err_with_attr = [stds["c3_attribute_only"], stds["c4_full_pipeline"]]
    ax.errorbar(x, y_with_attr, yerr=err_with_attr, marker='s', markersize=10,
                linewidth=2, capsize=5, label="With Attributes",
                color="#4CAF50", linestyle='-')
    # Annotate points
    offset = 0.02 * (ax.get_ylim()[1] - ax.get_ylim()[0]) if ax.get_ylim()[1] != ax.get_ylim()[0] else 0.01
    ax.annotate("C1", (x[0], y_no_attr[0]), textcoords="offset points",
                xytext=(-15, -15), fontsize=9)
    ax.annotate("C2", (x[1], y_no_attr[1]), textcoords="offset points",
                xytext=(5, -15), fontsize=9)
    ax.annotate("C3", (x[0], y_with_attr[0]), textcoords="offset points",
                xytext=(-15, 10), fontsize=9)
    ax.annotate("C4", (x[1], y_with_attr[1]), textcoords="offset points",
                xytext=(5, 10), fontsize=9)
    ax.set_xticks(x)
    ax.set_xticklabels(x_labels)
    ax.set_ylabel(ylabel)
    ax.set_title(title)
    ax.legend(loc='best')
    ax.grid(axis='y', alpha=0.3)
    # Check for interaction (non-parallel lines)
    slope_no_attr = y_no_attr[1] - y_no_attr[0]
    slope_with_attr = y_with_attr[1] - y_with_attr[0]
    interaction = slope_with_attr - slope_no_attr
    interaction_text = f"Interaction: {interaction:+.4f}"
    if interaction > 0.01:
        interaction_text += " (super-additive)"
    elif interaction < -0.01:
        interaction_text += " (sub-additive)"
    else:
        interaction_text += " (additive)"
    ax.text(0.02, 0.98, interaction_text, transform=ax.transAxes,
            fontsize=10, verticalalignment='top',
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
    plt.tight_layout()
    plt.savefig(output_path, dpi=150, bbox_inches='tight')
    plt.close()
    print(f"Saved: {output_path}")
 def plot_survival_rates(
    metrics: Dict[str, Any],
    output_path: Path,
    figsize: tuple = (10, 6)
 ):
    """Create bar chart of deduplication survival rates."""
    if not MATPLOTLIB_AVAILABLE:
        return
    by_condition = extract_metric_values(metrics, "survival_rate")
    ordered_conditions = [
        "c1_direct", "c2_expert_only", "c3_attribute_only",
        "c4_full_pipeline", "c5_random_perspective"
    ]
    conditions = [c for c in ordered_conditions if c in by_condition]
    if not conditions:
        print("No survival rate data")
        return
    fig, ax = plt.subplots(figsize=figsize)
    # Calculate means and stds
    means = [np.mean(by_condition[c]) * 100 for c in conditions]  # Convert to percentage
    stds = [np.std(by_condition[c], ddof=1) * 100 if len(by_condition[c]) > 1 else 0 for c in conditions]
    labels = [CONDITION_LABELS.get(c, c) for c in conditions]
    colors = [CONDITION_COLORS.get(c, "#888888") for c in conditions]
    x = np.arange(len(conditions))
    bars = ax.bar(x, means, yerr=stds, capsize=5, color=colors, alpha=0.8, edgecolor='black')
    # Add value labels on bars
    for bar, mean in zip(bars, means):
        height = bar.get_height()
        ax.annotate(f'{mean:.1f}%',
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3), textcoords="offset points",
                    ha='center', va='bottom', fontsize=10)
    ax.set_xticks(x)
    ax.set_xticklabels(labels, rotation=15, ha='right')
    ax.set_ylabel("Survival Rate (%)")
    ax.set_title("Deduplication Survival Rate by Condition\n(Higher = More Diverse Generation)")
    ax.set_ylim(0, 110)
    ax.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=150, bbox_inches='tight')
    plt.close()
    print(f"Saved: {output_path}")
 def plot_idea_counts(
    metrics: Dict[str, Any],
    output_path: Path,
    figsize: tuple = (10, 6)
 ):
    """Create stacked bar chart of raw vs unique idea counts."""
    if not MATPLOTLIB_AVAILABLE:
        return
    raw_counts = extract_metric_values(metrics, "raw_count")
    unique_counts = extract_metric_values(metrics, "unique_count")
    ordered_conditions = [
        "c1_direct", "c2_expert_only", "c3_attribute_only",
        "c4_full_pipeline", "c5_random_perspective"
    ]
    conditions = [c for c in ordered_conditions if c in raw_counts and c in unique_counts]
    if not conditions:
        print("No count data")
        return
    fig, ax = plt.subplots(figsize=figsize)
    # Calculate means
    raw_means = [np.mean(raw_counts[c]) for c in conditions]
    unique_means = [np.mean(unique_counts[c]) for c in conditions]
    removed_means = [r - u for r, u in zip(raw_means, unique_means)]
    labels = [CONDITION_LABELS.get(c, c) for c in conditions]
    x = np.arange(len(conditions))
    width = 0.6
    # Stacked bars: unique (bottom) + removed (top)
    bars1 = ax.bar(x, unique_means, width, label='Unique Ideas',
                   color=[CONDITION_COLORS.get(c, "#888888") for c in conditions], alpha=0.9)
    bars2 = ax.bar(x, removed_means, width, bottom=unique_means, label='Duplicates Removed',
                   color='lightgray', alpha=0.7, hatch='//')
    # Add value labels
    for i, (unique, raw) in enumerate(zip(unique_means, raw_means)):
        ax.annotate(f'{unique:.0f}', xy=(x[i], unique / 2),
                    ha='center', va='center', fontsize=10, fontweight='bold')
        ax.annotate(f'({raw:.0f})', xy=(x[i], raw + 1),
                    ha='center', va='bottom', fontsize=9, color='gray')
    ax.set_xticks(x)
    ax.set_xticklabels(labels, rotation=15, ha='right')
    ax.set_ylabel("Number of Ideas")
    ax.set_title("Idea Counts by Condition\n(Unique ideas shown, raw total in parentheses)")
    ax.legend(loc='upper right')
    ax.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=150, bbox_inches='tight')
    plt.close()
    print(f"Saved: {output_path}")
 def plot_metrics_comparison(
    metrics: Dict[str, Any],
    output_path: Path,
    figsize: tuple = (12, 8)
 ):
    """Create multi-panel comparison of key metrics."""
    if not MATPLOTLIB_AVAILABLE:
        return
    fig, axes = plt.subplots(2, 2, figsize=figsize)
    # Extract metrics
    metrics_to_plot = [
        ("survival_rate", "Survival Rate", axes[0, 0], True),
        ("post_dedup_diversity.mean_pairwise_distance", "Semantic Diversity", axes[0, 1], False),
        ("post_dedup_query_distance.mean_distance", "Query Distance (Novelty)", axes[1, 0], False),
        ("post_dedup_clusters.optimal_clusters", "Number of Clusters", axes[1, 1], False),
    ]
    ordered_conditions = [
        "c1_direct", "c2_expert_only", "c3_attribute_only",
        "c4_full_pipeline", "c5_random_perspective"
    ]
    for metric_path, title, ax, is_percentage in metrics_to_plot:
        by_condition = extract_metric_values(metrics, metric_path)
        conditions = [c for c in ordered_conditions if c in by_condition and by_condition[c]]
        if not conditions:
            ax.text(0.5, 0.5, "No data", ha='center', va='center', transform=ax.transAxes)
            ax.set_title(title)
            continue
        means = [np.mean(by_condition[c]) for c in conditions]
        if is_percentage:
            means = [m * 100 for m in means]
        colors = [CONDITION_COLORS.get(c, "#888888") for c in conditions]
        x = np.arange(len(conditions))
        bars = ax.bar(x, means, color=colors, alpha=0.8, edgecolor='black')
        # Simplified labels
        short_labels = ["C1", "C2", "C3", "C4", "C5"][:len(conditions)]
        ax.set_xticks(x)
        ax.set_xticklabels(short_labels)
        ax.set_title(title)
        ax.grid(axis='y', alpha=0.3)
        if is_percentage:
            ax.set_ylim(0, 110)
    # Add legend
    legend_elements = [
        mpatches.Patch(facecolor=CONDITION_COLORS[c], label=CONDITION_LABELS[c])
        for c in ordered_conditions if c in CONDITION_COLORS
    ]
    fig.legend(handles=legend_elements, loc='lower center', ncol=3, bbox_to_anchor=(0.5, -0.02))
    plt.tight_layout()
    plt.subplots_adjust(bottom=0.15)
    plt.savefig(output_path, dpi=150, bbox_inches='tight')
    plt.close()
    print(f"Saved: {output_path}")
 def generate_all_visualizations(
    metrics: Dict[str, Any],
    output_dir: Path
 ):
    """Generate all visualization figures."""
    if not MATPLOTLIB_AVAILABLE:
        print("matplotlib not available. Cannot generate visualizations.")
        return
    output_dir.mkdir(parents=True, exist_ok=True)
    experiment_id = metrics.get("experiment_id", "experiment")
    print(f"\nGenerating visualizations for {experiment_id}...")
    # 1. Survival rates bar chart
    plot_survival_rates(
        metrics,
        output_dir / f"{experiment_id}_survival_rates.png"
    )
    # 2. Idea counts stacked bar
    plot_idea_counts(
        metrics,
        output_dir / f"{experiment_id}_idea_counts.png"
    )
    # 3. Diversity box plot
    plot_box_comparison(
        metrics,
        "post_dedup_diversity.mean_pairwise_distance",
        "Semantic Diversity by Condition (Post-Dedup)",
        "Mean Pairwise Distance",
        output_dir / f"{experiment_id}_diversity_boxplot.png"
    )
    # 4. Query distance box plot
    plot_box_comparison(
        metrics,
        "post_dedup_query_distance.mean_distance",
        "Query Distance by Condition (Novelty)",
        "Distance from Original Query",
        output_dir / f"{experiment_id}_query_distance_boxplot.png"
    )
    # 5. 2×2 interaction plot for diversity
    plot_interaction_2x2(
        metrics,
        "post_dedup_diversity.mean_pairwise_distance",
        "2×2 Factorial: Semantic Diversity",
        "Mean Pairwise Distance",
        output_dir / f"{experiment_id}_interaction_diversity.png"
    )
    # 6. 2×2 interaction plot for query distance
    plot_interaction_2x2(
        metrics,
        "post_dedup_query_distance.mean_distance",
        "2×2 Factorial: Query Distance (Novelty)",
        "Distance from Original Query",
        output_dir / f"{experiment_id}_interaction_novelty.png"
    )
    # 7. Multi-panel comparison
    plot_metrics_comparison(
        metrics,
        output_dir / f"{experiment_id}_metrics_comparison.png"
    )
    print(f"\nAll visualizations saved to: {output_dir}")
 def main():
    parser = argparse.ArgumentParser(
        description="Generate visualizations for experiment results"
    )
    parser.add_argument(
        "--input",
        type=str,
        required=True,
        help="Input metrics JSON file"
    )
    parser.add_argument(
        "--output-dir",
        type=str,
        help="Output directory for figures (default: results/figures/)"
    )
    args = parser.parse_args()
    input_path = Path(args.input)
    if not input_path.exists():
        input_path = RESULTS_DIR / args.input
        if not input_path.exists():
            print(f"Error: Input file not found: {args.input}")
            sys.exit(1)
    # Load metrics
    with open(input_path, "r", encoding="utf-8") as f:
        metrics = json.load(f)
    # Output directory
    if args.output_dir:
        output_dir = Path(args.output_dir)
    else:
        output_dir = RESULTS_DIR / "figures"
    generate_all_visualizations(metrics, output_dir)
 if __name__ == "__main__":
    main()
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
@@ -155,7 +155,6 @@
      "integrity": "sha512-e7jT4DxYvIDLk1ZHmU/m/mB19rex9sv0c2ftBtjSBv+kVM/902eh0fINUzD7UwLLNR+jU585GxUJ8/EBfAM5fw==",
      "dev": true,
      "license": "MIT",
      "peer": true,
      "dependencies": {
        "@babel/code-frame": "^7.27.1",
        "@babel/generator": "^7.28.5",
@@ -2446,7 +2445,6 @@
      "integrity": "sha512-GNWcUTRBgIRJD5zj+Tq0fKOJ5XZajIiBroOF0yvj2bSU1WvNdYS/dn9UxwsujGW4JX06dnHyjV2y9rRaybH0iQ==",
      "dev": true,
      "license": "MIT",
      "peer": true,
      "dependencies": {
        "undici-types": "~7.16.0"
      }
@@ -2457,7 +2455,6 @@
      "integrity": "sha512-MWtvHrGZLFttgeEj28VXHxpmwYbor/ATPYbBfSFZEIRK0ecCFLl2Qo55z52Hss+UV9CRN7trSeq1zbgx7YDWWg==",
      "devOptional": true,
      "license": "MIT",
      "peer": true,
      "dependencies": {
        "csstype": "^3.2.2"
      }
@@ -2518,7 +2515,6 @@
      "integrity": "sha512-jCzKdm/QK0Kg4V4IK/oMlRZlY+QOcdjv89U2NgKHZk1CYTj82/RVSx1mV/0gqCVMJ/DA+Zf/S4NBWNF8GQ+eqQ==",
      "dev": true,
      "license": "MIT",
      "peer": true,
      "dependencies": {
        "@typescript-eslint/scope-manager": "8.48.0",
        "@typescript-eslint/types": "8.48.0",
@@ -2802,7 +2798,6 @@
      "integrity": "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==",
      "dev": true,
      "license": "MIT",
      "peer": true,
      "bin": {
        "acorn": "bin/acorn"
      },
@@ -2971,7 +2966,6 @@
        }
      ],
      "license": "MIT",
      "peer": true,
      "dependencies": {
        "baseline-browser-mapping": "^2.8.25",
        "caniuse-lite": "^1.0.30001754",
@@ -3442,7 +3436,6 @@
      "resolved": "https://registry.npmjs.org/d3-selection/-/d3-selection-3.0.0.tgz",
      "integrity": "sha512-fmTRWbNMmsmWq6xJV8D19U/gw/bwrHfNXxrIN+HfZgnzqTHp9jOmKMhsTUjXOJnZOdZY9Q28y4yebKzqDKlxlQ==",
      "license": "ISC",
      "peer": true,
      "engines": {
        "node": ">=12"
      }
@@ -3531,8 +3524,7 @@
      "version": "1.11.19",
      "resolved": "https://registry.npmjs.org/dayjs/-/dayjs-1.11.19.tgz",
      "integrity": "sha512-t5EcLVS6QPBNqM2z8fakk/NKel+Xzshgt8FFKAn+qwlD1pzZWxh0nVCrvFK7ZDb6XucZeF9z8C7CBWTRIVApAw==",
-      "license": "MIT",
+      "license": "MIT"
      "peer": true
    },
    "node_modules/debug": {
      "version": "4.4.3",
@@ -3646,7 +3638,6 @@
      "integrity": "sha512-BhHmn2yNOFA9H9JmmIVKJmd288g9hrVRDkdoIgRCRuSySRUHH7r/DI6aAXW9T1WwUuY3DFgrcaqB+deURBLR5g==",
      "dev": true,
      "license": "MIT",
      "peer": true,
      "dependencies": {
        "@eslint-community/eslint-utils": "^4.8.0",
        "@eslint-community/regexpp": "^4.12.1",
@@ -4376,7 +4367,6 @@
      "integrity": "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q==",
      "dev": true,
      "license": "MIT",
      "peer": true,
      "engines": {
        "node": ">=12"
      },
@@ -4503,7 +4493,6 @@
      "resolved": "https://registry.npmjs.org/react/-/react-19.2.0.tgz",
      "integrity": "sha512-tmbWg6W31tQLeB5cdIBOicJDJRR2KzXsV7uSK9iNfLWQ5bIZfxuPEHp7M8wiHyHnn0DD1i7w3Zmin0FtkrwoCQ==",
      "license": "MIT",
      "peer": true,
      "engines": {
        "node": ">=0.10.0"
      }
@@ -4513,7 +4502,6 @@
      "resolved": "https://registry.npmjs.org/react-dom/-/react-dom-19.2.0.tgz",
      "integrity": "sha512-UlbRu4cAiGaIewkPyiRGJk0imDN2T3JjieT6spoL2UeSf5od4n5LB/mQ4ejmxhCFT1tYe8IvaFulzynWovsEFQ==",
      "license": "MIT",
      "peer": true,
      "dependencies": {
        "scheduler": "^0.27.0"
      },
@@ -4767,7 +4755,6 @@
      "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
      "dev": true,
      "license": "Apache-2.0",
      "peer": true,
      "bin": {
        "tsc": "bin/tsc",
        "tsserver": "bin/tsserver"
@@ -4863,7 +4850,6 @@
      "integrity": "sha512-tI2l/nFHC5rLh7+5+o7QjKjSR04ivXDF4jcgV0f/bTQ+OJiITy5S6gaynVsEM+7RqzufMnVbIon6Sr5x1SDYaQ==",
      "dev": true,
      "license": "MIT",
      "peer": true,
      "dependencies": {
        "esbuild": "^0.25.0",
        "fdir": "^6.5.0",
@@ -4985,7 +4971,6 @@
      "integrity": "sha512-AvvthqfqrAhNH9dnfmrfKzX5upOdjUVJYFqNSlkmGf64gRaTzlPwz99IHYnVs28qYAybvAlBV+H7pn0saFY4Ig==",
      "dev": true,
      "license": "MIT",
      "peer": true,
      "funding": {
        "url": "https://github.com/sponsors/colinhacks"
      }
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@@ -1,17 +1,24 @@
-import { useState, useRef, useCallback, useEffect } from 'react';
+import { useState, useRef, useCallback, useEffect, useMemo } from 'react';
-import { ConfigProvider, Layout, theme, Typography, Space, Tabs, Slider, Radio } from 'antd';
+import { ConfigProvider, Layout, theme, Typography, Space, Tabs, Slider, Radio, Switch, Segmented } from 'antd';
-import { ApartmentOutlined, ThunderboltOutlined, FilterOutlined } from '@ant-design/icons';
+import { ApartmentOutlined, ThunderboltOutlined, FilterOutlined, SwapOutlined, FileSearchOutlined, GlobalOutlined } from '@ant-design/icons';
 import { ThemeToggle } from './components/ThemeToggle';
 import { InputPanel } from './components/InputPanel';
 import { TransformationInputPanel } from './components/TransformationInputPanel';
 import { MindmapPanel } from './components/MindmapPanel';
 import { TransformationPanel } from './components/TransformationPanel';
 import { DeduplicationPanel } from './components/DeduplicationPanel';
 import { PatentSearchPanel } from './components/PatentSearchPanel';
 import { DualPathInputPanel } from './components/DualPathInputPanel';
 import { DualPathMindmapPanel } from './components/DualPathMindmapPanel';
 import { CrossoverPanel } from './components/CrossoverPanel';
 import { useAttribute } from './hooks/useAttribute';
 import { useDualPathAttribute } from './hooks/useDualPathAttribute';
 import { getModels } from './services/api';
 import { crossoverPairsToDAGs, type CrossoverDAGResult } from './utils/crossoverToDAG';
 import { DualTransformationPanel } from './components/DualTransformationPanel';
 import type { MindmapDAGRef } from './components/MindmapDAG';
 import type { TransformationDAGRef } from './components/TransformationDAG';
-import type { CategoryMode, ExpertSource, ExpertTransformationDAGResult, DeduplicationMethod } from './types';
+import type { CategoryMode, ExpertSource, ExpertTransformationDAGResult, DeduplicationMethod, ExpertMode, CrossoverPair, PromptLanguage } from './types';
 const { Header, Sider, Content } = Layout;
 const { Title } = Typography;
@@ -24,7 +31,15 @@ interface VisualSettings {
 function App() {
  const [isDark, setIsDark] = useState(true);
  const [activeTab, setActiveTab] = useState<string>('attribute');
  const [dualPathMode, setDualPathMode] = useState(false);
  const [promptLanguage, setPromptLanguage] = useState<PromptLanguage>('zh');
  // Single path hook
  const { loading, progress, error, currentResult, history, analyze, loadFromHistory } = useAttribute();
  // Dual path hook
  const dualPath = useDualPathAttribute();
  const [visualSettings, setVisualSettings] = useState<VisualSettings>({
    nodeSpacing: 32,
    fontSize: 14,
@@ -32,6 +47,21 @@ function App() {
  const mindmapRef = useRef<MindmapDAGRef>(null);
  const transformationRef = useRef<TransformationDAGRef>(null);
  // Dual path expert mode
  const [expertMode, setExpertMode] = useState<ExpertMode>('shared');
  const [selectedCrossoverPairs, setSelectedCrossoverPairs] = useState<CrossoverPair[]>([]);
  // Convert selected crossover pairs to two separate DAGs for dual transformation
  const crossoverDAGs = useMemo((): CrossoverDAGResult | null => {
    if (selectedCrossoverPairs.length === 0) return null;
    if (!dualPath.pathA.result || !dualPath.pathB.result) return null;
    return crossoverPairsToDAGs(
      selectedCrossoverPairs,
      dualPath.pathA.result,
      dualPath.pathB.result
    );
  }, [selectedCrossoverPairs, dualPath.pathA.result, dualPath.pathB.result]);
  // Transformation Agent settings
  const [transformModel, setTransformModel] = useState<string>('');
  const [transformTemperature, setTransformTemperature] = useState<number>(0.95);
@@ -83,9 +113,10 @@ function App() {
    chainCount?: number,
    categoryMode?: CategoryMode,
    customCategories?: string[],
-    suggestedCategoryCount?: number
+    suggestedCategoryCount?: number,
    lang?: PromptLanguage
  ) => {
-    await analyze(query, model, temperature, chainCount, categoryMode, customCategories, suggestedCategoryCount);
+    await analyze(query, model, temperature, chainCount, categoryMode, customCategories, suggestedCategoryCount, lang || promptLanguage);
  };
  const handleResetView = useCallback(() => {
@@ -96,6 +127,30 @@ function App() {
    setShouldStartTransform(true);
  }, []);
  // Dual path analysis handler
  const handleDualPathAnalyze = useCallback(async (
    queryA: string,
    queryB: string,
    options?: {
      model?: string;
      temperature?: number;
      chainCount?: number;
      categoryMode?: CategoryMode;
      customCategories?: string[];
      suggestedCategoryCount?: number;
      lang?: PromptLanguage;
    }
  ) => {
    await dualPath.analyzeParallel(queryA, queryB, { ...options, lang: options?.lang || promptLanguage });
  }, [dualPath, promptLanguage]);
  // Handle mode switch
  const handleModeSwitch = useCallback((checked: boolean) => {
    setDualPathMode(checked);
    // Reset to attribute tab when switching modes
    setActiveTab('attribute');
  }, []);
  return (
    <ConfigProvider
      theme={{
@@ -140,7 +195,31 @@ function App() {
              Novelty Seeking
            </Title>
          </Space>
-          <ThemeToggle isDark={isDark} onToggle={setIsDark} />
+          <Space align="center" size="middle">
            <Space size="small">
              <Typography.Text type="secondary">Single</Typography.Text>
              <Switch
                checked={dualPathMode}
                onChange={handleModeSwitch}
                checkedChildren={<SwapOutlined />}
                unCheckedChildren={<ApartmentOutlined />}
              />
              <Typography.Text type="secondary">Dual</Typography.Text>
            </Space>
            <Space size="small">
              <GlobalOutlined style={{ color: isDark ? '#177ddc' : '#1890ff' }} />
              <Segmented
                size="small"
                value={promptLanguage}
                onChange={(value) => setPromptLanguage(value as PromptLanguage)}
                options={[
                  { label: '中文', value: 'zh' },
                  { label: 'EN', value: 'en' },
                ]}
              />
            </Space>
            <ThemeToggle isDark={isDark} onToggle={setIsDark} />
          </Space>
        </Header>
        <Layout>
          <Content
@@ -155,7 +234,98 @@ function App() {
              onChange={setActiveTab}
              style={{ height: '100%' }}
              tabBarStyle={{ marginBottom: 8 }}
-              items={[
+              items={dualPathMode ? [
                // ===== Dual Path Mode Tabs =====
                {
                  key: 'attribute',
                  label: (
                    <span>
                      <SwapOutlined style={{ marginRight: 8 }} />
                      Dual Path Attribute
                    </span>
                  ),
                  children: (
                    <div style={{ height: 'calc(100vh - 140px)' }}>
                      <DualPathMindmapPanel
                        pathA={dualPath.pathA}
                        pathB={dualPath.pathB}
                        isDark={isDark}
                        visualSettings={visualSettings}
                      />
                    </div>
                  ),
                },
                {
                  key: 'crossover',
                  label: (
                    <span>
                      <SwapOutlined style={{ marginRight: 8 }} />
                      Crossover
                    </span>
                  ),
                  children: (
                    <div style={{ height: 'calc(100vh - 140px)', padding: 16 }}>
                      <CrossoverPanel
                        pathAResult={dualPath.pathA.result}
                        pathBResult={dualPath.pathB.result}
                        isDark={isDark}
                        expertMode={expertMode}
                        onExpertModeChange={setExpertMode}
                        onCrossoverReady={setSelectedCrossoverPairs}
                      />
                    </div>
                  ),
                },
                {
                  key: 'transformation',
                  label: (
                    <span>
                      <ThunderboltOutlined style={{ marginRight: 8 }} />
                      Transformation Agent
                      {crossoverDAGs && (
                        <span style={{ marginLeft: 4, fontSize: 10, opacity: 0.7 }}>
                          (A:{crossoverDAGs.pathA.nodes.length} / B:{crossoverDAGs.pathB.nodes.length})
                        </span>
                      )}
                    </span>
                  ),
                  children: (
                    <div style={{ height: 'calc(100vh - 140px)' }}>
                      <DualTransformationPanel
                        crossoverDAGA={crossoverDAGs?.pathA ?? null}
                        crossoverDAGB={crossoverDAGs?.pathB ?? null}
                        isDark={isDark}
                        model={transformModel}
                        temperature={transformTemperature}
                        expertConfig={expertConfig}
                        expertSource={expertSource}
                        expertLanguage={expertLanguage}
                        lang={promptLanguage}
                        shouldStartTransform={shouldStartTransform}
                        onTransformComplete={() => setShouldStartTransform(false)}
                        onLoadingChange={setTransformLoading}
                      />
                    </div>
                  ),
                },
                {
                  key: 'patent',
                  label: (
                    <span>
                      <FileSearchOutlined style={{ marginRight: 8 }} />
                      Patent Search
                    </span>
                  ),
                  children: (
                    <div style={{ height: 'calc(100vh - 140px)' }}>
                      <PatentSearchPanel
                        isDark={isDark}
                      />
                    </div>
                  ),
                },
              ] : [
                // ===== Single Path Mode Tabs =====
                {
                  key: 'attribute',
                  label: (
@@ -196,6 +366,7 @@ function App() {
                        expertConfig={expertConfig}
                        expertSource={expertSource}
                        expertLanguage={expertLanguage}
                        lang={promptLanguage}
                        shouldStartTransform={shouldStartTransform}
                        onTransformComplete={() => setShouldStartTransform(false)}
                        onLoadingChange={setTransformLoading}
@@ -221,6 +392,24 @@ function App() {
                        onThresholdChange={setDeduplicationThreshold}
                        method={deduplicationMethod}
                        onMethodChange={setDeduplicationMethod}
                        lang={promptLanguage}
                      />
                    </div>
                  ),
                },
                {
                  key: 'patent',
                  label: (
                    <span>
                      <FileSearchOutlined style={{ marginRight: 8 }} />
                      Patent Search
                    </span>
                  ),
                  children: (
                    <div style={{ height: 'calc(100vh - 140px)' }}>
                      <PatentSearchPanel
                        descriptions={transformationResult?.results.flatMap(r => r.descriptions)}
                        isDark={isDark}
                      />
                    </div>
                  ),
@@ -236,24 +425,54 @@ function App() {
              overflow: 'auto',
            }}
          >
-            {activeTab === 'attribute' && (
+            {activeTab === 'attribute' && !dualPathMode && (
              <InputPanel
                loading={loading}
                progress={progress}
                history={history}
                currentResult={currentResult}
                onAnalyze={handleAnalyze}
-                onLoadHistory={loadFromHistory}
+                onLoadHistory={(item, lang) => loadFromHistory(item, lang || promptLanguage)}
                onResetView={handleResetView}
                visualSettings={visualSettings}
                onVisualSettingsChange={setVisualSettings}
                lang={promptLanguage}
              />
            )}
            {activeTab === 'attribute' && dualPathMode && (
              <DualPathInputPanel
                onAnalyze={handleDualPathAnalyze}
                loadingA={dualPath.pathA.loading}
                loadingB={dualPath.pathB.loading}
                progressA={dualPath.pathA.progress}
                progressB={dualPath.pathB.progress}
                availableModels={availableModels}
                lang={promptLanguage}
              />
            )}
            {activeTab === 'crossover' && dualPathMode && (
              <div style={{ padding: 16 }}>
                <Typography.Title level={5} style={{ marginBottom: 16 }}>
                  <SwapOutlined style={{ marginRight: 8 }} />
                  Crossover Settings
                </Typography.Title>
                <Typography.Text type="secondary">
                  Select attribute pairs in the main panel to create crossover combinations.
                  {selectedCrossoverPairs.length > 0 && (
                    <div style={{ marginTop: 8 }}>
                      <Typography.Text strong>
                        {selectedCrossoverPairs.length} pairs selected
                      </Typography.Text>
                    </div>
                  )}
                </Typography.Text>
              </div>
            )}
            {activeTab === 'transformation' && (
              <TransformationInputPanel
                onTransform={handleTransform}
                loading={transformLoading}
-                hasData={!!currentResult}
+                hasData={dualPathMode ? !!crossoverDAGs : !!currentResult}
                isDark={isDark}
                model={transformModel}
                temperature={transformTemperature}
@@ -270,6 +489,37 @@ function App() {
                availableModels={availableModels}
              />
            )}
            {activeTab === 'patent' && (
              <div style={{ padding: 16 }}>
                <Typography.Title level={5} style={{ marginBottom: 16 }}>
                  <FileSearchOutlined style={{ marginRight: 8 }} />
                  Patent Search Info
                </Typography.Title>
                <Typography.Paragraph type="secondary" style={{ fontSize: 12 }}>
                  Search patents using the Lens.org API to find prior art and similar inventions.
                </Typography.Paragraph>
                <Typography.Title level={5} style={{ marginTop: 24, marginBottom: 12 }}>
                  How to Use
                </Typography.Title>
                <Typography.Paragraph style={{ fontSize: 12 }}>
                  <ol style={{ paddingLeft: 16, margin: 0 }}>
                    <li style={{ marginBottom: 8 }}>Click a generated description on the left to load it into the search box</li>
                    <li style={{ marginBottom: 8 }}>Edit the description to refine your search query</li>
                    <li style={{ marginBottom: 8 }}>Click "Search Patents" to find similar patents</li>
                    <li style={{ marginBottom: 8 }}>Results appear on the right - click to view on Lens.org</li>
                  </ol>
                </Typography.Paragraph>
                <Typography.Title level={5} style={{ marginTop: 24, marginBottom: 12 }}>
                  Result Interpretation
                </Typography.Title>
                <Typography.Paragraph type="secondary" style={{ fontSize: 12 }}>
                  <strong>Many results:</strong> Query may overlap with existing prior art - consider making it more specific.
                </Typography.Paragraph>
                <Typography.Paragraph type="secondary" style={{ fontSize: 12 }}>
                  <strong>Few/no results:</strong> Potentially novel concept - good candidate for further exploration.
                </Typography.Paragraph>
              </div>
            )}
            {activeTab === 'deduplication' && (
              <div style={{ padding: 16 }}>
                <Typography.Title level={5} style={{ marginBottom: 16 }}>
--- a/frontend/src/components/CrossoverPanel.tsx
+++ b/frontend/src/components/CrossoverPanel.tsx
@@ -0,0 +1,298 @@
 import { useEffect, useState } from 'react';
 import {
  Empty,
  Card,
  Button,
  Statistic,
  Row,
  Col,
  Typography,
  Space,
  Badge,
  Collapse,
  Checkbox,
  Radio,
 } from 'antd';
 import {
  SwapOutlined,
  CheckCircleOutlined,
  ReloadOutlined,
  UnorderedListOutlined,
  TableOutlined,
 } from '@ant-design/icons';
 import type { AttributeDAG, CrossoverPair, ExpertMode } from '../types';
 import { useAttributeCrossover } from '../hooks/useAttributeCrossover';
 import { CrossoverCard } from './crossover/CrossoverCard';
 import { CrossoverMatrix } from './crossover/CrossoverMatrix';
 import { CrossoverPreview } from './crossover/CrossoverPreview';
 const { Text } = Typography;
 interface CrossoverPanelProps {
  pathAResult: AttributeDAG | null;
  pathBResult: AttributeDAG | null;
  isDark: boolean;
  expertMode: ExpertMode;
  onExpertModeChange: (mode: ExpertMode) => void;
  onCrossoverReady?: (selectedPairs: CrossoverPair[]) => void;
 }
 type ViewMode = 'list' | 'matrix';
 export function CrossoverPanel({
  pathAResult,
  pathBResult,
  isDark,
  expertMode,
  onExpertModeChange,
  onCrossoverReady,
 }: CrossoverPanelProps) {
  const [viewMode, setViewMode] = useState<ViewMode>('list');
  const {
    pairs,
    selectedPairs,
    pairsByType,
    crossTypeStats,
    applyPairs,
    togglePairSelection,
    selectPairsByType,
    selectAll,
    clearPairs,
  } = useAttributeCrossover();
  // Generate pairs when both results are available
  useEffect(() => {
    if (pathAResult && pathBResult) {
      applyPairs(pathAResult, pathBResult);
    } else {
      clearPairs();
    }
  }, [pathAResult, pathBResult, applyPairs, clearPairs]);
  // Notify parent when selection changes
  useEffect(() => {
    onCrossoverReady?.(selectedPairs);
  }, [selectedPairs, onCrossoverReady]);
  // Render when no data
  if (!pathAResult || !pathBResult) {
    return (
      <div style={{
        display: 'flex',
        justifyContent: 'center',
        alignItems: 'center',
        height: '100%',
      }}>
        <Empty
          description={
            <Space direction="vertical" align="center">
              <Text>Complete both Path A and Path B analysis first</Text>
              <Text type="secondary">
                {!pathAResult && !pathBResult
                  ? 'Neither path has been analyzed'
                  : !pathAResult
                    ? 'Path A has not been analyzed'
                    : 'Path B has not been analyzed'}
              </Text>
            </Space>
          }
        />
      </div>
    );
  }
  // Generate cross type labels dynamically
  const getCrossTypeLabel = (crossType: string): string => {
    if (crossType.startsWith('same-')) {
      const category = crossType.replace('same-', '');
      return `Same Category: ${category}`;
    }
    if (crossType.startsWith('cross-')) {
      const parts = crossType.replace('cross-', '').split('-');
      if (parts.length >= 2) {
        return `Cross: ${parts[0]} × ${parts.slice(1).join('-')}`;
      }
    }
    return crossType;
  };
  const renderListView = () => {
    const crossTypes = Object.keys(pairsByType);
    if (crossTypes.length === 0) {
      return <Empty description="No crossover pairs generated" />;
    }
    const collapseItems = crossTypes.map(type => {
      const typePairs = pairsByType[type];
      const stats = crossTypeStats[type];
      const label = getCrossTypeLabel(type);
      return {
        key: type,
        label: (
          <div style={{ display: 'flex', alignItems: 'center', gap: 8 }}>
            <Checkbox
              checked={stats.selected === stats.total}
              indeterminate={stats.selected > 0 && stats.selected < stats.total}
              onClick={(e) => e.stopPropagation()}
              onChange={(e) => selectPairsByType(type, e.target.checked)}
            />
            <Text>{label}</Text>
            <Badge
              count={`${stats.selected}/${stats.total}`}
              style={{
                backgroundColor: stats.selected > 0 ? '#52c41a' : '#d9d9d9',
              }}
            />
          </div>
        ),
        children: (
          <div style={{
            display: 'grid',
            gridTemplateColumns: 'repeat(auto-fill, minmax(280px, 1fr))',
            gap: 8,
          }}>
            {typePairs.map(pair => (
              <CrossoverCard
                key={pair.id}
                pair={pair}
                onToggle={togglePairSelection}
                isDark={isDark}
              />
            ))}
          </div>
        ),
      };
    });
    return (
      <Collapse
        items={collapseItems}
        defaultActiveKey={crossTypes.filter(t => t.startsWith('same-'))}
      />
    );
  };
  const renderMatrixView = () => {
    return (
      <CrossoverMatrix
        dagA={pathAResult}
        dagB={pathBResult}
        pairs={pairs}
        onTogglePair={togglePairSelection}
        isDark={isDark}
      />
    );
  };
  return (
    <div style={{ height: '100%', display: 'flex', flexDirection: 'column' }}>
      {/* Statistics Header */}
      <Card size="small" style={{ marginBottom: 16 }}>
        <Row gutter={16}>
          <Col span={6}>
            <Statistic
              title="Total Pairs"
              value={pairs.length}
              prefix={<SwapOutlined />}
            />
          </Col>
          <Col span={6}>
            <Statistic
              title="Selected"
              value={selectedPairs.length}
              prefix={<CheckCircleOutlined />}
              valueStyle={{ color: '#52c41a' }}
            />
          </Col>
          <Col span={6}>
            <Statistic
              title="Path A Attrs"
              value={pathAResult.nodes.length}
            />
          </Col>
          <Col span={6}>
            <Statistic
              title="Path B Attrs"
              value={pathBResult.nodes.length}
            />
          </Col>
        </Row>
      </Card>
      {/* Selection Preview */}
      <CrossoverPreview
        selectedPairs={selectedPairs}
        dagA={pathAResult}
        dagB={pathBResult}
        isDark={isDark}
      />
      {/* Expert Mode Selection */}
      <Card size="small" style={{ marginBottom: 16 }}>
        <Space direction="vertical" style={{ width: '100%' }}>
          <Text strong>Expert Team Mode</Text>
          <Radio.Group
            value={expertMode}
            onChange={(e) => onExpertModeChange(e.target.value)}
            buttonStyle="solid"
          >
            <Radio.Button value="shared">
              Shared Experts
            </Radio.Button>
            <Radio.Button value="independent">
              Independent Experts
            </Radio.Button>
          </Radio.Group>
          <Text type="secondary" style={{ fontSize: 12 }}>
            {expertMode === 'shared'
              ? 'Both paths use the same expert team for crossover transformation'
              : 'Each path uses its own expert team, combined for crossover'}
          </Text>
        </Space>
      </Card>
      {/* Actions */}
      <div style={{ marginBottom: 16, display: 'flex', gap: 8 }}>
        <Button
          icon={<CheckCircleOutlined />}
          onClick={() => selectAll(true)}
        >
          Select All
        </Button>
        <Button
          onClick={() => selectAll(false)}
        >
          Deselect All
        </Button>
        <Button
          icon={<ReloadOutlined />}
          onClick={() => applyPairs(pathAResult, pathBResult)}
        >
          Regenerate
        </Button>
        <div style={{ flex: 1 }} />
        <Radio.Group
          value={viewMode}
          onChange={(e) => setViewMode(e.target.value)}
          buttonStyle="solid"
          size="small"
        >
          <Radio.Button value="list">
            <UnorderedListOutlined /> List
          </Radio.Button>
          <Radio.Button value="matrix">
            <TableOutlined /> Matrix
          </Radio.Button>
        </Radio.Group>
      </div>
      {/* Content */}
      <div style={{ flex: 1, overflow: 'auto' }}>
        {viewMode === 'list' ? renderListView() : renderMatrixView()}
      </div>
    </div>
  );
 }
--- a/frontend/src/components/DeduplicationPanel.tsx
+++ b/frontend/src/components/DeduplicationPanel.tsx
@@ -26,6 +26,7 @@ import type {
  ExpertTransformationDAGResult,
  ExpertTransformationDescription,
  DeduplicationMethod,
  PromptLanguage,
 } from '../types';
 const { Title, Text } = Typography;
@@ -37,6 +38,7 @@ interface DeduplicationPanelProps {
  onThresholdChange: (value: number) => void;
  method: DeduplicationMethod;
  onMethodChange?: (method: DeduplicationMethod) => void;  // Optional, handled in App.tsx sidebar
  lang?: PromptLanguage;
 }
 /**
@@ -48,6 +50,7 @@ export const DeduplicationPanel: React.FC<DeduplicationPanelProps> = ({
  threshold,
  onThresholdChange,
  method,
  lang = 'zh',
  // onMethodChange is handled in App.tsx sidebar
 }) => {
  const { loading, result, error, progress, deduplicate, clearResult } = useDeduplication();
@@ -70,7 +73,7 @@ export const DeduplicationPanel: React.FC<DeduplicationPanelProps> = ({
  const handleDeduplicate = () => {
    if (allDescriptions.length > 0) {
-      deduplicate(allDescriptions, threshold, method);
+      deduplicate(allDescriptions, threshold, method, lang);
    }
  };
--- a/frontend/src/components/DualPathInputPanel.tsx
+++ b/frontend/src/components/DualPathInputPanel.tsx
@@ -0,0 +1,312 @@
 import { useState, useEffect } from 'react';
 import {
  Input,
  Button,
  Select,
  Typography,
  Space,
  message,
  Slider,
  Collapse,
  Progress,
  Card,
  Alert,
  Tag,
  Divider,
 } from 'antd';
 import {
  SearchOutlined,
  LoadingOutlined,
  SwapOutlined,
 } from '@ant-design/icons';
 import type { CategoryMode, DAGProgress, PromptLanguage } from '../types';
 import { getModels } from '../services/api';
 import { CategorySelector } from './CategorySelector';
 const { TextArea } = Input;
 const { Text } = Typography;
 interface DualPathInputPanelProps {
  onAnalyze: (queryA: string, queryB: string, options?: {
    model?: string;
    temperature?: number;
    chainCount?: number;
    categoryMode?: CategoryMode;
    customCategories?: string[];
    suggestedCategoryCount?: number;
    lang?: PromptLanguage;
  }) => Promise<void>;
  loadingA: boolean;
  loadingB: boolean;
  progressA: DAGProgress;
  progressB: DAGProgress;
  availableModels?: string[];
  lang?: PromptLanguage;
 }
 export function DualPathInputPanel({
  onAnalyze,
  loadingA,
  loadingB,
  progressA,
  progressB,
  availableModels: propModels,
  lang = 'zh',
 }: DualPathInputPanelProps) {
  const [queryA, setQueryA] = useState('');
  const [queryB, setQueryB] = useState('');
  const [models, setModels] = useState<string[]>(propModels || []);
  const [selectedModel, setSelectedModel] = useState<string | undefined>();
  const [loadingModels, setLoadingModels] = useState(false);
  const [temperature, setTemperature] = useState(0.7);
  const [chainCount, setChainCount] = useState(5);
  // Category settings
  const [categoryMode, setCategoryMode] = useState<CategoryMode>('dynamic_auto' as CategoryMode);
  const [customCategories, setCustomCategories] = useState<string[]>([]);
  const [suggestedCategoryCount, setSuggestedCategoryCount] = useState(3);
  const isLoading = loadingA || loadingB;
  useEffect(() => {
    if (propModels && propModels.length > 0) {
      setModels(propModels);
      if (!selectedModel) {
        const defaultModel = propModels.find((m) => m.includes('qwen3')) || propModels[0];
        setSelectedModel(defaultModel);
      }
      return;
    }
    async function fetchModels() {
      setLoadingModels(true);
      try {
        const response = await getModels();
        setModels(response.models);
        if (response.models.length > 0 && !selectedModel) {
          const defaultModel = response.models.find((m) => m.includes('qwen3')) || response.models[0];
          setSelectedModel(defaultModel);
        }
      } catch {
        message.error('Failed to fetch models');
      } finally {
        setLoadingModels(false);
      }
    }
    fetchModels();
  }, [propModels]);
  const handleAnalyze = async () => {
    if (!queryA.trim() || !queryB.trim()) {
      message.warning(lang === 'zh' ? '請輸入兩個路徑的查詢內容' : 'Please enter queries for both paths');
      return;
    }
    try {
      await onAnalyze(queryA.trim(), queryB.trim(), {
        model: selectedModel,
        temperature,
        chainCount,
        categoryMode,
        customCategories: customCategories.length > 0 ? customCategories : undefined,
        suggestedCategoryCount,
        lang,
      });
    } catch {
      message.error(lang === 'zh' ? '分析失敗' : 'Analysis failed');
    }
  };
  const handleSwapQueries = () => {
    const temp = queryA;
    setQueryA(queryB);
    setQueryB(temp);
  };
  const renderProgressIndicator = (label: string, progress: DAGProgress, loading: boolean) => {
    if (progress.step === 'idle' && !loading) return null;
    if (progress.step === 'done') return null;
    const percent = progress.step === 'step0'
      ? 15
      : progress.step === 'step1'
        ? 50
        : progress.step === 'relationships'
          ? 85
          : 100;
    return (
      <div style={{ marginTop: 8 }}>
        <Text type="secondary" style={{ fontSize: 12 }}>{label}: {progress.message}</Text>
        <Progress
          percent={Math.round(percent)}
          size="small"
          status={progress.step === 'error' ? 'exception' : 'active'}
          strokeColor={{ from: '#108ee9', to: '#87d068' }}
        />
      </div>
    );
  };
  const collapseItems = [
    {
      key: 'categories',
      label: 'Category Settings',
      children: (
        <CategorySelector
          mode={categoryMode}
          onModeChange={setCategoryMode}
          customCategories={customCategories}
          onCustomCategoriesChange={setCustomCategories}
          suggestedCount={suggestedCategoryCount}
          onSuggestedCountChange={setSuggestedCategoryCount}
          disabled={isLoading}
        />
      ),
    },
    {
      key: 'llm',
      label: 'LLM Parameters',
      children: (
        <Space direction="vertical" style={{ width: '100%' }} size="middle">
          <div>
            <Text type="secondary" style={{ fontSize: 12 }}>Temperature: {temperature}</Text>
            <Slider
              min={0}
              max={1}
              step={0.1}
              value={temperature}
              onChange={setTemperature}
              marks={{ 0: '0', 0.5: '0.5', 1: '1' }}
              disabled={isLoading}
            />
          </div>
          <div>
            <Text type="secondary" style={{ fontSize: 12 }}>Chain Count: {chainCount}</Text>
            <Slider
              min={1}
              max={10}
              step={1}
              value={chainCount}
              onChange={setChainCount}
              marks={{ 1: '1', 5: '5', 10: '10' }}
              disabled={isLoading}
            />
          </div>
        </Space>
      ),
    },
  ];
  return (
    <div style={{
      display: 'flex',
      flexDirection: 'column',
      height: '100%',
      padding: 16,
      gap: 16,
    }}>
      {/* Dual Path Input Card */}
      <Card
        size="small"
        title={<Text strong>Dual Path Analysis</Text>}
        styles={{ body: { padding: 12 } }}
      >
        <Space direction="vertical" style={{ width: '100%' }} size="middle">
          {/* Model Selection */}
          <Select
            style={{ width: '100%' }}
            value={selectedModel}
            onChange={setSelectedModel}
            loading={loadingModels}
            placeholder="Select a model"
            options={models.map((m) => ({ label: m, value: m }))}
            size="middle"
            disabled={isLoading}
          />
          {/* Path A Input */}
          <div>
            <Tag color="blue" style={{ marginBottom: 4 }}>Path A</Tag>
            <TextArea
              value={queryA}
              onChange={(e) => setQueryA(e.target.value)}
              placeholder="Enter first object (e.g., umbrella)"
              autoSize={{ minRows: 1, maxRows: 2 }}
              disabled={isLoading}
            />
            {renderProgressIndicator('Path A', progressA, loadingA)}
          </div>
          {/* Swap Button */}
          <div style={{ textAlign: 'center' }}>
            <Button
              icon={<SwapOutlined rotate={90} />}
              size="small"
              onClick={handleSwapQueries}
              disabled={isLoading}
            >
              Swap
            </Button>
          </div>
          {/* Path B Input */}
          <div>
            <Tag color="green" style={{ marginBottom: 4 }}>Path B</Tag>
            <TextArea
              value={queryB}
              onChange={(e) => setQueryB(e.target.value)}
              placeholder="Enter second object (e.g., bicycle)"
              autoSize={{ minRows: 1, maxRows: 2 }}
              disabled={isLoading}
            />
            {renderProgressIndicator('Path B', progressB, loadingB)}
          </div>
          {/* Analyze Button */}
          <Button
            type="primary"
            icon={<SearchOutlined />}
            onClick={handleAnalyze}
            loading={isLoading}
            block
            size="large"
            disabled={!queryA.trim() || !queryB.trim()}
          >
            {isLoading ? 'Analyzing...' : 'Analyze Both'}
          </Button>
        </Space>
      </Card>
      {/* Combined Progress Alert */}
      {isLoading && (
        <Alert
          type="info"
          icon={<LoadingOutlined spin />}
          message="Parallel Analysis in Progress"
          description={
            <Space direction="vertical" style={{ width: '100%' }}>
              <div>
                <Tag color="blue">A</Tag> {progressA.message || 'Waiting...'}
              </div>
              <div>
                <Tag color="green">B</Tag> {progressB.message || 'Waiting...'}
              </div>
            </Space>
          }
          showIcon
        />
      )}
      <Divider style={{ margin: '4px 0' }} />
      {/* Settings Collapse */}
      <Collapse
        items={collapseItems}
        defaultActiveKey={[]}
        size="small"
        style={{ background: 'transparent' }}
      />
    </div>
  );
 }
--- a/frontend/src/components/DualPathMindmapPanel.tsx
+++ b/frontend/src/components/DualPathMindmapPanel.tsx
@@ -0,0 +1,191 @@
 import { Empty, Spin, Tag, Typography } from 'antd';
 import type { PathState } from '../types';
 import { MindmapDAG } from './MindmapDAG';
 const { Text } = Typography;
 interface VisualSettings {
  nodeSpacing: number;
  fontSize: number;
 }
 interface DualPathMindmapPanelProps {
  pathA: PathState;
  pathB: PathState;
  isDark: boolean;
  visualSettings: VisualSettings;
 }
 interface SinglePathViewProps {
  path: PathState;
  label: string;
  color: 'blue' | 'green';
  isDark: boolean;
  visualSettings: VisualSettings;
 }
 function SinglePathView({ path, label, color, isDark, visualSettings }: SinglePathViewProps) {
  const { result, loading, error, query, progress } = path;
  // Header with label
  const headerStyle: React.CSSProperties = {
    padding: '6px 12px',
    background: isDark ? '#1f1f1f' : '#fafafa',
    borderBottom: `1px solid ${isDark ? '#303030' : '#f0f0f0'}`,
    display: 'flex',
    alignItems: 'center',
    gap: 8,
    minHeight: 36,
  };
  const contentStyle: React.CSSProperties = {
    flex: 1,
    position: 'relative',
    overflow: 'hidden',
  };
  const renderContent = () => {
    if (loading) {
      return (
        <div style={{
          display: 'flex',
          flexDirection: 'column',
          justifyContent: 'center',
          alignItems: 'center',
          height: '100%',
          gap: 8,
        }}>
          <Spin size="large" />
          <Text type="secondary">{progress.message || 'Analyzing...'}</Text>
        </div>
      );
    }
    if (error) {
      return (
        <div style={{
          display: 'flex',
          justifyContent: 'center',
          alignItems: 'center',
          height: '100%',
        }}>
          <Empty description={error} />
        </div>
      );
    }
    if (!result) {
      return (
        <div style={{
          display: 'flex',
          justifyContent: 'center',
          alignItems: 'center',
          height: '100%',
        }}>
          <Empty description={`Enter a query for ${label}`} />
        </div>
      );
    }
    return (
      <MindmapDAG
        data={result}
        isDark={isDark}
        visualSettings={visualSettings}
      />
    );
  };
  return (
    <div style={{ display: 'flex', flexDirection: 'column', height: '100%' }}>
      <div style={headerStyle}>
        <Tag color={color}>{label}</Tag>
        {result && (
          <Text strong style={{ flex: 1 }}>
            {result.query}
          </Text>
        )}
        {!result && query && (
          <Text type="secondary" style={{ flex: 1 }}>
            {query}
          </Text>
        )}
        {result && (
          <Text type="secondary" style={{ fontSize: 12 }}>
            {result.nodes.length} attributes
          </Text>
        )}
      </div>
      <div style={contentStyle}>
        {renderContent()}
      </div>
    </div>
  );
 }
 export function DualPathMindmapPanel({
  pathA,
  pathB,
  isDark,
  visualSettings,
 }: DualPathMindmapPanelProps) {
  const containerStyle: React.CSSProperties = {
    display: 'flex',
    flexDirection: 'column',
    height: '100%',
    gap: 2,
  };
  const pathContainerStyle: React.CSSProperties = {
    flex: 1,
    minHeight: 0,
    borderRadius: 6,
    overflow: 'hidden',
    border: `1px solid ${isDark ? '#303030' : '#f0f0f0'}`,
  };
  const dividerStyle: React.CSSProperties = {
    height: 4,
    background: isDark ? '#303030' : '#f0f0f0',
    cursor: 'row-resize',
    display: 'flex',
    alignItems: 'center',
    justifyContent: 'center',
  };
  return (
    <div style={containerStyle}>
      {/* Path A - Top Half */}
      <div style={pathContainerStyle}>
        <SinglePathView
          path={pathA}
          label="Path A"
          color="blue"
          isDark={isDark}
          visualSettings={visualSettings}
        />
      </div>
      {/* Divider */}
      <div style={dividerStyle}>
        <div style={{
          width: 40,
          height: 3,
          borderRadius: 2,
          background: isDark ? '#505050' : '#d0d0d0',
        }} />
      </div>
      {/* Path B - Bottom Half */}
      <div style={pathContainerStyle}>
        <SinglePathView
          path={pathB}
          label="Path B"
          color="green"
          isDark={isDark}
          visualSettings={visualSettings}
        />
      </div>
    </div>
  );
 }
--- a/Show More
+++ b/Show More