Compare commits

...

3 Commits

Author SHA1 Message Date
43c025e060 feat: Add experiments framework and novelty-driven agent loop
- Add complete experiments directory with pilot study infrastructure
  - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective)
  - Human assessment tool with React frontend and FastAPI backend
  - AUT flexibility analysis with jump signal detection
  - Result visualization and metrics computation

- Add novelty-driven agent loop module (experiments/novelty_loop/)
  - NoveltyDrivenTaskAgent with expert perspective perturbation
  - Three termination strategies: breakthrough, exhaust, coverage
  - Interactive CLI demo with colored output
  - Embedding-based novelty scoring

- Add DDC knowledge domain classification data (en/zh)
- Add CLAUDE.md project documentation
- Update research report with experiment findings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 10:16:21 +08:00
26a56a2a07 feat: Enhance patent search and update research documentation
- Improve patent search service with expanded functionality
- Update PatentSearchPanel UI component
- Add new research_report.md
- Update experimental protocol, literature review, paper outline, and theoretical framework

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:52:33 +08:00
ec48709755 chore: save local changes 2026-01-05 22:32:08 +08:00
126 changed files with 25270 additions and 275 deletions

101
CLAUDE.md Normal file
View File

@@ -0,0 +1,101 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a creative ideation system that uses LLMs to break "semantic gravity" (the tendency of LLMs to generate ideas clustered around high-probability training distributions). The system analyzes objects through multiple attribute dimensions and transforms them using expert perspectives to generate novel ideas.
## Development Commands
### Starting the Application
```bash
./start.sh # Starts both backend (port 8001) and frontend (port 5173)
./stop.sh # Stops all services
```
### Backend (FastAPI + Python)
```bash
cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8001 --reload
```
### Frontend (React + Vite + TypeScript)
```bash
cd frontend
npm install
npm run dev # Development server
npm run build # TypeScript check + production build
npm run lint # ESLint
```
## Architecture
### Multi-Agent Pipeline
The system uses three interconnected agents that process queries through Server-Sent Events (SSE) for real-time streaming:
```
Query → Attribute Agent → Expert Transformation Agent → Deduplication Agent
Patent Search (optional)
```
**1. Attribute Agent** (`/api/analyze`)
- Analyzes a query (e.g., "bicycle") through configurable category dimensions
- Step 0: Category analysis (5 modes: FIXED_ONLY, FIXED_PLUS_CUSTOM, FIXED_PLUS_DYNAMIC, CUSTOM_ONLY, DYNAMIC_AUTO)
- Step 1: Generate attributes per category
- Step 2: Build DAG relationships between attributes across categories
- Output: `AttributeDAG` with nodes and edges
**2. Expert Transformation Agent** (`/api/expert-transformation/category`)
- Takes attributes and transforms them through diverse expert perspectives
- Step 0: Generate expert team (sources: `llm`, `curated`, `dbpedia`, `wikidata`)
- Step 1: Each expert generates keywords for each attribute
- Step 2: Generate descriptions for each keyword
- Formula: `total_keywords = attributes × expert_count × keywords_per_expert`
**3. Deduplication Agent** (`/api/deduplication/deduplicate`)
- Consolidates similar ideas using embedding similarity or LLM judgment
- Groups duplicates while preserving representative descriptions
### Backend Structure (`backend/app/`)
- `routers/` - FastAPI endpoints with SSE streaming
- `services/` - LLM service (Ollama/OpenAI), embedding service, expert source service
- `prompts/` - Bilingual prompt templates (zh/en) for each agent step
- `data/` - Curated occupation lists for expert sourcing (210 professions)
### Frontend Structure (`frontend/src/`)
- `hooks/` - React hooks matching backend agents (`useAttribute`, `useExpertTransformation`, `useDeduplication`)
- `components/` - UI panels for each stage + DAG visualization (D3.js, @xyflow/react)
- `services/api.ts` - SSE stream parsing and API calls
- `types/index.ts` - TypeScript interfaces mirroring backend schemas
### Key Patterns
**SSE Event Flow**: All agent operations stream progress via SSE events:
```typescript
// Frontend callback pattern
onStep0Start onStep0Complete onStep1Start onStep1Complete onDone
```
**Bilingual Support**: All prompts and UI support `PromptLanguage = 'zh' | 'en'`. Language flows through the entire pipeline from request to response messages.
**Expert Source Fallback**: If external sources (DBpedia, Wikidata) fail, system automatically falls back to LLM-based expert generation.
### Configuration
Backend requires `.env` file:
```
OLLAMA_BASE_URL=http://localhost:11435 # Default Ollama endpoint
DEFAULT_MODEL=qwen3:8b # Default LLM model
OPENAI_API_KEY= # Optional: for OpenAI-compatible APIs
LENS_API_TOKEN= # Optional: for patent search
```
### Dual-Path Mode
The system supports analyzing two queries in parallel (`PathA` and `PathB`) with attribute crossover functionality for comparing and combining ideas across different objects.

View File

@@ -3,10 +3,11 @@ from typing import Optional
class Settings(BaseSettings): class Settings(BaseSettings):
ollama_base_url: str = "http://192.168.30.36:11434" ollama_base_url: str = "http://localhost:11435"
default_model: str = "qwen3:8b" default_model: str = "qwen3:8b"
openai_api_key: Optional[str] = None openai_api_key: Optional[str] = None
openai_base_url: Optional[str] = None openai_base_url: Optional[str] = None
lens_api_token: Optional[str] = None
class Config: class Config:
env_file = ".env" env_file = ".env"

View File

@@ -0,0 +1,120 @@
{
"metadata": {
"source": "ddc",
"language": "en",
"created_at": "2026-01-20",
"total_count": 100,
"description": "Dewey Decimal Classification knowledge domains (10 main classes + 90 divisions)"
},
"domains": [
{"code": "000", "name": "Computer Science, Information & General Works", "level": "class", "parent": null},
{"code": "010", "name": "Bibliographies", "level": "division", "parent": "000"},
{"code": "020", "name": "Library & Information Sciences", "level": "division", "parent": "000"},
{"code": "030", "name": "Encyclopedias & Books of Facts", "level": "division", "parent": "000"},
{"code": "040", "name": "Unassigned", "level": "division", "parent": "000"},
{"code": "050", "name": "Magazines, Journals & Serials", "level": "division", "parent": "000"},
{"code": "060", "name": "Associations, Organizations & Museums", "level": "division", "parent": "000"},
{"code": "070", "name": "News Media, Journalism & Publishing", "level": "division", "parent": "000"},
{"code": "080", "name": "Quotations", "level": "division", "parent": "000"},
{"code": "090", "name": "Manuscripts & Rare Books", "level": "division", "parent": "000"},
{"code": "100", "name": "Philosophy & Psychology", "level": "class", "parent": null},
{"code": "110", "name": "Metaphysics", "level": "division", "parent": "100"},
{"code": "120", "name": "Epistemology", "level": "division", "parent": "100"},
{"code": "130", "name": "Parapsychology & Occultism", "level": "division", "parent": "100"},
{"code": "140", "name": "Philosophical Schools of Thought", "level": "division", "parent": "100"},
{"code": "150", "name": "Psychology", "level": "division", "parent": "100"},
{"code": "160", "name": "Logic", "level": "division", "parent": "100"},
{"code": "170", "name": "Ethics", "level": "division", "parent": "100"},
{"code": "180", "name": "Ancient, Medieval & Eastern Philosophy", "level": "division", "parent": "100"},
{"code": "190", "name": "Modern Western Philosophy", "level": "division", "parent": "100"},
{"code": "200", "name": "Religion", "level": "class", "parent": null},
{"code": "210", "name": "Philosophy & Theory of Religion", "level": "division", "parent": "200"},
{"code": "220", "name": "Bible", "level": "division", "parent": "200"},
{"code": "230", "name": "Christianity & Christian Theology", "level": "division", "parent": "200"},
{"code": "240", "name": "Christian Practice & Observance", "level": "division", "parent": "200"},
{"code": "250", "name": "Christian Orders & Local Churches", "level": "division", "parent": "200"},
{"code": "260", "name": "Christian Social & Ecclesiastical Theology", "level": "division", "parent": "200"},
{"code": "270", "name": "History of Christianity", "level": "division", "parent": "200"},
{"code": "280", "name": "Christian Denominations", "level": "division", "parent": "200"},
{"code": "290", "name": "Other Religions", "level": "division", "parent": "200"},
{"code": "300", "name": "Social Sciences", "level": "class", "parent": null},
{"code": "310", "name": "Statistics", "level": "division", "parent": "300"},
{"code": "320", "name": "Political Science", "level": "division", "parent": "300"},
{"code": "330", "name": "Economics", "level": "division", "parent": "300"},
{"code": "340", "name": "Law", "level": "division", "parent": "300"},
{"code": "350", "name": "Public Administration & Military Science", "level": "division", "parent": "300"},
{"code": "360", "name": "Social Problems & Services", "level": "division", "parent": "300"},
{"code": "370", "name": "Education", "level": "division", "parent": "300"},
{"code": "380", "name": "Commerce, Communications & Transportation", "level": "division", "parent": "300"},
{"code": "390", "name": "Customs, Etiquette & Folklore", "level": "division", "parent": "300"},
{"code": "400", "name": "Language", "level": "class", "parent": null},
{"code": "410", "name": "Linguistics", "level": "division", "parent": "400"},
{"code": "420", "name": "English & Old English Languages", "level": "division", "parent": "400"},
{"code": "430", "name": "German & Related Languages", "level": "division", "parent": "400"},
{"code": "440", "name": "French & Related Languages", "level": "division", "parent": "400"},
{"code": "450", "name": "Italian, Romanian & Related Languages", "level": "division", "parent": "400"},
{"code": "460", "name": "Spanish, Portuguese & Galician", "level": "division", "parent": "400"},
{"code": "470", "name": "Latin & Italic Languages", "level": "division", "parent": "400"},
{"code": "480", "name": "Classical & Modern Greek Languages", "level": "division", "parent": "400"},
{"code": "490", "name": "Other Languages", "level": "division", "parent": "400"},
{"code": "500", "name": "Science", "level": "class", "parent": null},
{"code": "510", "name": "Mathematics", "level": "division", "parent": "500"},
{"code": "520", "name": "Astronomy", "level": "division", "parent": "500"},
{"code": "530", "name": "Physics", "level": "division", "parent": "500"},
{"code": "540", "name": "Chemistry", "level": "division", "parent": "500"},
{"code": "550", "name": "Earth Sciences & Geology", "level": "division", "parent": "500"},
{"code": "560", "name": "Paleontology", "level": "division", "parent": "500"},
{"code": "570", "name": "Biology & Life Sciences", "level": "division", "parent": "500"},
{"code": "580", "name": "Botany", "level": "division", "parent": "500"},
{"code": "590", "name": "Zoology", "level": "division", "parent": "500"},
{"code": "600", "name": "Technology", "level": "class", "parent": null},
{"code": "610", "name": "Medicine & Health", "level": "division", "parent": "600"},
{"code": "620", "name": "Engineering", "level": "division", "parent": "600"},
{"code": "630", "name": "Agriculture", "level": "division", "parent": "600"},
{"code": "640", "name": "Home & Family Management", "level": "division", "parent": "600"},
{"code": "650", "name": "Management & Public Relations", "level": "division", "parent": "600"},
{"code": "660", "name": "Chemical Engineering", "level": "division", "parent": "600"},
{"code": "670", "name": "Manufacturing", "level": "division", "parent": "600"},
{"code": "680", "name": "Manufacture for Specific Uses", "level": "division", "parent": "600"},
{"code": "690", "name": "Construction & Building", "level": "division", "parent": "600"},
{"code": "700", "name": "Arts & Recreation", "level": "class", "parent": null},
{"code": "710", "name": "Landscape & Area Planning", "level": "division", "parent": "700"},
{"code": "720", "name": "Architecture", "level": "division", "parent": "700"},
{"code": "730", "name": "Sculpture, Ceramics & Metalwork", "level": "division", "parent": "700"},
{"code": "740", "name": "Drawing & Decorative Arts", "level": "division", "parent": "700"},
{"code": "750", "name": "Painting", "level": "division", "parent": "700"},
{"code": "760", "name": "Graphic Arts & Printmaking", "level": "division", "parent": "700"},
{"code": "770", "name": "Photography & Computer Art", "level": "division", "parent": "700"},
{"code": "780", "name": "Music", "level": "division", "parent": "700"},
{"code": "790", "name": "Sports, Games & Entertainment", "level": "division", "parent": "700"},
{"code": "800", "name": "Literature", "level": "class", "parent": null},
{"code": "810", "name": "American Literature in English", "level": "division", "parent": "800"},
{"code": "820", "name": "English & Old English Literature", "level": "division", "parent": "800"},
{"code": "830", "name": "German & Related Literature", "level": "division", "parent": "800"},
{"code": "840", "name": "French & Related Literature", "level": "division", "parent": "800"},
{"code": "850", "name": "Italian, Romanian & Related Literature", "level": "division", "parent": "800"},
{"code": "860", "name": "Spanish, Portuguese & Galician Literature", "level": "division", "parent": "800"},
{"code": "870", "name": "Latin & Italic Literature", "level": "division", "parent": "800"},
{"code": "880", "name": "Classical & Modern Greek Literature", "level": "division", "parent": "800"},
{"code": "890", "name": "Other Literatures", "level": "division", "parent": "800"},
{"code": "900", "name": "History & Geography", "level": "class", "parent": null},
{"code": "910", "name": "Geography & Travel", "level": "division", "parent": "900"},
{"code": "920", "name": "Biography & Genealogy", "level": "division", "parent": "900"},
{"code": "930", "name": "History of Ancient World", "level": "division", "parent": "900"},
{"code": "940", "name": "History of Europe", "level": "division", "parent": "900"},
{"code": "950", "name": "History of Asia", "level": "division", "parent": "900"},
{"code": "960", "name": "History of Africa", "level": "division", "parent": "900"},
{"code": "970", "name": "History of North America", "level": "division", "parent": "900"},
{"code": "980", "name": "History of South America", "level": "division", "parent": "900"},
{"code": "990", "name": "History of Other Areas", "level": "division", "parent": "900"}
]
}

View File

@@ -0,0 +1,120 @@
{
"metadata": {
"source": "ddc",
"language": "zh",
"created_at": "2026-01-20",
"total_count": 100,
"description": "杜威十進位圖書分類法知識領域10個大類 + 90個細類"
},
"domains": [
{"code": "000", "name": "電腦科學、資訊與總類", "level": "class", "parent": null},
{"code": "010", "name": "書目學", "level": "division", "parent": "000"},
{"code": "020", "name": "圖書資訊學", "level": "division", "parent": "000"},
{"code": "030", "name": "百科全書與常識書", "level": "division", "parent": "000"},
{"code": "040", "name": "未分配", "level": "division", "parent": "000"},
{"code": "050", "name": "雜誌、期刊與連續出版品", "level": "division", "parent": "000"},
{"code": "060", "name": "協會、組織與博物館", "level": "division", "parent": "000"},
{"code": "070", "name": "新聞媒體、新聞學與出版", "level": "division", "parent": "000"},
{"code": "080", "name": "引用語錄", "level": "division", "parent": "000"},
{"code": "090", "name": "手稿與珍本", "level": "division", "parent": "000"},
{"code": "100", "name": "哲學與心理學", "level": "class", "parent": null},
{"code": "110", "name": "形上學", "level": "division", "parent": "100"},
{"code": "120", "name": "知識論", "level": "division", "parent": "100"},
{"code": "130", "name": "超心理學與神秘學", "level": "division", "parent": "100"},
{"code": "140", "name": "哲學流派", "level": "division", "parent": "100"},
{"code": "150", "name": "心理學", "level": "division", "parent": "100"},
{"code": "160", "name": "邏輯學", "level": "division", "parent": "100"},
{"code": "170", "name": "倫理學", "level": "division", "parent": "100"},
{"code": "180", "name": "古代、中世紀與東方哲學", "level": "division", "parent": "100"},
{"code": "190", "name": "近代西方哲學", "level": "division", "parent": "100"},
{"code": "200", "name": "宗教", "level": "class", "parent": null},
{"code": "210", "name": "宗教哲學與理論", "level": "division", "parent": "200"},
{"code": "220", "name": "聖經", "level": "division", "parent": "200"},
{"code": "230", "name": "基督教與基督神學", "level": "division", "parent": "200"},
{"code": "240", "name": "基督教實踐與禮儀", "level": "division", "parent": "200"},
{"code": "250", "name": "基督教修會與地方教會", "level": "division", "parent": "200"},
{"code": "260", "name": "基督教社會與教會神學", "level": "division", "parent": "200"},
{"code": "270", "name": "基督教歷史", "level": "division", "parent": "200"},
{"code": "280", "name": "基督教教派", "level": "division", "parent": "200"},
{"code": "290", "name": "其他宗教", "level": "division", "parent": "200"},
{"code": "300", "name": "社會科學", "level": "class", "parent": null},
{"code": "310", "name": "統計學", "level": "division", "parent": "300"},
{"code": "320", "name": "政治學", "level": "division", "parent": "300"},
{"code": "330", "name": "經濟學", "level": "division", "parent": "300"},
{"code": "340", "name": "法律", "level": "division", "parent": "300"},
{"code": "350", "name": "公共行政與軍事學", "level": "division", "parent": "300"},
{"code": "360", "name": "社會問題與服務", "level": "division", "parent": "300"},
{"code": "370", "name": "教育", "level": "division", "parent": "300"},
{"code": "380", "name": "商業、通訊與運輸", "level": "division", "parent": "300"},
{"code": "390", "name": "風俗、禮儀與民俗", "level": "division", "parent": "300"},
{"code": "400", "name": "語言", "level": "class", "parent": null},
{"code": "410", "name": "語言學", "level": "division", "parent": "400"},
{"code": "420", "name": "英語與古英語", "level": "division", "parent": "400"},
{"code": "430", "name": "德語及相關語言", "level": "division", "parent": "400"},
{"code": "440", "name": "法語及相關語言", "level": "division", "parent": "400"},
{"code": "450", "name": "義大利語、羅馬尼亞語及相關語言", "level": "division", "parent": "400"},
{"code": "460", "name": "西班牙語、葡萄牙語與加利西亞語", "level": "division", "parent": "400"},
{"code": "470", "name": "拉丁語及義大利語族", "level": "division", "parent": "400"},
{"code": "480", "name": "古典與現代希臘語", "level": "division", "parent": "400"},
{"code": "490", "name": "其他語言", "level": "division", "parent": "400"},
{"code": "500", "name": "自然科學", "level": "class", "parent": null},
{"code": "510", "name": "數學", "level": "division", "parent": "500"},
{"code": "520", "name": "天文學", "level": "division", "parent": "500"},
{"code": "530", "name": "物理學", "level": "division", "parent": "500"},
{"code": "540", "name": "化學", "level": "division", "parent": "500"},
{"code": "550", "name": "地球科學與地質學", "level": "division", "parent": "500"},
{"code": "560", "name": "古生物學", "level": "division", "parent": "500"},
{"code": "570", "name": "生物學與生命科學", "level": "division", "parent": "500"},
{"code": "580", "name": "植物學", "level": "division", "parent": "500"},
{"code": "590", "name": "動物學", "level": "division", "parent": "500"},
{"code": "600", "name": "應用科學與技術", "level": "class", "parent": null},
{"code": "610", "name": "醫學與健康", "level": "division", "parent": "600"},
{"code": "620", "name": "工程學", "level": "division", "parent": "600"},
{"code": "630", "name": "農業", "level": "division", "parent": "600"},
{"code": "640", "name": "家政與家庭管理", "level": "division", "parent": "600"},
{"code": "650", "name": "管理與公共關係", "level": "division", "parent": "600"},
{"code": "660", "name": "化學工程", "level": "division", "parent": "600"},
{"code": "670", "name": "製造業", "level": "division", "parent": "600"},
{"code": "680", "name": "特定用途製造", "level": "division", "parent": "600"},
{"code": "690", "name": "建築與營造", "level": "division", "parent": "600"},
{"code": "700", "name": "藝術與休閒", "level": "class", "parent": null},
{"code": "710", "name": "景觀與區域規劃", "level": "division", "parent": "700"},
{"code": "720", "name": "建築學", "level": "division", "parent": "700"},
{"code": "730", "name": "雕塑、陶瓷與金工", "level": "division", "parent": "700"},
{"code": "740", "name": "繪畫與裝飾藝術", "level": "division", "parent": "700"},
{"code": "750", "name": "繪畫藝術", "level": "division", "parent": "700"},
{"code": "760", "name": "版畫與印刷藝術", "level": "division", "parent": "700"},
{"code": "770", "name": "攝影與電腦藝術", "level": "division", "parent": "700"},
{"code": "780", "name": "音樂", "level": "division", "parent": "700"},
{"code": "790", "name": "運動、遊戲與娛樂", "level": "division", "parent": "700"},
{"code": "800", "name": "文學", "level": "class", "parent": null},
{"code": "810", "name": "美國英語文學", "level": "division", "parent": "800"},
{"code": "820", "name": "英語與古英語文學", "level": "division", "parent": "800"},
{"code": "830", "name": "德語及相關文學", "level": "division", "parent": "800"},
{"code": "840", "name": "法語及相關文學", "level": "division", "parent": "800"},
{"code": "850", "name": "義大利語、羅馬尼亞語及相關文學", "level": "division", "parent": "800"},
{"code": "860", "name": "西班牙語、葡萄牙語與加利西亞語文學", "level": "division", "parent": "800"},
{"code": "870", "name": "拉丁語及義大利語族文學", "level": "division", "parent": "800"},
{"code": "880", "name": "古典與現代希臘文學", "level": "division", "parent": "800"},
{"code": "890", "name": "其他文學", "level": "division", "parent": "800"},
{"code": "900", "name": "歷史與地理", "level": "class", "parent": null},
{"code": "910", "name": "地理與旅遊", "level": "division", "parent": "900"},
{"code": "920", "name": "傳記與家譜", "level": "division", "parent": "900"},
{"code": "930", "name": "古代世界史", "level": "division", "parent": "900"},
{"code": "940", "name": "歐洲史", "level": "division", "parent": "900"},
{"code": "950", "name": "亞洲史", "level": "division", "parent": "900"},
{"code": "960", "name": "非洲史", "level": "division", "parent": "900"},
{"code": "970", "name": "北美洲史", "level": "division", "parent": "900"},
{"code": "980", "name": "南美洲史", "level": "division", "parent": "900"},
{"code": "990", "name": "其他地區史", "level": "division", "parent": "900"}
]
}

View File

@@ -3,10 +3,11 @@ from contextlib import asynccontextmanager
from fastapi import FastAPI from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from .routers import attributes, transformation, expert_transformation, deduplication from .routers import attributes, transformation, expert_transformation, deduplication, patent_search
from .services.llm_service import ollama_provider from .services.llm_service import ollama_provider
from .services.embedding_service import embedding_service from .services.embedding_service import embedding_service
from .services.llm_deduplication_service import llm_deduplication_service from .services.llm_deduplication_service import llm_deduplication_service
from .services.patent_search_service import patent_search_service
@asynccontextmanager @asynccontextmanager
@@ -15,6 +16,7 @@ async def lifespan(app: FastAPI):
await ollama_provider.close() await ollama_provider.close()
await embedding_service.close() await embedding_service.close()
await llm_deduplication_service.close() await llm_deduplication_service.close()
await patent_search_service.close()
app = FastAPI( app = FastAPI(
@@ -36,6 +38,7 @@ app.include_router(attributes.router)
app.include_router(transformation.router) app.include_router(transformation.router)
app.include_router(expert_transformation.router) app.include_router(expert_transformation.router)
app.include_router(deduplication.router) app.include_router(deduplication.router)
app.include_router(patent_search.router)
@app.get("/") @app.get("/")

View File

@@ -1,7 +1,10 @@
from pydantic import BaseModel from pydantic import BaseModel
from typing import Optional, List, Dict from typing import Optional, List, Dict, Literal
from enum import Enum from enum import Enum
# Language type for prompts
LanguageType = Literal["zh", "en"]
class AttributeNode(BaseModel): class AttributeNode(BaseModel):
name: str name: str
@@ -47,16 +50,19 @@ class CausalChain(BaseModel):
class StreamAnalyzeRequest(BaseModel): class StreamAnalyzeRequest(BaseModel):
"""多步驟分析請求(更新為支持動態類別)""" """Multi-step analysis request (updated to support dynamic categories)"""
query: str query: str
model: Optional[str] = None model: Optional[str] = None
temperature: Optional[float] = 0.7 temperature: Optional[float] = 0.7
chain_count: int = 5 # 用戶可設定要生成多少條因果鏈 chain_count: int = 5 # User can set how many causal chains to generate
# 新增:動態類別支持 # Dynamic category support
category_mode: Optional[str] = "dynamic_auto" # CategoryMode enum category_mode: Optional[str] = "dynamic_auto" # CategoryMode enum value
custom_categories: Optional[List[str]] = None custom_categories: Optional[List[str]] = None
suggested_category_count: int = 3 # 建議 LLM 生成的類別數量 suggested_category_count: int = 3 # Suggest LLM to generate this many categories
# Language setting
lang: LanguageType = "zh"
class StreamAnalyzeResponse(BaseModel): class StreamAnalyzeResponse(BaseModel):
@@ -136,13 +142,14 @@ class DAGRelationship(BaseModel):
# ===== Transformation Agent schemas ===== # ===== Transformation Agent schemas =====
class TransformationRequest(BaseModel): class TransformationRequest(BaseModel):
"""Transformation Agent 請求""" """Transformation Agent request"""
query: str # 原始查詢 (e.g., "腳踏車") query: str # Original query (e.g., "bicycle")
category: str # 類別名稱 (e.g., "功能") category: str # Category name (e.g., "Functions")
attributes: List[str] # 該類別的屬性列表 attributes: List[str] # Attribute list for this category
model: Optional[str] = None model: Optional[str] = None
temperature: Optional[float] = 0.7 temperature: Optional[float] = 0.7
keyword_count: int = 3 # 要生成的新關鍵字數量 keyword_count: int = 3 # Number of new keywords to generate
lang: LanguageType = "zh" # Language for prompts
class TransformationDescription(BaseModel): class TransformationDescription(BaseModel):
@@ -215,24 +222,27 @@ class ExpertSource(str, Enum):
class ExpertTransformationRequest(BaseModel): class ExpertTransformationRequest(BaseModel):
"""Expert Transformation Agent 請求""" """Expert Transformation Agent request"""
query: str query: str
category: str category: str
attributes: List[str] attributes: List[str]
# Expert parameters # Expert parameters
expert_count: int = 3 # 專家數量 (2-8) expert_count: int = 3 # Number of experts (2-8)
keywords_per_expert: int = 1 # 每個專家為每個屬性生成幾個關鍵字 (1-3) keywords_per_expert: int = 1 # Keywords per expert per attribute (1-3)
custom_experts: Optional[List[str]] = None # 用戶指定專家 ["藥師", "工程師"] custom_experts: Optional[List[str]] = None # User-specified experts
# Expert source parameters # Expert source parameters
expert_source: ExpertSource = ExpertSource.LLM # 專家來源 expert_source: ExpertSource = ExpertSource.LLM # Expert source
expert_language: str = "en" # 外部來源的語言 (目前只有英文資料) expert_language: str = "en" # Language for external sources
# LLM parameters # LLM parameters
model: Optional[str] = None model: Optional[str] = None
temperature: Optional[float] = 0.7 temperature: Optional[float] = 0.7
# Prompt language
lang: LanguageType = "zh"
# ===== Deduplication Agent schemas ===== # ===== Deduplication Agent schemas =====
@@ -243,11 +253,12 @@ class DeduplicationMethod(str, Enum):
class DeduplicationRequest(BaseModel): class DeduplicationRequest(BaseModel):
"""去重請求""" """Deduplication request"""
descriptions: List[ExpertTransformationDescription] descriptions: List[ExpertTransformationDescription]
method: DeduplicationMethod = DeduplicationMethod.EMBEDDING # 去重方法 method: DeduplicationMethod = DeduplicationMethod.EMBEDDING # Deduplication method
similarity_threshold: float = 0.85 # 餘弦相似度閾值 (0.0-1.0),僅 Embedding 使用 similarity_threshold: float = 0.85 # Cosine similarity threshold (0.0-1.0), only for Embedding
model: Optional[str] = None # Embedding/LLM 模型 model: Optional[str] = None # Embedding/LLM model
lang: LanguageType = "zh" # Prompt language (for LLM method)
class DescriptionGroup(BaseModel): class DescriptionGroup(BaseModel):

View File

@@ -1,21 +1,37 @@
from typing import List, Optional, Dict from typing import List, Optional, Dict
import json import json
from .language_config import (
DEFAULT_CATEGORIES = ["材料", "功能", "用途", "使用族群", "特性"] LanguageType,
DEFAULT_CATEGORIES,
CATEGORY_DESCRIPTIONS = { CATEGORY_DESCRIPTIONS,
"材料": "物件由什麼材料組成", )
"功能": "物件能做什麼",
"用途": "物件在什麼場景使用",
"使用族群": "誰會使用這個物件",
"特性": "物件有什麼特徵",
}
def get_attribute_prompt(query: str, categories: Optional[List[str]] = None) -> str: def get_default_categories(lang: LanguageType = "zh") -> List[str]:
return DEFAULT_CATEGORIES.get(lang, DEFAULT_CATEGORIES["zh"])
def get_category_descriptions(lang: LanguageType = "zh") -> Dict[str, str]:
return CATEGORY_DESCRIPTIONS.get(lang, CATEGORY_DESCRIPTIONS["zh"])
def get_attribute_prompt(
query: str,
categories: Optional[List[str]] = None,
lang: LanguageType = "zh"
) -> str:
"""Generate prompt with causal chain structure.""" """Generate prompt with causal chain structure."""
if lang == "en":
prompt = f"""Analyze the attributes of "{query}" in a causal chain format: Materials→Functions→Usages→User Groups.
prompt = f"""分析「{query}」的屬性,以因果鏈方式呈現:材料→功能→用途→使用族群。 List 3-5 types of materials, each extending into a complete causal chain.
JSON format:
{{"name": "{query}", "children": [{{"name": "Material Name", "category": "Materials", "children": [{{"name": "Function Name", "category": "Functions", "children": [{{"name": "Usage Name", "category": "Usages", "children": [{{"name": "User Group Name", "category": "User Groups"}}]}}]}}]}}]}}
Return JSON only."""
else:
prompt = f"""分析「{query}」的屬性,以因果鏈方式呈現:材料→功能→用途→使用族群。
請列出 3-5 種材料,每種材料延伸出完整因果鏈。 請列出 3-5 種材料,每種材料延伸出完整因果鏈。
@@ -27,9 +43,18 @@ JSON 格式:
return prompt return prompt
def get_step1_attributes_prompt(query: str) -> str: def get_step1_attributes_prompt(query: str, lang: LanguageType = "zh") -> str:
"""Step 1: 生成各類別的屬性列表(平行結構)""" """Step 1: Generate attribute list for each category (parallel structure)"""
return f"""/no_think if lang == "en":
return f"""/no_think
Analyze "{query}" and list attributes for the following four categories. List 3-5 common attributes for each category.
Return JSON only, in the following format:
{{"materials": ["material1", "material2", "material3"], "functions": ["function1", "function2", "function3"], "usages": ["usage1", "usage2", "usage3"], "users": ["user group1", "user group2", "user group3"]}}
Object: {query}"""
else:
return f"""/no_think
分析「{query}」,列出以下四個類別的屬性。每個類別列出 3-5 個常見屬性。 分析「{query}」,列出以下四個類別的屬性。每個類別列出 3-5 個常見屬性。
只回傳 JSON格式如下 只回傳 JSON格式如下
@@ -45,21 +70,48 @@ def get_step2_causal_chain_prompt(
usages: List[str], usages: List[str],
users: List[str], users: List[str],
existing_chains: List[dict], existing_chains: List[dict],
chain_index: int chain_index: int,
lang: LanguageType = "zh"
) -> str: ) -> str:
"""Step 2: 生成單條因果鏈""" """Step 2: Generate a single causal chain"""
existing_chains_text = "" existing_chains_text = ""
if existing_chains:
chains_list = [ if lang == "en":
f"- {c['material']}{c['function']}{c['usage']}{c['user']}" if existing_chains:
for c in existing_chains chains_list = [
] f"- {c['material']}{c['function']}{c['usage']}{c['user']}"
existing_chains_text = f""" for c in existing_chains
]
existing_chains_text = f"""
[Already generated causal chains, do not repeat]
{chr(10).join(chains_list)}
"""
return f"""/no_think
Generate causal chain #{chain_index} for "{query}".
[Available Materials] {', '.join(materials)}
[Available Functions] {', '.join(functions)}
[Available Usages] {', '.join(usages)}
[Available User Groups] {', '.join(users)}
{existing_chains_text}
[Rules]
1. Select one attribute from each category to form a logical causal chain
2. The causal relationship must be logical (materials determine functions, functions determine usages, usages determine user groups)
3. Do not repeat existing causal chains
Return JSON only:
{{"material": "selected material", "function": "selected function", "usage": "selected usage", "user": "selected user group"}}"""
else:
if existing_chains:
chains_list = [
f"- {c['material']}{c['function']}{c['usage']}{c['user']}"
for c in existing_chains
]
existing_chains_text = f"""
【已生成的因果鏈,請勿重複】 【已生成的因果鏈,請勿重複】
{chr(10).join(chains_list)} {chr(10).join(chains_list)}
""" """
return f"""/no_think
return f"""/no_think
為「{query}」生成第 {chain_index} 條因果鏈。 為「{query}」生成第 {chain_index} 條因果鏈。
【可選材料】{', '.join(materials)} 【可選材料】{', '.join(materials)}
@@ -76,19 +128,52 @@ def get_step2_causal_chain_prompt(
{{"material": "選擇的材料", "function": "選擇的功能", "usage": "選擇的用途", "user": "選擇的族群"}}""" {{"material": "選擇的材料", "function": "選擇的功能", "usage": "選擇的用途", "user": "選擇的族群"}}"""
def get_flat_attribute_prompt(query: str, categories: Optional[List[str]] = None) -> str: def get_flat_attribute_prompt(
query: str,
categories: Optional[List[str]] = None,
lang: LanguageType = "zh"
) -> str:
"""Generate prompt with flat/parallel categories (original design).""" """Generate prompt with flat/parallel categories (original design)."""
cats = categories if categories else DEFAULT_CATEGORIES cats = categories if categories else get_default_categories(lang)
cat_descs = get_category_descriptions(lang)
# Build category list # Build category list
category_lines = [] category_lines = []
for cat in cats: for cat in cats:
desc = CATEGORY_DESCRIPTIONS.get(cat, f"{cat}的相關屬性") desc = cat_descs.get(cat, f"Related attributes of {cat}" if lang == "en" else f"{cat}的相關屬性")
category_lines.append(f"- {cat}{desc}") category_lines.append(f"- {cat}: {desc}")
categories_text = "\n".join(category_lines) categories_text = "\n".join(category_lines)
prompt = f"""/no_think if lang == "en":
prompt = f"""/no_think
You are an object attribute analysis expert. Please break down the user's input object into the following attribute categories.
[Required Categories]
{categories_text}
[Important] The return format must be valid JSON, and each node must have a "name" field:
```json
{{
"name": "Object Name",
"children": [
{{
"name": "Category Name",
"children": [
{{"name": "Attribute 1"}},
{{"name": "Attribute 2"}}
]
}}
]
}}
```
Return JSON only, no other text.
User input: {query}"""
else:
prompt = f"""/no_think
你是一個物件屬性分析專家。請將用戶輸入的物件拆解成以下屬性類別。 你是一個物件屬性分析專家。請將用戶輸入的物件拆解成以下屬性類別。
【必須包含的類別】 【必須包含的類別】
@@ -123,14 +208,42 @@ def get_flat_attribute_prompt(query: str, categories: Optional[List[str]] = None
def get_step0_category_analysis_prompt( def get_step0_category_analysis_prompt(
query: str, query: str,
suggested_count: int = 3, suggested_count: int = 3,
exclude_categories: List[str] | None = None exclude_categories: List[str] | None = None,
lang: LanguageType = "zh"
) -> str: ) -> str:
"""Step 0: LLM 分析建議類別""" """Step 0: LLM analyzes and suggests categories"""
exclude_text = ""
if exclude_categories:
exclude_text = f"\n【禁止使用的類別】{', '.join(exclude_categories)}(這些已經是固定類別,不要重複建議)\n"
return f"""/no_think if lang == "en":
exclude_text = ""
if exclude_categories:
exclude_text = f"\n[Forbidden Categories] {', '.join(exclude_categories)} (These are already fixed categories, do not suggest duplicates)\n"
return f"""/no_think
Analyze "{query}" and suggest {suggested_count} most suitable attribute categories to describe it.
[Common Category References] Characteristics, Shape, Color, Size, Brand, Price Range, Weight, Style, Occasion, Season, Technical Specifications
{exclude_text}
[Important]
1. Choose categories that best describe the essence of this object
2. Categories should have logical relationships
3. Do not choose overly abstract or duplicate categories
4. Must suggest creative categories different from the reference list
Return JSON only:
{{
"categories": [
{{"name": "Category1", "description": "Description1", "order": 0}},
{{"name": "Category2", "description": "Description2", "order": 1}}
]
}}
Object: {query}"""
else:
exclude_text = ""
if exclude_categories:
exclude_text = f"\n【禁止使用的類別】{', '.join(exclude_categories)}(這些已經是固定類別,不要重複建議)\n"
return f"""/no_think
分析「{query}」,建議 {suggested_count} 個最適合的屬性類別來描述它。 分析「{query}」,建議 {suggested_count} 個最適合的屬性類別來描述它。
【常見類別參考】特性、形狀、顏色、尺寸、品牌、價格區間、重量、風格、場合、季節、技術規格 【常見類別參考】特性、形狀、顏色、尺寸、品牌、價格區間、重量、風格、場合、季節、技術規格
@@ -154,21 +267,35 @@ def get_step0_category_analysis_prompt(
def get_step1_dynamic_attributes_prompt( def get_step1_dynamic_attributes_prompt(
query: str, query: str,
categories: List # List[CategoryDefinition] categories: List, # List[CategoryDefinition]
lang: LanguageType = "zh"
) -> str: ) -> str:
"""動態 Step 1 - 根據類別列表生成屬性""" """Dynamic Step 1 - Generate attributes based on category list"""
# 按 order 排序並構建描述 # Sort by order and build description
sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0)) sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0))
category_desc = "\n".join([ category_desc = "\n".join([
f"- {cat.name if hasattr(cat, 'name') else cat['name']}: {cat.description if hasattr(cat, 'description') else cat.get('description', '相關屬性')}" f"- {cat.name if hasattr(cat, 'name') else cat['name']}: {cat.description if hasattr(cat, 'description') else cat.get('description', 'Related attributes' if lang == 'en' else '相關屬性')}"
for cat in sorted_cats for cat in sorted_cats
]) ])
category_keys = [cat.name if hasattr(cat, 'name') else cat['name'] for cat in sorted_cats] category_keys = [cat.name if hasattr(cat, 'name') else cat['name'] for cat in sorted_cats]
json_template = {cat: ["屬性1", "屬性2", "屬性3"] for cat in category_keys}
return f"""/no_think if lang == "en":
json_template = {cat: ["attribute1", "attribute2", "attribute3"] for cat in category_keys}
return f"""/no_think
Analyze "{query}" and list attributes for the following categories. List 3-5 common attributes for each category.
[Category List]
{category_desc}
Return JSON only:
{json.dumps(json_template, ensure_ascii=False, indent=2)}
Object: {query}"""
else:
json_template = {cat: ["屬性1", "屬性2", "屬性3"] for cat in category_keys}
return f"""/no_think
分析「{query}」,列出以下類別的屬性。每個類別列出 3-5 個常見屬性。 分析「{query}」,列出以下類別的屬性。每個類別列出 3-5 個常見屬性。
【類別列表】 【類別列表】
@@ -185,30 +312,59 @@ def get_step2_dynamic_causal_chain_prompt(
categories: List, # List[CategoryDefinition] categories: List, # List[CategoryDefinition]
attributes_by_category: Dict[str, List[str]], attributes_by_category: Dict[str, List[str]],
existing_chains: List[Dict[str, str]], existing_chains: List[Dict[str, str]],
chain_index: int chain_index: int,
lang: LanguageType = "zh"
) -> str: ) -> str:
"""動態 Step 2 - 生成動態類別的因果鏈""" """Dynamic Step 2 - Generate causal chains for dynamic categories"""
sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0)) sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0))
# 構建可選屬性 # Build available attributes
available_attrs = "\n".join([ available_attrs = "\n".join([
f"{cat.name if hasattr(cat, 'name') else cat['name']}{', '.join(attributes_by_category.get(cat.name if hasattr(cat, 'name') else cat['name'], []))}" f"[{cat.name if hasattr(cat, 'name') else cat['name']}] {', '.join(attributes_by_category.get(cat.name if hasattr(cat, 'name') else cat['name'], []))}"
for cat in sorted_cats for cat in sorted_cats
]) ])
# 已生成的因果鏈 if lang == "en":
existing_text = "" # Already generated causal chains
if existing_chains: existing_text = ""
chains_list = [ if existing_chains:
"".join([chain.get(cat.name if hasattr(cat, 'name') else cat['name'], '?') for cat in sorted_cats]) chains_list = [
for chain in existing_chains "".join([chain.get(cat.name if hasattr(cat, 'name') else cat['name'], '?') for cat in sorted_cats])
] for chain in existing_chains
existing_text = f"\n【已生成,請勿重複】\n" + "\n".join([f"- {c}" for c in chains_list]) ]
existing_text = "\n[Already generated, do not repeat]\n" + "\n".join([f"- {c}" for c in chains_list])
# JSON 模板 # JSON template
json_template = {cat.name if hasattr(cat, 'name') else cat['name']: f"選擇的{cat.name if hasattr(cat, 'name') else cat['name']}" for cat in sorted_cats} json_template = {cat.name if hasattr(cat, 'name') else cat['name']: f"selected {cat.name if hasattr(cat, 'name') else cat['name']}" for cat in sorted_cats}
return f"""/no_think return f"""/no_think
Generate causal chain #{chain_index} for "{query}".
[Available Attributes]
{available_attrs}
{existing_text}
[Rules]
1. Select one attribute from each category
2. Causal relationships must be logical
3. Do not repeat
Return JSON only:
{json.dumps(json_template, ensure_ascii=False, indent=2)}"""
else:
# 已生成的因果鏈
existing_text = ""
if existing_chains:
chains_list = [
"".join([chain.get(cat.name if hasattr(cat, 'name') else cat['name'], '?') for cat in sorted_cats])
for chain in existing_chains
]
existing_text = "\n【已生成,請勿重複】\n" + "\n".join([f"- {c}" for c in chains_list])
# JSON 模板
json_template = {cat.name if hasattr(cat, 'name') else cat['name']: f"選擇的{cat.name if hasattr(cat, 'name') else cat['name']}" for cat in sorted_cats}
return f"""/no_think
為「{query}」生成第 {chain_index} 條因果鏈。 為「{query}」生成第 {chain_index} 條因果鏈。
【可選屬性】 【可選屬性】
@@ -230,20 +386,46 @@ def get_step2_dag_relationships_prompt(
query: str, query: str,
categories: List, # List[CategoryDefinition] categories: List, # List[CategoryDefinition]
attributes_by_category: Dict[str, List[str]], attributes_by_category: Dict[str, List[str]],
lang: LanguageType = "zh"
) -> str: ) -> str:
"""生成相鄰類別之間的自然關係""" """Generate natural relationships between adjacent categories"""
sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0)) sorted_cats = sorted(categories, key=lambda x: x.order if hasattr(x, 'order') else x.get('order', 0))
# Build attribute listing # Build attribute listing
attr_listing = "\n".join([ attr_listing = "\n".join([
f"{cat.name if hasattr(cat, 'name') else cat['name']}{', '.join(attributes_by_category.get(cat.name if hasattr(cat, 'name') else cat['name'], []))}" f"[{cat.name if hasattr(cat, 'name') else cat['name']}] {', '.join(attributes_by_category.get(cat.name if hasattr(cat, 'name') else cat['name'], []))}"
for cat in sorted_cats for cat in sorted_cats
]) ])
# Build direction hints # Build direction hints
direction_hints = "".join([cat.name if hasattr(cat, 'name') else cat['name'] for cat in sorted_cats]) direction_hints = "".join([cat.name if hasattr(cat, 'name') else cat['name'] for cat in sorted_cats])
return f"""/no_think if lang == "en":
return f"""/no_think
Analyze the attribute relationships of "{query}".
{attr_listing}
[Relationship Direction] {direction_hints}
[Rules]
1. Only establish relationships between adjacent categories (e.g., Materials→Functions, Functions→Usages)
2. Only output pairs that have true causal or associative relationships
3. An attribute can connect to multiple downstream attributes, or none at all
4. Not every attribute needs to have connections
5. Relationships should be reasonable and meaningful
Return JSON:
{{
"relationships": [
{{"source_category": "CategoryA", "source": "attribute name", "target_category": "CategoryB", "target": "attribute name"}},
...
]
}}
Return JSON only."""
else:
return f"""/no_think
分析「{query}」的屬性關係。 分析「{query}」的屬性關係。
{attr_listing} {attr_listing}

View File

@@ -1,34 +1,68 @@
"""Expert Transformation Agent 提示詞模組""" """Expert Transformation Agent prompts module - Bilingual support"""
from typing import List, Optional from typing import List, Optional
from .language_config import LanguageType
def get_expert_generation_prompt( def get_expert_generation_prompt(
query: str, query: str,
categories: List[str], categories: List[str],
expert_count: int, expert_count: int,
custom_experts: Optional[List[str]] = None custom_experts: Optional[List[str]] = None,
lang: LanguageType = "zh"
) -> str: ) -> str:
"""Step 0: 生成專家團隊(不依賴主題,純隨機多元)""" """Step 0: Generate expert team (not dependent on topic, purely random and diverse)"""
import time import time
import random import random
custom_text = "" # Add timestamp and random number for diversity
if custom_experts and len(custom_experts) > 0:
custom_text = f"(已指定:{', '.join(custom_experts[:expert_count])}"
# 加入時間戳和隨機數來增加多樣性
seed = int(time.time() * 1000) % 10000 seed = int(time.time() * 1000) % 10000
diversity_hints = [
"冷門、非主流、跨領域",
"罕見職業、新興領域、邊緣學科",
"非傳統、創新、小眾專業",
"未來趨向、實驗性、非常規",
"跨文化、混合領域、獨特視角"
]
hint = random.choice(diversity_hints)
return f"""/no_think if lang == "en":
custom_text = ""
if custom_experts and len(custom_experts) > 0:
custom_text = f" (Specified: {', '.join(custom_experts[:expert_count])})"
diversity_hints = [
"obscure, non-mainstream, cross-disciplinary",
"rare occupations, emerging fields, fringe disciplines",
"unconventional, innovative, niche specialties",
"future-oriented, experimental, non-traditional",
"cross-cultural, hybrid fields, unique perspectives"
]
hint = random.choice(diversity_hints)
return f"""/no_think
Randomly assemble a team of {expert_count} experts from completely different fields{custom_text}.
[Innovation Requirements] (Random seed: {seed})
- Prioritize {hint} experts
- Avoid common professions (such as doctors, engineers, teachers, lawyers, etc.)
- Each expert must be from a completely unrelated field
- The rarer and more innovative, the better
Return JSON:
{{"experts": [{{"id": "expert-0", "name": "profession", "domain": "field", "perspective": "viewpoint"}}, ...]}}
Rules:
- id should be expert-0 to expert-{expert_count - 1}
- name is the profession name (not a person's name), 2-5 words
- domain should be specific and unique, no duplicate types"""
else:
custom_text = ""
if custom_experts and len(custom_experts) > 0:
custom_text = f"(已指定:{', '.join(custom_experts[:expert_count])}"
diversity_hints = [
"冷門、非主流、跨領域",
"罕見職業、新興領域、邊緣學科",
"非傳統、創新、小眾專業",
"未來趨向、實驗性、非常規",
"跨文化、混合領域、獨特視角"
]
hint = random.choice(diversity_hints)
return f"""/no_think
隨機組建 {expert_count} 個來自完全不同領域的專家團隊{custom_text} 隨機組建 {expert_count} 個來自完全不同領域的專家團隊{custom_text}
【創新要求】(隨機種子:{seed} 【創新要求】(隨機種子:{seed}
@@ -50,13 +84,39 @@ def get_expert_keyword_generation_prompt(
category: str, category: str,
attribute: str, attribute: str,
experts: List[dict], # List[ExpertProfile] experts: List[dict], # List[ExpertProfile]
keywords_per_expert: int = 1 keywords_per_expert: int = 1,
lang: LanguageType = "zh"
) -> str: ) -> str:
"""Step 1: 專家視角關鍵字生成""" """Step 1: Expert perspective keyword generation"""
# 建立專家列表,格式更清晰 # Build expert list in clearer format
experts_list = "\n".join([f"- {exp['id']}: {exp['name']}" for exp in experts]) experts_list = "\n".join([f"- {exp['id']}: {exp['name']}" for exp in experts])
return f"""/no_think if lang == "en":
return f"""/no_think
You need to play the role of the following experts to generate innovative keywords for an attribute:
[Expert List]
{experts_list}
[Task]
Attribute: "{attribute}" (Category: {category})
For each expert, please:
1. First understand the professional background, knowledge domain, and work content of that profession
2. Think about "{attribute}" from that profession's unique perspective
3. Generate {keywords_per_expert} innovative keyword(s) related to that specialty (2-6 words)
Keywords must reflect that expert's professional thinking style, for example:
- Accountant viewing "movement""cash flow", "cost-benefit"
- Architect viewing "movement""circulation design", "spatial flow"
- Psychologist viewing "movement""behavioral motivation", "emotional transition"
Return JSON:
{{"keywords": [{{"keyword": "term", "expert_id": "expert-X", "expert_name": "name"}}, ...]}}
Total of {len(experts) * keywords_per_expert} keywords needed, each keyword must be clearly related to the corresponding expert's professional field."""
else:
return f"""/no_think
你需要扮演以下專家,為屬性生成創新關鍵字: 你需要扮演以下專家,為屬性生成創新關鍵字:
【專家名單】 【專家名單】
@@ -86,13 +146,29 @@ def get_single_description_prompt(
keyword: str, keyword: str,
expert_id: str, expert_id: str,
expert_name: str, expert_name: str,
expert_domain: str expert_domain: str,
lang: LanguageType = "zh"
) -> str: ) -> str:
"""Step 2: 為單一關鍵字生成描述""" """Step 2: Generate description for a single keyword"""
# 如果 domain 是通用的,就只用職業名稱 if lang == "en":
domain_text = f"{expert_domain}領域)" if expert_domain and expert_domain != "Professional Field" else "" # If domain is generic, just use profession name
domain_text = f" ({expert_domain} field)" if expert_domain and expert_domain != "Professional Field" else ""
return f"""/no_think return f"""/no_think
You are a {expert_name}{domain_text}.
Task: Generate an innovative application description for "{query}".
Keyword: {keyword}
From your professional perspective, explain how to apply the concept of "{keyword}" to "{query}". The description should be specific, creative, 15-30 words.
Return JSON only, no other text:
{{"description": "your innovative application description"}}"""
else:
# 如果 domain 是通用的,就只用職業名稱
domain_text = f"{expert_domain}領域)" if expert_domain and expert_domain != "Professional Field" else ""
return f"""/no_think
你是一位{expert_name}{domain_text} 你是一位{expert_name}{domain_text}
任務:為「{query}」生成一段創新應用描述。 任務:為「{query}」生成一段創新應用描述。

View File

@@ -0,0 +1,51 @@
"""Language configuration for prompts"""
from enum import Enum
from typing import Literal
class Language(str, Enum):
CHINESE = "zh"
ENGLISH = "en"
LanguageType = Literal["zh", "en"]
# Default categories for each language
DEFAULT_CATEGORIES = {
"zh": ["材料", "功能", "用途", "使用族群", "特性"],
"en": ["Materials", "Functions", "Usages", "User Groups", "Characteristics"],
}
CATEGORY_DESCRIPTIONS = {
"zh": {
"材料": "物件由什麼材料組成",
"功能": "物件能做什麼",
"用途": "物件在什麼場景使用",
"使用族群": "誰會使用這個物件",
"特性": "物件有什麼特徵",
},
"en": {
"Materials": "What materials the object is made of",
"Functions": "What the object can do",
"Usages": "In what scenarios the object is used",
"User Groups": "Who uses this object",
"Characteristics": "What features the object has",
},
}
# Category name mappings between languages
CATEGORY_MAPPING = {
"zh_to_en": {
"材料": "Materials",
"功能": "Functions",
"用途": "Usages",
"使用族群": "User Groups",
"特性": "Characteristics",
},
"en_to_zh": {
"Materials": "材料",
"Functions": "功能",
"Usages": "用途",
"User Groups": "使用族群",
"Characteristics": "特性",
},
}

View File

@@ -1,22 +1,43 @@
"""Transformation Agent 提示詞模組""" """Transformation Agent prompts module - Bilingual support"""
from typing import List from typing import List
from .language_config import LanguageType
def get_keyword_generation_prompt( def get_keyword_generation_prompt(
category: str, category: str,
attributes: List[str], attributes: List[str],
keyword_count: int = 3 keyword_count: int = 3,
lang: LanguageType = "zh"
) -> str: ) -> str:
""" """
Step 1: 生成新關鍵字 Step 1: Generate new keywords
給定類別和現有屬性,生成全新的、有創意的關鍵字。 Given a category and existing attributes, generate new, creative keywords.
不考慮原始查詢,只專注於類別本身可能的延伸。 Don't consider the original query, focus only on possible extensions of the category itself.
""" """
attrs_text = "".join(attributes) attrs_text = ", ".join(attributes) if lang == "en" else "".join(attributes)
return f"""/no_think if lang == "en":
return f"""/no_think
You are a creative brainstorming expert. Given a category and its existing attributes, please generate new, creative keywords or descriptive phrases.
[Category] {category}
[Existing Attributes] {attrs_text}
[Important Rules]
1. Generate {keyword_count} completely new keywords
2. Keywords must fit within the scope of "{category}" category
3. Keywords should be creative and not duplicate or be too similar to existing attributes
4. Don't consider any specific object, focus only on possible extensions of this category
5. Each keyword should be 2-6 words
Return JSON only:
{{
"keywords": ["keyword1", "keyword2", "keyword3"]
}}"""
else:
return f"""/no_think
你是一個創意發想專家。給定一個類別和該類別下的現有屬性,請生成全新的、有創意的關鍵字或描述片段。 你是一個創意發想專家。給定一個類別和該類別下的現有屬性,請生成全新的、有創意的關鍵字或描述片段。
【類別】{category} 【類別】{category}
@@ -38,14 +59,36 @@ def get_keyword_generation_prompt(
def get_description_generation_prompt( def get_description_generation_prompt(
query: str, query: str,
category: str, category: str,
keyword: str keyword: str,
lang: LanguageType = "zh"
) -> str: ) -> str:
""" """
Step 2: 結合原始查詢生成描述 Step 2: Combine with original query to generate description
用新關鍵字創造一個與原始查詢相關的創新應用描述。 Use new keyword to create an innovative application description related to the original query.
""" """
return f"""/no_think if lang == "en":
return f"""/no_think
You are an innovation application expert. Please apply a new keyword concept to a specific object to create an innovative application description.
[Object] {query}
[Category] {category}
[New Keyword] {keyword}
[Task]
Using the concept of "{keyword}", create an innovative application description for "{query}".
The description should be a complete sentence or phrase explaining how to apply this new concept to the object.
[Example Format]
- If the object is "bicycle" and keyword is "monitor", you could generate "bicycle monitors the rider's health status"
- If the object is "umbrella" and keyword is "generate power", you could generate "umbrella generates electricity using raindrop impacts"
Return JSON only:
{{
"description": "innovative application description"
}}"""
else:
return f"""/no_think
你是一個創新應用專家。請將一個新的關鍵字概念應用到特定物件上,創造出創新的應用描述。 你是一個創新應用專家。請將一個新的關鍵字概念應用到特定物件上,創造出創新的應用描述。
【物件】{query} 【物件】{query}
@@ -69,15 +112,35 @@ def get_description_generation_prompt(
def get_batch_description_prompt( def get_batch_description_prompt(
query: str, query: str,
category: str, category: str,
keywords: List[str] keywords: List[str],
lang: LanguageType = "zh"
) -> str: ) -> str:
""" """
批次生成描述(可選的優化版本,一次處理多個關鍵字) Batch description generation (optional optimized version, process multiple keywords at once)
""" """
keywords_text = "".join(keywords) keywords_text = ", ".join(keywords) if lang == "en" else "".join(keywords)
keywords_json = ", ".join([f'"{k}"' for k in keywords])
return f"""/no_think if lang == "en":
return f"""/no_think
You are an innovation application expert. Please apply multiple new keyword concepts to a specific object, creating an innovative application description for each keyword.
[Object] {query}
[Category] {category}
[New Keywords] {keywords_text}
[Task]
Create an innovative application description related to "{query}" for each keyword.
Each description should be a complete sentence or phrase.
Return JSON only:
{{
"descriptions": [
{{"keyword": "keyword1", "description": "description1"}},
{{"keyword": "keyword2", "description": "description2"}}
]
}}"""
else:
return f"""/no_think
你是一個創新應用專家。請將多個新的關鍵字概念應用到特定物件上,為每個關鍵字創造創新的應用描述。 你是一個創新應用專家。請將多個新的關鍵字概念應用到特定物件上,為每個關鍵字創造創新的應用描述。
【物件】{query} 【物件】{query}

View File

@@ -58,7 +58,8 @@ async def execute_step0(
prompt = get_step0_category_analysis_prompt( prompt = get_step0_category_analysis_prompt(
request.query, request.query,
request.suggested_category_count, request.suggested_category_count,
exclude_categories=exclude_categories exclude_categories=exclude_categories,
lang=request.lang
) )
temperature = request.temperature if request.temperature is not None else 0.7 temperature = request.temperature if request.temperature is not None else 0.7
response = await ollama_provider.generate( response = await ollama_provider.generate(
@@ -310,7 +311,7 @@ async def generate_sse_events(request: StreamAnalyzeRequest) -> AsyncGenerator[s
# ========== Step 1: Generate Attributes (Dynamic) ========== # ========== Step 1: Generate Attributes (Dynamic) ==========
yield f"event: step1_start\ndata: {json.dumps({'message': '生成屬性...'}, ensure_ascii=False)}\n\n" yield f"event: step1_start\ndata: {json.dumps({'message': '生成屬性...'}, ensure_ascii=False)}\n\n"
step1_prompt = get_step1_dynamic_attributes_prompt(request.query, final_categories) step1_prompt = get_step1_dynamic_attributes_prompt(request.query, final_categories, lang=request.lang)
logger.info(f"Step 1 prompt: {step1_prompt[:200]}") logger.info(f"Step 1 prompt: {step1_prompt[:200]}")
step1_response = await ollama_provider.generate( step1_response = await ollama_provider.generate(
@@ -330,6 +331,7 @@ async def generate_sse_events(request: StreamAnalyzeRequest) -> AsyncGenerator[s
query=request.query, query=request.query,
categories=final_categories, categories=final_categories,
attributes_by_category=step1_result.attributes, attributes_by_category=step1_result.attributes,
lang=request.lang
) )
logger.info(f"Step 2 (relationships) prompt: {step2_prompt[:300]}") logger.info(f"Step 2 (relationships) prompt: {step2_prompt[:300]}")

View File

@@ -63,7 +63,8 @@ async def deduplicate_descriptions(request: DeduplicationRequest) -> Deduplicati
# 使用 LLM 成對比較去重 # 使用 LLM 成對比較去重
result = await llm_deduplication_service.deduplicate( result = await llm_deduplication_service.deduplicate(
descriptions=request.descriptions, descriptions=request.descriptions,
model=request.model model=request.model,
lang=request.lang
) )
return result return result
except ValueError as e: except ValueError as e:

View File

@@ -68,7 +68,8 @@ async def generate_expert_transformation_events(
query=request.query, query=request.query,
categories=all_categories, categories=all_categories,
expert_count=request.expert_count, expert_count=request.expert_count,
custom_experts=actual_custom_experts if actual_custom_experts else None custom_experts=actual_custom_experts if actual_custom_experts else None,
lang=request.lang
) )
logger.info(f"Expert prompt: {expert_prompt[:200]}") logger.info(f"Expert prompt: {expert_prompt[:200]}")
@@ -119,7 +120,8 @@ async def generate_expert_transformation_events(
query=request.query, query=request.query,
categories=all_categories, categories=all_categories,
expert_count=request.expert_count, expert_count=request.expert_count,
custom_experts=actual_custom_experts if actual_custom_experts else None custom_experts=actual_custom_experts if actual_custom_experts else None,
lang=request.lang
) )
expert_response = await ollama_provider.generate( expert_response = await ollama_provider.generate(
@@ -160,7 +162,8 @@ async def generate_expert_transformation_events(
category=request.category, category=request.category,
attribute=attribute, attribute=attribute,
experts=[e.model_dump() for e in experts], experts=[e.model_dump() for e in experts],
keywords_per_expert=request.keywords_per_expert keywords_per_expert=request.keywords_per_expert,
lang=request.lang
) )
logger.info(f"Keyword prompt for '{attribute}': {kw_prompt[:300]}") logger.info(f"Keyword prompt for '{attribute}': {kw_prompt[:300]}")
@@ -214,7 +217,8 @@ async def generate_expert_transformation_events(
keyword=kw.keyword, keyword=kw.keyword,
expert_id=kw.expert_id, expert_id=kw.expert_id,
expert_name=kw.expert_name, expert_name=kw.expert_name,
expert_domain=expert_domain expert_domain=expert_domain,
lang=request.lang
) )
desc_response = await ollama_provider.generate( desc_response = await ollama_provider.generate(

View File

@@ -0,0 +1,137 @@
"""Patent Search Router - Search for similar patents using Lens.org API"""
import logging
from typing import Optional, List
from fastapi import APIRouter
from pydantic import BaseModel
from ..services.patent_search_service import patent_search_service
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/patent", tags=["patent"])
# ===== Request/Response Models =====
class PatentSearchRequest(BaseModel):
"""Patent search request"""
query: str # Search query (description or keywords)
max_results: int = 10 # Maximum results to return (1-20)
class PatentResult(BaseModel):
"""Single patent result from Lens.org"""
lens_id: str
doc_number: str
jurisdiction: str
kind: str
title: str
abstract: Optional[str] = None
date_published: Optional[str] = None
applicants: List[str] = []
inventors: List[str] = []
legal_status: Optional[str] = None
classifications_cpc: List[str] = []
families_simple: List[str] = []
url: str
class PatentSearchResponse(BaseModel):
"""Patent search response"""
query: str
total_results: int
patents: List[PatentResult]
error: Optional[str] = None
class BatchPatentSearchRequest(BaseModel):
"""Batch patent search request - search multiple descriptions"""
queries: List[str] # List of descriptions to search
max_results_per_query: int = 5 # Max results per query
class BatchPatentSearchResult(BaseModel):
"""Results for a single query in batch search"""
query: str
total_results: int
patents: List[PatentResult]
error: Optional[str] = None
class BatchPatentSearchResponse(BaseModel):
"""Batch patent search response"""
results: List[BatchPatentSearchResult]
total_queries: int
# ===== Endpoints =====
@router.post("/search", response_model=PatentSearchResponse)
async def search_patents(request: PatentSearchRequest):
"""
Search for patents similar to the given description/query.
Uses Lens.org API to find related patents based on title, abstract, and claims.
"""
logger.info(f"Patent search request: {request.query[:100]}...")
# Limit max_results to reasonable range
max_results = min(max(1, request.max_results), 20)
result = await patent_search_service.search(
query=request.query,
max_results=max_results,
)
return PatentSearchResponse(
query=request.query,
total_results=result.get("total_results", 0),
patents=[PatentResult(**p) for p in result.get("patents", [])],
error=result.get("error"),
)
@router.post("/search/batch", response_model=BatchPatentSearchResponse)
async def batch_search_patents(request: BatchPatentSearchRequest):
"""
Search for patents for multiple descriptions at once.
Useful for checking multiple creative descriptions against patents.
"""
logger.info(f"Batch patent search: {len(request.queries)} queries")
# Limit results per query
max_per_query = min(max(1, request.max_results_per_query), 10)
results: List[BatchPatentSearchResult] = []
for query in request.queries:
result = await patent_search_service.search(
query=query,
max_results=max_per_query,
)
results.append(BatchPatentSearchResult(
query=query,
total_results=result.get("total_results", 0),
patents=[PatentResult(**p) for p in result.get("patents", [])],
error=result.get("error"),
))
return BatchPatentSearchResponse(
results=results,
total_queries=len(request.queries),
)
@router.get("/health")
async def patent_search_health():
"""Check if patent search service is working"""
# Do a simple test search
result = await patent_search_service.search("test", max_results=1)
if result.get("error"):
return {"status": "unhealthy", "error": result["error"]}
return {"status": "healthy"}

View File

@@ -36,7 +36,8 @@ async def generate_transformation_events(
keyword_prompt = get_keyword_generation_prompt( keyword_prompt = get_keyword_generation_prompt(
category=request.category, category=request.category,
attributes=request.attributes, attributes=request.attributes,
keyword_count=request.keyword_count keyword_count=request.keyword_count,
lang=request.lang
) )
logger.info(f"Keyword prompt: {keyword_prompt[:200]}") logger.info(f"Keyword prompt: {keyword_prompt[:200]}")
@@ -61,7 +62,8 @@ async def generate_transformation_events(
desc_prompt = get_batch_description_prompt( desc_prompt = get_batch_description_prompt(
query=request.query, query=request.query,
category=request.category, category=request.category,
keywords=new_keywords keywords=new_keywords,
lang=request.lang
) )
logger.info(f"Description prompt: {desc_prompt[:300]}") logger.info(f"Description prompt: {desc_prompt[:300]}")

View File

@@ -26,7 +26,7 @@ class EmbeddingService:
def __init__(self): def __init__(self):
self.base_url = settings.ollama_base_url self.base_url = settings.ollama_base_url
self.default_model = "nomic-embed-text" # Ollama 預設的 embedding 模型 self.default_model = "qwen3-embedding:4b" # Qwen3 embedding model for better semantic understanding
self.client = httpx.AsyncClient(timeout=120.0) self.client = httpx.AsyncClient(timeout=120.0)
async def get_embedding(self, text: str, model: Optional[str] = None) -> List[float]: async def get_embedding(self, text: str, model: Optional[str] = None) -> List[float]:

View File

@@ -1,12 +1,12 @@
""" """
LLM Deduplication Service - 使用 LLM 成對比較進行去重 LLM Deduplication Service - Using LLM pairwise comparison for deduplication
LLM 判斷兩個描述是否語意重複,透過並行處理加速。 Let LLM determine whether two descriptions are semantically duplicate, accelerated by parallel processing.
""" """
import asyncio import asyncio
import logging import logging
from typing import List, Tuple, Optional from typing import List, Tuple, Optional, Literal
import httpx import httpx
import numpy as np import numpy as np
@@ -18,6 +18,7 @@ from ..models.schemas import (
DeduplicationMethod, DeduplicationMethod,
DescriptionGroup, DescriptionGroup,
) )
from ..prompts.language_config import LanguageType
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -31,27 +32,20 @@ class LLMDeduplicationService:
self.client = httpx.AsyncClient(timeout=60.0) self.client = httpx.AsyncClient(timeout=60.0)
self.max_concurrent = 5 # 最大並行數,避免 Ollama 過載 self.max_concurrent = 5 # 最大並行數,避免 Ollama 過載
async def compare_pair( def _get_comparison_prompt(self, desc1: str, desc2: str, lang: LanguageType = "zh") -> str:
self, """Get comparison prompt in the specified language"""
desc1: str, if lang == "en":
desc2: str, return f"""Determine whether the following two innovative descriptions express the same or very similar concepts:
model: str,
semaphore: asyncio.Semaphore
) -> bool:
"""
讓 LLM 判斷兩個描述是否語意重複
Args: Description 1: {desc1}
desc1: 第一個描述
desc2: 第二個描述
model: LLM 模型名稱
semaphore: 並行控制信號量
Returns: Description 2: {desc2}
bool: 是否為重複描述
""" If both descriptions essentially express the same or very similar innovative concept, answer "YES"
async with semaphore: # 控制並行數 If the two descriptions express different innovative concepts, answer "NO"
prompt = f"""判斷以下兩個創新描述是否表達相同或非常相似的概念: Only answer YES or NO, no other text"""
else:
return f"""判斷以下兩個創新描述是否表達相同或非常相似的概念:
描述1: {desc1} 描述1: {desc1}
@@ -61,6 +55,30 @@ class LLMDeduplicationService:
如果兩者描述不同的創新概念,回答 "NO" 如果兩者描述不同的創新概念,回答 "NO"
只回答 YES 或 NO不要其他文字""" 只回答 YES 或 NO不要其他文字"""
async def compare_pair(
self,
desc1: str,
desc2: str,
model: str,
semaphore: asyncio.Semaphore,
lang: LanguageType = "zh"
) -> bool:
"""
Let LLM determine whether two descriptions are semantically duplicate
Args:
desc1: First description
desc2: Second description
model: LLM model name
semaphore: Concurrency control semaphore
lang: Language for the prompt
Returns:
bool: Whether the descriptions are duplicates
"""
async with semaphore: # Control concurrency
prompt = self._get_comparison_prompt(desc1, desc2, lang)
try: try:
response = await self.client.post( response = await self.client.post(
f"{self.base_url}/api/generate", f"{self.base_url}/api/generate",
@@ -86,26 +104,28 @@ class LLMDeduplicationService:
async def compare_batch( async def compare_batch(
self, self,
pairs: List[Tuple[int, int, str, str]], pairs: List[Tuple[int, int, str, str]],
model: str model: str,
lang: LanguageType = "zh"
) -> List[Tuple[int, int, bool]]: ) -> List[Tuple[int, int, bool]]:
""" """
並行批次比較多個描述對 Parallel batch comparison of multiple description pairs
Args: Args:
pairs: 待比較的配對列表 [(i, j, desc1, desc2), ...] pairs: List of pairs to compare [(i, j, desc1, desc2), ...]
model: LLM 模型名稱 model: LLM model name
lang: Language for the prompt
Returns: Returns:
比較結果列表 [(i, j, is_similar), ...] List of comparison results [(i, j, is_similar), ...]
""" """
semaphore = asyncio.Semaphore(self.max_concurrent) semaphore = asyncio.Semaphore(self.max_concurrent)
async def compare_one(pair: Tuple[int, int, str, str]) -> Tuple[int, int, bool]: async def compare_one(pair: Tuple[int, int, str, str]) -> Tuple[int, int, bool]:
i, j, desc1, desc2 = pair i, j, desc1, desc2 = pair
is_similar = await self.compare_pair(desc1, desc2, model, semaphore) is_similar = await self.compare_pair(desc1, desc2, model, semaphore, lang)
return (i, j, is_similar) return (i, j, is_similar)
# 使用 asyncio.gather 並行執行所有比較 # Use asyncio.gather to execute all comparisons in parallel
results = await asyncio.gather(*[compare_one(p) for p in pairs]) results = await asyncio.gather(*[compare_one(p) for p in pairs])
return results return results
@@ -144,17 +164,19 @@ class LLMDeduplicationService:
async def deduplicate( async def deduplicate(
self, self,
descriptions: List[ExpertTransformationDescription], descriptions: List[ExpertTransformationDescription],
model: Optional[str] = None model: Optional[str] = None,
lang: LanguageType = "zh"
) -> DeduplicationResult: ) -> DeduplicationResult:
""" """
使用 LLM 成對比較進行去重 Use LLM pairwise comparison for deduplication
Args: Args:
descriptions: 要去重的描述列表 descriptions: List of descriptions to deduplicate
model: LLM 模型名稱 model: LLM model name
lang: Language for the prompt
Returns: Returns:
DeduplicationResult: 去重結果 DeduplicationResult: Deduplication result
""" """
model = model or self.default_model model = model or self.default_model
@@ -188,10 +210,10 @@ class LLMDeduplicationService:
)) ))
total_pairs = len(pairs) total_pairs = len(pairs)
logger.info(f"LLM deduplication: {total_pairs} pairs to compare (parallel={self.max_concurrent}, model={model})") logger.info(f"LLM deduplication: {total_pairs} pairs to compare (parallel={self.max_concurrent}, model={model}, lang={lang})")
# 並行批次比較 # Parallel batch comparison
results = await self.compare_batch(pairs, model) results = await self.compare_batch(pairs, model, lang)
# 填入相似度矩陣 # 填入相似度矩陣
for i, j, is_similar in results: for i, j, is_similar in results:

View File

@@ -0,0 +1,264 @@
"""Patent Search Service using Lens.org API"""
import httpx
import logging
from typing import List, Optional, Dict, Any
from dataclasses import dataclass, asdict
from app.config import settings
logger = logging.getLogger(__name__)
@dataclass
class PatentSearchResult:
"""Single patent search result from Lens.org"""
lens_id: str
doc_number: str
jurisdiction: str
kind: str
title: str
abstract: Optional[str]
date_published: Optional[str]
applicants: List[str]
inventors: List[str]
legal_status: Optional[str]
classifications_cpc: List[str]
families_simple: List[str]
url: str
def to_dict(self) -> Dict[str, Any]:
return asdict(self)
class PatentSearchService:
"""Service for searching patents using Lens.org API"""
LENS_API_URL = "https://api.lens.org/patent/search"
def __init__(self):
self._client: Optional[httpx.AsyncClient] = None
async def _get_client(self) -> httpx.AsyncClient:
if self._client is None or self._client.is_closed:
self._client = httpx.AsyncClient(
timeout=30.0,
follow_redirects=True,
)
return self._client
async def close(self):
if self._client and not self._client.is_closed:
await self._client.aclose()
def _get_headers(self) -> Dict[str, str]:
"""Get headers with authorization token"""
token = settings.lens_api_token
if not token:
raise ValueError("LENS_API_TOKEN environment variable is not set")
return {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/json",
}
async def search(
self,
query: str,
max_results: int = 10,
) -> dict:
"""
Search Lens.org for relevant patents
Args:
query: Search query (searches title, abstract, and claims)
max_results: Maximum number of results to return
Returns:
Dict with total_results count and list of patent results
"""
try:
client = await self._get_client()
# Build Lens.org query using query string format for full-text search
request_body = {
"query": query,
"size": max_results,
"sort": [{"_score": "desc"}]
}
logger.info(f"Searching Lens.org patents with query: {query[:100]}...")
response = await client.post(
self.LENS_API_URL,
json=request_body,
headers=self._get_headers(),
)
if response.status_code == 401:
logger.error("Lens.org API authentication failed - check LENS_API_TOKEN")
return {
"total_results": 0,
"patents": [],
"error": "Authentication failed - invalid API token"
}
if response.status_code == 429:
logger.warning("Lens.org API rate limit exceeded")
return {
"total_results": 0,
"patents": [],
"error": "Rate limit exceeded - please try again later"
}
if response.status_code != 200:
logger.error(f"Lens.org API returned status {response.status_code}: {response.text}")
return {
"total_results": 0,
"patents": [],
"error": f"API returned status {response.status_code}"
}
data = response.json()
total_results = data.get("total", 0)
results = data.get("data", [])
patents: List[PatentSearchResult] = []
for item in results:
patent = self._parse_patent(item)
patents.append(patent)
logger.info(f"Found {total_results} total patents, returning {len(patents)}")
return {
"total_results": total_results,
"patents": [p.to_dict() for p in patents],
}
except ValueError as e:
logger.error(f"Configuration error: {e}")
return {
"total_results": 0,
"patents": [],
"error": str(e)
}
except httpx.HTTPError as e:
logger.error(f"HTTP error searching patents: {e}")
return {
"total_results": 0,
"patents": [],
"error": str(e)
}
except Exception as e:
logger.error(f"Error searching patents: {e}")
return {
"total_results": 0,
"patents": [],
"error": str(e)
}
def _parse_patent(self, item: Dict[str, Any]) -> PatentSearchResult:
"""Parse a single patent result from Lens.org response"""
lens_id = item.get("lens_id", "")
jurisdiction = item.get("jurisdiction", "")
doc_number = item.get("doc_number", "")
kind = item.get("kind", "")
# Get biblio section (contains title, parties, classifications)
biblio = item.get("biblio", {})
# Extract title from biblio.invention_title (list with lang info)
title_data = biblio.get("invention_title", [])
title = self._extract_text_with_lang(title_data)
# Extract abstract (top-level, list with lang info)
abstract_data = item.get("abstract", [])
abstract = self._extract_text_with_lang(abstract_data)
# Extract applicants from biblio.parties.applicants
parties = biblio.get("parties", {})
applicants = []
applicant_data = parties.get("applicants", [])
if isinstance(applicant_data, list):
for app in applicant_data:
if isinstance(app, dict):
name = app.get("extracted_name", {}).get("value", "")
if name:
applicants.append(name)
# Extract inventors from biblio.parties.inventors
inventors = []
inventor_data = parties.get("inventors", [])
if isinstance(inventor_data, list):
for inv in inventor_data:
if isinstance(inv, dict):
name = inv.get("extracted_name", {}).get("value", "")
if name:
inventors.append(name)
# Extract legal status
legal_status_data = item.get("legal_status", {})
legal_status = None
if isinstance(legal_status_data, dict):
legal_status = legal_status_data.get("patent_status")
# Extract CPC classifications from biblio.classifications_cpc
classifications_cpc = []
cpc_data = biblio.get("classifications_cpc", [])
if isinstance(cpc_data, list):
for cpc in cpc_data:
if isinstance(cpc, dict):
symbol = cpc.get("symbol", "")
if symbol:
classifications_cpc.append(symbol)
# Extract simple family members
families_simple = []
families_data = item.get("families", {})
if isinstance(families_data, dict):
simple_family = families_data.get("simple", {})
if isinstance(simple_family, dict):
members = simple_family.get("members", [])
if isinstance(members, list):
families_simple = [m.get("lens_id", "") for m in members if isinstance(m, dict) and m.get("lens_id")]
# Build URL to Lens.org patent page
url = f"https://www.lens.org/lens/patent/{lens_id}" if lens_id else ""
return PatentSearchResult(
lens_id=lens_id,
doc_number=doc_number,
jurisdiction=jurisdiction,
kind=kind,
title=title,
abstract=abstract,
date_published=item.get("date_published"),
applicants=applicants,
inventors=inventors,
legal_status=legal_status,
classifications_cpc=classifications_cpc,
families_simple=families_simple,
url=url,
)
def _extract_text_with_lang(self, data: Any, prefer_lang: str = "en") -> str:
"""Extract text from Lens.org language-tagged list, preferring specified language"""
if not data:
return ""
if isinstance(data, str):
return data
if isinstance(data, list) and data:
# Prefer specified language
for item in data:
if isinstance(item, dict) and item.get("lang") == prefer_lang:
return item.get("text", "")
# Fall back to first item
first = data[0]
if isinstance(first, dict):
return first.get("text", "")
return str(first)
return ""
# Singleton instance
patent_search_service = PatentSearchService()

7
experiments/__init__.py Normal file
View File

@@ -0,0 +1,7 @@
"""
Experiment module for 5-condition idea generation study.
This module implements a 2×2 factorial design + control to test
the contributions of attribute decomposition and expert perspectives
to creative ideation quality.
"""

View File

@@ -0,0 +1,546 @@
"""
Statistical analysis for experiment results.
Performs:
- 2×2 ANOVA for main effects (attributes, experts) and interaction
- Post-hoc tests (Tukey HSD)
- Effect sizes (Cohen's d)
- Control comparison (C2 vs C5)
Usage:
python -m experiments.analyze_results --input results/experiment_xxx_metrics.json
"""
import sys
import json
import argparse
from pathlib import Path
from typing import List, Dict, Any, Tuple
from dataclasses import dataclass
import numpy as np
class NumpyEncoder(json.JSONEncoder):
"""JSON encoder that handles numpy types."""
def default(self, obj):
if isinstance(obj, (np.integer, np.int64, np.int32)):
return int(obj)
if isinstance(obj, (np.floating, np.float64, np.float32)):
return float(obj)
if isinstance(obj, (np.bool_, bool)):
return bool(obj)
if isinstance(obj, np.ndarray):
return obj.tolist()
return super().default(obj)
# Add experiments to path
sys.path.insert(0, str(Path(__file__).parent.parent))
from experiments.config import RESULTS_DIR
# Try to import statistical libraries
try:
from scipy import stats
SCIPY_AVAILABLE = True
except ImportError:
SCIPY_AVAILABLE = False
print("Warning: scipy not installed. Some statistical tests will be unavailable.")
try:
import pandas as pd
PANDAS_AVAILABLE = True
except ImportError:
PANDAS_AVAILABLE = False
@dataclass
class EffectSize:
"""Cohen's d effect size with interpretation."""
d: float
interpretation: str # small, medium, large
@staticmethod
def from_groups(group1: List[float], group2: List[float]) -> 'EffectSize':
"""Calculate Cohen's d from two groups."""
n1, n2 = len(group1), len(group2)
if n1 < 2 or n2 < 2:
return EffectSize(d=0, interpretation="insufficient data")
mean1, mean2 = np.mean(group1), np.mean(group2)
var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
# Pooled standard deviation
pooled_std = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2))
if pooled_std == 0:
return EffectSize(d=0, interpretation="no variance")
d = (mean1 - mean2) / pooled_std
# Interpretation (Cohen's conventions)
abs_d = abs(d)
if abs_d < 0.2:
interpretation = "negligible"
elif abs_d < 0.5:
interpretation = "small"
elif abs_d < 0.8:
interpretation = "medium"
else:
interpretation = "large"
return EffectSize(d=round(d, 4), interpretation=interpretation)
@dataclass
class TTestResult:
"""Independent samples t-test result."""
t_statistic: float
p_value: float
effect_size: EffectSize
significant: bool # p < 0.05
group1_mean: float
group2_mean: float
group1_std: float
group2_std: float
group1_n: int
group2_n: int
@dataclass
class ANOVAResult:
"""2×2 ANOVA result."""
main_effect_attributes: Dict[str, float] # F, p
main_effect_experts: Dict[str, float] # F, p
interaction: Dict[str, float] # F, p
significant_effects: List[str]
def extract_metric_values(
metrics: Dict[str, Any],
metric_path: str
) -> Dict[str, List[float]]:
"""
Extract values for a specific metric across all queries.
Args:
metrics: Full metrics dict from compute_metrics.py
metric_path: Dot-separated path like "post_dedup_diversity.mean_pairwise_distance"
Returns:
Dict mapping condition name to list of values
"""
by_condition = {}
for query_metrics in metrics.get("metrics_by_query", []):
for condition, cond_metrics in query_metrics.get("conditions", {}).items():
if condition not in by_condition:
by_condition[condition] = []
# Navigate the metric path
value = cond_metrics
for key in metric_path.split("."):
if value is None:
break
if isinstance(value, dict):
value = value.get(key)
else:
value = None
if value is not None and isinstance(value, (int, float)):
by_condition[condition].append(float(value))
return by_condition
def perform_ttest(
group1: List[float],
group2: List[float],
group1_name: str = "Group 1",
group2_name: str = "Group 2"
) -> TTestResult:
"""Perform independent samples t-test."""
if not SCIPY_AVAILABLE:
return None
if len(group1) < 2 or len(group2) < 2:
return None
t_stat, p_value = stats.ttest_ind(group1, group2)
effect = EffectSize.from_groups(group1, group2)
return TTestResult(
t_statistic=round(t_stat, 4),
p_value=round(p_value, 4),
effect_size=effect,
significant=p_value < 0.05,
group1_mean=round(np.mean(group1), 4),
group2_mean=round(np.mean(group2), 4),
group1_std=round(np.std(group1, ddof=1), 4),
group2_std=round(np.std(group2, ddof=1), 4),
group1_n=len(group1),
group2_n=len(group2)
)
def perform_2x2_anova(
c1_direct: List[float], # No attributes, No experts
c2_expert: List[float], # No attributes, With experts
c3_attribute: List[float], # With attributes, No experts
c4_full: List[float] # With attributes, With experts
) -> ANOVAResult:
"""
Perform 2×2 factorial ANOVA.
Factors:
- Attributes: Without (C1, C2) vs With (C3, C4)
- Experts: Without (C1, C3) vs With (C2, C4)
"""
if not SCIPY_AVAILABLE:
return None
# Check minimum data
min_n = min(len(c1_direct), len(c2_expert), len(c3_attribute), len(c4_full))
if min_n < 2:
return None
# For a proper 2×2 ANOVA, we'd use statsmodels or similar
# Here we'll compute main effects and interaction manually
# Main effect of Attributes: (C3 + C4) vs (C1 + C2)
no_attr = c1_direct + c2_expert
with_attr = c3_attribute + c4_full
f_attr, p_attr = stats.f_oneway(no_attr, with_attr)
# Main effect of Experts: (C2 + C4) vs (C1 + C3)
no_expert = c1_direct + c3_attribute
with_expert = c2_expert + c4_full
f_expert, p_expert = stats.f_oneway(no_expert, with_expert)
# Interaction: Compare the difference of differences
# (C4 - C3) - (C2 - C1) = interaction term
# Simplified approach: compare all 4 groups
f_all, p_all = stats.f_oneway(c1_direct, c2_expert, c3_attribute, c4_full)
# Estimate interaction by checking if combination is super-additive
mean1, mean2, mean3, mean4 = np.mean(c1_direct), np.mean(c2_expert), np.mean(c3_attribute), np.mean(c4_full)
expected_additive = mean1 + (mean2 - mean1) + (mean3 - mean1) # Additive prediction
actual_combination = mean4
interaction_strength = actual_combination - expected_additive
significant_effects = []
if p_attr < 0.05:
significant_effects.append("Attributes")
if p_expert < 0.05:
significant_effects.append("Experts")
if p_all < 0.05 and abs(interaction_strength) > 0.01:
significant_effects.append("Interaction")
return ANOVAResult(
main_effect_attributes={"F": round(f_attr, 4), "p": round(p_attr, 4)},
main_effect_experts={"F": round(f_expert, 4), "p": round(p_expert, 4)},
interaction={
"F_all_groups": round(f_all, 4),
"p_all_groups": round(p_all, 4),
"interaction_strength": round(interaction_strength, 4),
"super_additive": interaction_strength > 0
},
significant_effects=significant_effects
)
def analyze_experiment(metrics: Dict[str, Any]) -> Dict[str, Any]:
"""
Perform full statistical analysis on experiment metrics.
Returns analysis results for multiple metrics.
"""
results = {
"analysis_metrics": [],
"research_questions": {}
}
# Define metrics to analyze
metrics_to_analyze = [
("Survival Rate", "survival_rate"),
("Post-Dedup Diversity", "post_dedup_diversity.mean_pairwise_distance"),
("Normalized Diversity", "normalized_diversity.mean_pairwise_distance"),
("Query Distance", "post_dedup_query_distance.mean_distance"),
("Cluster Count", "post_dedup_clusters.optimal_clusters"),
]
for metric_name, metric_path in metrics_to_analyze:
print(f"\n{'='*60}")
print(f"Analyzing: {metric_name}")
print(f"{'='*60}")
# Extract values by condition
by_condition = extract_metric_values(metrics, metric_path)
if not by_condition:
print(f" No data available for {metric_name}")
continue
metric_results = {
"metric_name": metric_name,
"metric_path": metric_path,
"descriptive": {},
"comparisons": {},
"anova": None
}
# Descriptive statistics
print(f"\nDescriptive Statistics:")
print(f"{'Condition':<25} {'Mean':<10} {'Std':<10} {'N':<5}")
print("-" * 50)
for cond, values in sorted(by_condition.items()):
if values:
mean = np.mean(values)
std = np.std(values, ddof=1) if len(values) > 1 else 0
metric_results["descriptive"][cond] = {
"mean": round(mean, 4),
"std": round(std, 4),
"n": len(values)
}
print(f"{cond:<25} {mean:<10.4f} {std:<10.4f} {len(values):<5}")
# Key comparisons
comparisons = []
# 1. C1 (Direct) vs C4 (Full Pipeline) - Main comparison
if "c1_direct" in by_condition and "c4_full_pipeline" in by_condition:
result = perform_ttest(
by_condition["c4_full_pipeline"],
by_condition["c1_direct"],
"Full Pipeline", "Direct"
)
if result:
comparisons.append(("C4 vs C1 (Full vs Direct)", result))
metric_results["comparisons"]["c4_vs_c1"] = {
"t": result.t_statistic,
"p": result.p_value,
"d": result.effect_size.d,
"interpretation": result.effect_size.interpretation,
"significant": result.significant
}
# 2. C2 (Expert) vs C5 (Random) - Control comparison
if "c2_expert_only" in by_condition and "c5_random_perspective" in by_condition:
result = perform_ttest(
by_condition["c2_expert_only"],
by_condition["c5_random_perspective"],
"Expert", "Random"
)
if result:
comparisons.append(("C2 vs C5 (Expert vs Random)", result))
metric_results["comparisons"]["c2_vs_c5"] = {
"t": result.t_statistic,
"p": result.p_value,
"d": result.effect_size.d,
"interpretation": result.effect_size.interpretation,
"significant": result.significant
}
# 3. C2 (Expert-Only) vs C1 (Direct) - Effect of experts alone
if "c2_expert_only" in by_condition and "c1_direct" in by_condition:
result = perform_ttest(
by_condition["c2_expert_only"],
by_condition["c1_direct"],
"Expert-Only", "Direct"
)
if result:
comparisons.append(("C2 vs C1 (Expert effect)", result))
metric_results["comparisons"]["c2_vs_c1"] = {
"t": result.t_statistic,
"p": result.p_value,
"d": result.effect_size.d,
"interpretation": result.effect_size.interpretation,
"significant": result.significant
}
# 4. C3 (Attribute-Only) vs C1 (Direct) - Effect of attributes alone
if "c3_attribute_only" in by_condition and "c1_direct" in by_condition:
result = perform_ttest(
by_condition["c3_attribute_only"],
by_condition["c1_direct"],
"Attribute-Only", "Direct"
)
if result:
comparisons.append(("C3 vs C1 (Attribute effect)", result))
metric_results["comparisons"]["c3_vs_c1"] = {
"t": result.t_statistic,
"p": result.p_value,
"d": result.effect_size.d,
"interpretation": result.effect_size.interpretation,
"significant": result.significant
}
# Print comparisons
if comparisons:
print(f"\nPairwise Comparisons:")
print(f"{'Comparison':<30} {'t':<10} {'p':<10} {'d':<10} {'Sig?':<8}")
print("-" * 68)
for name, result in comparisons:
sig = "Yes*" if result.significant else "No"
print(f"{name:<30} {result.t_statistic:<10.3f} {result.p_value:<10.4f} "
f"{result.effect_size.d:<10.3f} {sig:<8}")
# 2×2 ANOVA (if all conditions available)
if all(c in by_condition for c in ["c1_direct", "c2_expert_only", "c3_attribute_only", "c4_full_pipeline"]):
anova = perform_2x2_anova(
by_condition["c1_direct"],
by_condition["c2_expert_only"],
by_condition["c3_attribute_only"],
by_condition["c4_full_pipeline"]
)
if anova:
metric_results["anova"] = {
"main_effect_attributes": anova.main_effect_attributes,
"main_effect_experts": anova.main_effect_experts,
"interaction": anova.interaction,
"significant_effects": anova.significant_effects
}
print(f"\n2×2 ANOVA Results:")
print(f" Main Effect (Attributes): F={anova.main_effect_attributes['F']:.3f}, "
f"p={anova.main_effect_attributes['p']:.4f}")
print(f" Main Effect (Experts): F={anova.main_effect_experts['F']:.3f}, "
f"p={anova.main_effect_experts['p']:.4f}")
print(f" Interaction Strength: {anova.interaction['interaction_strength']:.4f} "
f"({'super-additive' if anova.interaction['super_additive'] else 'sub-additive'})")
print(f" Significant Effects: {', '.join(anova.significant_effects) or 'None'}")
results["analysis_metrics"].append(metric_results)
# Summarize research questions
results["research_questions"] = summarize_research_questions(results["analysis_metrics"])
return results
def summarize_research_questions(analysis_metrics: List[Dict]) -> Dict[str, str]:
"""Summarize findings for each research question."""
rq = {}
# Find the diversity metric results
diversity_results = None
for m in analysis_metrics:
if "Diversity" in m["metric_name"] and "Normalized" in m["metric_name"]:
diversity_results = m
break
if diversity_results is None:
for m in analysis_metrics:
if "Diversity" in m["metric_name"]:
diversity_results = m
break
if diversity_results:
anova = diversity_results.get("anova", {})
comparisons = diversity_results.get("comparisons", {})
# RQ1: Does attribute decomposition improve diversity?
if anova and "main_effect_attributes" in anova:
p = anova["main_effect_attributes"]["p"]
rq["RQ1_attributes"] = f"Main effect p={p:.4f}. " + \
("Significant effect of attributes." if p < 0.05 else "No significant effect.")
# RQ2: Do expert perspectives improve diversity?
if anova and "main_effect_experts" in anova:
p = anova["main_effect_experts"]["p"]
rq["RQ2_experts"] = f"Main effect p={p:.4f}. " + \
("Significant effect of experts." if p < 0.05 else "No significant effect.")
# RQ3: Interaction effect?
if anova and "interaction" in anova:
strength = anova["interaction"]["interaction_strength"]
super_add = anova["interaction"]["super_additive"]
rq["RQ3_interaction"] = f"Interaction strength={strength:.4f}. " + \
("Super-additive (combination better than sum)." if super_add else "Sub-additive or additive.")
# RQ5: Expert vs Random (C2 vs C5)
if "c2_vs_c5" in comparisons:
comp = comparisons["c2_vs_c5"]
rq["RQ5_expert_vs_random"] = f"d={comp['d']:.3f} ({comp['interpretation']}), p={comp['p']:.4f}. " + \
("Expert knowledge matters." if comp["significant"] and comp["d"] > 0 else "No significant difference from random perspectives.")
return rq
def print_research_summary(results: Dict[str, Any]):
"""Print summary of research question findings."""
print("\n" + "=" * 70)
print("RESEARCH QUESTIONS SUMMARY")
print("=" * 70)
rq = results.get("research_questions", {})
print("\nRQ1: Does attribute decomposition improve semantic diversity?")
print(f"{rq.get('RQ1_attributes', 'Insufficient data')}")
print("\nRQ2: Do expert perspectives improve semantic diversity?")
print(f"{rq.get('RQ2_experts', 'Insufficient data')}")
print("\nRQ3: Is there an interaction effect (Full Pipeline > sum of parts)?")
print(f"{rq.get('RQ3_interaction', 'Insufficient data')}")
print("\nRQ5: Do experts beat random perspectives? (C2 vs C5)")
print(f"{rq.get('RQ5_expert_vs_random', 'Insufficient data')}")
print("\n" + "=" * 70)
print("Note: With pilot data (n=1 query), statistical power is limited.")
print("Full experiment (n=10+ queries) needed for reliable conclusions.")
print("=" * 70)
def main():
parser = argparse.ArgumentParser(
description="Statistical analysis for experiment results"
)
parser.add_argument(
"--input",
type=str,
required=True,
help="Input metrics JSON file"
)
parser.add_argument(
"--output",
type=str,
help="Output file path (default: input_analysis.json)"
)
args = parser.parse_args()
input_path = Path(args.input)
if not input_path.exists():
input_path = RESULTS_DIR / args.input
if not input_path.exists():
print(f"Error: Input file not found: {args.input}")
sys.exit(1)
# Load metrics
with open(input_path, "r", encoding="utf-8") as f:
metrics = json.load(f)
# Run analysis
results = analyze_experiment(metrics)
# Print research summary
print_research_summary(results)
# Save results
if args.output:
output_path = Path(args.output)
else:
stem = input_path.stem.replace("_metrics", "")
output_path = input_path.parent / f"{stem}_analysis.json"
with open(output_path, "w", encoding="utf-8") as f:
json.dump(results, f, indent=2, ensure_ascii=False, cls=NumpyEncoder)
print(f"\nAnalysis saved to: {output_path}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,314 @@
# Human Assessment Web Interface
A standalone web application for human assessment of generated ideas using Torrance-inspired creativity metrics.
## Overview
This tool enables blind evaluation of creative ideas generated by the novelty-seeking experiment. Raters assess ideas on four dimensions without knowing which experimental condition produced each idea, ensuring unbiased evaluation.
## Quick Start
```bash
cd experiments/assessment
# 1. Prepare assessment data (if not already done)
python3 prepare_data.py
# 2. Start the system
./start.sh
# 3. Open browser
open http://localhost:5174
```
## Directory Structure
```
assessment/
├── backend/
│ ├── app.py # FastAPI backend API
│ ├── database.py # SQLite database operations
│ ├── models.py # Pydantic models & dimension definitions
│ └── requirements.txt # Python dependencies
├── frontend/
│ ├── src/
│ │ ├── components/ # React UI components
│ │ ├── hooks/ # React state management
│ │ ├── services/ # API client
│ │ └── types/ # TypeScript definitions
│ └── package.json
├── data/
│ └── assessment_items.json # Prepared ideas for rating
├── results/
│ └── ratings.db # SQLite database with ratings
├── prepare_data.py # Data preparation script
├── analyze_ratings.py # Inter-rater reliability analysis
├── start.sh # Start both servers
├── stop.sh # Stop all services
└── README.md # This file
```
## Data Preparation
### List Available Experiment Files
```bash
python3 prepare_data.py --list
```
Output:
```
Available experiment files (most recent first):
experiment_20260119_165650_deduped.json (1571.3 KB)
experiment_20260119_163040_deduped.json (156.4 KB)
```
### Prepare Assessment Data
```bash
# Use all ideas (not recommended for human assessment)
python3 prepare_data.py
# RECOMMENDED: Stratified sampling - 4 ideas per condition per query
# Results in ~200 ideas (5 conditions × 4 ideas × 10 queries)
python3 prepare_data.py --per-condition 4
# Alternative: Sample 150 ideas total (proportionally across queries)
python3 prepare_data.py --sample 150
# Limit per query (20 ideas max per query)
python3 prepare_data.py --per-query 20
# Combined: 4 per condition, max 15 per query
python3 prepare_data.py --per-condition 4 --per-query 15
# Specify a different experiment file
python3 prepare_data.py experiment_20260119_163040_deduped.json --per-condition 4
```
### Sampling Options
| Option | Description | Example |
|--------|-------------|---------|
| `--per-condition N` | Max N ideas per condition per query (stratified) | `--per-condition 4` → ~200 ideas |
| `--per-query N` | Max N ideas per query | `--per-query 20` |
| `--sample N` | Total N ideas (proportionally distributed) | `--sample 150` |
| `--seed N` | Random seed for reproducibility | `--seed 42` (default) |
**Recommendation**: Use `--per-condition 4` for balanced assessment across conditions.
The script:
1. Loads the deduped experiment results
2. Extracts all unique ideas with hidden metadata (condition, expert, keyword)
3. Assigns stable IDs to each idea
4. Shuffles ideas within each query (reproducible with seed=42)
5. Outputs `data/assessment_items.json`
## Assessment Dimensions
Raters evaluate each idea on four dimensions using a 1-5 Likert scale:
### Originality
*How unexpected or surprising is this idea?*
| Score | Description |
|-------|-------------|
| 1 | Very common/obvious idea anyone would suggest |
| 2 | Somewhat common, slight variation on expected ideas |
| 3 | Moderately original, some unexpected elements |
| 4 | Quite original, notably different approach |
| 5 | Highly unexpected, truly novel concept |
### Elaboration
*How detailed and well-developed is this idea?*
| Score | Description |
|-------|-------------|
| 1 | Vague, minimal detail, just a concept |
| 2 | Basic idea with little specificity |
| 3 | Moderately detailed, some specifics provided |
| 4 | Well-developed with clear implementation hints |
| 5 | Highly specific, thoroughly developed concept |
### Coherence
*Does this idea make logical sense and relate to the query object?*
| Score | Description |
|-------|-------------|
| 1 | Nonsensical, irrelevant, or incomprehensible |
| 2 | Mostly unclear, weak connection to query |
| 3 | Partially coherent, some logical gaps |
| 4 | Mostly coherent with minor issues |
| 5 | Fully coherent, clearly relates to query |
### Usefulness
*Could this idea have practical value or inspire real innovation?*
| Score | Description |
|-------|-------------|
| 1 | No practical value whatsoever |
| 2 | Minimal usefulness, highly impractical |
| 3 | Some potential value with major limitations |
| 4 | Useful idea with realistic applications |
| 5 | Highly useful, clear practical value |
## Running the System
### Start
```bash
./start.sh
```
This will:
1. Check for `data/assessment_items.json` (runs `prepare_data.py` if missing)
2. Install frontend dependencies if needed
3. Start backend API on port 8002
4. Start frontend dev server on port 5174
### Stop
```bash
./stop.sh
```
Or press `Ctrl+C` in the terminal running `start.sh`.
### Manual Start (Development)
```bash
# Terminal 1: Backend
cd backend
../../../backend/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8002 --reload
# Terminal 2: Frontend
cd frontend
npm run dev
```
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/health` | GET | Health check |
| `/api/info` | GET | Experiment info (total ideas, queries, conditions) |
| `/api/dimensions` | GET | Dimension definitions for UI |
| `/api/raters` | GET | List all raters |
| `/api/raters` | POST | Register/login rater |
| `/api/queries` | GET | List all queries |
| `/api/queries/{id}` | GET | Get query with all ideas |
| `/api/queries/{id}/unrated?rater_id=X` | GET | Get unrated ideas for rater |
| `/api/ratings` | POST | Submit a rating |
| `/api/progress/{rater_id}` | GET | Get rater's progress |
| `/api/statistics` | GET | Overall statistics |
| `/api/export` | GET | Export all ratings with metadata |
## Analysis
After collecting ratings from multiple raters:
```bash
python3 analyze_ratings.py
```
This calculates:
- **Krippendorff's alpha**: Inter-rater reliability for ordinal data
- **ICC(2,1)**: Intraclass Correlation Coefficient with 95% CI
- **Mean ratings per condition**: Compare experimental conditions
- **Kruskal-Wallis test**: Statistical significance between conditions
Output is saved to `results/analysis_results.json`.
## Database Schema
SQLite database (`results/ratings.db`):
```sql
-- Raters
CREATE TABLE raters (
rater_id TEXT PRIMARY KEY,
name TEXT,
created_at TIMESTAMP
);
-- Ratings
CREATE TABLE ratings (
id INTEGER PRIMARY KEY,
rater_id TEXT,
idea_id TEXT,
query_id TEXT,
originality INTEGER CHECK(originality BETWEEN 1 AND 5),
elaboration INTEGER CHECK(elaboration BETWEEN 1 AND 5),
coherence INTEGER CHECK(coherence BETWEEN 1 AND 5),
usefulness INTEGER CHECK(usefulness BETWEEN 1 AND 5),
skipped INTEGER DEFAULT 0,
timestamp TIMESTAMP,
UNIQUE(rater_id, idea_id)
);
-- Progress tracking
CREATE TABLE progress (
rater_id TEXT,
query_id TEXT,
completed_count INTEGER,
total_count INTEGER,
PRIMARY KEY (rater_id, query_id)
);
```
## Blind Assessment Design
To ensure unbiased evaluation:
1. **Randomization**: Ideas are shuffled within each query using a fixed seed (42) for reproducibility
2. **Hidden metadata**: Condition, expert name, and keywords are stored but not shown to raters
3. **Consistent ordering**: All raters see the same randomized order
4. **Context provided**: Only the query text is shown (e.g., "Chair", "Bicycle")
## Workflow for Raters
1. **Login**: Enter a unique rater ID
2. **Instructions**: Read dimension definitions (shown before first rating)
3. **Rate ideas**: For each idea:
- Read the idea text
- Rate all 4 dimensions (1-5)
- Click "Submit & Next" or "Skip"
4. **Progress**: Track completion per query and overall
5. **Completion**: Summary shown when all ideas are rated
## Troubleshooting
### Backend won't start
```bash
# Check if port 8002 is in use
lsof -i :8002
# Check backend logs
cat /tmp/assessment_backend.log
```
### Frontend won't start
```bash
# Reinstall dependencies
cd frontend
rm -rf node_modules
npm install
```
### Reset database
```bash
rm results/ratings.db
# Database is auto-created on next backend start
```
### Regenerate assessment data
```bash
rm data/assessment_items.json
python3 prepare_data.py
```
## Tech Stack
- **Backend**: Python 3.11+, FastAPI, SQLite, Pydantic
- **Frontend**: React 19, TypeScript, Vite, Ant Design 6.0
- **Analysis**: NumPy, SciPy (for statistical tests)

View File

@@ -0,0 +1,356 @@
#!/usr/bin/env python3
"""
Analyze assessment ratings for inter-rater reliability and condition comparisons.
This script:
1. Loads ratings from the SQLite database
2. Joins with hidden metadata (condition, expert)
3. Calculates inter-rater reliability metrics
4. Computes mean ratings per dimension per condition
5. Performs statistical comparisons between conditions
"""
import json
import sqlite3
from collections import defaultdict
from datetime import datetime
from pathlib import Path
from typing import Any
import numpy as np
from scipy import stats
# Paths
RESULTS_DIR = Path(__file__).parent / 'results'
DATA_DIR = Path(__file__).parent / 'data'
DB_PATH = RESULTS_DIR / 'ratings.db'
ASSESSMENT_DATA_PATH = DATA_DIR / 'assessment_items.json'
def load_assessment_data() -> dict[str, Any]:
"""Load the assessment items data with hidden metadata."""
with open(ASSESSMENT_DATA_PATH, 'r', encoding='utf-8') as f:
return json.load(f)
def load_ratings_from_db() -> list[dict[str, Any]]:
"""Load all ratings from the SQLite database."""
if not DB_PATH.exists():
print(f"Database not found at {DB_PATH}")
return []
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
cursor.execute('''
SELECT r.*, rat.name as rater_name
FROM ratings r
LEFT JOIN raters rat ON r.rater_id = rat.rater_id
WHERE r.skipped = 0
''')
ratings = [dict(row) for row in cursor.fetchall()]
conn.close()
return ratings
def build_idea_lookup(assessment_data: dict[str, Any]) -> dict[str, dict[str, Any]]:
"""Build a lookup table from idea_id to metadata."""
lookup = {}
for query in assessment_data['queries']:
for idea in query['ideas']:
lookup[idea['idea_id']] = {
'text': idea['text'],
'query_id': query['query_id'],
'query_text': query['query_text'],
**idea['_hidden']
}
return lookup
def calculate_krippendorff_alpha(ratings_matrix: np.ndarray) -> float:
"""
Calculate Krippendorff's alpha for ordinal data.
Args:
ratings_matrix: 2D array where rows are items and columns are raters.
NaN values indicate missing ratings.
Returns:
Krippendorff's alpha coefficient
"""
# Remove items with fewer than 2 raters
valid_items = ~np.all(np.isnan(ratings_matrix), axis=1)
ratings_matrix = ratings_matrix[valid_items]
if ratings_matrix.shape[0] < 2:
return np.nan
n_items, n_raters = ratings_matrix.shape
# Observed disagreement
observed_disagreement = 0
n_pairs = 0
for i in range(n_items):
values = ratings_matrix[i, ~np.isnan(ratings_matrix[i])]
if len(values) < 2:
continue
# Ordinal distance: squared difference
for j in range(len(values)):
for k in range(j + 1, len(values)):
observed_disagreement += (values[j] - values[k]) ** 2
n_pairs += 1
if n_pairs == 0:
return np.nan
observed_disagreement /= n_pairs
# Expected disagreement (based on marginal distribution)
all_values = ratings_matrix[~np.isnan(ratings_matrix)]
if len(all_values) < 2:
return np.nan
expected_disagreement = 0
n_total_pairs = 0
for i in range(len(all_values)):
for j in range(i + 1, len(all_values)):
expected_disagreement += (all_values[i] - all_values[j]) ** 2
n_total_pairs += 1
if n_total_pairs == 0:
return np.nan
expected_disagreement /= n_total_pairs
if expected_disagreement == 0:
return 1.0
alpha = 1 - (observed_disagreement / expected_disagreement)
return alpha
def calculate_icc(ratings_matrix: np.ndarray) -> tuple[float, float, float]:
"""
Calculate Intraclass Correlation Coefficient (ICC(2,1)).
Args:
ratings_matrix: 2D array where rows are items and columns are raters.
Returns:
Tuple of (ICC, lower_bound, upper_bound)
"""
# Remove rows with any NaN
valid_rows = ~np.any(np.isnan(ratings_matrix), axis=1)
ratings_matrix = ratings_matrix[valid_rows]
if ratings_matrix.shape[0] < 2 or ratings_matrix.shape[1] < 2:
return np.nan, np.nan, np.nan
n, k = ratings_matrix.shape
# Grand mean
grand_mean = np.mean(ratings_matrix)
# Row means (item means)
row_means = np.mean(ratings_matrix, axis=1)
# Column means (rater means)
col_means = np.mean(ratings_matrix, axis=0)
# Sum of squares
ss_total = np.sum((ratings_matrix - grand_mean) ** 2)
ss_rows = k * np.sum((row_means - grand_mean) ** 2)
ss_cols = n * np.sum((col_means - grand_mean) ** 2)
ss_error = ss_total - ss_rows - ss_cols
# Mean squares
ms_rows = ss_rows / (n - 1) if n > 1 else 0
ms_cols = ss_cols / (k - 1) if k > 1 else 0
ms_error = ss_error / ((n - 1) * (k - 1)) if (n > 1 and k > 1) else 0
# ICC(2,1) - two-way random, absolute agreement, single rater
if ms_error + (ms_cols - ms_error) / n == 0:
return np.nan, np.nan, np.nan
icc = (ms_rows - ms_error) / (ms_rows + (k - 1) * ms_error + k * (ms_cols - ms_error) / n)
# Confidence interval (approximate)
# Using F distribution
df1 = n - 1
df2 = (n - 1) * (k - 1)
if ms_error == 0:
return icc, np.nan, np.nan
f_value = ms_rows / ms_error
f_lower = f_value / stats.f.ppf(0.975, df1, df2)
f_upper = f_value / stats.f.ppf(0.025, df1, df2)
icc_lower = (f_lower - 1) / (f_lower + k - 1)
icc_upper = (f_upper - 1) / (f_upper + k - 1)
return icc, icc_lower, icc_upper
def analyze_ratings():
"""Main analysis function."""
print("=" * 60)
print("CREATIVE IDEA ASSESSMENT ANALYSIS")
print("=" * 60)
print()
# Load data
assessment_data = load_assessment_data()
ratings = load_ratings_from_db()
idea_lookup = build_idea_lookup(assessment_data)
if not ratings:
print("No ratings found in database.")
return
print(f"Loaded {len(ratings)} ratings from database")
print(f"Experiment ID: {assessment_data['experiment_id']}")
print()
# Get unique raters
raters = list(set(r['rater_id'] for r in ratings))
print(f"Raters: {raters}")
print()
# Join ratings with metadata
enriched_ratings = []
for r in ratings:
idea_meta = idea_lookup.get(r['idea_id'], {})
enriched_ratings.append({
**r,
'condition': idea_meta.get('condition', 'unknown'),
'expert_name': idea_meta.get('expert_name', ''),
'keyword': idea_meta.get('keyword', ''),
'query_text': idea_meta.get('query_text', ''),
'idea_text': idea_meta.get('text', '')
})
# Dimensions
dimensions = ['originality', 'elaboration', 'coherence', 'usefulness']
# ================================
# Inter-rater reliability
# ================================
print("-" * 60)
print("INTER-RATER RELIABILITY")
print("-" * 60)
print()
if len(raters) >= 2:
# Build ratings matrix per dimension
idea_ids = list(set(r['idea_id'] for r in enriched_ratings))
for dim in dimensions:
# Create matrix: rows = ideas, cols = raters
matrix = np.full((len(idea_ids), len(raters)), np.nan)
idea_to_idx = {idea: idx for idx, idea in enumerate(idea_ids)}
rater_to_idx = {rater: idx for idx, rater in enumerate(raters)}
for r in enriched_ratings:
if r[dim] is not None:
i = idea_to_idx[r['idea_id']]
j = rater_to_idx[r['rater_id']]
matrix[i, j] = r[dim]
# Calculate metrics
alpha = calculate_krippendorff_alpha(matrix)
icc, icc_low, icc_high = calculate_icc(matrix)
print(f"{dim.upper()}:")
print(f" Krippendorff's alpha: {alpha:.3f}")
print(f" ICC(2,1): {icc:.3f} (95% CI: {icc_low:.3f} - {icc_high:.3f})")
print()
else:
print("Need at least 2 raters for inter-rater reliability analysis.")
print()
# ================================
# Condition comparisons
# ================================
print("-" * 60)
print("MEAN RATINGS BY CONDITION")
print("-" * 60)
print()
# Group ratings by condition
condition_ratings: dict[str, dict[str, list[int]]] = defaultdict(lambda: defaultdict(list))
for r in enriched_ratings:
condition = r['condition']
for dim in dimensions:
if r[dim] is not None:
condition_ratings[condition][dim].append(r[dim])
# Calculate means and print
condition_stats = {}
for condition in sorted(condition_ratings.keys()):
print(f"\n{condition}:")
condition_stats[condition] = {}
for dim in dimensions:
values = condition_ratings[condition][dim]
if values:
mean = np.mean(values)
std = np.std(values)
n = len(values)
condition_stats[condition][dim] = {'mean': mean, 'std': std, 'n': n}
print(f" {dim}: {mean:.2f} (SD={std:.2f}, n={n})")
else:
print(f" {dim}: no data")
# ================================
# Statistical comparisons
# ================================
print()
print("-" * 60)
print("STATISTICAL COMPARISONS (Kruskal-Wallis)")
print("-" * 60)
print()
conditions = sorted(condition_ratings.keys())
if len(conditions) >= 2:
for dim in dimensions:
groups = [condition_ratings[c][dim] for c in conditions if condition_ratings[c][dim]]
if len(groups) >= 2:
h_stat, p_value = stats.kruskal(*groups)
sig = "*" if p_value < 0.05 else ""
print(f"{dim}: H={h_stat:.2f}, p={p_value:.4f} {sig}")
else:
print(f"{dim}: insufficient data for comparison")
else:
print("Need at least 2 conditions with data for statistical comparison.")
# ================================
# Export results
# ================================
output = {
'analysis_timestamp': datetime.utcnow().isoformat(),
'experiment_id': assessment_data['experiment_id'],
'total_ratings': len(ratings),
'raters': raters,
'rater_count': len(raters),
'condition_stats': condition_stats,
'enriched_ratings': enriched_ratings
}
output_path = RESULTS_DIR / 'analysis_results.json'
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(output, f, ensure_ascii=False, indent=2, default=str)
print()
print("-" * 60)
print(f"Results exported to: {output_path}")
print("=" * 60)
if __name__ == '__main__':
analyze_ratings()

View File

@@ -0,0 +1 @@
"""Assessment backend package."""

View File

@@ -0,0 +1,374 @@
"""
FastAPI backend for human assessment of creative ideas.
"""
import json
from datetime import datetime
from pathlib import Path
from typing import Any
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
try:
from . import database as db
from .models import (
DIMENSION_DEFINITIONS,
ExportData,
ExportRating,
IdeaForRating,
Progress,
QueryInfo,
QueryWithIdeas,
Rater,
RaterCreate,
RaterProgress,
Rating,
RatingSubmit,
Statistics,
)
except ImportError:
import database as db
from models import (
DIMENSION_DEFINITIONS,
ExportData,
ExportRating,
IdeaForRating,
Progress,
QueryInfo,
QueryWithIdeas,
Rater,
RaterCreate,
RaterProgress,
Rating,
RatingSubmit,
Statistics,
)
# Load assessment data
DATA_PATH = Path(__file__).parent.parent / 'data' / 'assessment_items.json'
def load_assessment_data() -> dict[str, Any]:
"""Load the assessment items data."""
if not DATA_PATH.exists():
raise RuntimeError(f"Assessment data not found at {DATA_PATH}. Run prepare_data.py first.")
with open(DATA_PATH, 'r', encoding='utf-8') as f:
return json.load(f)
# Initialize FastAPI app
app = FastAPI(
title="Creative Idea Assessment API",
description="API for human assessment of creative ideas using Torrance-inspired metrics",
version="1.0.0"
)
# CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Cache for assessment data
_assessment_data: dict[str, Any] | None = None
def get_assessment_data() -> dict[str, Any]:
"""Get cached assessment data."""
global _assessment_data
if _assessment_data is None:
_assessment_data = load_assessment_data()
return _assessment_data
# Rater endpoints
@app.get("/api/raters", response_model=list[Rater])
def list_raters() -> list[dict[str, Any]]:
"""List all registered raters."""
return db.list_raters()
@app.post("/api/raters", response_model=Rater)
def create_or_get_rater(rater_data: RaterCreate) -> dict[str, Any]:
"""Register a new rater or get existing one."""
return db.create_rater(rater_data.rater_id, rater_data.name)
@app.get("/api/raters/{rater_id}", response_model=Rater)
def get_rater(rater_id: str) -> dict[str, Any]:
"""Get a specific rater."""
rater = db.get_rater(rater_id)
if not rater:
raise HTTPException(status_code=404, detail="Rater not found")
return rater
# Query endpoints
@app.get("/api/queries", response_model=list[QueryInfo])
def list_queries() -> list[dict[str, Any]]:
"""List all queries available for assessment."""
data = get_assessment_data()
return [
{
'query_id': q['query_id'],
'query_text': q['query_text'],
'category': q.get('category', ''),
'idea_count': q['idea_count']
}
for q in data['queries']
]
@app.get("/api/queries/{query_id}", response_model=QueryWithIdeas)
def get_query_with_ideas(query_id: str) -> dict[str, Any]:
"""Get a query with all its ideas for rating (without hidden metadata)."""
data = get_assessment_data()
for query in data['queries']:
if query['query_id'] == query_id:
ideas = [
IdeaForRating(
idea_id=idea['idea_id'],
text=idea['text'],
index=idx
)
for idx, idea in enumerate(query['ideas'])
]
return QueryWithIdeas(
query_id=query['query_id'],
query_text=query['query_text'],
category=query.get('category', ''),
ideas=ideas,
total_count=len(ideas)
)
raise HTTPException(status_code=404, detail="Query not found")
@app.get("/api/queries/{query_id}/unrated", response_model=QueryWithIdeas)
def get_unrated_ideas(query_id: str, rater_id: str) -> dict[str, Any]:
"""Get unrated ideas for a query by a specific rater."""
data = get_assessment_data()
for query in data['queries']:
if query['query_id'] == query_id:
# Get already rated idea IDs
rated_ids = db.get_rated_idea_ids(rater_id, query_id)
# Filter to unrated ideas
unrated_ideas = [
IdeaForRating(
idea_id=idea['idea_id'],
text=idea['text'],
index=idx
)
for idx, idea in enumerate(query['ideas'])
if idea['idea_id'] not in rated_ids
]
return QueryWithIdeas(
query_id=query['query_id'],
query_text=query['query_text'],
category=query.get('category', ''),
ideas=unrated_ideas,
total_count=query['idea_count']
)
raise HTTPException(status_code=404, detail="Query not found")
# Rating endpoints
@app.post("/api/ratings", response_model=dict[str, Any])
def submit_rating(rating: RatingSubmit) -> dict[str, Any]:
"""Submit a rating for an idea."""
# Validate that rater exists
rater = db.get_rater(rating.rater_id)
if not rater:
raise HTTPException(status_code=404, detail="Rater not found. Please register first.")
# Validate idea exists
data = get_assessment_data()
idea_found = False
for query in data['queries']:
for idea in query['ideas']:
if idea['idea_id'] == rating.idea_id:
idea_found = True
break
if idea_found:
break
if not idea_found:
raise HTTPException(status_code=404, detail="Idea not found")
# If not skipped, require all ratings
if not rating.skipped:
if rating.originality is None or rating.elaboration is None or rating.coherence is None or rating.usefulness is None:
raise HTTPException(
status_code=400,
detail="All dimensions must be rated unless skipping"
)
# Save rating
return db.save_rating(
rater_id=rating.rater_id,
idea_id=rating.idea_id,
query_id=rating.query_id,
originality=rating.originality,
elaboration=rating.elaboration,
coherence=rating.coherence,
usefulness=rating.usefulness,
skipped=rating.skipped
)
@app.get("/api/ratings/{rater_id}/{idea_id}", response_model=Rating | None)
def get_rating(rater_id: str, idea_id: str) -> dict[str, Any] | None:
"""Get a specific rating."""
return db.get_rating(rater_id, idea_id)
@app.get("/api/ratings/rater/{rater_id}", response_model=list[Rating])
def get_ratings_by_rater(rater_id: str) -> list[dict[str, Any]]:
"""Get all ratings by a rater."""
return db.get_ratings_by_rater(rater_id)
# Progress endpoints
@app.get("/api/progress/{rater_id}", response_model=RaterProgress)
def get_rater_progress(rater_id: str) -> RaterProgress:
"""Get complete progress for a rater."""
rater = db.get_rater(rater_id)
if not rater:
raise HTTPException(status_code=404, detail="Rater not found")
data = get_assessment_data()
# Get rated idea counts per query
ratings = db.get_ratings_by_rater(rater_id)
ratings_per_query: dict[str, int] = {}
for r in ratings:
qid = r['query_id']
ratings_per_query[qid] = ratings_per_query.get(qid, 0) + 1
# Build progress list
query_progress = []
total_completed = 0
total_ideas = 0
for query in data['queries']:
qid = query['query_id']
completed = ratings_per_query.get(qid, 0)
total = query['idea_count']
query_progress.append(Progress(
rater_id=rater_id,
query_id=qid,
completed_count=completed,
total_count=total
))
total_completed += completed
total_ideas += total
percentage = (total_completed / total_ideas * 100) if total_ideas > 0 else 0
return RaterProgress(
rater_id=rater_id,
queries=query_progress,
total_completed=total_completed,
total_ideas=total_ideas,
percentage=round(percentage, 1)
)
# Statistics endpoint
@app.get("/api/statistics", response_model=Statistics)
def get_statistics() -> Statistics:
"""Get overall assessment statistics."""
stats = db.get_statistics()
return Statistics(**stats)
# Dimension definitions endpoint
@app.get("/api/dimensions")
def get_dimensions() -> dict[str, Any]:
"""Get dimension definitions for the UI."""
return DIMENSION_DEFINITIONS
# Export endpoint
@app.get("/api/export", response_model=ExportData)
def export_ratings() -> ExportData:
"""Export all ratings with hidden metadata for analysis."""
data = get_assessment_data()
all_ratings = db.get_all_ratings()
# Build idea lookup with hidden metadata
idea_lookup: dict[str, dict[str, Any]] = {}
query_lookup: dict[str, str] = {}
for query in data['queries']:
query_lookup[query['query_id']] = query['query_text']
for idea in query['ideas']:
idea_lookup[idea['idea_id']] = {
'text': idea['text'],
'condition': idea['_hidden']['condition'],
'expert_name': idea['_hidden']['expert_name'],
'keyword': idea['_hidden']['keyword']
}
# Build export ratings
export_ratings = []
for r in all_ratings:
idea_data = idea_lookup.get(r['idea_id'], {})
export_ratings.append(ExportRating(
rater_id=r['rater_id'],
idea_id=r['idea_id'],
query_id=r['query_id'],
query_text=query_lookup.get(r['query_id'], ''),
idea_text=idea_data.get('text', ''),
originality=r['originality'],
elaboration=r['elaboration'],
coherence=r['coherence'],
usefulness=r['usefulness'],
skipped=bool(r['skipped']),
condition=idea_data.get('condition', ''),
expert_name=idea_data.get('expert_name', ''),
keyword=idea_data.get('keyword', ''),
timestamp=r['timestamp']
))
return ExportData(
experiment_id=data['experiment_id'],
export_timestamp=datetime.utcnow(),
rater_count=len(db.list_raters()),
rating_count=len(export_ratings),
ratings=export_ratings
)
# Health check
@app.get("/api/health")
def health_check() -> dict[str, str]:
"""Health check endpoint."""
return {"status": "healthy"}
# Info endpoint
@app.get("/api/info")
def get_info() -> dict[str, Any]:
"""Get assessment session info."""
data = get_assessment_data()
return {
'experiment_id': data['experiment_id'],
'total_ideas': data['total_ideas'],
'query_count': data['query_count'],
'conditions': data['conditions'],
'randomization_seed': data['randomization_seed']
}

View File

@@ -0,0 +1,309 @@
"""
SQLite database setup and operations for assessment ratings storage.
"""
import sqlite3
from contextlib import contextmanager
from datetime import datetime
from pathlib import Path
from typing import Any, Generator
# Database path
DB_PATH = Path(__file__).parent.parent / 'results' / 'ratings.db'
def get_db_path() -> Path:
"""Get the database path, ensuring directory exists."""
DB_PATH.parent.mkdir(parents=True, exist_ok=True)
return DB_PATH
@contextmanager
def get_connection() -> Generator[sqlite3.Connection, None, None]:
"""Get a database connection as a context manager."""
conn = sqlite3.connect(get_db_path())
conn.row_factory = sqlite3.Row
try:
yield conn
finally:
conn.close()
def init_db() -> None:
"""Initialize the database with required tables."""
with get_connection() as conn:
cursor = conn.cursor()
# Raters table
cursor.execute('''
CREATE TABLE IF NOT EXISTS raters (
rater_id TEXT PRIMARY KEY,
name TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# Ratings table
cursor.execute('''
CREATE TABLE IF NOT EXISTS ratings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
rater_id TEXT NOT NULL,
idea_id TEXT NOT NULL,
query_id TEXT NOT NULL,
originality INTEGER CHECK(originality BETWEEN 1 AND 5),
elaboration INTEGER CHECK(elaboration BETWEEN 1 AND 5),
coherence INTEGER CHECK(coherence BETWEEN 1 AND 5),
usefulness INTEGER CHECK(usefulness BETWEEN 1 AND 5),
skipped INTEGER DEFAULT 0,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (rater_id) REFERENCES raters(rater_id),
UNIQUE(rater_id, idea_id)
)
''')
# Progress table
cursor.execute('''
CREATE TABLE IF NOT EXISTS progress (
rater_id TEXT NOT NULL,
query_id TEXT NOT NULL,
completed_count INTEGER DEFAULT 0,
total_count INTEGER DEFAULT 0,
started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (rater_id, query_id),
FOREIGN KEY (rater_id) REFERENCES raters(rater_id)
)
''')
# Create indexes for common queries
cursor.execute('''
CREATE INDEX IF NOT EXISTS idx_ratings_rater
ON ratings(rater_id)
''')
cursor.execute('''
CREATE INDEX IF NOT EXISTS idx_ratings_idea
ON ratings(idea_id)
''')
conn.commit()
# Rater operations
def create_rater(rater_id: str, name: str | None = None) -> dict[str, Any]:
"""Create a new rater."""
with get_connection() as conn:
cursor = conn.cursor()
try:
cursor.execute(
'INSERT INTO raters (rater_id, name) VALUES (?, ?)',
(rater_id, name or rater_id)
)
conn.commit()
return {'rater_id': rater_id, 'name': name or rater_id, 'created': True}
except sqlite3.IntegrityError:
# Rater already exists
return get_rater(rater_id)
def get_rater(rater_id: str) -> dict[str, Any] | None:
"""Get a rater by ID."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute('SELECT * FROM raters WHERE rater_id = ?', (rater_id,))
row = cursor.fetchone()
if row:
return dict(row)
return None
def list_raters() -> list[dict[str, Any]]:
"""List all raters."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute('SELECT * FROM raters ORDER BY created_at')
return [dict(row) for row in cursor.fetchall()]
# Rating operations
def save_rating(
rater_id: str,
idea_id: str,
query_id: str,
originality: int | None,
elaboration: int | None,
coherence: int | None,
usefulness: int | None,
skipped: bool = False
) -> dict[str, Any]:
"""Save or update a rating."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute('''
INSERT INTO ratings (rater_id, idea_id, query_id, originality, elaboration, coherence, usefulness, skipped, timestamp)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(rater_id, idea_id) DO UPDATE SET
originality = excluded.originality,
elaboration = excluded.elaboration,
coherence = excluded.coherence,
usefulness = excluded.usefulness,
skipped = excluded.skipped,
timestamp = excluded.timestamp
''', (rater_id, idea_id, query_id, originality, elaboration, coherence, usefulness, int(skipped), datetime.utcnow()))
conn.commit()
# Update progress
update_progress(rater_id, query_id)
return {
'rater_id': rater_id,
'idea_id': idea_id,
'saved': True
}
def get_rating(rater_id: str, idea_id: str) -> dict[str, Any] | None:
"""Get a specific rating."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
'SELECT * FROM ratings WHERE rater_id = ? AND idea_id = ?',
(rater_id, idea_id)
)
row = cursor.fetchone()
if row:
return dict(row)
return None
def get_ratings_by_rater(rater_id: str) -> list[dict[str, Any]]:
"""Get all ratings by a rater."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
'SELECT * FROM ratings WHERE rater_id = ? ORDER BY timestamp',
(rater_id,)
)
return [dict(row) for row in cursor.fetchall()]
def get_ratings_by_idea(idea_id: str) -> list[dict[str, Any]]:
"""Get all ratings for an idea."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
'SELECT * FROM ratings WHERE idea_id = ? ORDER BY rater_id',
(idea_id,)
)
return [dict(row) for row in cursor.fetchall()]
def get_all_ratings() -> list[dict[str, Any]]:
"""Get all ratings."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute('SELECT * FROM ratings ORDER BY timestamp')
return [dict(row) for row in cursor.fetchall()]
# Progress operations
def update_progress(rater_id: str, query_id: str) -> None:
"""Update progress for a rater on a query."""
with get_connection() as conn:
cursor = conn.cursor()
# Count completed ratings for this query
cursor.execute('''
SELECT COUNT(*) as count FROM ratings
WHERE rater_id = ? AND query_id = ?
''', (rater_id, query_id))
completed = cursor.fetchone()['count']
# Update or insert progress
cursor.execute('''
INSERT INTO progress (rater_id, query_id, completed_count, updated_at)
VALUES (?, ?, ?, ?)
ON CONFLICT(rater_id, query_id) DO UPDATE SET
completed_count = excluded.completed_count,
updated_at = excluded.updated_at
''', (rater_id, query_id, completed, datetime.utcnow()))
conn.commit()
def set_progress_total(rater_id: str, query_id: str, total: int) -> None:
"""Set the total count for a query's progress."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute('''
INSERT INTO progress (rater_id, query_id, total_count, completed_count)
VALUES (?, ?, ?, 0)
ON CONFLICT(rater_id, query_id) DO UPDATE SET
total_count = excluded.total_count
''', (rater_id, query_id, total))
conn.commit()
def get_progress(rater_id: str) -> list[dict[str, Any]]:
"""Get progress for all queries for a rater."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
'SELECT * FROM progress WHERE rater_id = ? ORDER BY query_id',
(rater_id,)
)
return [dict(row) for row in cursor.fetchall()]
def get_progress_for_query(rater_id: str, query_id: str) -> dict[str, Any] | None:
"""Get progress for a specific query."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
'SELECT * FROM progress WHERE rater_id = ? AND query_id = ?',
(rater_id, query_id)
)
row = cursor.fetchone()
if row:
return dict(row)
return None
def get_rated_idea_ids(rater_id: str, query_id: str) -> set[str]:
"""Get the set of idea IDs already rated by a rater for a query."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute(
'SELECT idea_id FROM ratings WHERE rater_id = ? AND query_id = ?',
(rater_id, query_id)
)
return {row['idea_id'] for row in cursor.fetchall()}
# Statistics
def get_statistics() -> dict[str, Any]:
"""Get overall statistics."""
with get_connection() as conn:
cursor = conn.cursor()
cursor.execute('SELECT COUNT(*) as count FROM raters')
rater_count = cursor.fetchone()['count']
cursor.execute('SELECT COUNT(*) as count FROM ratings WHERE skipped = 0')
rating_count = cursor.fetchone()['count']
cursor.execute('SELECT COUNT(*) as count FROM ratings WHERE skipped = 1')
skip_count = cursor.fetchone()['count']
cursor.execute('SELECT COUNT(DISTINCT idea_id) as count FROM ratings')
rated_ideas = cursor.fetchone()['count']
return {
'rater_count': rater_count,
'rating_count': rating_count,
'skip_count': skip_count,
'rated_ideas': rated_ideas
}
# Initialize on import
init_db()

View File

@@ -0,0 +1,183 @@
"""
Pydantic models for the assessment API.
"""
from datetime import datetime
from pydantic import BaseModel, Field
# Request models
class RaterCreate(BaseModel):
"""Request to create or login as a rater."""
rater_id: str = Field(..., min_length=1, max_length=50, description="Unique rater identifier")
name: str | None = Field(None, max_length=100, description="Optional display name")
class RatingSubmit(BaseModel):
"""Request to submit a rating."""
rater_id: str = Field(..., description="Rater identifier")
idea_id: str = Field(..., description="Idea identifier")
query_id: str = Field(..., description="Query identifier")
originality: int | None = Field(None, ge=1, le=5, description="Originality score 1-5")
elaboration: int | None = Field(None, ge=1, le=5, description="Elaboration score 1-5")
coherence: int | None = Field(None, ge=1, le=5, description="Coherence score 1-5")
usefulness: int | None = Field(None, ge=1, le=5, description="Usefulness score 1-5")
skipped: bool = Field(False, description="Whether the idea was skipped")
# Response models
class Rater(BaseModel):
"""Rater information."""
rater_id: str
name: str | None
created_at: datetime | None = None
class Rating(BaseModel):
"""A single rating."""
id: int
rater_id: str
idea_id: str
query_id: str
originality: int | None
elaboration: int | None
coherence: int | None
usefulness: int | None
skipped: int
timestamp: datetime | None
class Progress(BaseModel):
"""Progress for a rater on a query."""
rater_id: str
query_id: str
completed_count: int
total_count: int
started_at: datetime | None = None
updated_at: datetime | None = None
class QueryInfo(BaseModel):
"""Information about a query."""
query_id: str
query_text: str
category: str
idea_count: int
class IdeaForRating(BaseModel):
"""An idea presented for rating (without hidden metadata)."""
idea_id: str
text: str
index: int # Position in the randomized list for this query
class QueryWithIdeas(BaseModel):
"""A query with its ideas for rating."""
query_id: str
query_text: str
category: str
ideas: list[IdeaForRating]
total_count: int
class Statistics(BaseModel):
"""Overall statistics."""
rater_count: int
rating_count: int
skip_count: int
rated_ideas: int
class RaterProgress(BaseModel):
"""Complete progress summary for a rater."""
rater_id: str
queries: list[Progress]
total_completed: int
total_ideas: int
percentage: float
# Export response models
class ExportRating(BaseModel):
"""Rating with hidden metadata for export."""
rater_id: str
idea_id: str
query_id: str
query_text: str
idea_text: str
originality: int | None
elaboration: int | None
coherence: int | None
usefulness: int | None
skipped: bool
condition: str
expert_name: str
keyword: str
timestamp: datetime | None
class ExportData(BaseModel):
"""Full export data structure."""
experiment_id: str
export_timestamp: datetime
rater_count: int
rating_count: int
ratings: list[ExportRating]
# Dimension definitions (for frontend)
DIMENSION_DEFINITIONS = {
"originality": {
"name": "Originality",
"question": "How unexpected or surprising is this idea? Would most people NOT think of this?",
"scale": {
1: "Very common/obvious idea anyone would suggest",
2: "Somewhat common, slight variation on expected ideas",
3: "Moderately original, some unexpected elements",
4: "Quite original, notably different approach",
5: "Highly unexpected, truly novel concept"
},
"low_label": "Common",
"high_label": "Unexpected"
},
"elaboration": {
"name": "Elaboration",
"question": "How detailed and well-developed is this idea?",
"scale": {
1: "Vague, minimal detail, just a concept",
2: "Basic idea with little specificity",
3: "Moderately detailed, some specifics provided",
4: "Well-developed with clear implementation hints",
5: "Highly specific, thoroughly developed concept"
},
"low_label": "Vague",
"high_label": "Detailed"
},
"coherence": {
"name": "Coherence",
"question": "Does this idea make logical sense and relate to the query object?",
"scale": {
1: "Nonsensical, irrelevant, or incomprehensible",
2: "Mostly unclear, weak connection to query",
3: "Partially coherent, some logical gaps",
4: "Mostly coherent with minor issues",
5: "Fully coherent, clearly relates to query"
},
"low_label": "Nonsense",
"high_label": "Coherent"
},
"usefulness": {
"name": "Usefulness",
"question": "Could this idea have practical value or inspire real innovation?",
"scale": {
1: "No practical value whatsoever",
2: "Minimal usefulness, highly impractical",
3: "Some potential value with major limitations",
4: "Useful idea with realistic applications",
5: "Highly useful, clear practical value"
},
"low_label": "Useless",
"high_label": "Useful"
}
}

View File

@@ -0,0 +1,3 @@
fastapi>=0.109.0
uvicorn>=0.27.0
pydantic>=2.5.0

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,13 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Creative Idea Assessment</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,32 @@
{
"name": "assessment-frontend",
"private": true,
"version": "1.0.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "tsc -b && vite build",
"lint": "eslint .",
"preview": "vite preview"
},
"dependencies": {
"@ant-design/icons": "^6.1.0",
"antd": "^6.0.0",
"react": "^19.2.0",
"react-dom": "^19.2.0"
},
"devDependencies": {
"@eslint/js": "^9.39.1",
"@types/node": "^24.10.1",
"@types/react": "^19.2.5",
"@types/react-dom": "^19.2.3",
"@vitejs/plugin-react": "^5.1.1",
"eslint": "^9.39.1",
"eslint-plugin-react-hooks": "^7.0.1",
"eslint-plugin-react-refresh": "^0.4.24",
"globals": "^16.5.0",
"typescript": "~5.9.3",
"typescript-eslint": "^8.46.4",
"vite": "^7.2.4"
}
}

View File

@@ -0,0 +1,109 @@
/**
* Main application component for the assessment interface.
*/
import { ConfigProvider, theme, Spin } from 'antd';
import { useAssessment } from './hooks/useAssessment';
import { RaterLogin } from './components/RaterLogin';
import { InstructionsPage } from './components/InstructionsPage';
import { AssessmentPage } from './components/AssessmentPage';
import { CompletionPage } from './components/CompletionPage';
function App() {
const assessment = useAssessment();
const renderContent = () => {
// Show loading spinner for initial load
if (assessment.loading && !assessment.rater) {
return (
<div style={{
display: 'flex',
justifyContent: 'center',
alignItems: 'center',
minHeight: '100vh'
}}>
<Spin size="large" />
</div>
);
}
switch (assessment.view) {
case 'login':
return (
<RaterLogin
onLogin={assessment.login}
loading={assessment.loading}
error={assessment.error}
/>
);
case 'instructions':
return (
<InstructionsPage
dimensions={assessment.dimensions}
onStart={assessment.startAssessment}
loading={assessment.loading}
/>
);
case 'assessment':
if (!assessment.rater || !assessment.currentQuery || !assessment.currentIdea || !assessment.dimensions) {
return (
<div style={{
display: 'flex',
justifyContent: 'center',
alignItems: 'center',
minHeight: '100vh'
}}>
<Spin size="large" tip="Loading..." />
</div>
);
}
return (
<AssessmentPage
raterId={assessment.rater.rater_id}
queryId={assessment.currentQuery.query_id}
queryText={assessment.currentQuery.query_text}
idea={assessment.currentIdea}
ideaIndex={assessment.currentIdeaIndex}
totalIdeas={assessment.currentQuery.total_count}
dimensions={assessment.dimensions}
progress={assessment.progress}
onNext={assessment.nextIdea}
onPrev={assessment.prevIdea}
onShowDefinitions={assessment.showInstructions}
onLogout={assessment.logout}
canGoPrev={assessment.currentIdeaIndex > 0}
/>
);
case 'completion':
return (
<CompletionPage
raterId={assessment.rater?.rater_id ?? ''}
progress={assessment.progress}
onLogout={assessment.logout}
/>
);
default:
return null;
}
};
return (
<ConfigProvider
theme={{
algorithm: theme.defaultAlgorithm,
token: {
colorPrimary: '#1677ff',
borderRadius: 6,
},
}}
>
{renderContent()}
</ConfigProvider>
);
}
export default App;

View File

@@ -0,0 +1,199 @@
/**
* Main assessment page for rating ideas.
*/
import { Card, Button, Space, Alert, Typography } from 'antd';
import {
ArrowLeftOutlined,
ArrowRightOutlined,
ForwardOutlined,
BookOutlined,
LogoutOutlined
} from '@ant-design/icons';
import type { IdeaForRating, DimensionDefinitions, RaterProgress } from '../types';
import { useRatings } from '../hooks/useRatings';
import { IdeaCard } from './IdeaCard';
import { RatingSlider } from './RatingSlider';
import { ProgressBar } from './ProgressBar';
const { Text } = Typography;
interface AssessmentPageProps {
raterId: string;
queryId: string;
queryText: string;
idea: IdeaForRating;
ideaIndex: number;
totalIdeas: number;
dimensions: DimensionDefinitions;
progress: RaterProgress | null;
onNext: () => void;
onPrev: () => void;
onShowDefinitions: () => void;
onLogout: () => void;
canGoPrev: boolean;
}
export function AssessmentPage({
raterId,
queryId,
queryText,
idea,
ideaIndex,
totalIdeas,
dimensions,
progress,
onNext,
onPrev,
onShowDefinitions,
onLogout,
canGoPrev
}: AssessmentPageProps) {
const {
ratings,
setRating,
isComplete,
submit,
skip,
submitting,
error
} = useRatings({
raterId,
queryId,
ideaId: idea.idea_id,
onSuccess: onNext
});
const handleSubmit = async () => {
await submit();
};
const handleSkip = async () => {
await skip();
};
// Calculate query progress
const queryProgress = progress?.queries.find(q => q.query_id === queryId);
const queryCompleted = queryProgress?.completed_count ?? ideaIndex;
const queryTotal = totalIdeas;
return (
<div style={{ maxWidth: 800, margin: '0 auto', padding: 24 }}>
{/* Header with query info and overall progress */}
<Card size="small" style={{ marginBottom: 16 }}>
<div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center', marginBottom: 8 }}>
<Text strong style={{ fontSize: 16 }}>Query: "{queryText}"</Text>
<Space>
<Button
icon={<BookOutlined />}
onClick={onShowDefinitions}
size="small"
>
Definitions
</Button>
<Button
icon={<LogoutOutlined />}
onClick={onLogout}
size="small"
danger
>
Exit
</Button>
</Space>
</div>
<ProgressBar
completed={queryCompleted}
total={queryTotal}
label="Query Progress"
/>
{progress && (
<div style={{ marginTop: 8 }}>
<ProgressBar
completed={progress.total_completed}
total={progress.total_ideas}
label="Overall Progress"
/>
</div>
)}
</Card>
{/* Error display */}
{error && (
<Alert
message={error}
type="error"
showIcon
closable
style={{ marginBottom: 16 }}
/>
)}
{/* Idea card */}
<IdeaCard
ideaNumber={ideaIndex + 1}
text={idea.text}
queryText={queryText}
/>
{/* Rating inputs */}
<Card style={{ marginBottom: 16 }}>
<RatingSlider
dimension={dimensions.originality}
value={ratings.originality}
onChange={(v) => setRating('originality', v)}
disabled={submitting}
/>
<RatingSlider
dimension={dimensions.elaboration}
value={ratings.elaboration}
onChange={(v) => setRating('elaboration', v)}
disabled={submitting}
/>
<RatingSlider
dimension={dimensions.coherence}
value={ratings.coherence}
onChange={(v) => setRating('coherence', v)}
disabled={submitting}
/>
<RatingSlider
dimension={dimensions.usefulness}
value={ratings.usefulness}
onChange={(v) => setRating('usefulness', v)}
disabled={submitting}
/>
</Card>
{/* Navigation buttons */}
<Card>
<div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
<Button
icon={<ArrowLeftOutlined />}
onClick={onPrev}
disabled={!canGoPrev || submitting}
>
Back
</Button>
<Space>
<Button
icon={<ForwardOutlined />}
onClick={handleSkip}
loading={submitting}
>
Skip
</Button>
<Button
type="primary"
icon={<ArrowRightOutlined />}
onClick={handleSubmit}
loading={submitting}
disabled={!isComplete()}
>
Submit & Next
</Button>
</Space>
</div>
</Card>
</div>
);
}

View File

@@ -0,0 +1,105 @@
/**
* Completion page shown when all ideas have been rated.
*/
import { Card, Button, Typography, Space, Result, Statistic, Row, Col } from 'antd';
import { CheckCircleOutlined, BarChartOutlined, LogoutOutlined } from '@ant-design/icons';
import type { RaterProgress } from '../types';
const { Title, Text } = Typography;
interface CompletionPageProps {
raterId: string;
progress: RaterProgress | null;
onLogout: () => void;
}
export function CompletionPage({ raterId, progress, onLogout }: CompletionPageProps) {
const completed = progress?.total_completed ?? 0;
const total = progress?.total_ideas ?? 0;
const percentage = progress?.percentage ?? 0;
const isFullyComplete = completed >= total;
return (
<div style={{
display: 'flex',
justifyContent: 'center',
alignItems: 'center',
minHeight: '100vh',
padding: 24
}}>
<Card style={{ maxWidth: 600, width: '100%' }}>
<Result
status={isFullyComplete ? 'success' : 'info'}
icon={isFullyComplete ? <CheckCircleOutlined /> : <BarChartOutlined />}
title={isFullyComplete ? 'Assessment Complete!' : 'Session Summary'}
subTitle={
isFullyComplete
? 'Thank you for completing the assessment.'
: 'You have made progress on the assessment.'
}
extra={[
<Button
type="primary"
key="logout"
icon={<LogoutOutlined />}
onClick={onLogout}
>
Exit
</Button>
]}
>
<Row gutter={16} style={{ marginTop: 24 }}>
<Col span={8}>
<Statistic
title="Ideas Rated"
value={completed}
suffix={`/ ${total}`}
/>
</Col>
<Col span={8}>
<Statistic
title="Progress"
value={percentage}
suffix="%"
precision={1}
/>
</Col>
<Col span={8}>
<Statistic
title="Rater ID"
value={raterId}
valueStyle={{ fontSize: 16 }}
/>
</Col>
</Row>
{progress && progress.queries.length > 0 && (
<div style={{ marginTop: 24 }}>
<Title level={5}>Progress by Query</Title>
<Space direction="vertical" style={{ width: '100%' }}>
{progress.queries.map((q) => (
<div
key={q.query_id}
style={{
display: 'flex',
justifyContent: 'space-between',
padding: '4px 0'
}}
>
<Text>{q.query_id}</Text>
<Text type={q.completed_count >= q.total_count ? 'success' : 'secondary'}>
{q.completed_count} / {q.total_count}
{q.completed_count >= q.total_count && ' ✓'}
</Text>
</div>
))}
</Space>
</div>
)}
</Result>
</Card>
</div>
);
}

View File

@@ -0,0 +1,36 @@
/**
* Card displaying a single idea for rating.
*/
import { Card, Typography, Tag } from 'antd';
const { Text, Paragraph } = Typography;
interface IdeaCardProps {
ideaNumber: number;
text: string;
queryText: string;
}
export function IdeaCard({ ideaNumber, text, queryText }: IdeaCardProps) {
return (
<Card
title={
<div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
<Text strong>IDEA #{ideaNumber}</Text>
<Tag color="blue">Query: {queryText}</Tag>
</div>
}
style={{ marginBottom: 24 }}
>
<Paragraph style={{
fontSize: 16,
lineHeight: 1.8,
margin: 0,
padding: '8px 0'
}}>
"{text}"
</Paragraph>
</Card>
);
}

View File

@@ -0,0 +1,134 @@
/**
* Instructions page showing dimension definitions.
*/
import { useState } from 'react';
import { Card, Button, Typography, Space, Checkbox, Divider, Tag } from 'antd';
import { PlayCircleOutlined } from '@ant-design/icons';
import type { DimensionDefinitions } from '../types';
const { Title, Text, Paragraph } = Typography;
interface InstructionsPageProps {
dimensions: DimensionDefinitions | null;
onStart: () => void;
onBack?: () => void;
loading: boolean;
isReturning?: boolean;
}
export function InstructionsPage({
dimensions,
onStart,
onBack,
loading,
isReturning = false
}: InstructionsPageProps) {
const [acknowledged, setAcknowledged] = useState(isReturning);
if (!dimensions) {
return (
<div style={{ padding: 24, textAlign: 'center' }}>
<Text>Loading instructions...</Text>
</div>
);
}
const dimensionOrder = ['originality', 'elaboration', 'coherence', 'usefulness'] as const;
return (
<div style={{
maxWidth: 800,
margin: '0 auto',
padding: 24
}}>
<Card>
<Space direction="vertical" size="large" style={{ width: '100%' }}>
<div style={{ textAlign: 'center' }}>
<Title level={2}>Assessment Instructions</Title>
<Paragraph type="secondary">
You will rate creative ideas on 4 dimensions using a 1-5 scale.
Please read each definition carefully before beginning.
</Paragraph>
</div>
<Divider />
{dimensionOrder.map((key) => {
const dim = dimensions[key];
return (
<Card
key={key}
size="small"
title={
<Space>
<Tag color="blue">{dim.name}</Tag>
<Text type="secondary">{dim.question}</Text>
</Space>
}
style={{ marginBottom: 16 }}
>
<div style={{
display: 'grid',
gridTemplateColumns: 'auto 1fr',
gap: '8px 16px',
fontSize: 14
}}>
{([1, 2, 3, 4, 5] as const).map((score) => (
<>
<Tag
key={`score-${score}`}
color={score <= 2 ? 'red' : score === 3 ? 'orange' : 'green'}
>
{score}
</Tag>
<Text key={`text-${score}`}>
{dim.scale[score]}
</Text>
</>
))}
</div>
<Divider style={{ margin: '12px 0' }} />
<div style={{ display: 'flex', justifyContent: 'space-between' }}>
<Text type="secondary">{dim.low_label}</Text>
<Text type="secondary">{dim.high_label}</Text>
</div>
</Card>
);
})}
<Divider />
<Space direction="vertical" style={{ width: '100%' }}>
{!isReturning && (
<Checkbox
checked={acknowledged}
onChange={(e) => setAcknowledged(e.target.checked)}
>
I have read and understood the instructions
</Checkbox>
)}
<Space style={{ width: '100%', justifyContent: 'center' }}>
{onBack && (
<Button onClick={onBack}>
Back to Assessment
</Button>
)}
<Button
type="primary"
size="large"
icon={<PlayCircleOutlined />}
onClick={onStart}
loading={loading}
disabled={!acknowledged}
>
{isReturning ? 'Continue Rating' : 'Begin Rating'}
</Button>
</Space>
</Space>
</Space>
</Card>
</div>
);
}

View File

@@ -0,0 +1,39 @@
/**
* Progress bar component showing assessment progress.
*/
import { Progress, Typography, Space } from 'antd';
const { Text } = Typography;
interface ProgressBarProps {
completed: number;
total: number;
label?: string;
}
export function ProgressBar({ completed, total, label }: ProgressBarProps) {
const percentage = total > 0 ? Math.round((completed / total) * 100) : 0;
return (
<div style={{ width: '100%' }}>
{label && (
<Space style={{ marginBottom: 4, justifyContent: 'space-between', width: '100%' }}>
<Text type="secondary">{label}</Text>
<Text type="secondary">
{completed}/{total} ({percentage}%)
</Text>
</Space>
)}
<Progress
percent={percentage}
showInfo={!label}
status="active"
strokeColor={{
'0%': '#108ee9',
'100%': '#87d068',
}}
/>
</div>
);
}

View File

@@ -0,0 +1,116 @@
/**
* Rater login component.
*/
import { useState, useEffect } from 'react';
import { Card, Input, Button, Typography, Space, List, Alert } from 'antd';
import { UserOutlined, LoginOutlined } from '@ant-design/icons';
import * as api from '../services/api';
import type { Rater } from '../types';
const { Title, Text } = Typography;
interface RaterLoginProps {
onLogin: (raterId: string, name?: string) => void;
loading: boolean;
error: string | null;
}
export function RaterLogin({ onLogin, loading, error }: RaterLoginProps) {
const [raterId, setRaterId] = useState('');
const [existingRaters, setExistingRaters] = useState<Rater[]>([]);
useEffect(() => {
api.listRaters()
.then(setExistingRaters)
.catch(console.error);
}, []);
const handleLogin = () => {
if (raterId.trim()) {
onLogin(raterId.trim());
}
};
const handleQuickLogin = (rater: Rater) => {
onLogin(rater.rater_id);
};
return (
<div style={{
display: 'flex',
justifyContent: 'center',
alignItems: 'center',
minHeight: '100vh',
padding: 24
}}>
<Card
style={{ width: 400, maxWidth: '100%' }}
styles={{ body: { padding: 32 } }}
>
<Space direction="vertical" size="large" style={{ width: '100%' }}>
<div style={{ textAlign: 'center' }}>
<Title level={3} style={{ marginBottom: 8 }}>
Creative Idea Assessment
</Title>
<Text type="secondary">
Enter your rater ID to begin
</Text>
</div>
{error && (
<Alert message={error} type="error" showIcon />
)}
<Input
size="large"
placeholder="Enter your rater ID"
prefix={<UserOutlined />}
value={raterId}
onChange={(e) => setRaterId(e.target.value)}
onPressEnter={handleLogin}
disabled={loading}
/>
<Button
type="primary"
size="large"
icon={<LoginOutlined />}
onClick={handleLogin}
loading={loading}
disabled={!raterId.trim()}
block
>
Start Assessment
</Button>
{existingRaters.length > 0 && (
<div>
<Text type="secondary" style={{ display: 'block', marginBottom: 8 }}>
Existing raters:
</Text>
<List
size="small"
bordered
dataSource={existingRaters}
renderItem={(rater) => (
<List.Item
style={{ cursor: 'pointer' }}
onClick={() => handleQuickLogin(rater)}
>
<Text code>{rater.rater_id}</Text>
{rater.name && rater.name !== rater.rater_id && (
<Text type="secondary" style={{ marginLeft: 8 }}>
({rater.name})
</Text>
)}
</List.Item>
)}
/>
</div>
)}
</Space>
</Card>
</div>
);
}

View File

@@ -0,0 +1,74 @@
/**
* Rating input component with radio buttons for 1-5 scale.
*/
import { Radio, Typography, Space, Tooltip, Button } from 'antd';
import { QuestionCircleOutlined } from '@ant-design/icons';
import type { DimensionDefinition } from '../types';
const { Text } = Typography;
interface RatingSliderProps {
dimension: DimensionDefinition;
value: number | null;
onChange: (value: number | null) => void;
disabled?: boolean;
}
export function RatingSlider({ dimension, value, onChange, disabled }: RatingSliderProps) {
return (
<div style={{ marginBottom: 24 }}>
<div style={{ display: 'flex', alignItems: 'center', marginBottom: 8 }}>
<Text strong style={{ marginRight: 8 }}>
{dimension.name.toUpperCase()}
</Text>
<Tooltip
title={
<div>
<p style={{ marginBottom: 8 }}>{dimension.question}</p>
{([1, 2, 3, 4, 5] as const).map((score) => (
<div key={score} style={{ marginBottom: 4 }}>
<strong>{score}:</strong> {dimension.scale[score]}
</div>
))}
</div>
}
placement="right"
overlayStyle={{ maxWidth: 400 }}
>
<Button
type="text"
size="small"
icon={<QuestionCircleOutlined />}
style={{ padding: 0, height: 'auto' }}
/>
</Tooltip>
</div>
<div style={{ display: 'flex', alignItems: 'center', gap: 16 }}>
<Text type="secondary" style={{ minWidth: 80, textAlign: 'right' }}>
{dimension.low_label}
</Text>
<Radio.Group
value={value}
onChange={(e) => onChange(e.target.value)}
disabled={disabled}
style={{ flex: 1 }}
>
<Space size="large">
{[1, 2, 3, 4, 5].map((score) => (
<Radio key={score} value={score}>
{score}
</Radio>
))}
</Space>
</Radio.Group>
<Text type="secondary" style={{ minWidth: 80 }}>
{dimension.high_label}
</Text>
</div>
</div>
);
}

View File

@@ -0,0 +1,272 @@
/**
* Hook for managing the assessment session state.
*/
import { useState, useCallback, useEffect } from 'react';
import type {
AppView,
DimensionDefinitions,
QueryInfo,
QueryWithIdeas,
Rater,
RaterProgress,
} from '../types';
import * as api from '../services/api';
interface AssessmentState {
view: AppView;
rater: Rater | null;
queries: QueryInfo[];
currentQueryIndex: number;
currentQuery: QueryWithIdeas | null;
currentIdeaIndex: number;
progress: RaterProgress | null;
dimensions: DimensionDefinitions | null;
loading: boolean;
error: string | null;
}
const initialState: AssessmentState = {
view: 'login',
rater: null,
queries: [],
currentQueryIndex: 0,
currentQuery: null,
currentIdeaIndex: 0,
progress: null,
dimensions: null,
loading: false,
error: null,
};
export function useAssessment() {
const [state, setState] = useState<AssessmentState>(initialState);
// Load dimension definitions on mount
useEffect(() => {
api.getDimensionDefinitions()
.then((dimensions) => setState((s) => ({ ...s, dimensions })))
.catch((err) => console.error('Failed to load dimensions:', err));
}, []);
// Login as a rater
const login = useCallback(async (raterId: string, name?: string) => {
setState((s) => ({ ...s, loading: true, error: null }));
try {
const rater = await api.createOrGetRater({ rater_id: raterId, name });
const queries = await api.listQueries();
const progress = await api.getRaterProgress(raterId);
setState((s) => ({
...s,
rater,
queries,
progress,
view: 'instructions',
loading: false,
}));
} catch (err) {
setState((s) => ({
...s,
error: err instanceof Error ? err.message : 'Login failed',
loading: false,
}));
}
}, []);
// Start assessment (move from instructions to assessment)
const startAssessment = useCallback(async () => {
if (!state.rater || state.queries.length === 0) return;
setState((s) => ({ ...s, loading: true }));
try {
// Find first query with unrated ideas
let queryIndex = 0;
let queryData: QueryWithIdeas | null = null;
for (let i = 0; i < state.queries.length; i++) {
const unrated = await api.getUnratedIdeas(state.queries[i].query_id, state.rater.rater_id);
if (unrated.ideas.length > 0) {
queryIndex = i;
queryData = unrated;
break;
}
}
if (!queryData) {
// All done
setState((s) => ({
...s,
view: 'completion',
loading: false,
}));
return;
}
setState((s) => ({
...s,
view: 'assessment',
currentQueryIndex: queryIndex,
currentQuery: queryData,
currentIdeaIndex: 0,
loading: false,
}));
} catch (err) {
setState((s) => ({
...s,
error: err instanceof Error ? err.message : 'Failed to start assessment',
loading: false,
}));
}
}, [state.rater, state.queries]);
// Move to next idea
const nextIdea = useCallback(async () => {
if (!state.currentQuery || !state.rater) return;
const nextIndex = state.currentIdeaIndex + 1;
if (nextIndex < state.currentQuery.ideas.length) {
// More ideas in current query
setState((s) => ({ ...s, currentIdeaIndex: nextIndex }));
} else {
// Query complete, try to move to next query
const nextQueryIndex = state.currentQueryIndex + 1;
if (nextQueryIndex < state.queries.length) {
setState((s) => ({ ...s, loading: true }));
try {
const unrated = await api.getUnratedIdeas(
state.queries[nextQueryIndex].query_id,
state.rater.rater_id
);
if (unrated.ideas.length > 0) {
setState((s) => ({
...s,
currentQueryIndex: nextQueryIndex,
currentQuery: unrated,
currentIdeaIndex: 0,
loading: false,
}));
} else {
// Try to find next query with unrated ideas
for (let i = nextQueryIndex + 1; i < state.queries.length; i++) {
const nextUnrated = await api.getUnratedIdeas(
state.queries[i].query_id,
state.rater.rater_id
);
if (nextUnrated.ideas.length > 0) {
setState((s) => ({
...s,
currentQueryIndex: i,
currentQuery: nextUnrated,
currentIdeaIndex: 0,
loading: false,
}));
return;
}
}
// All queries complete
setState((s) => ({
...s,
view: 'completion',
loading: false,
}));
}
} catch (err) {
setState((s) => ({
...s,
error: err instanceof Error ? err.message : 'Failed to load next query',
loading: false,
}));
}
} else {
// All queries complete
setState((s) => ({ ...s, view: 'completion' }));
}
}
// Refresh progress
try {
const progress = await api.getRaterProgress(state.rater.rater_id);
setState((s) => ({ ...s, progress }));
} catch (err) {
console.error('Failed to refresh progress:', err);
}
}, [state.currentQuery, state.currentIdeaIndex, state.currentQueryIndex, state.queries, state.rater]);
// Move to previous idea
const prevIdea = useCallback(() => {
if (state.currentIdeaIndex > 0) {
setState((s) => ({ ...s, currentIdeaIndex: s.currentIdeaIndex - 1 }));
}
}, [state.currentIdeaIndex]);
// Jump to a specific query
const jumpToQuery = useCallback(async (queryIndex: number) => {
if (!state.rater || queryIndex < 0 || queryIndex >= state.queries.length) return;
setState((s) => ({ ...s, loading: true }));
try {
const queryData = await api.getQueryWithIdeas(state.queries[queryIndex].query_id);
setState((s) => ({
...s,
currentQueryIndex: queryIndex,
currentQuery: queryData,
currentIdeaIndex: 0,
view: 'assessment',
loading: false,
}));
} catch (err) {
setState((s) => ({
...s,
error: err instanceof Error ? err.message : 'Failed to load query',
loading: false,
}));
}
}, [state.rater, state.queries]);
// Refresh progress
const refreshProgress = useCallback(async () => {
if (!state.rater) return;
try {
const progress = await api.getRaterProgress(state.rater.rater_id);
setState((s) => ({ ...s, progress }));
} catch (err) {
console.error('Failed to refresh progress:', err);
}
}, [state.rater]);
// Show definitions
const showInstructions = useCallback(() => {
setState((s) => ({ ...s, view: 'instructions' }));
}, []);
// Return to assessment
const returnToAssessment = useCallback(() => {
setState((s) => ({ ...s, view: 'assessment' }));
}, []);
// Logout
const logout = useCallback(() => {
setState(initialState);
}, []);
// Get current idea
const currentIdea = state.currentQuery?.ideas[state.currentIdeaIndex] ?? null;
return {
...state,
currentIdea,
login,
startAssessment,
nextIdea,
prevIdea,
jumpToQuery,
refreshProgress,
showInstructions,
returnToAssessment,
logout,
};
}

View File

@@ -0,0 +1,133 @@
/**
* Hook for managing rating submission.
*/
import { useState, useCallback } from 'react';
import type { RatingState, DimensionKey } from '../types';
import * as api from '../services/api';
interface UseRatingsOptions {
raterId: string | null;
queryId: string | null;
ideaId: string | null;
onSuccess?: () => void;
}
export function useRatings({ raterId, queryId, ideaId, onSuccess }: UseRatingsOptions) {
const [ratings, setRatings] = useState<RatingState>({
originality: null,
elaboration: null,
coherence: null,
usefulness: null,
});
const [submitting, setSubmitting] = useState(false);
const [error, setError] = useState<string | null>(null);
// Set a single rating
const setRating = useCallback((dimension: DimensionKey, value: number | null) => {
setRatings((prev) => ({ ...prev, [dimension]: value }));
}, []);
// Reset all ratings
const resetRatings = useCallback(() => {
setRatings({
originality: null,
elaboration: null,
coherence: null,
usefulness: null,
});
setError(null);
}, []);
// Check if all ratings are set
const isComplete = useCallback(() => {
return (
ratings.originality !== null &&
ratings.elaboration !== null &&
ratings.coherence !== null &&
ratings.usefulness !== null
);
}, [ratings]);
// Submit rating
const submit = useCallback(async () => {
if (!raterId || !queryId || !ideaId) {
setError('Missing required information');
return false;
}
if (!isComplete()) {
setError('Please rate all dimensions');
return false;
}
setSubmitting(true);
setError(null);
try {
await api.submitRating({
rater_id: raterId,
idea_id: ideaId,
query_id: queryId,
originality: ratings.originality,
elaboration: ratings.elaboration,
coherence: ratings.coherence,
usefulness: ratings.usefulness,
skipped: false,
});
resetRatings();
onSuccess?.();
return true;
} catch (err) {
setError(err instanceof Error ? err.message : 'Failed to submit rating');
return false;
} finally {
setSubmitting(false);
}
}, [raterId, queryId, ideaId, ratings, isComplete, resetRatings, onSuccess]);
// Skip idea
const skip = useCallback(async () => {
if (!raterId || !queryId || !ideaId) {
setError('Missing required information');
return false;
}
setSubmitting(true);
setError(null);
try {
await api.submitRating({
rater_id: raterId,
idea_id: ideaId,
query_id: queryId,
originality: null,
elaboration: null,
coherence: null,
usefulness: null,
skipped: true,
});
resetRatings();
onSuccess?.();
return true;
} catch (err) {
setError(err instanceof Error ? err.message : 'Failed to skip idea');
return false;
} finally {
setSubmitting(false);
}
}, [raterId, queryId, ideaId, resetRatings, onSuccess]);
return {
ratings,
setRating,
resetRatings,
isComplete,
submit,
skip,
submitting,
error,
};
}

View File

@@ -0,0 +1,43 @@
:root {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
line-height: 1.5;
font-weight: 400;
color-scheme: light;
color: rgba(0, 0, 0, 0.88);
background-color: #f5f5f5;
font-synthesis: none;
text-rendering: optimizeLegibility;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}
body {
margin: 0;
min-height: 100vh;
}
#root {
min-height: 100vh;
}
/* Custom scrollbar */
::-webkit-scrollbar {
width: 8px;
height: 8px;
}
::-webkit-scrollbar-track {
background: #f1f1f1;
border-radius: 4px;
}
::-webkit-scrollbar-thumb {
background: #c1c1c1;
border-radius: 4px;
}
::-webkit-scrollbar-thumb:hover {
background: #a8a8a8;
}

View File

@@ -0,0 +1,10 @@
import { StrictMode } from 'react'
import { createRoot } from 'react-dom/client'
import './index.css'
import App from './App'
createRoot(document.getElementById('root')!).render(
<StrictMode>
<App />
</StrictMode>,
)

View File

@@ -0,0 +1,116 @@
/**
* API client for the assessment backend.
*/
import type {
DimensionDefinitions,
QueryInfo,
QueryWithIdeas,
Rater,
RaterCreate,
RaterProgress,
Rating,
RatingSubmit,
SessionInfo,
Statistics,
} from '../types';
const API_BASE = '/api';
async function fetchJson<T>(url: string, options?: RequestInit): Promise<T> {
const response = await fetch(`${API_BASE}${url}`, {
headers: {
'Content-Type': 'application/json',
...options?.headers,
},
...options,
});
if (!response.ok) {
const error = await response.json().catch(() => ({ detail: response.statusText }));
throw new Error(error.detail || 'API request failed');
}
return response.json();
}
// Rater API
export async function listRaters(): Promise<Rater[]> {
return fetchJson<Rater[]>('/raters');
}
export async function createOrGetRater(data: RaterCreate): Promise<Rater> {
return fetchJson<Rater>('/raters', {
method: 'POST',
body: JSON.stringify(data),
});
}
export async function getRater(raterId: string): Promise<Rater> {
return fetchJson<Rater>(`/raters/${encodeURIComponent(raterId)}`);
}
// Query API
export async function listQueries(): Promise<QueryInfo[]> {
return fetchJson<QueryInfo[]>('/queries');
}
export async function getQueryWithIdeas(queryId: string): Promise<QueryWithIdeas> {
return fetchJson<QueryWithIdeas>(`/queries/${encodeURIComponent(queryId)}`);
}
export async function getUnratedIdeas(queryId: string, raterId: string): Promise<QueryWithIdeas> {
return fetchJson<QueryWithIdeas>(
`/queries/${encodeURIComponent(queryId)}/unrated?rater_id=${encodeURIComponent(raterId)}`
);
}
// Rating API
export async function submitRating(rating: RatingSubmit): Promise<{ saved: boolean }> {
return fetchJson<{ saved: boolean }>('/ratings', {
method: 'POST',
body: JSON.stringify(rating),
});
}
export async function getRating(raterId: string, ideaId: string): Promise<Rating | null> {
try {
return await fetchJson<Rating>(`/ratings/${encodeURIComponent(raterId)}/${encodeURIComponent(ideaId)}`);
} catch {
return null;
}
}
export async function getRatingsByRater(raterId: string): Promise<Rating[]> {
return fetchJson<Rating[]>(`/ratings/rater/${encodeURIComponent(raterId)}`);
}
// Progress API
export async function getRaterProgress(raterId: string): Promise<RaterProgress> {
return fetchJson<RaterProgress>(`/progress/${encodeURIComponent(raterId)}`);
}
// Statistics API
export async function getStatistics(): Promise<Statistics> {
return fetchJson<Statistics>('/statistics');
}
// Dimension definitions API
export async function getDimensionDefinitions(): Promise<DimensionDefinitions> {
return fetchJson<DimensionDefinitions>('/dimensions');
}
// Session info API
export async function getSessionInfo(): Promise<SessionInfo> {
return fetchJson<SessionInfo>('/info');
}
// Health check
export async function healthCheck(): Promise<boolean> {
try {
await fetchJson<{ status: string }>('/health');
return true;
} catch {
return false;
}
}

View File

@@ -0,0 +1,142 @@
/**
* TypeScript types for the assessment frontend.
*/
// Rater types
export interface Rater {
rater_id: string;
name: string | null;
created_at?: string;
}
export interface RaterCreate {
rater_id: string;
name?: string;
}
// Query types
export interface QueryInfo {
query_id: string;
query_text: string;
category: string;
idea_count: number;
}
export interface IdeaForRating {
idea_id: string;
text: string;
index: number;
}
export interface QueryWithIdeas {
query_id: string;
query_text: string;
category: string;
ideas: IdeaForRating[];
total_count: number;
}
// Rating types
export interface RatingSubmit {
rater_id: string;
idea_id: string;
query_id: string;
originality: number | null;
elaboration: number | null;
coherence: number | null;
usefulness: number | null;
skipped: boolean;
}
export interface Rating {
id: number;
rater_id: string;
idea_id: string;
query_id: string;
originality: number | null;
elaboration: number | null;
coherence: number | null;
usefulness: number | null;
skipped: number;
timestamp: string | null;
}
// Progress types
export interface QueryProgress {
rater_id: string;
query_id: string;
completed_count: number;
total_count: number;
started_at?: string;
updated_at?: string;
}
export interface RaterProgress {
rater_id: string;
queries: QueryProgress[];
total_completed: number;
total_ideas: number;
percentage: number;
}
// Statistics types
export interface Statistics {
rater_count: number;
rating_count: number;
skip_count: number;
rated_ideas: number;
}
// Dimension definition types
export interface DimensionScale {
1: string;
2: string;
3: string;
4: string;
5: string;
}
export interface DimensionDefinition {
name: string;
question: string;
scale: DimensionScale;
low_label: string;
high_label: string;
}
export interface DimensionDefinitions {
originality: DimensionDefinition;
elaboration: DimensionDefinition;
coherence: DimensionDefinition;
usefulness: DimensionDefinition;
}
// Session info
export interface SessionInfo {
experiment_id: string;
total_ideas: number;
query_count: number;
conditions: string[];
randomization_seed: number;
}
// UI State types
export type AppView = 'login' | 'instructions' | 'assessment' | 'completion';
export interface RatingState {
originality: number | null;
elaboration: number | null;
coherence: number | null;
usefulness: number | null;
}
export const EMPTY_RATING_STATE: RatingState = {
originality: null,
elaboration: null,
coherence: null,
usefulness: null,
};
export type DimensionKey = keyof RatingState;
export const DIMENSION_KEYS: DimensionKey[] = ['originality', 'elaboration', 'coherence', 'usefulness'];

View File

@@ -0,0 +1,20 @@
{
"compilerOptions": {
"target": "ES2020",
"useDefineForClassFields": true,
"lib": ["ES2020", "DOM", "DOM.Iterable"],
"module": "ESNext",
"skipLibCheck": true,
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"isolatedModules": true,
"moduleDetection": "force",
"noEmit": true,
"jsx": "react-jsx",
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"noFallthroughCasesInSwitch": true
},
"include": ["src"]
}

View File

@@ -0,0 +1,16 @@
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
export default defineConfig({
plugins: [react()],
server: {
host: '0.0.0.0',
port: 5174,
proxy: {
'/api': {
target: 'http://localhost:8002',
changeOrigin: true
}
}
},
})

View File

@@ -0,0 +1,375 @@
#!/usr/bin/env python3
"""
Prepare assessment data from experiment results.
Extracts unique ideas from deduped experiment results, assigns stable IDs,
and randomizes the order within each query for unbiased human assessment.
Usage:
python prepare_data.py # Use latest, all ideas
python prepare_data.py --sample 100 # Sample 100 ideas total
python prepare_data.py --per-query 10 # 10 ideas per query
python prepare_data.py --per-condition 5 # 5 ideas per condition per query
python prepare_data.py --list # List available files
"""
import argparse
import json
import random
from pathlib import Path
from typing import Any
def load_experiment_data(filepath: Path) -> dict[str, Any]:
"""Load experiment data from JSON file."""
with open(filepath, 'r', encoding='utf-8') as f:
return json.load(f)
def sample_ideas_stratified(
ideas: list[dict[str, Any]],
per_condition: int | None = None,
total_limit: int | None = None,
rng: random.Random | None = None
) -> list[dict[str, Any]]:
"""
Sample ideas with stratification by condition.
Args:
ideas: List of ideas with _hidden.condition metadata
per_condition: Max ideas per condition (stratified sampling)
total_limit: Max total ideas (after stratified sampling)
rng: Random number generator for reproducibility
Returns:
Sampled list of ideas
"""
if rng is None:
rng = random.Random()
if per_condition is None and total_limit is None:
return ideas
# Group by condition
by_condition: dict[str, list[dict[str, Any]]] = {}
for idea in ideas:
condition = idea['_hidden']['condition']
if condition not in by_condition:
by_condition[condition] = []
by_condition[condition].append(idea)
# Sample per condition
sampled = []
for condition, cond_ideas in by_condition.items():
rng.shuffle(cond_ideas)
if per_condition is not None:
cond_ideas = cond_ideas[:per_condition]
sampled.extend(cond_ideas)
# Apply total limit if specified
if total_limit is not None and len(sampled) > total_limit:
rng.shuffle(sampled)
sampled = sampled[:total_limit]
return sampled
def extract_ideas_from_condition(
query_id: str,
condition_name: str,
condition_data: dict[str, Any],
idea_counter: dict[str, int]
) -> list[dict[str, Any]]:
"""Extract ideas from a single condition with hidden metadata."""
ideas = []
dedup_data = condition_data.get('dedup', {})
unique_ideas_with_source = dedup_data.get('unique_ideas_with_source', [])
for item in unique_ideas_with_source:
idea_text = item.get('idea', '')
if not idea_text:
continue
# Generate stable idea ID
current_count = idea_counter.get(query_id, 0)
idea_id = f"{query_id}_I{current_count:03d}"
idea_counter[query_id] = current_count + 1
ideas.append({
'idea_id': idea_id,
'text': idea_text,
'_hidden': {
'condition': condition_name,
'expert_name': item.get('expert_name', ''),
'keyword': item.get('keyword', '')
}
})
return ideas
def prepare_assessment_data(
experiment_filepath: Path,
output_filepath: Path,
seed: int = 42,
sample_total: int | None = None,
per_query: int | None = None,
per_condition: int | None = None
) -> dict[str, Any]:
"""
Prepare assessment data from experiment results.
Args:
experiment_filepath: Path to deduped experiment JSON
output_filepath: Path to write assessment items JSON
seed: Random seed for reproducible shuffling
sample_total: Total number of ideas to sample (across all queries)
per_query: Maximum ideas per query
per_condition: Maximum ideas per condition per query (stratified)
Returns:
Assessment data structure
"""
rng = random.Random(seed)
# Load experiment data
data = load_experiment_data(experiment_filepath)
experiment_id = data.get('experiment_id', 'unknown')
conditions = data.get('conditions', [])
results = data.get('results', [])
print(f"Loading experiment: {experiment_id}")
print(f"Conditions: {conditions}")
print(f"Number of queries: {len(results)}")
# Show sampling config
if sample_total or per_query or per_condition:
print(f"Sampling config: total={sample_total}, per_query={per_query}, per_condition={per_condition}")
assessment_queries = []
total_ideas = 0
idea_counter: dict[str, int] = {}
for result in results:
query_id = result.get('query_id', '')
query_text = result.get('query', '')
category = result.get('category', '')
query_ideas = []
# Extract ideas from all conditions
conditions_data = result.get('conditions', {})
for condition_name, condition_data in conditions_data.items():
ideas = extract_ideas_from_condition(
query_id, condition_name, condition_data, idea_counter
)
query_ideas.extend(ideas)
# Apply stratified sampling if per_condition is specified
if per_condition is not None:
query_ideas = sample_ideas_stratified(
query_ideas,
per_condition=per_condition,
rng=rng
)
# Apply per-query limit
if per_query is not None and len(query_ideas) > per_query:
rng.shuffle(query_ideas)
query_ideas = query_ideas[:per_query]
# Shuffle ideas within this query
rng.shuffle(query_ideas)
assessment_queries.append({
'query_id': query_id,
'query_text': query_text,
'category': category,
'ideas': query_ideas,
'idea_count': len(query_ideas)
})
total_ideas += len(query_ideas)
print(f" Query '{query_text}' ({query_id}): {len(query_ideas)} ideas")
# Apply total sample limit across all queries (proportionally)
if sample_total is not None and total_ideas > sample_total:
print(f"\nApplying total sample limit: {sample_total} (from {total_ideas})")
# Calculate proportion to keep
keep_ratio = sample_total / total_ideas
new_total = 0
for query in assessment_queries:
n_keep = max(1, int(len(query['ideas']) * keep_ratio))
rng.shuffle(query['ideas'])
query['ideas'] = query['ideas'][:n_keep]
query['idea_count'] = len(query['ideas'])
new_total += len(query['ideas'])
total_ideas = new_total
# Build output structure
assessment_data = {
'experiment_id': experiment_id,
'queries': assessment_queries,
'total_ideas': total_ideas,
'query_count': len(assessment_queries),
'conditions': conditions,
'randomization_seed': seed,
'sampling': {
'sample_total': sample_total,
'per_query': per_query,
'per_condition': per_condition
},
'metadata': {
'source_file': str(experiment_filepath.name),
'prepared_for': 'human_assessment'
}
}
# Write output
output_filepath.parent.mkdir(parents=True, exist_ok=True)
with open(output_filepath, 'w', encoding='utf-8') as f:
json.dump(assessment_data, f, ensure_ascii=False, indent=2)
print(f"\nTotal ideas for assessment: {total_ideas}")
print(f"Output written to: {output_filepath}")
return assessment_data
def list_experiment_files(results_dir: Path) -> list[Path]:
"""List available deduped experiment files."""
return sorted(results_dir.glob('*_deduped.json'), key=lambda p: p.stat().st_mtime, reverse=True)
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description='Prepare assessment data from experiment results.',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python prepare_data.py # Use latest, all ideas
python prepare_data.py --sample 100 # Sample 100 ideas total
python prepare_data.py --per-query 20 # Max 20 ideas per query
python prepare_data.py --per-condition 4 # 4 ideas per condition per query
python prepare_data.py --per-condition 4 --per-query 15 # Combined limits
python prepare_data.py --list # List available files
Recommended for human assessment:
# 5 conditions × 4 ideas × 10 queries = 200 ideas (balanced)
python prepare_data.py --per-condition 4
# Or limit total to ~150 ideas
python prepare_data.py --sample 150
"""
)
parser.add_argument(
'experiment_file',
nargs='?',
default=None,
help='Experiment file name (e.g., experiment_20260119_165650_deduped.json)'
)
parser.add_argument(
'--list', '-l',
action='store_true',
help='List available experiment files'
)
parser.add_argument(
'--sample',
type=int,
default=None,
metavar='N',
help='Total number of ideas to sample (proportionally across queries)'
)
parser.add_argument(
'--per-query',
type=int,
default=None,
metavar='N',
help='Maximum ideas per query'
)
parser.add_argument(
'--per-condition',
type=int,
default=None,
metavar='N',
help='Maximum ideas per condition per query (stratified sampling)'
)
parser.add_argument(
'--seed', '-s',
type=int,
default=42,
help='Random seed for shuffling (default: 42)'
)
args = parser.parse_args()
# Paths
base_dir = Path(__file__).parent.parent
results_dir = base_dir / 'results'
output_file = Path(__file__).parent / 'data' / 'assessment_items.json'
# List available files
available_files = list_experiment_files(results_dir)
if args.list:
print("Available experiment files (most recent first):")
for f in available_files:
size_kb = f.stat().st_size / 1024
print(f" {f.name} ({size_kb:.1f} KB)")
return
# Determine which file to use
if args.experiment_file:
experiment_file = results_dir / args.experiment_file
if not experiment_file.exists():
# Try without .json extension
experiment_file = results_dir / f"{args.experiment_file}.json"
else:
# Use the latest deduped file
if not available_files:
print("Error: No deduped experiment files found in results directory.")
return
experiment_file = available_files[0]
print(f"Using latest experiment file: {experiment_file.name}")
if not experiment_file.exists():
print(f"Error: Experiment file not found: {experiment_file}")
print("\nAvailable files:")
for f in available_files:
print(f" {f.name}")
return
prepare_assessment_data(
experiment_file,
output_file,
seed=args.seed,
sample_total=args.sample,
per_query=args.per_query,
per_condition=args.per_condition
)
# Verify output
with open(output_file, 'r') as f:
data = json.load(f)
print("\n--- Verification ---")
print(f"Queries: {data['query_count']}")
print(f"Total ideas: {data['total_ideas']}")
# Show distribution by condition (from hidden metadata)
condition_counts: dict[str, int] = {}
for query in data['queries']:
for idea in query['ideas']:
condition = idea['_hidden']['condition']
condition_counts[condition] = condition_counts.get(condition, 0) + 1
print("\nIdeas per condition:")
for condition, count in sorted(condition_counts.items()):
print(f" {condition}: {count}")
if __name__ == '__main__':
main()

Binary file not shown.

101
experiments/assessment/start.sh Executable file
View File

@@ -0,0 +1,101 @@
#!/bin/bash
# Human Assessment Web Interface Start Script
# This script starts both the backend API and frontend dev server
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
echo -e "${GREEN}================================${NC}"
echo -e "${GREEN}Creative Idea Assessment System${NC}"
echo -e "${GREEN}================================${NC}"
echo
# Find Python with FastAPI (use project venv or system)
VENV_PYTHON="$SCRIPT_DIR/../../backend/venv/bin/python"
if [ -x "$VENV_PYTHON" ]; then
PYTHON_CMD="$VENV_PYTHON"
UVICORN_CMD="$SCRIPT_DIR/../../backend/venv/bin/uvicorn"
else
PYTHON_CMD="python3"
UVICORN_CMD="uvicorn"
fi
# Check if assessment data exists
if [ ! -f "data/assessment_items.json" ]; then
echo -e "${YELLOW}Assessment data not found. Running prepare_data.py...${NC}"
$PYTHON_CMD prepare_data.py
echo
fi
# Check if node_modules exist in frontend
if [ ! -d "frontend/node_modules" ]; then
echo -e "${YELLOW}Installing frontend dependencies...${NC}"
cd frontend
npm install
cd ..
echo
fi
# Function to cleanup background processes on exit
cleanup() {
echo
echo -e "${YELLOW}Shutting down...${NC}"
kill $BACKEND_PID 2>/dev/null || true
kill $FRONTEND_PID 2>/dev/null || true
exit 0
}
trap cleanup SIGINT SIGTERM
# Start backend
echo -e "${GREEN}Starting backend API on port 8002...${NC}"
cd backend
$UVICORN_CMD app:app --host 0.0.0.0 --port 8002 --reload &
BACKEND_PID=$!
cd ..
# Wait for backend to start
echo "Waiting for backend to initialize..."
sleep 2
# Check if backend is running
if ! curl -s http://localhost:8002/api/health > /dev/null 2>&1; then
echo -e "${RED}Backend failed to start. Check for errors above.${NC}"
kill $BACKEND_PID 2>/dev/null || true
exit 1
fi
echo -e "${GREEN}Backend is running.${NC}"
echo
# Start frontend
echo -e "${GREEN}Starting frontend on port 5174...${NC}"
cd frontend
npm run dev &
FRONTEND_PID=$!
cd ..
# Wait for frontend to start
sleep 3
echo
echo -e "${GREEN}================================${NC}"
echo -e "${GREEN}Assessment system is running!${NC}"
echo -e "${GREEN}================================${NC}"
echo
echo -e "Backend API: ${YELLOW}http://localhost:8002${NC}"
echo -e "Frontend UI: ${YELLOW}http://localhost:5174${NC}"
echo
echo -e "Press Ctrl+C to stop all services"
echo
# Wait for any process to exit
wait

13
experiments/assessment/stop.sh Executable file
View File

@@ -0,0 +1,13 @@
#!/bin/bash
# Stop the assessment system
echo "Stopping assessment system..."
# Kill backend (uvicorn on port 8002)
pkill -f "uvicorn app:app.*8002" 2>/dev/null && echo "Backend stopped" || echo "Backend not running"
# Kill frontend (vite on port 5174)
pkill -f "vite.*5174" 2>/dev/null && echo "Frontend stopped" || echo "Frontend not running"
echo "Done"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,666 @@
"""
Compute metrics for experiment results.
Computes metrics BOTH before and after deduplication:
- Pre-dedup: Measures raw generation capability
- Post-dedup: Measures quality of unique ideas
Also normalizes idea counts for fair cross-condition comparison.
Usage:
python -m experiments.compute_metrics --input results/experiment_xxx_deduped.json
"""
import sys
import json
import argparse
import asyncio
import logging
import random
from pathlib import Path
from typing import List, Dict, Any, Optional, Tuple
from dataclasses import dataclass, asdict
import numpy as np
# Add backend to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent / "backend"))
from app.services.embedding_service import embedding_service
from app.services.llm_service import ollama_provider, extract_json_from_response
from experiments.config import RESULTS_DIR, MODEL, RANDOM_SEED
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
@dataclass
class DiversityMetrics:
"""Semantic diversity metrics for a set of ideas."""
mean_pairwise_distance: float
std_pairwise_distance: float
min_pairwise_distance: float
max_pairwise_distance: float
idea_count: int
@dataclass
class ClusterMetrics:
"""Cluster analysis metrics."""
optimal_clusters: int
silhouette_score: float
cluster_sizes: List[int]
@dataclass
class QueryDistanceMetrics:
"""Distance from original query metrics."""
mean_distance: float
std_distance: float
min_distance: float
max_distance: float
distances: List[float]
@dataclass
class RelevanceMetrics:
"""LLM-as-judge relevance metrics (for hallucination detection)."""
relevance_rate: float # Score >= 2
nonsense_rate: float # Score == 1
mean_score: float
score_distribution: Dict[int, int] # {1: count, 2: count, 3: count}
@dataclass
class ConditionMetrics:
"""All metrics for a single condition."""
condition: str
query: str
# Idea counts
raw_count: int
unique_count: int
survival_rate: float
# Pre-dedup metrics (on raw ideas)
pre_dedup_diversity: Optional[DiversityMetrics]
# Post-dedup metrics (on unique ideas)
post_dedup_diversity: Optional[DiversityMetrics]
post_dedup_clusters: Optional[ClusterMetrics]
post_dedup_query_distance: Optional[QueryDistanceMetrics]
# Normalized metrics (on equal-sized samples)
normalized_diversity: Optional[DiversityMetrics]
normalized_sample_size: int
# Relevance/hallucination (post-dedup only)
relevance: Optional[RelevanceMetrics]
# ============================================================
# Embedding-based metrics
# ============================================================
async def get_embeddings(texts: List[str]) -> List[List[float]]:
"""Get embeddings for a list of texts."""
if not texts:
return []
return await embedding_service.get_embeddings_batch(texts)
def compute_pairwise_distances(embeddings: List[List[float]]) -> List[float]:
"""Compute all pairwise cosine distances."""
n = len(embeddings)
if n < 2:
return []
distances = []
for i in range(n):
for j in range(i + 1, n):
sim = embedding_service.cosine_similarity(embeddings[i], embeddings[j])
dist = 1 - sim # Convert similarity to distance
distances.append(dist)
return distances
async def compute_diversity_metrics(ideas: List[str]) -> Optional[DiversityMetrics]:
"""Compute semantic diversity metrics for a set of ideas."""
if len(ideas) < 2:
return None
embeddings = await get_embeddings(ideas)
distances = compute_pairwise_distances(embeddings)
if not distances:
return None
return DiversityMetrics(
mean_pairwise_distance=float(np.mean(distances)),
std_pairwise_distance=float(np.std(distances)),
min_pairwise_distance=float(np.min(distances)),
max_pairwise_distance=float(np.max(distances)),
idea_count=len(ideas)
)
async def compute_query_distance_metrics(
query: str,
ideas: List[str]
) -> Optional[QueryDistanceMetrics]:
"""Compute distance of ideas from the original query."""
if not ideas:
return None
# Get query embedding
query_emb = await embedding_service.get_embedding(query)
idea_embs = await get_embeddings(ideas)
distances = []
for emb in idea_embs:
sim = embedding_service.cosine_similarity(query_emb, emb)
dist = 1 - sim
distances.append(dist)
return QueryDistanceMetrics(
mean_distance=float(np.mean(distances)),
std_distance=float(np.std(distances)),
min_distance=float(np.min(distances)),
max_distance=float(np.max(distances)),
distances=distances
)
async def compute_cluster_metrics(ideas: List[str]) -> Optional[ClusterMetrics]:
"""Compute cluster analysis metrics."""
if len(ideas) < 3:
return None
try:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
except ImportError:
logger.warning("sklearn not installed, skipping cluster metrics")
return None
embeddings = await get_embeddings(ideas)
embeddings_np = np.array(embeddings)
# Find optimal k using silhouette score
max_k = min(len(ideas) - 1, 10)
if max_k < 2:
return None
best_k = 2
best_score = -1
for k in range(2, max_k + 1):
try:
kmeans = KMeans(n_clusters=k, random_state=RANDOM_SEED, n_init=10)
labels = kmeans.fit_predict(embeddings_np)
score = silhouette_score(embeddings_np, labels)
if score > best_score:
best_score = score
best_k = k
except Exception as e:
logger.warning(f"Clustering failed for k={k}: {e}")
continue
# Get cluster sizes for optimal k
kmeans = KMeans(n_clusters=best_k, random_state=RANDOM_SEED, n_init=10)
labels = kmeans.fit_predict(embeddings_np)
cluster_sizes = [int(np.sum(labels == i)) for i in range(best_k)]
return ClusterMetrics(
optimal_clusters=best_k,
silhouette_score=float(best_score),
cluster_sizes=sorted(cluster_sizes, reverse=True)
)
# ============================================================
# LLM-as-Judge relevance metrics
# ============================================================
async def judge_relevance(query: str, idea: str, model: str = None) -> Dict[str, Any]:
"""Use LLM to judge if an idea is relevant to the query."""
model = model or MODEL
prompt = f"""/no_think
You are evaluating whether a generated idea is relevant and applicable to an original query.
Original query: {query}
Generated idea: {idea}
Rate the relevance on a scale of 1-3:
1 = Nonsense/completely irrelevant (no logical connection to the query)
2 = Weak but valid connection (requires stretch but has some relevance)
3 = Clearly relevant and applicable (directly relates to the query)
Return JSON only:
{{"score": N, "reason": "brief explanation (10-20 words)"}}
"""
try:
response = await ollama_provider.generate(
prompt=prompt,
model=model,
temperature=0.3 # Lower temperature for more consistent judgments
)
result = extract_json_from_response(response)
return {
"score": result.get("score", 2),
"reason": result.get("reason", "")
}
except Exception as e:
logger.warning(f"Relevance judgment failed: {e}")
return {"score": 2, "reason": "judgment failed"}
async def compute_relevance_metrics(
query: str,
ideas: List[str],
model: str = None,
sample_size: int = None
) -> Optional[RelevanceMetrics]:
"""Compute LLM-as-judge relevance metrics for ideas."""
if not ideas:
return None
# Optionally sample to reduce API calls
if sample_size and len(ideas) > sample_size:
rng = random.Random(RANDOM_SEED)
ideas_to_judge = rng.sample(ideas, sample_size)
else:
ideas_to_judge = ideas
scores = []
for idea in ideas_to_judge:
result = await judge_relevance(query, idea, model)
scores.append(result["score"])
# Compute distribution
distribution = {1: 0, 2: 0, 3: 0}
for s in scores:
if s in distribution:
distribution[s] += 1
nonsense_count = distribution[1]
relevant_count = distribution[2] + distribution[3]
return RelevanceMetrics(
relevance_rate=relevant_count / len(scores) if scores else 0,
nonsense_rate=nonsense_count / len(scores) if scores else 0,
mean_score=float(np.mean(scores)) if scores else 0,
score_distribution=distribution
)
# ============================================================
# Main metrics computation
# ============================================================
async def compute_condition_metrics(
query: str,
condition: str,
raw_ideas: List[str],
unique_ideas: List[str],
normalized_sample_size: int,
compute_relevance: bool = False
) -> ConditionMetrics:
"""Compute all metrics for a single condition."""
raw_count = len(raw_ideas)
unique_count = len(unique_ideas)
survival_rate = unique_count / raw_count if raw_count > 0 else 1.0
logger.info(f" Computing metrics for {condition}...")
logger.info(f" Raw: {raw_count}, Unique: {unique_count}, Survival: {survival_rate:.1%}")
# Pre-dedup diversity (on raw ideas)
logger.info(f" Computing pre-dedup diversity...")
pre_dedup_diversity = await compute_diversity_metrics(raw_ideas)
# Post-dedup diversity (on unique ideas)
logger.info(f" Computing post-dedup diversity...")
post_dedup_diversity = await compute_diversity_metrics(unique_ideas)
# Cluster analysis (post-dedup)
logger.info(f" Computing cluster metrics...")
post_dedup_clusters = await compute_cluster_metrics(unique_ideas)
# Query distance (post-dedup)
logger.info(f" Computing query distance...")
post_dedup_query_distance = await compute_query_distance_metrics(query, unique_ideas)
# Normalized diversity (equal-sized sample for fair comparison)
normalized_diversity = None
if len(unique_ideas) >= normalized_sample_size and normalized_sample_size > 1:
logger.info(f" Computing normalized diversity (n={normalized_sample_size})...")
rng = random.Random(RANDOM_SEED)
sampled_ideas = rng.sample(unique_ideas, normalized_sample_size)
normalized_diversity = await compute_diversity_metrics(sampled_ideas)
# Relevance metrics (optional, expensive)
relevance = None
if compute_relevance and unique_ideas:
logger.info(f" Computing relevance metrics (LLM-as-judge)...")
# Sample up to 10 ideas to reduce cost
relevance = await compute_relevance_metrics(
query, unique_ideas, sample_size=min(10, len(unique_ideas))
)
return ConditionMetrics(
condition=condition,
query=query,
raw_count=raw_count,
unique_count=unique_count,
survival_rate=survival_rate,
pre_dedup_diversity=pre_dedup_diversity,
post_dedup_diversity=post_dedup_diversity,
post_dedup_clusters=post_dedup_clusters,
post_dedup_query_distance=post_dedup_query_distance,
normalized_diversity=normalized_diversity,
normalized_sample_size=normalized_sample_size,
relevance=relevance
)
async def process_experiment_results(
input_file: Path,
output_file: Optional[Path] = None,
compute_relevance: bool = False
) -> Dict[str, Any]:
"""
Process experiment results and compute all metrics.
Args:
input_file: Path to deduped experiment results JSON
output_file: Path for output (default: input with _metrics suffix)
compute_relevance: Whether to compute LLM-as-judge relevance
Returns:
Results with computed metrics
"""
# Load experiment results
with open(input_file, "r", encoding="utf-8") as f:
experiment = json.load(f)
logger.info(f"Processing experiment: {experiment.get('experiment_id', 'unknown')}")
# Determine normalized sample size (minimum unique count across all conditions)
min_unique_count = float('inf')
for query_result in experiment["results"]:
for condition, cond_result in query_result["conditions"].items():
if cond_result.get("success", False):
dedup = cond_result.get("dedup", {})
unique_count = len(dedup.get("unique_ideas", cond_result.get("ideas", [])))
if unique_count > 0:
min_unique_count = min(min_unique_count, unique_count)
normalized_sample_size = min(int(min_unique_count), 10) if min_unique_count != float('inf') else 5
logger.info(f"Normalized sample size: {normalized_sample_size}")
# Process each query
all_metrics = []
for query_result in experiment["results"]:
query = query_result["query"]
query_id = query_result["query_id"]
logger.info(f"\nProcessing query: {query} ({query_id})")
query_metrics = {
"query_id": query_id,
"query": query,
"conditions": {}
}
for condition, cond_result in query_result["conditions"].items():
if not cond_result.get("success", False):
logger.warning(f" Skipping failed condition: {condition}")
continue
# Get raw and unique ideas
raw_ideas = cond_result.get("ideas", [])
dedup = cond_result.get("dedup", {})
unique_ideas = dedup.get("unique_ideas", raw_ideas)
# Compute metrics
metrics = await compute_condition_metrics(
query=query,
condition=condition,
raw_ideas=raw_ideas,
unique_ideas=unique_ideas,
normalized_sample_size=normalized_sample_size,
compute_relevance=compute_relevance
)
# Convert to dict for JSON serialization
query_metrics["conditions"][condition] = asdict(metrics)
all_metrics.append(query_metrics)
# Calculate aggregate statistics
aggregate = calculate_aggregate_metrics(all_metrics)
# Build output
output = {
"experiment_id": experiment.get("experiment_id"),
"config": experiment.get("config"),
"normalized_sample_size": normalized_sample_size,
"metrics_by_query": all_metrics,
"aggregate": aggregate
}
# Save results
if output_file is None:
stem = input_file.stem.replace("_deduped", "").replace("_complete", "")
output_file = input_file.parent / f"{stem}_metrics.json"
with open(output_file, "w", encoding="utf-8") as f:
json.dump(output, f, indent=2, ensure_ascii=False)
logger.info(f"\nMetrics saved to: {output_file}")
return output
def calculate_aggregate_metrics(all_metrics: List[Dict]) -> Dict[str, Any]:
"""Calculate aggregate statistics across all queries."""
aggregate = {}
# Collect metrics by condition
by_condition = {}
for query_metrics in all_metrics:
for condition, metrics in query_metrics["conditions"].items():
if condition not in by_condition:
by_condition[condition] = {
"raw_counts": [],
"unique_counts": [],
"survival_rates": [],
"pre_dedup_diversity": [],
"post_dedup_diversity": [],
"normalized_diversity": [],
"query_distances": [],
"cluster_counts": [],
"silhouette_scores": [],
"relevance_rates": [],
"nonsense_rates": []
}
bc = by_condition[condition]
bc["raw_counts"].append(metrics["raw_count"])
bc["unique_counts"].append(metrics["unique_count"])
bc["survival_rates"].append(metrics["survival_rate"])
if metrics.get("pre_dedup_diversity"):
bc["pre_dedup_diversity"].append(
metrics["pre_dedup_diversity"]["mean_pairwise_distance"]
)
if metrics.get("post_dedup_diversity"):
bc["post_dedup_diversity"].append(
metrics["post_dedup_diversity"]["mean_pairwise_distance"]
)
if metrics.get("normalized_diversity"):
bc["normalized_diversity"].append(
metrics["normalized_diversity"]["mean_pairwise_distance"]
)
if metrics.get("post_dedup_query_distance"):
bc["query_distances"].append(
metrics["post_dedup_query_distance"]["mean_distance"]
)
if metrics.get("post_dedup_clusters"):
bc["cluster_counts"].append(
metrics["post_dedup_clusters"]["optimal_clusters"]
)
bc["silhouette_scores"].append(
metrics["post_dedup_clusters"]["silhouette_score"]
)
if metrics.get("relevance"):
bc["relevance_rates"].append(metrics["relevance"]["relevance_rate"])
bc["nonsense_rates"].append(metrics["relevance"]["nonsense_rate"])
# Calculate means and stds
for condition, data in by_condition.items():
aggregate[condition] = {}
for metric_name, values in data.items():
if values:
aggregate[condition][metric_name] = {
"mean": float(np.mean(values)),
"std": float(np.std(values)),
"min": float(np.min(values)),
"max": float(np.max(values)),
"n": len(values)
}
return aggregate
def print_metrics_summary(metrics: Dict[str, Any]):
"""Print a formatted summary of computed metrics."""
print("\n" + "=" * 80)
print("METRICS SUMMARY")
print("=" * 80)
print(f"\nNormalized sample size: {metrics.get('normalized_sample_size', 'N/A')}")
aggregate = metrics.get("aggregate", {})
# Idea counts
print("\n--- Idea Counts ---")
print(f"{'Condition':<25} {'Raw':<10} {'Unique':<10} {'Survival':<10}")
print("-" * 55)
for cond, data in aggregate.items():
raw = data.get("raw_counts", {}).get("mean", 0)
unique = data.get("unique_counts", {}).get("mean", 0)
survival = data.get("survival_rates", {}).get("mean", 0)
print(f"{cond:<25} {raw:<10.1f} {unique:<10.1f} {survival:<10.1%}")
# Diversity metrics
print("\n--- Semantic Diversity (Mean Pairwise Distance) ---")
print(f"{'Condition':<25} {'Pre-Dedup':<12} {'Post-Dedup':<12} {'Normalized':<12}")
print("-" * 61)
for cond, data in aggregate.items():
pre = data.get("pre_dedup_diversity", {}).get("mean", 0)
post = data.get("post_dedup_diversity", {}).get("mean", 0)
norm = data.get("normalized_diversity", {}).get("mean", 0)
print(f"{cond:<25} {pre:<12.4f} {post:<12.4f} {norm:<12.4f}")
# Query distance
print("\n--- Query Distance (Novelty) ---")
print(f"{'Condition':<25} {'Mean Distance':<15} {'Std':<10}")
print("-" * 50)
for cond, data in aggregate.items():
dist = data.get("query_distances", {})
mean = dist.get("mean", 0)
std = dist.get("std", 0)
print(f"{cond:<25} {mean:<15.4f} {std:<10.4f}")
# Cluster metrics
print("\n--- Cluster Analysis ---")
print(f"{'Condition':<25} {'Clusters':<12} {'Silhouette':<12}")
print("-" * 49)
for cond, data in aggregate.items():
clusters = data.get("cluster_counts", {}).get("mean", 0)
silhouette = data.get("silhouette_scores", {}).get("mean", 0)
print(f"{cond:<25} {clusters:<12.1f} {silhouette:<12.4f}")
# Relevance (if computed)
has_relevance = any(
"relevance_rates" in data and data["relevance_rates"].get("n", 0) > 0
for data in aggregate.values()
)
if has_relevance:
print("\n--- Relevance (LLM-as-Judge) ---")
print(f"{'Condition':<25} {'Relevance':<12} {'Nonsense':<12}")
print("-" * 49)
for cond, data in aggregate.items():
rel = data.get("relevance_rates", {}).get("mean", 0)
non = data.get("nonsense_rates", {}).get("mean", 0)
print(f"{cond:<25} {rel:<12.1%} {non:<12.1%}")
print("\n" + "=" * 80)
print("Interpretation:")
print("- Higher pairwise distance = more diverse ideas")
print("- Higher query distance = more novel (farther from original)")
print("- More clusters = more distinct themes")
print("- Higher silhouette = cleaner cluster separation")
print("=" * 80)
async def main():
parser = argparse.ArgumentParser(
description="Compute metrics for experiment results"
)
parser.add_argument(
"--input",
type=str,
required=True,
help="Input deduped experiment results JSON file"
)
parser.add_argument(
"--output",
type=str,
help="Output file path (default: input_metrics.json)"
)
parser.add_argument(
"--relevance",
action="store_true",
help="Compute LLM-as-judge relevance metrics (expensive)"
)
args = parser.parse_args()
input_path = Path(args.input)
if not input_path.exists():
input_path = RESULTS_DIR / args.input
if not input_path.exists():
print(f"Error: Input file not found: {args.input}")
sys.exit(1)
output_path = Path(args.output) if args.output else None
metrics = await process_experiment_results(
input_file=input_path,
output_file=output_path,
compute_relevance=args.relevance
)
print_metrics_summary(metrics)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,23 @@
"""
Condition implementations for the 5-condition experiment.
C1: Direct generation (baseline)
C2: Expert-only (no attributes)
C3: Attribute-only (no experts)
C4: Full pipeline (attributes + experts)
C5: Random-perspective (random words instead of experts)
"""
from .c1_direct import generate_ideas as c1_generate
from .c2_expert_only import generate_ideas as c2_generate
from .c3_attribute_only import generate_ideas as c3_generate
from .c4_full_pipeline import generate_ideas as c4_generate
from .c5_random_perspective import generate_ideas as c5_generate
__all__ = [
"c1_generate",
"c2_generate",
"c3_generate",
"c4_generate",
"c5_generate",
]

View File

@@ -0,0 +1,111 @@
"""
Condition 1: Direct Generation (Baseline)
Single LLM call asking for creative ideas directly.
No attribute decomposition, no expert perspectives.
"""
import sys
from pathlib import Path
# Add backend to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
from typing import List, Dict, Any
from app.services.llm_service import ollama_provider, extract_json_from_response
from experiments.config import MODEL, TEMPERATURE, IDEAS_DIRECT, PROMPT_LANGUAGE
def get_direct_generation_prompt(query: str, idea_count: int, lang: str = "en") -> str:
"""Generate prompt for direct idea generation."""
if lang == "en":
return f"""/no_think
Generate {idea_count} creative and innovative ideas for "{query}".
Requirements:
1. Each idea should be specific and actionable
2. Ideas should be diverse, covering different aspects and applications
3. Include both practical improvements and creative innovations
4. Ideas should be 15-30 words each
Return JSON only:
{{"ideas": ["idea 1", "idea 2", "idea 3", ...]}}
Generate exactly {idea_count} ideas."""
else:
return f"""/no_think
為「{query}」生成 {idea_count} 個創意點子。
要求:
1. 每個點子要具體可行
2. 點子要多元,涵蓋不同面向和應用
3. 包含實用改進和創意創新
4. 每個點子 15-30 字
只回傳 JSON
{{"ideas": ["點子1", "點子2", "點子3", ...]}}
生成正好 {idea_count} 個點子。"""
async def generate_ideas(
query: str,
model: str = None,
temperature: float = None,
idea_count: int = None,
lang: str = None
) -> Dict[str, Any]:
"""
Generate ideas using direct LLM generation (C1 baseline).
Args:
query: The object/concept to generate ideas for
model: LLM model to use (default from config)
temperature: Generation temperature (default from config)
idea_count: Number of ideas to generate (default from config)
lang: Language for prompts (default from config)
Returns:
Dict with ideas and metadata
"""
model = model or MODEL
temperature = temperature or TEMPERATURE
idea_count = idea_count or IDEAS_DIRECT
lang = lang or PROMPT_LANGUAGE
prompt = get_direct_generation_prompt(query, idea_count, lang)
response = await ollama_provider.generate(
prompt=prompt,
model=model,
temperature=temperature
)
result = extract_json_from_response(response)
ideas = result.get("ideas", [])
return {
"condition": "c1_direct",
"query": query,
"ideas": ideas,
"idea_count": len(ideas),
"metadata": {
"model": model,
"temperature": temperature,
"prompt_language": lang,
"mechanism": "direct_llm_generation"
}
}
# For testing
if __name__ == "__main__":
import asyncio
async def test():
result = await generate_ideas("Chair")
print(f"Generated {result['idea_count']} ideas:")
for i, idea in enumerate(result['ideas'], 1):
print(f" {i}. {idea}")
asyncio.run(test())

View File

@@ -0,0 +1,176 @@
"""
Condition 2: Expert-Only (No Attributes)
Uses expert perspectives to generate ideas, but without
attribute decomposition. Each expert generates ideas directly
for the query from their professional perspective.
"""
import sys
from pathlib import Path
# Add backend to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
from typing import List, Dict, Any
from app.services.llm_service import ollama_provider, extract_json_from_response
from app.services.expert_source_service import expert_source_service
from experiments.config import (
MODEL, TEMPERATURE, EXPERT_COUNT, EXPERT_SOURCE,
IDEAS_PER_EXPERT, PROMPT_LANGUAGE
)
def get_expert_idea_generation_prompt(
query: str,
expert_name: str,
expert_domain: str,
idea_count: int,
lang: str = "en"
) -> str:
"""Generate prompt for expert-based idea generation."""
if lang == "en":
domain_text = f" ({expert_domain} field)" if expert_domain else ""
return f"""/no_think
You are a {expert_name}{domain_text}.
Task: Generate {idea_count} creative and innovative ideas for "{query}" from your professional perspective.
Requirements:
1. Each idea should reflect your professional expertise and unique viewpoint
2. Think about how concepts from your field could improve or reimagine "{query}"
3. Ideas should be specific and actionable (15-30 words each)
4. Combine your professional knowledge with creative thinking
Return JSON only:
{{"ideas": ["idea 1", "idea 2", "idea 3", ...]}}
Generate exactly {idea_count} ideas from your perspective as a {expert_name}."""
else:
domain_text = f"{expert_domain}領域)" if expert_domain else ""
return f"""/no_think
你是一位{expert_name}{domain_text}
任務:從你的專業角度,為「{query}」生成 {idea_count} 個創意點子。
要求:
1. 每個點子要反映你的專業知識和獨特觀點
2. 思考你領域的概念如何改進或重新想像「{query}
3. 點子要具體可行(每個 15-30 字)
4. 結合專業知識和創意思維
只回傳 JSON
{{"ideas": ["點子1", "點子2", "點子3", ...]}}
從你作為{expert_name}的角度生成正好 {idea_count} 個點子。"""
async def generate_ideas(
query: str,
model: str = None,
temperature: float = None,
expert_count: int = None,
expert_source: str = None,
ideas_per_expert: int = None,
lang: str = None
) -> Dict[str, Any]:
"""
Generate ideas using expert perspectives only (C2).
Args:
query: The object/concept to generate ideas for
model: LLM model to use
temperature: Generation temperature
expert_count: Number of experts to use
expert_source: Source of experts (curated, dbpedia, etc.)
ideas_per_expert: Ideas each expert generates
lang: Language for prompts
Returns:
Dict with ideas and metadata
"""
model = model or MODEL
temperature = temperature or TEMPERATURE
expert_count = expert_count or EXPERT_COUNT
expert_source = expert_source or EXPERT_SOURCE
ideas_per_expert = ideas_per_expert or IDEAS_PER_EXPERT
lang = lang or PROMPT_LANGUAGE
# Get experts from curated source
experts, actual_source = expert_source_service.get_experts(
source=expert_source,
count=expert_count,
language=lang
)
all_ideas = []
expert_details = []
for expert in experts:
expert_name = expert.get("name", "Expert")
expert_domain = expert.get("domain", "")
prompt = get_expert_idea_generation_prompt(
query=query,
expert_name=expert_name,
expert_domain=expert_domain,
idea_count=ideas_per_expert,
lang=lang
)
response = await ollama_provider.generate(
prompt=prompt,
model=model,
temperature=temperature
)
result = extract_json_from_response(response)
ideas = result.get("ideas", [])
# Tag ideas with expert source
for idea in ideas:
all_ideas.append({
"idea": idea,
"expert_name": expert_name,
"expert_domain": expert_domain
})
expert_details.append({
"name": expert_name,
"domain": expert_domain,
"ideas_generated": len(ideas)
})
return {
"condition": "c2_expert_only",
"query": query,
"ideas": [item["idea"] for item in all_ideas],
"ideas_with_source": all_ideas,
"idea_count": len(all_ideas),
"metadata": {
"model": model,
"temperature": temperature,
"prompt_language": lang,
"expert_count": expert_count,
"expert_source": actual_source,
"ideas_per_expert": ideas_per_expert,
"experts": expert_details,
"mechanism": "expert_perspectives_only"
}
}
# For testing
if __name__ == "__main__":
import asyncio
async def test():
result = await generate_ideas("Chair")
print(f"Generated {result['idea_count']} ideas from {len(result['metadata']['experts'])} experts:")
for exp in result['metadata']['experts']:
print(f" - {exp['name']}: {exp['ideas_generated']} ideas")
print("\nSample ideas:")
for i, item in enumerate(result['ideas_with_source'][:5], 1):
print(f" {i}. [{item['expert_name']}] {item['idea']}")
asyncio.run(test())

View File

@@ -0,0 +1,181 @@
"""
Condition 3: Attribute-Only (No Experts)
Uses attribute decomposition to break down the query into
structured categories, then generates ideas from each attribute.
No expert perspectives involved.
"""
import sys
from pathlib import Path
# Add backend to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
from typing import List, Dict, Any
from app.services.llm_service import ollama_provider, extract_json_from_response
from app.prompts.attribute_prompt import get_step1_dynamic_attributes_prompt
from experiments.config import (
MODEL, TEMPERATURE, FIXED_CATEGORIES, PROMPT_LANGUAGE
)
def get_attribute_idea_generation_prompt(
query: str,
category: str,
attribute: str,
idea_count: int,
lang: str = "en"
) -> str:
"""Generate prompt for attribute-based idea generation."""
if lang == "en":
return f"""/no_think
Generate {idea_count} creative ideas for "{query}" focusing on the attribute "{attribute}" (Category: {category}).
Requirements:
1. Each idea should be directly inspired by the attribute "{attribute}"
2. Think about how this attribute could be improved, reimagined, or applied in new ways
3. Ideas should be specific and actionable (15-30 words each)
4. Be creative while maintaining relevance to the attribute
Return JSON only:
{{"ideas": ["idea 1", "idea 2", ...]}}
Generate exactly {idea_count} ideas based on the attribute "{attribute}"."""
else:
return f"""/no_think
為「{query}」生成 {idea_count} 個創意點子,聚焦於屬性「{attribute}」(類別:{category})。
要求:
1. 每個點子要直接受屬性「{attribute}」啟發
2. 思考如何改進、重新想像或以新方式應用這個屬性
3. 點子要具體可行(每個 15-30 字)
4. 保持創意同時與屬性相關
只回傳 JSON
{{"ideas": ["點子1", "點子2", ...]}}
基於屬性「{attribute}」生成正好 {idea_count} 個點子。"""
async def generate_ideas(
query: str,
model: str = None,
temperature: float = None,
categories: List[str] = None,
ideas_per_attribute: int = 1,
lang: str = None
) -> Dict[str, Any]:
"""
Generate ideas using attribute decomposition only (C3).
Args:
query: The object/concept to generate ideas for
model: LLM model to use
temperature: Generation temperature
categories: Categories to use for decomposition
ideas_per_attribute: Ideas to generate per attribute
lang: Language for prompts
Returns:
Dict with ideas and metadata
"""
model = model or MODEL
temperature = temperature or TEMPERATURE
categories = categories or FIXED_CATEGORIES
lang = lang or PROMPT_LANGUAGE
# Step 1: Generate attributes using existing prompt
# Build category definitions for the prompt
category_defs = [
{"name": cat, "description": f"Related {cat.lower()} of the object", "order": i}
for i, cat in enumerate(categories)
]
attr_prompt = get_step1_dynamic_attributes_prompt(
query=query,
categories=category_defs,
lang=lang
)
attr_response = await ollama_provider.generate(
prompt=attr_prompt,
model=model,
temperature=temperature
)
attributes_by_category = extract_json_from_response(attr_response)
# Step 2: Generate ideas for each attribute
all_ideas = []
attribute_details = []
for category in categories:
attrs = attributes_by_category.get(category, [])
for attr in attrs:
prompt = get_attribute_idea_generation_prompt(
query=query,
category=category,
attribute=attr,
idea_count=ideas_per_attribute,
lang=lang
)
response = await ollama_provider.generate(
prompt=prompt,
model=model,
temperature=temperature
)
result = extract_json_from_response(response)
ideas = result.get("ideas", [])
# Tag ideas with attribute source
for idea in ideas:
all_ideas.append({
"idea": idea,
"category": category,
"attribute": attr
})
attribute_details.append({
"category": category,
"attribute": attr,
"ideas_generated": len(ideas)
})
return {
"condition": "c3_attribute_only",
"query": query,
"ideas": [item["idea"] for item in all_ideas],
"ideas_with_source": all_ideas,
"idea_count": len(all_ideas),
"metadata": {
"model": model,
"temperature": temperature,
"prompt_language": lang,
"categories": categories,
"attributes_by_category": attributes_by_category,
"attribute_count": sum(len(v) for v in attributes_by_category.values()),
"ideas_per_attribute": ideas_per_attribute,
"attributes": attribute_details,
"mechanism": "attribute_decomposition_only"
}
}
# For testing
if __name__ == "__main__":
import asyncio
async def test():
result = await generate_ideas("Chair")
print(f"Generated {result['idea_count']} ideas from {result['metadata']['attribute_count']} attributes:")
for cat, attrs in result['metadata']['attributes_by_category'].items():
print(f" {cat}: {', '.join(attrs)}")
print("\nSample ideas:")
for i, item in enumerate(result['ideas_with_source'][:5], 1):
print(f" {i}. [{item['category']}/{item['attribute']}] {item['idea']}")
asyncio.run(test())

View File

@@ -0,0 +1,214 @@
"""
Condition 4: Full Pipeline (Attributes + Experts)
The complete novelty-seeking system:
1. Attribute decomposition into categories
2. Expert team generation
3. Expert keyword generation for each attribute
4. Description generation for each keyword
"""
import sys
from pathlib import Path
# Add backend to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
from typing import List, Dict, Any
from app.services.llm_service import ollama_provider, extract_json_from_response
from app.services.expert_source_service import expert_source_service
from app.prompts.attribute_prompt import get_step1_dynamic_attributes_prompt
from app.prompts.expert_transformation_prompt import (
get_expert_keyword_generation_prompt,
get_single_description_prompt
)
from experiments.config import (
MODEL, TEMPERATURE, FIXED_CATEGORIES, EXPERT_COUNT,
EXPERT_SOURCE, KEYWORDS_PER_EXPERT, PROMPT_LANGUAGE
)
async def generate_ideas(
query: str,
model: str = None,
temperature: float = None,
categories: List[str] = None,
expert_count: int = None,
expert_source: str = None,
keywords_per_expert: int = None,
lang: str = None
) -> Dict[str, Any]:
"""
Generate ideas using the full pipeline (C4).
Args:
query: The object/concept to generate ideas for
model: LLM model to use
temperature: Generation temperature
categories: Categories for attribute decomposition
expert_count: Number of experts
expert_source: Source of experts
keywords_per_expert: Keywords each expert generates per attribute
lang: Language for prompts
Returns:
Dict with ideas and metadata
"""
model = model or MODEL
temperature = temperature or TEMPERATURE
categories = categories or FIXED_CATEGORIES
expert_count = expert_count or EXPERT_COUNT
expert_source = expert_source or EXPERT_SOURCE
keywords_per_expert = keywords_per_expert or KEYWORDS_PER_EXPERT
lang = lang or PROMPT_LANGUAGE
# Step 0: Get experts from curated source
experts_data, actual_source = expert_source_service.get_experts(
source=expert_source,
count=expert_count,
language=lang
)
# Convert to expected format
experts = [
{
"id": f"expert-{i}",
"name": exp.get("name", "Expert"),
"domain": exp.get("domain", ""),
"perspective": exp.get("perspective", "")
}
for i, exp in enumerate(experts_data)
]
# Step 1: Generate attributes
category_defs = [
{"name": cat, "description": f"Related {cat.lower()} of the object", "order": i}
for i, cat in enumerate(categories)
]
attr_prompt = get_step1_dynamic_attributes_prompt(
query=query,
categories=category_defs,
lang=lang
)
attr_response = await ollama_provider.generate(
prompt=attr_prompt,
model=model,
temperature=temperature
)
attributes_by_category = extract_json_from_response(attr_response)
# Step 2: Expert keyword generation for each category/attribute
all_keywords = []
for category in categories:
attrs = attributes_by_category.get(category, [])
for attr in attrs:
# Generate keywords from all experts for this attribute
keyword_prompt = get_expert_keyword_generation_prompt(
category=category,
attribute=attr,
experts=experts,
keywords_per_expert=keywords_per_expert,
lang=lang
)
keyword_response = await ollama_provider.generate(
prompt=keyword_prompt,
model=model,
temperature=temperature
)
keyword_result = extract_json_from_response(keyword_response)
keywords = keyword_result.get("keywords", [])
for kw in keywords:
all_keywords.append({
"category": category,
"attribute": attr,
"keyword": kw.get("keyword", ""),
"expert_id": kw.get("expert_id", ""),
"expert_name": kw.get("expert_name", "")
})
# Step 3: Generate descriptions for each keyword
all_ideas = []
for kw_info in all_keywords:
# Find expert details
expert = next(
(e for e in experts if e["id"] == kw_info["expert_id"]),
{"name": kw_info["expert_name"], "domain": "", "id": kw_info["expert_id"]}
)
desc_prompt = get_single_description_prompt(
query=query,
keyword=kw_info["keyword"],
expert_id=expert["id"],
expert_name=expert["name"],
expert_domain=expert.get("domain", ""),
lang=lang
)
desc_response = await ollama_provider.generate(
prompt=desc_prompt,
model=model,
temperature=temperature
)
desc_result = extract_json_from_response(desc_response)
description = desc_result.get("description", "")
all_ideas.append({
"idea": description,
"keyword": kw_info["keyword"],
"category": kw_info["category"],
"attribute": kw_info["attribute"],
"expert_name": expert["name"],
"expert_domain": expert.get("domain", "")
})
return {
"condition": "c4_full_pipeline",
"query": query,
"ideas": [item["idea"] for item in all_ideas],
"ideas_with_source": all_ideas,
"idea_count": len(all_ideas),
"metadata": {
"model": model,
"temperature": temperature,
"prompt_language": lang,
"categories": categories,
"attributes_by_category": attributes_by_category,
"attribute_count": sum(len(v) for v in attributes_by_category.values()),
"expert_count": expert_count,
"expert_source": actual_source,
"keywords_per_expert": keywords_per_expert,
"total_keywords": len(all_keywords),
"experts": [{"name": e["name"], "domain": e["domain"]} for e in experts],
"mechanism": "full_pipeline_attributes_plus_experts"
}
}
# For testing
if __name__ == "__main__":
import asyncio
async def test():
result = await generate_ideas("Chair")
print(f"Generated {result['idea_count']} ideas using full pipeline:")
print(f" Attributes: {result['metadata']['attribute_count']}")
print(f" Experts: {result['metadata']['expert_count']}")
print(f" Keywords: {result['metadata']['total_keywords']}")
print("\nExperts used:")
for exp in result['metadata']['experts']:
print(f" - {exp['name']} ({exp['domain']})")
print("\nSample ideas:")
for i, item in enumerate(result['ideas_with_source'][:5], 1):
print(f" {i}. [{item['expert_name']}] {item['keyword']}: {item['idea']}")
asyncio.run(test())

View File

@@ -0,0 +1,178 @@
"""
Condition 5: Random-Perspective Control
Uses random words as "perspectives" instead of domain experts.
Tests whether the benefit from expert perspectives comes from
domain knowledge or simply from any perspective shift.
"""
import sys
import json
import random
from pathlib import Path
# Add backend to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "backend"))
from typing import List, Dict, Any
from app.services.llm_service import ollama_provider, extract_json_from_response
from experiments.config import (
MODEL, TEMPERATURE, EXPERT_COUNT, IDEAS_PER_EXPERT,
PROMPT_LANGUAGE, RANDOM_SEED, DATA_DIR
)
def load_random_words() -> List[str]:
"""Load the random word pool from data file."""
words_file = DATA_DIR / "random_words.json"
with open(words_file, "r", encoding="utf-8") as f:
data = json.load(f)
return data.get("words", [])
def get_random_perspective_prompt(
query: str,
perspective_word: str,
idea_count: int,
lang: str = "en"
) -> str:
"""Generate prompt for random-perspective idea generation."""
if lang == "en":
return f"""/no_think
Generate {idea_count} creative and innovative ideas for "{query}" inspired by the concept of "{perspective_word}".
Requirements:
1. Each idea should draw inspiration from "{perspective_word}" - its qualities, characteristics, or associations
2. Think about how concepts related to "{perspective_word}" could improve or reimagine "{query}"
3. Ideas should be specific and actionable (15-30 words each)
4. Be creative in connecting "{perspective_word}" to "{query}"
Return JSON only:
{{"ideas": ["idea 1", "idea 2", "idea 3", ...]}}
Generate exactly {idea_count} ideas inspired by "{perspective_word}"."""
else:
return f"""/no_think
為「{query}」生成 {idea_count} 個創意點子,靈感來自「{perspective_word}」這個概念。
要求:
1. 每個點子要從「{perspective_word}」獲得靈感——它的特質、特徵或聯想
2. 思考與「{perspective_word}」相關的概念如何改進或重新想像「{query}
3. 點子要具體可行(每個 15-30 字)
4. 創意地連接「{perspective_word}」和「{query}
只回傳 JSON
{{"ideas": ["點子1", "點子2", "點子3", ...]}}
生成正好 {idea_count} 個受「{perspective_word}」啟發的點子。"""
async def generate_ideas(
query: str,
model: str = None,
temperature: float = None,
word_count: int = None,
ideas_per_word: int = None,
lang: str = None,
seed: int = None
) -> Dict[str, Any]:
"""
Generate ideas using random word perspectives (C5 control).
Args:
query: The object/concept to generate ideas for
model: LLM model to use
temperature: Generation temperature
word_count: Number of random words to use (matches expert count)
ideas_per_word: Ideas to generate per word
lang: Language for prompts
seed: Random seed for reproducibility
Returns:
Dict with ideas and metadata
"""
model = model or MODEL
temperature = temperature or TEMPERATURE
word_count = word_count or EXPERT_COUNT
ideas_per_word = ideas_per_word or IDEAS_PER_EXPERT
lang = lang or PROMPT_LANGUAGE
seed = seed or RANDOM_SEED
# Load word pool and sample random words
word_pool = load_random_words()
# Use seeded random for reproducibility
# Create a unique seed per query to get different words for different queries
# but same words for same query across runs
query_seed = seed + hash(query) % 10000
rng = random.Random(query_seed)
selected_words = rng.sample(word_pool, min(word_count, len(word_pool)))
all_ideas = []
word_details = []
for word in selected_words:
prompt = get_random_perspective_prompt(
query=query,
perspective_word=word,
idea_count=ideas_per_word,
lang=lang
)
response = await ollama_provider.generate(
prompt=prompt,
model=model,
temperature=temperature
)
result = extract_json_from_response(response)
ideas = result.get("ideas", [])
# Tag ideas with perspective word source
for idea in ideas:
all_ideas.append({
"idea": idea,
"perspective_word": word
})
word_details.append({
"word": word,
"ideas_generated": len(ideas)
})
return {
"condition": "c5_random_perspective",
"query": query,
"ideas": [item["idea"] for item in all_ideas],
"ideas_with_source": all_ideas,
"idea_count": len(all_ideas),
"metadata": {
"model": model,
"temperature": temperature,
"prompt_language": lang,
"word_count": word_count,
"ideas_per_word": ideas_per_word,
"random_seed": seed,
"query_seed": query_seed,
"selected_words": selected_words,
"word_details": word_details,
"word_pool_size": len(word_pool),
"mechanism": "random_perspective_control"
}
}
# For testing
if __name__ == "__main__":
import asyncio
async def test():
result = await generate_ideas("Chair")
print(f"Generated {result['idea_count']} ideas from {len(result['metadata']['selected_words'])} random words:")
print(f" Words used: {', '.join(result['metadata']['selected_words'])}")
print(f" Seed: {result['metadata']['random_seed']}, Query seed: {result['metadata']['query_seed']}")
print("\nSample ideas:")
for i, item in enumerate(result['ideas_with_source'][:5], 1):
print(f" {i}. [{item['perspective_word']}] {item['idea']}")
asyncio.run(test())

72
experiments/config.py Normal file
View File

@@ -0,0 +1,72 @@
"""
Experiment configuration for 5-condition idea generation study.
"""
from typing import Literal
from pathlib import Path
# Paths
EXPERIMENTS_DIR = Path(__file__).parent
DATA_DIR = EXPERIMENTS_DIR / "data"
RESULTS_DIR = EXPERIMENTS_DIR / "results"
DOCS_DIR = EXPERIMENTS_DIR / "docs"
# LLM Settings
MODEL = "qwen3:8b"
TEMPERATURE = 0.9
# Expert Settings
EXPERT_COUNT = 4
EXPERT_SOURCE: Literal["curated", "llm", "dbpedia", "wikidata"] = "curated"
KEYWORDS_PER_EXPERT = 1
# Language Settings
PROMPT_LANGUAGE: Literal["en", "zh"] = "en"
# Attribute Settings
FIXED_CATEGORIES = ["Functions", "Usages", "User Groups", "Characteristics"]
# Deduplication Settings
DEDUP_THRESHOLD = 0.85
DEDUP_METHOD: Literal["embedding", "llm"] = "embedding"
# Reproducibility
RANDOM_SEED = 42
# Idea Generation Settings
IDEAS_PER_EXPERT = 5 # For C2 and C5
IDEAS_DIRECT = 20 # For C1
# Condition Names
CONDITIONS = [
"c1_direct",
"c2_expert_only",
"c3_attribute_only",
"c4_full_pipeline",
"c5_random_perspective",
]
# Condition Display Names
CONDITION_NAMES = {
"c1_direct": "C1: Direct Generation",
"c2_expert_only": "C2: Expert-Only",
"c3_attribute_only": "C3: Attribute-Only",
"c4_full_pipeline": "C4: Full Pipeline",
"c5_random_perspective": "C5: Random-Perspective",
}
# Summary Config Dict (for logging/reporting)
EXPERIMENT_CONFIG = {
"model": MODEL,
"temperature": TEMPERATURE,
"expert_count": EXPERT_COUNT,
"expert_source": EXPERT_SOURCE,
"keywords_per_expert": KEYWORDS_PER_EXPERT,
"prompt_language": PROMPT_LANGUAGE,
"random_seed": RANDOM_SEED,
"categories": FIXED_CATEGORIES,
"dedup_threshold": DEDUP_THRESHOLD,
"dedup_method": DEDUP_METHOD,
"ideas_per_expert": IDEAS_PER_EXPERT,
"ideas_direct": IDEAS_DIRECT,
}

View File

@@ -0,0 +1,66 @@
{
"description": "10 pilot queries for the 5-condition experiment, balanced across categories",
"version": "1.0",
"queries": [
{
"id": "A1",
"query": "Chair",
"category": "everyday",
"description": "Common household furniture"
},
{
"id": "A5",
"query": "Bicycle",
"category": "everyday",
"description": "Personal transportation device"
},
{
"id": "A7",
"query": "Smartphone",
"category": "everyday",
"description": "Mobile communication device"
},
{
"id": "B1",
"query": "Solar panel",
"category": "technology",
"description": "Renewable energy technology"
},
{
"id": "B3",
"query": "3D printer",
"category": "technology",
"description": "Additive manufacturing device"
},
{
"id": "B4",
"query": "Drone",
"category": "technology",
"description": "Unmanned aerial vehicle"
},
{
"id": "C1",
"query": "Food delivery service",
"category": "services",
"description": "Restaurant meal delivery platform"
},
{
"id": "C2",
"query": "Online education platform",
"category": "services",
"description": "Digital learning service"
},
{
"id": "C4",
"query": "Public transportation",
"category": "services",
"description": "Mass transit system"
},
{
"id": "C9",
"query": "Elderly care service",
"category": "services",
"description": "Senior citizen support service"
}
]
}

View File

@@ -0,0 +1,28 @@
{
"description": "Word pool for C5 random-perspective condition",
"version": "1.0",
"selection_criteria": [
"Concrete and evocative (easy to generate associations)",
"Diverse domains (no overlap with typical expert knowledge)",
"No obvious connection to test queries",
"Equal representation across conceptual categories"
],
"categories": {
"nature": ["ocean", "mountain", "forest", "desert", "cave"],
"optics": ["microscope", "telescope", "kaleidoscope", "prism", "lens"],
"animals": ["butterfly", "elephant", "octopus", "eagle", "ant"],
"weather": ["sunrise", "thunderstorm", "rainbow", "fog", "aurora"],
"art": ["clockwork", "origami", "mosaic", "symphony", "ballet"],
"temporal": ["ancient", "futuristic", "organic", "crystalline", "liquid"],
"sensory": ["whisper", "explosion", "rhythm", "silence", "echo"]
},
"words": [
"ocean", "mountain", "forest", "desert", "cave",
"microscope", "telescope", "kaleidoscope", "prism", "lens",
"butterfly", "elephant", "octopus", "eagle", "ant",
"sunrise", "thunderstorm", "rainbow", "fog", "aurora",
"clockwork", "origami", "mosaic", "symphony", "ballet",
"ancient", "futuristic", "organic", "crystalline", "liquid",
"whisper", "explosion", "rhythm", "silence", "echo"
]
}

View File

@@ -0,0 +1,328 @@
"""
Post-generation deduplication for experiment results.
Applies embedding-based deduplication uniformly to all conditions
to normalize idea counts and measure "dedup survival rate".
Usage:
python -m experiments.deduplication --input results/experiment_xxx.json
"""
import sys
import json
import argparse
import asyncio
import logging
from pathlib import Path
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
# Add backend to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent / "backend"))
from app.services.embedding_service import embedding_service
from app.models.schemas import ExpertTransformationDescription
from experiments.config import DEDUP_THRESHOLD, RESULTS_DIR
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
@dataclass
class DedupStats:
"""Deduplication statistics for a single condition."""
condition: str
pre_dedup_count: int
post_dedup_count: int
duplicates_removed: int
survival_rate: float
groups: List[Dict[str, Any]]
def ideas_to_descriptions(
ideas: List[str],
ideas_with_source: Optional[List[Dict[str, Any]]] = None
) -> List[ExpertTransformationDescription]:
"""
Convert experiment ideas to ExpertTransformationDescription format
for compatibility with the embedding service.
"""
descriptions = []
if ideas_with_source:
# Use source information if available
for i, item in enumerate(ideas_with_source):
desc = ExpertTransformationDescription(
keyword=item.get("keyword", item.get("attribute", item.get("perspective_word", ""))),
expert_id=f"source-{i}",
expert_name=item.get("expert_name", item.get("perspective_word", "direct")),
description=item.get("idea", "")
)
descriptions.append(desc)
else:
# Simple conversion for ideas without source
for i, idea in enumerate(ideas):
desc = ExpertTransformationDescription(
keyword="",
expert_id=f"idea-{i}",
expert_name="direct",
description=idea
)
descriptions.append(desc)
return descriptions
async def deduplicate_condition(
ideas: List[str],
ideas_with_source: Optional[List[Dict[str, Any]]] = None,
threshold: float = DEDUP_THRESHOLD
) -> Dict[str, Any]:
"""
Apply deduplication to ideas from a single condition.
Returns:
Dict with deduplicated ideas and statistics
"""
if not ideas:
return {
"unique_ideas": [],
"unique_ideas_with_source": [],
"groups": [],
"stats": {
"pre_dedup_count": 0,
"post_dedup_count": 0,
"duplicates_removed": 0,
"survival_rate": 1.0
}
}
# Convert to description format
descriptions = ideas_to_descriptions(ideas, ideas_with_source)
# Run deduplication
result = await embedding_service.deduplicate(
descriptions=descriptions,
threshold=threshold
)
# Extract unique ideas (representatives from each group)
unique_ideas = []
unique_ideas_with_source = []
groups_info = []
for group in result.groups:
rep = group.representative
unique_ideas.append(rep.description)
# Reconstruct source info
source_info = {
"idea": rep.description,
"keyword": rep.keyword,
"expert_name": rep.expert_name
}
unique_ideas_with_source.append(source_info)
# Group info for analysis
group_info = {
"representative": rep.description,
"duplicates": [d.description for d in group.duplicates],
"duplicate_count": len(group.duplicates),
"similarity_scores": group.similarity_scores
}
groups_info.append(group_info)
pre_count = len(ideas)
post_count = len(unique_ideas)
survival_rate = post_count / pre_count if pre_count > 0 else 1.0
return {
"unique_ideas": unique_ideas,
"unique_ideas_with_source": unique_ideas_with_source,
"groups": groups_info,
"stats": {
"pre_dedup_count": pre_count,
"post_dedup_count": post_count,
"duplicates_removed": pre_count - post_count,
"survival_rate": survival_rate
}
}
async def process_experiment_results(
input_file: Path,
output_file: Optional[Path] = None,
threshold: float = DEDUP_THRESHOLD
) -> Dict[str, Any]:
"""
Process an experiment results file and apply deduplication.
Args:
input_file: Path to experiment results JSON
output_file: Path for output (default: input_file with _deduped suffix)
threshold: Similarity threshold for deduplication
Returns:
Processed results with deduplication applied
"""
# Load experiment results
with open(input_file, "r", encoding="utf-8") as f:
experiment = json.load(f)
logger.info(f"Processing experiment: {experiment.get('experiment_id', 'unknown')}")
logger.info(f"Deduplication threshold: {threshold}")
# Process each query's conditions
dedup_summary = {
"threshold": threshold,
"conditions": {}
}
for query_result in experiment["results"]:
query = query_result["query"]
query_id = query_result["query_id"]
logger.info(f"\nProcessing query: {query} ({query_id})")
for condition, cond_result in query_result["conditions"].items():
if not cond_result.get("success", False):
logger.warning(f" Skipping failed condition: {condition}")
continue
logger.info(f" Deduplicating {condition}...")
ideas = cond_result.get("ideas", [])
ideas_with_source = cond_result.get("ideas_with_source", [])
dedup_result = await deduplicate_condition(
ideas=ideas,
ideas_with_source=ideas_with_source,
threshold=threshold
)
# Add dedup results to condition
cond_result["dedup"] = dedup_result
# Update summary stats
if condition not in dedup_summary["conditions"]:
dedup_summary["conditions"][condition] = {
"total_pre_dedup": 0,
"total_post_dedup": 0,
"total_removed": 0,
"query_stats": []
}
stats = dedup_result["stats"]
cond_summary = dedup_summary["conditions"][condition]
cond_summary["total_pre_dedup"] += stats["pre_dedup_count"]
cond_summary["total_post_dedup"] += stats["post_dedup_count"]
cond_summary["total_removed"] += stats["duplicates_removed"]
cond_summary["query_stats"].append({
"query_id": query_id,
"query": query,
**stats
})
logger.info(f" {stats['pre_dedup_count']} -> {stats['post_dedup_count']} "
f"(survival: {stats['survival_rate']:.1%})")
# Calculate overall survival rates
for condition, cond_stats in dedup_summary["conditions"].items():
if cond_stats["total_pre_dedup"] > 0:
cond_stats["overall_survival_rate"] = (
cond_stats["total_post_dedup"] / cond_stats["total_pre_dedup"]
)
else:
cond_stats["overall_survival_rate"] = 1.0
# Add dedup summary to experiment
experiment["dedup_summary"] = dedup_summary
# Save results
if output_file is None:
stem = input_file.stem.replace("_complete", "").replace("_intermediate", "")
output_file = input_file.parent / f"{stem}_deduped.json"
with open(output_file, "w", encoding="utf-8") as f:
json.dump(experiment, f, indent=2, ensure_ascii=False)
logger.info(f"\nResults saved to: {output_file}")
return experiment
def print_dedup_summary(experiment: Dict[str, Any]):
"""Print formatted deduplication summary."""
dedup = experiment.get("dedup_summary", {})
print("\n" + "=" * 70)
print("DEDUPLICATION SUMMARY")
print("=" * 70)
print(f"Threshold: {dedup.get('threshold', 'N/A')}")
print("\nResults by condition:")
print("-" * 70)
print(f"{'Condition':<30} {'Pre-Dedup':<12} {'Post-Dedup':<12} {'Survival':<10}")
print("-" * 70)
for condition, stats in dedup.get("conditions", {}).items():
pre = stats.get("total_pre_dedup", 0)
post = stats.get("total_post_dedup", 0)
survival = stats.get("overall_survival_rate", 1.0)
print(f"{condition:<30} {pre:<12} {post:<12} {survival:<10.1%}")
print("-" * 70)
print("\nInterpretation:")
print("- Higher survival rate = more diverse/unique ideas")
print("- Lower survival rate = more redundant ideas removed")
async def main():
parser = argparse.ArgumentParser(
description="Apply deduplication to experiment results"
)
parser.add_argument(
"--input",
type=str,
required=True,
help="Input experiment results JSON file"
)
parser.add_argument(
"--output",
type=str,
help="Output file path (default: input_deduped.json)"
)
parser.add_argument(
"--threshold",
type=float,
default=DEDUP_THRESHOLD,
help=f"Similarity threshold (default: {DEDUP_THRESHOLD})"
)
args = parser.parse_args()
input_path = Path(args.input)
if not input_path.exists():
# Try relative to results dir
input_path = RESULTS_DIR / args.input
if not input_path.exists():
print(f"Error: Input file not found: {args.input}")
sys.exit(1)
output_path = Path(args.output) if args.output else None
experiment = await process_experiment_results(
input_file=input_path,
output_file=output_path,
threshold=args.threshold
)
print_dedup_summary(experiment)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,301 @@
# AUT 彈性評估方法說明
## 什麼是 AUT替代用途任務
AUTAlternative Uses Task替代用途任務是一個經典的**發散性思維測試**,由 Guilford 在 1967 年提出。
**測試方式:**
```
問題:「請列出磚塊的所有可能用途」
典型回答:
1. 蓋房子
2. 當門擋
3. 壓紙張
4. 當武器
5. 墊高東西
...
```
---
## Torrance 創造力四維度
| 維度 | 中文 | 定義 | 測量方式 |
|------|------|------|----------|
| **Fluency** | 流暢性 | 產生多少想法 | 計算數量 |
| **Flexibility** | 彈性/靈活性 | 想法涵蓋多少不同類別 | 計算類別數 |
| **Originality** | 原創性 | 想法的稀有程度 | 統計罕見度 |
| **Elaboration** | 精緻性 | 想法的詳細程度 | 評估細節 |
---
## 我們實作的三種彈性評估方法
### 方法一LLM 雙階段分類法Hadas & Hershkovitz 2024
**原理:** 讓大型語言模型識別想法的語義類別,然後計算類別數量
```
第一階段:讓 LLM 識別所有想法的語義類別
輸入:「椅子」的 195 個創意想法
輸出:["交通運輸", "藝術裝飾", "醫療健康", "教育", "儲存", ...]
第二階段:將每個想法分配到類別
想法 1「太陽能充電椅」→ 科技類
想法 2「椅子改裝成擔架」→ 醫療類
想法 3「椅腳當鼓棒」→ 藝術類
彈性分數 = 使用的不同類別數量
```
**優點:** 類別名稱有語義意義,可解釋性強
**缺點:** 依賴 LLM 的一致性,可能有解析錯誤
---
### 方法二嵌入向量階層式聚類法arXiv:2405.00899
**原理:** 將想法轉換成向量,用數學方法自動分群
```
步驟 1將每個想法轉換成向量embedding
「太陽能充電椅」→ [0.12, -0.34, 0.56, ...]1024 維)
步驟 2使用 Ward 連結法進行階層式聚類
計算所有想法之間的餘弦距離
由下而上合併最相似的群組
步驟 3在相似度 ≥ 0.7 的閾值切割樹狀圖
確保同一群內的想法夠相似
彈性分數 = 產生的群集數量
```
**優點:** 客觀、可重現、不依賴 LLM 判斷
**缺點:** 群集沒有語義標籤,需要人工解讀
---
### 方法三組合跳躍信號分析Combined Jump Signal, arXiv:2405.00899
**原理:** 使用更嚴格的「真正跳躍」定義,減少假陽性
```
組合跳躍 = 類別跳躍 ∧ 語義跳躍
類別跳躍jumpcat連續想法屬於不同的 embedding 群集
語義跳躍jumpSS連續想法的語義相似度 < 0.7
真正跳躍 = 兩個條件都必須成立
```
**為什麼需要組合跳躍?**
```
問題:單獨使用類別跳躍可能產生假陽性
例如:「人體工學椅」和「可調節椅」
- 可能被分到不同群集(類別跳躍 = True
- 但語義上很相似(語義跳躍 = False
- 不應該算作真正的「創意跳躍」
解決:組合跳躍要求兩者同時成立,更準確
```
| 跳躍比率 | 探索模式 | 含義 |
|----------|----------|------|
| 高(>45% | 靈活探索Flexible | 廣泛切換類別,思維跳躍 |
| 中30-45% | 混合模式Mixed | 適度切換 |
| 低(<30% | 持續探索Persistent | 深入單一領域,專注發展 |
**應用:** 區分 LLM 與人類的創意模式差異
---
## 研究發現
### 發現一新穎性Novelty與彈性Flexibility是獨立維度
| 條件 | 新穎性分數 | 彈性(群集數) | 平均相似度 | 模式 |
|------|:----------:|:--------------:|:----------:|------|
| C4 完整管線 | **0.395**(最高) | 10 | 0.583 | 高新穎、中等彈性 |
| C5 隨機視角 | 0.365 | **15**(最高) | 0.521 | 高新穎、高彈性 |
| C2 專家視角 | 0.315 | 13 | 0.517 | 中等新穎、高彈性 |
| C3 屬性分解 | 0.337 | 12 | - | 中等新穎、中等彈性 |
| C1 直接生成 | 0.273(最低) | **1**(最低) | 0.647 | 低新穎、低彈性 |
**視覺化解讀:**
```
C1 直接生成的想法:
┌─────────────────────────────────────┐
│ ○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○ │ ← 所有想法集中在一個「普通領域」
│ (彼此相似,且都很典型) │ (低新穎性 + 低彈性)
└─────────────────────────────────────┘
C5 隨機視角的想法:
┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐
│ ★ │ │ ★ │ │ ★ │ │ ★ │ │ ★ │ ← 分散在多個「新穎領域」
└───┘ └───┘ └───┘ └───┘ └───┘ (高新穎性 + 高彈性)
↑ ↑ ↑ ↑ ↑
交通 醫療 藝術 教育 科技
C4 完整管線的想法:
┌─────────────────┐
┌──┤ ★★★★★★★★★★★★ ├──┐ ← 集中在一個「新穎領域」但有多個子類別
│ └─────────────────┘ │ (最高新穎性 + 中等彈性)
│ ↓ │
└── 10 個語義群集 ───────┘
```
### 發現二:組合跳躍信號分析結果
| 條件 | 類別跳躍 | 語義跳躍 | **組合跳躍** | 彈性檔案 |
|------|:--------:|:--------:|:------------:|:--------:|
| C2 專家視角 | 54 | 125 | **48** | 持續探索 |
| C3 屬性分解 | 34 | 107 | **33** | 持續探索 |
| C5 隨機視角 | 22 | 116 | **20** | 持續探索 |
| C4 完整管線 | 13 | 348 | **13** | 持續探索 |
| C1 直接生成 | 0 | 104 | **0** | 持續探索 |
**組合跳躍比率:**
| 條件 | 組合跳躍比率 | 彈性檔案 | 解讀 |
|------|:------------:|:--------:|------|
| C3 屬性分解 | **26.6%** | Persistent | 適度類別切換 |
| C2 專家視角 | **24.4%** | Persistent | 適度類別切換 |
| C5 隨機視角 | 10.1% | Persistent | 較低類別切換 |
| C4 完整管線 | **3.2%** | Persistent | 非常專注的探索 |
| C1 直接生成 | 0.0% | Persistent | 單一群集(無跳躍) |
**關鍵洞察:** 組合跳躍 ≤ 類別跳躍(符合預期)。所有條件都呈現「持續探索」模式。
---
### 發現三:🔑 原創性-彈性相關性(關鍵發現)
**論文發現arXiv:2405.00899**
- **人類:** 原創性與彈性**無相關**r ≈ 0
- **典型 LLM** **正相關** — 靈活的 LLM 原創性更高
**我們的結果:**
| 指標 | 數值 | 解讀 |
|------|:----:|------|
| **Pearson r** | **0.071** | 接近零的相關性 |
| 模式 | **類似人類** | 打破典型 LLM 模式 |
**各條件數據:**
| 條件 | 新穎性分數 | 彈性(組合跳躍數) |
|------|:----------:|:------------------:|
| C4 完整管線 | **0.395**(最高) | **13**(最低) |
| C5 隨機視角 | 0.365 | 20 |
| C3 屬性分解 | 0.337 | 33 |
| C2 專家視角 | 0.315 | 48最高 |
| C1 直接生成 | 0.273(最低) | 0 |
**重大發現:** 屬性+專家管線C4實現**最高新穎性但最低彈性**
證明結構化的無上下文生成能產生**聚焦的新穎性**而非分散的探索。
**這意味著什麼?**
```
典型 LLM 模式:
彈性高 → 新穎性高(正相關)
想法越分散,越可能遇到新穎概念
我們的管線C4
彈性低 + 新穎性高(打破模式)
專注探索一個新穎領域,而非到處跳躍
這是「類似人類」的創意模式!
人類專家通常深入探索一個領域,而非廣泛但淺薄地涉獵
```
---
## 這對創意研究的意義
1. **創造力是多維度的**
- 新穎性Novelty和彈性Flexibility是**獨立維度**
- 高新穎不代表高彈性,反之亦然
- 需要同時考慮流暢性、彈性、原創性、精緻性
2. **管線設計的取捨**
| 策略 | 新穎性 | 彈性 | 特點 |
|------|:------:|:----:|------|
| 直接生成C1 | 低 | 低 | 快速但普通 |
| 專家視角C2 | 中 | 高 | 多元觀點 |
| 隨機視角C5 | 高 | **最高** | 強迫跳躍 |
| 完整管線C4 | **最高** | 中 | 結構化新穎 |
3. **為什麼專家/隨機視角產生更多類別?**
```
C1 直接生成:
LLM 沒有外部刺激 → 停留在「家具改良」單一領域
平均相似度 0.647(最高)→ 想法彼此很像
C2 專家視角:
4 個不同領域專家 → 引入不同思維框架
平均相似度 0.517(較低)→ 想法更分散
C5 隨機視角:
隨機詞彙強迫跳躍 → 意外的連結
平均相似度 0.521 → 最多語義類別15 個)
```
4. **實務建議**
- 若需要**高新穎性**使用完整管線C4
- 若需要**高彈性/多元性**使用隨機視角C5或專家視角C2
- 若需要**兩者兼顧**:可能需要混合策略
---
## 方法論修正說明
### 原始演算法的問題
最初的聚類演算法有邏輯錯誤:
```
原本的邏輯(錯誤):
目標:找到群內相似度 >= 0.7 的群集
問題:當想法很分散時(低相似度),
無法形成符合閾值的緊密群集
→ 演算法放棄,回傳 1 個群集
結果C2/C5 的分散想法被錯誤標記為「1 個群集」
```
### 修正後的演算法
```
修正後的邏輯(正確):
方法:使用 average linkage 階層式聚類
閾值:在距離 0.5 處切割樹狀圖
(即相似度 < 0.5 時分開)
結果:分散的想法正確地被分成多個群集
```
### 結果對比
| 條件 | 修正前群集數 | 修正後群集數 | 平均相似度 |
|------|:------------:|:------------:|:----------:|
| C1 直接生成 | 29 | **1** | 0.647(高) |
| C2 專家視角 | 1 | **13** | 0.517(低) |
| C5 隨機視角 | 1 | **15** | 0.521(低) |
**關鍵洞察:** 低相似度 = 高多元性 = 高彈性分數
---
## 參考文獻
1. Hadas & Hershkovitz (2024). "Using Large Language Models to Evaluate Alternative Uses Task Flexibility Score." *Thinking Skills and Creativity*, Vol. 52.
2. arXiv:2405.00899 - "Characterising Creative Process in Humans and LLMs" - Jump signal methodology
3. Guilford, J.P. (1967). *The Nature of Human Intelligence*. McGraw-Hill.
4. Torrance, E.P. (1974). *Torrance Tests of Creative Thinking*. Scholastic Testing Service.

View File

@@ -0,0 +1,477 @@
# 創意過程特徵化指標詳解
## 基於 arXiv:2405.00899 論文的方法論
**論文標題:** "Characterising the Creative Process in Humans and Large Language Models"
**來源:** [arXiv:2405.00899](https://arxiv.org/html/2405.00899v2)
本文檔詳細解釋我們從該論文引入的創意過程評估指標,以及這些指標在我們實驗中揭示的重要發現。
---
## 一、組合跳躍信號Combined Jump Signal
### 1.1 什麼是「跳躍」?
在創意發散思維中,「跳躍」指的是連續產生的想法之間的**語義類別切換**。
```
想法序列範例:
1. 太陽能充電椅 → 科技類
2. 智慧溫控座椅 → 科技類(無跳躍)
3. 椅子改裝成擔架 → 醫療類(跳躍!)
4. 輪椅輔助站立功能 → 醫療類(無跳躍)
5. 椅腳當鼓棒 → 藝術類(跳躍!)
```
### 1.2 為什麼需要「組合」跳躍?
**原始方法的問題:**
單純使用類別跳躍jumpcat可能產生**假陽性**
```
問題情境:
想法 A「可折疊露營椅」 → 群集 1
想法 B「便攜式野餐椅」 → 群集 2
類別跳躍 = True不同群集
但這兩個想法語義上非常相似!
這不應該算作真正的「創意跳躍」
```
**論文的解決方案:組合跳躍信號**
```
組合跳躍 = 類別跳躍 ∧ 語義跳躍
其中:
類別跳躍jumpcat連續想法屬於不同的 embedding 群集
語義跳躍jumpSS連續想法的餘弦相似度 < 0.7
真正跳躍 = 兩個條件都必須成立
```
### 1.3 數學定義
對於連續的想法 $i$ 和 $i-1$
$$
\text{jump}_i = \text{jump}_{cat,i} \land \text{jump}_{SS,i}
$$
其中:
- $\text{jump}_{cat,i} = \mathbb{1}[c_i \neq c_{i-1}]$(類別是否改變)
- $\text{jump}_{SS,i} = \mathbb{1}[\text{sim}(e_i, e_{i-1}) < 0.7]$(相似度是否低於閾值)
### 1.4 我們的實驗結果
| 條件 | 類別跳躍 | 語義跳躍 | **組合跳躍** | 組合比率 |
|------|:--------:|:--------:|:------------:|:--------:|
| C2 專家視角 | 54 | 125 | **48** | 24.4% |
| C3 屬性分解 | 34 | 107 | **33** | 26.6% |
| C5 隨機視角 | 22 | 116 | **20** | 10.1% |
| C4 完整管線 | 13 | 348 | **13** | 3.2% |
| C1 直接生成 | 0 | 104 | **0** | 0.0% |
**關鍵觀察:**
- 組合跳躍 ≤ 類別跳躍(驗證方法有效性)
- C4 的語義跳躍很高348但類別跳躍很低13→ 想法在語義上分散但停留在相似類別
- C1 沒有類別跳躍 → 所有想法在單一語義群集內
---
## 二、彈性檔案分類Flexibility Profile Classification
### 2.1 三種創意探索模式
根據論文研究,創意探索可分為三種模式:
| 檔案 | 英文 | 跳躍比率 | 特徵 |
|------|------|:--------:|------|
| **持續探索** | Persistent | < 30% | 深入單一領域,專注發展想法 |
| **混合模式** | Mixed | 30-45% | 適度切換,平衡深度與廣度 |
| **靈活探索** | Flexible | > 45% | 頻繁跳躍,廣泛涉獵不同領域 |
### 2.2 視覺化理解
```
持續探索Persistent
┌─────────────────────────────────────┐
│ ●→●→●→●→●→●→●→●→●→● │ 深入探索一個領域
│ 科技類 │ 偶爾切換(<30%
│ ↓ │
│ ●→●→●→● │
│ 醫療類 │
└─────────────────────────────────────┘
靈活探索Flexible
┌─────────────────────────────────────┐
│ ●→ ●→ ●→ ●→ ●→ ●→ ●→ ● │ 頻繁在不同領域間跳躍
│ 科 醫 藝 教 科 社 環 科 │ 每個領域停留很短
│ 技 療 術 育 技 會 保 技 │ >45% 跳躍)
└─────────────────────────────────────┘
混合模式Mixed
┌─────────────────────────────────────┐
│ ●→●→●→●→ ●→●→●→ ●→●→●→● │ 適度平衡
│ 科技類 醫療類 藝術類 │ 30-45% 跳躍)
└─────────────────────────────────────┘
```
### 2.3 我們的實驗結果
| 條件 | 組合跳躍比率 | 彈性檔案 | 解讀 |
|------|:------------:|:--------:|------|
| C3 屬性分解 | 26.6% | Persistent | 接近 Mixed 的邊界 |
| C2 專家視角 | 24.4% | Persistent | 適度的類別切換 |
| C5 隨機視角 | 10.1% | Persistent | 較少切換 |
| **C4 完整管線** | **3.2%** | **Persistent** | 非常專注的探索 |
| C1 直接生成 | 0.0% | Persistent | 單一群集 |
**重要發現:** 所有條件都呈現「持續探索」模式,但程度不同。
---
## 三、原創性-彈性相關性分析Originality-Flexibility Correlation
### 3.1 論文的核心發現
arXiv:2405.00899 論文發現了一個關鍵差異:
| 主體 | 原創性與彈性的關係 | 解讀 |
|------|:------------------:|------|
| **人類** | r ≈ 0無相關 | 原創性和彈性是獨立的能力 |
| **典型 LLM** | r > 0正相關 | 越靈活的 LLM 越原創 |
**為什麼會有這種差異?**
```
人類創意模式:
- 有些人善於深入探索(低彈性、高原創)
- 有些人善於廣泛聯想(高彈性、高原創)
- 兩種能力是獨立的維度
典型 LLM 模式:
- LLM 透過「隨機性」產生多樣性
- 高 temperature → 更多跳躍 → 更多意外發現
- 彈性和原創性被「隨機性」綁定在一起
```
### 3.2 我們的實驗結果
**Pearson 相關係數r = 0.071**
| 指標 | 數值 | 解讀 |
|------|:----:|------|
| **Pearson r** | **0.071** | 接近零 |
| 統計意義 | 無顯著相關 | 兩個維度獨立 |
| **模式判定** | **類似人類** | 打破典型 LLM 模式 |
**各條件詳細數據:**
| 條件 | 新穎性(距離質心) | 彈性(組合跳躍數) | 組合 |
|------|:------------------:|:------------------:|------|
| C4 完整管線 | **0.395**(最高) | **13**(最低) | 高新穎 + 低彈性 |
| C5 隨機視角 | 0.365 | 20 | 高新穎 + 低彈性 |
| C3 屬性分解 | 0.337 | 33 | 中新穎 + 中彈性 |
| C2 專家視角 | 0.315 | **48**(最高) | 中新穎 + 高彈性 |
| C1 直接生成 | 0.273(最低) | 0 | 低新穎 + 低彈性 |
### 3.3 這個發現的重大意義
```
┌─────────────────────────────────────────────────────────────┐
│ 原創性-彈性空間 │
│ │
│ 高原創 │ C4● │
│ │ C5● │
│ │ C3● │
│ │ C2● │
│ │ │
│ 低原創 │ C1● │
│ └──────────────────────────────────────────────── │
│ 低彈性 高彈性 │
│ │
│ r = 0.071 → 幾乎垂直於對角線 → 無相關 → 類似人類! │
└─────────────────────────────────────────────────────────────┘
對比典型 LLMr > 0.3
┌─────────────────────────────────────────────────────────────┐
│ 高原創 │ ● │
│ │ ● │
│ │ ● │
│ │ ● │
│ 低原創 │ ● │
│ └──────────────────────────────────────────────── │
│ 低彈性 高彈性 │
│ │
│ r > 0.3 → 沿對角線分布 → 正相關 → 典型 LLM 模式 │
└─────────────────────────────────────────────────────────────┘
```
---
## 四、累積跳躍輪廓Cumulative Jump Profile
### 4.1 什麼是累積跳躍輪廓?
追蹤在想法生成過程中,跳躍次數如何隨時間累積。
```
想法位置: 1 2 3 4 5 6 7 8 9 10
跳躍發生: - - ✓ - ✓ - ✓ ✓ - ✓
累積計數: 0 0 1 1 2 2 3 4 4 5
輪廓線:
5 │ ●
4 │ ●────●
3 │ ●────●
2 │ ●────●
1 │ ●────●
0 │●────●
└────────────────────────────────────────
1 2 3 4 5 6 7 8 9 10
想法位置
```
### 4.2 輪廓線的解讀
| 輪廓特徵 | 含義 | 創意模式 |
|----------|------|----------|
| **陡峭斜率** | 快速累積跳躍 | 頻繁切換類別 |
| **平緩區域** | 跳躍暫停 | 深入探索當前類別 |
| **階梯狀** | 突然爆發跳躍 | 類別耗盡後轉移 |
| **近乎水平** | 幾乎沒有跳躍 | 持續在單一領域 |
### 4.3 我們的實驗視覺化
![累積跳躍輪廓](../results/cumulative_jump_profiles.png)
**各條件輪廓解讀:**
| 條件 | 輪廓特徵 | 創意策略 |
|------|----------|----------|
| C2 專家視角 | 穩定上升 | 持續的類別切換 |
| C3 屬性分解 | 穩定上升 | 持續的類別切換 |
| C5 隨機視角 | 緩慢上升 | 較少切換 |
| C4 完整管線 | 幾乎水平 | 非常專注的單一領域探索 |
| C1 直接生成 | 完全水平 | 無任何類別切換 |
---
## 五、實驗發現的綜合意義
### 5.1 核心發現總結
| 發現 | 內容 | 意義 |
|------|------|------|
| **發現一** | 原創性-彈性相關 r = 0.071 | 管線產生「類似人類」的創意模式 |
| **發現二** | C4 最高新穎性 + 最低彈性 | 結構化方法產生聚焦的新穎性 |
| **發現三** | 所有條件都是 Persistent | LLM 傾向深度探索而非廣度 |
| **發現四** | 組合跳躍 < 類別跳躍 | 驗證方法學的有效性 |
### 5.2 為什麼 C4 能打破 LLM 模式?
```
典型 LLM 的問題:
┌─────────────────────────────────────────────────────────────┐
│ 直接生成:「給我椅子的創新用途」 │
│ │
│ LLM 依賴 temperature 產生多樣性 │
│ → 高 temperature = 更多隨機性 │
│ → 更多隨機性 = 更多跳躍(高彈性) │
│ → 更多跳躍 = 更可能遇到新穎想法(高原創) │
│ │
│ 結果:彈性和原創性被綁定(正相關) │
└─────────────────────────────────────────────────────────────┘
C4 管線的突破:
┌─────────────────────────────────────────────────────────────┐
│ 結構化生成: │
│ │
│ Step 1: 屬性分解 │
│ 「椅子」→ [便攜, 可堆疊, 人體工學, ...] │
│ │
│ Step 2: 專家無上下文關鍵字 │
│ 會計師 + 「便攜」→ 「流動資產」(不知道是椅子!) │
│ │
│ Step 3: 重新結合 │
│ 「椅子」+ 「流動資產」+ 會計師視角 │
│ → 「帶 RFID 資產追蹤的企業椅子」 │
│ │
│ 關鍵機制: │
│ - 結構強制「跳出」典型語義空間(高新穎性) │
│ - 但所有想法都錨定在相同屬性集(低彈性) │
│ - 新穎性來自「強制bisociation」而非「隨機探索」 │
│ │
│ 結果:高新穎性 + 低彈性 → 打破正相關 → 類似人類 │
└─────────────────────────────────────────────────────────────┘
```
### 5.3 這對創意 AI 研究的意義
**理論貢獻:**
1. **證明 LLM 可以產生「類似人類」的創意模式**
- 不是透過模仿人類數據
- 而是透過結構化的創意管線設計
2. **原創性和彈性是可以獨立控制的**
- 傳統認為需要高隨機性才能高原創
- 我們證明結構化約束也能達到高原創
3. **「專注的新穎性」vs「分散的探索」**
- C4深入一個新穎領域專家策略
- C5廣泛接觸多個領域通才策略
- 兩種都有價值,但機制不同
**實務應用:**
| 目標 | 推薦策略 | 原因 |
|------|----------|------|
| 最大化新穎性 | C4 完整管線 | 最高距離質心分數 |
| 最大化類別多樣性 | C2 專家視角 | 最多組合跳躍 |
| 平衡新穎與多樣 | C3 屬性分解 | 中等水平 |
| 快速生成 | C1 直接生成 | 最少 API 調用 |
---
## 六、方法論驗證
### 6.1 組合跳躍 ≤ 類別跳躍
這是方法學的必要條件驗證:
```
邏輯推導:
組合跳躍 = 類別跳躍 ∧ 語義跳躍
當類別跳躍 = False 時:
組合跳躍 = False ∧ ? = False
當類別跳躍 = True 時:
組合跳躍 = True ∧ 語義跳躍 = 語義跳躍(可能 True 或 False
因此:組合跳躍 ≤ 類別跳躍(必然成立)
```
**實驗驗證:**
| 條件 | 類別跳躍 | 組合跳躍 | 驗證 |
|------|:--------:|:--------:|:----:|
| C2 | 54 | 48 | ✓ |
| C3 | 34 | 33 | ✓ |
| C5 | 22 | 20 | ✓ |
| C4 | 13 | 13 | ✓ |
| C1 | 0 | 0 | ✓ |
### 6.2 彈性檔案閾值的選擇
論文使用的閾值30%、45%)基於人類實驗數據的分布。我們的 LLM 實驗中,所有條件都落在 Persistent 區間,這本身就是一個發現:
```
人類分布(論文數據):
Persistent: ~33%
Mixed: ~34%
Flexible: ~33%
我們的 LLM 分布:
Persistent: 100%(所有條件)
Mixed: 0%
Flexible: 0%
解讀:
LLM即使加入專家/屬性引導)仍傾向持續探索模式
這可能是 LLM 架構的固有特性
```
---
## 七、與其他指標的整合
### 7.1 完整指標體系
| 維度 | 指標 | 來源 | C4 表現 |
|------|------|------|:-------:|
| **流暢性** | 想法數量 | Torrance | 402最多 |
| **彈性** | 組合跳躍數 | arXiv:2405.00899 | 13最低 |
| **原創性** | 距離質心 | 本研究 | 0.395(最高) |
| **精緻性** | 平均字數 | Torrance | 26.2 |
### 7.2 C4 的獨特位置
```
創意空間定位:
高原創性
C4 ●│
│ C5●
│ C3●
│ C2●
C1 ●│
└──────────────────── 高彈性
低原創性
C4 占據了「高原創性 + 低彈性」的獨特位置
這在人類創意者中常見(專家型),但在 LLM 中罕見
```
---
## 八、未來研究方向
基於這些發現,建議的後續研究:
1. **跨模型驗證**
- 在 GPT-4、Claude、Llama-3 上重複實驗
- 確認發現是否為通用現象
2. **Temperature 敏感度測試**
- 論文發現 LLM 對 temperature 不敏感
- 測試我們的管線是否也有此特性
3. **人類基準比較**
- 收集人類在相同任務上的數據
- 直接比較彈性檔案分布
4. **管線變體測試**
- 調整屬性數量、專家數量
- 找到最佳平衡點
---
## 參考文獻
1. **arXiv:2405.00899** - "Characterising the Creative Process in Humans and Large Language Models"
- 組合跳躍信號、彈性檔案分類的原始論文
2. **Hadas & Hershkovitz (2024)** - "Using LLMs to Evaluate AUT Flexibility Score"
- LLM 雙階段分類法的來源
3. **Torrance (1974)** - *Torrance Tests of Creative Thinking*
- 創造力四維度框架
4. **Koestler (1964)** - *The Act of Creation*
- Bisociation 理論基礎
---
## 附錄:程式碼參考
相關分析程式碼位於:
- `experiments/aut_flexibility_analysis.py`
- `compute_jump_signal()` - 組合跳躍計算
- `classify_flexibility_profile()` - 彈性檔案分類
- `analyze_originality_flexibility_correlation()` - 相關性分析
- `compute_cumulative_jump_profile()` - 累積跳躍輪廓
- `plot_cumulative_jump_profiles()` - 視覺化
執行分析:
```bash
cd experiments
source ../backend/venv/bin/activate
python aut_flexibility_analysis.py experiment_20260119_165650_deduped.json
```

View File

@@ -0,0 +1,259 @@
# Experiment Design: 5-Condition Idea Generation Study
**Date:** January 19, 2026
**Version:** 1.0
**Status:** Pilot Implementation
## Overview
This experiment tests whether the novelty-seeking system's two key mechanisms—**attribute decomposition** and **expert transformation**—independently and jointly improve creative ideation quality compared to direct LLM generation.
## Research Questions
1. Does decomposing a query into structured attributes improve idea diversity?
2. Do expert perspectives improve idea novelty?
3. Do these mechanisms have synergistic effects when combined?
4. Is the benefit from experts due to domain knowledge, or simply perspective-shifting?
## Experimental Design
### 2×2 Factorial Design + Control
| | No Attributes | With Attributes |
|--------------------|---------------|-----------------|
| **No Experts** | C1: Direct | C3: Attr-Only |
| **With Experts** | C2: Expert-Only | C4: Full Pipeline |
**Plus:** C5: Random-Perspective (tests perspective-shifting without domain knowledge)
### Condition Descriptions
#### C1: Direct Generation (Baseline)
- Single LLM call: "Generate 20 creative ideas for [query]"
- No attribute decomposition
- No expert perspectives
- Purpose: Baseline for standard LLM ideation
#### C2: Expert-Only
- 4 experts from curated occupations
- Each expert generates 5 ideas directly for the query
- No attribute decomposition
- Purpose: Isolate expert contribution
#### C3: Attribute-Only
- Decompose query into 4 fixed categories
- Generate attributes per category
- Direct idea generation per attribute (no expert framing)
- Purpose: Isolate attribute decomposition contribution
#### C4: Full Pipeline
- Full attribute decomposition (4 categories)
- Expert transformation (4 experts × 1 keyword per attribute)
- Purpose: Test combined mechanism (main system)
#### C5: Random-Perspective
- 4 random words per query (from curated pool)
- Each word used as a "perspective" to generate 5 ideas
- Purpose: Control for perspective-shifting vs. expert knowledge
---
## Key Design Decisions & Rationale
### 1. Why 5 Conditions?
C1-C4 form a 2×2 factorial design that isolates the independent contributions of:
- **Attribute decomposition** (C1 vs C3, C2 vs C4)
- **Expert perspectives** (C1 vs C2, C3 vs C4)
C5 addresses a critical confound: if experts improve ideation, is it because of their **domain knowledge** or simply because any **perspective shift** helps? By using random words instead of domain experts, C5 tests whether the perspective-taking mechanism alone provides benefits.
### 2. Why Random Words in C5 (Not Fixed)?
**Decision:** Use randomly sampled words (with seed) rather than a fixed set.
**Rationale:**
- Stronger generalization: results hold across many word combinations
- Avoids cherry-picking accusation ("you just picked easy words")
- Reproducible via random seed (seed=42)
- Each query gets different random words, increasing robustness
### 3. Why Apply Deduplication Uniformly?
**Decision:** Apply embedding-based deduplication (threshold=0.85) to ALL conditions after generation.
**Rationale:**
- Fair comparison: all conditions normalized to unique ideas
- Creates "dedup survival rate" as an additional metric
- Hypothesis: Full Pipeline ideas are diverse (low redundancy), not just numerous
- Direct generation may produce many similar ideas that collapse after dedup
### 4. Why FIXED_ONLY Categories?
**Decision:** Use 4 fixed categories: Functions, Usages, User Groups, Characteristics
**Rationale:**
- Best for proof power: isolates "attribute decomposition" effect
- No confound from dynamic category selection variability
- Universal applicability: these 4 categories apply to objects, technology, and services
- Dropped "Materials" category as it doesn't apply well to services
### 5. Why Curated Expert Source?
**Decision:** Use curated occupations (210 professions) rather than LLM-generated experts.
**Rationale:**
- Reproducibility: same occupation pool across runs
- Consistency: no variance from LLM expert generation
- Control: we know exactly which experts are available
- Validation: occupations were manually curated for diversity
### 6. Why Temperature 0.9?
**Decision:** Use temperature=0.9 for all conditions.
**Rationale:**
- Higher temperature encourages more diverse/creative outputs
- Matches typical creative task settings
- Consistent across conditions for fair comparison
- Lower temperatures (0.7) showed more repetitive outputs in testing
### 7. Why 10 Pilot Queries?
**Decision:** Start with 10 queries before scaling to full 30.
**Rationale:**
- Validate pipeline works before full investment
- Catch implementation bugs early
- Balanced across categories (3 everyday, 3 technology, 4 services)
- Sufficient for initial pattern detection
---
## Configuration Summary
| Setting | Value | Rationale |
|---------|-------|-----------|
| **LLM Model** | qwen3:8b | Local, fast, consistent |
| **Temperature** | 0.9 | Encourages creativity |
| **Expert Count** | 4 | Balance diversity vs. cost |
| **Expert Source** | Curated | Reproducibility |
| **Keywords/Expert** | 1 | Simplifies analysis |
| **Language** | English | Consistency |
| **Categories** | Functions, Usages, User Groups, Characteristics | Universal applicability |
| **Dedup Threshold** | 0.85 | Standard similarity cutoff |
| **Random Seed** | 42 | Reproducibility |
| **Pilot Queries** | 10 | Validation before scaling |
---
## Query Selection
### Pilot Queries (10)
| ID | Query | Category |
|----|-------|----------|
| A1 | Chair | Everyday |
| A5 | Bicycle | Everyday |
| A7 | Smartphone | Everyday |
| B1 | Solar panel | Technology |
| B3 | 3D printer | Technology |
| B4 | Drone | Technology |
| C1 | Food delivery service | Services |
| C2 | Online education platform | Services |
| C4 | Public transportation | Services |
| C9 | Elderly care service | Services |
### Selection Criteria
- Balanced across 3 domains (everyday objects, technology, services)
- Varying complexity levels
- Different user familiarity levels
- Subset from full 30-query experimental protocol
---
## Random Word Pool (C5)
35 words selected across 7 conceptual categories:
| Category | Words |
|----------|-------|
| Nature | ocean, mountain, forest, desert, cave |
| Optics | microscope, telescope, kaleidoscope, prism, lens |
| Animals | butterfly, elephant, octopus, eagle, ant |
| Weather | sunrise, thunderstorm, rainbow, fog, aurora |
| Art | clockwork, origami, mosaic, symphony, ballet |
| Temporal | ancient, futuristic, organic, crystalline, liquid |
| Sensory | whisper, explosion, rhythm, silence, echo |
**Selection Criteria:**
- Concrete and evocative (easy to generate associations)
- Diverse domains (no overlap with typical expert knowledge)
- No obvious connection to test queries
- Equal representation across categories
---
## Expected Outputs
### Per Condition Per Query
| Condition | Expected Ideas (pre-dedup) | Mechanism |
|-----------|---------------------------|-----------|
| C1 | 20 | Direct request |
| C2 | 20 | 4 experts × 5 ideas |
| C3 | ~20 | Varies by attribute count |
| C4 | ~20 | 4 experts × ~5 keywords × 1 description |
| C5 | 20 | 4 words × 5 ideas |
### Metrics to Collect
1. **Pre-deduplication count**: Raw ideas generated
2. **Post-deduplication count**: Unique ideas after similarity filtering
3. **Dedup survival rate**: post/pre ratio
4. **Generation metadata**: Experts/words used, attributes generated
---
## File Structure
```
experiments/
├── __init__.py
├── config.py # Experiment configuration
├── docs/
│ └── experiment_design_2026-01-19.md # This file
├── conditions/
│ ├── __init__.py
│ ├── c1_direct.py
│ ├── c2_expert_only.py
│ ├── c3_attribute_only.py
│ ├── c4_full_pipeline.py
│ └── c5_random_perspective.py
├── data/
│ ├── queries.json # 10 pilot queries
│ └── random_words.json # Word pool for C5
├── generate_ideas.py # Main runner
├── deduplication.py # Post-processing
└── results/ # Output (gitignored)
```
---
## Verification Checklist
- [ ] Each condition produces expected number of ideas
- [ ] Deduplication reduces count meaningfully
- [ ] Results JSON contains all required metadata
- [ ] Random seed produces reproducible C5 results
- [ ] No runtime errors on all 10 pilot queries
---
## Next Steps After Pilot
1. Analyze pilot results for obvious issues
2. Adjust parameters if needed (idea count normalization, etc.)
3. Scale to full 30 queries
4. Human evaluation of idea quality (novelty, usefulness, feasibility)
5. Statistical analysis of condition differences

View File

@@ -0,0 +1,813 @@
---
marp: true
theme: default
paginate: true
backgroundColor: #fff
style: |
section {
font-size: 24px;
}
h1 {
color: #2c3e50;
}
h2 {
color: #34495e;
}
table {
font-size: 18px;
}
.columns {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 1rem;
}
---
# Breaking Semantic Gravity in LLM-Based Creative Ideation
## A Pilot Study on Attribute Decomposition and Expert Perspectives
**Date:** January 19, 2026
**Model:** Qwen3:8b (Temperature: 0.9)
**Queries:** 10 pilot queries
---
# Research Problem
## The "Semantic Gravity" Challenge
LLMs tend to generate ideas clustered around **high-probability training distributions**
```
Query: "Chair"
Typical LLM output:
- Ergonomic office chair
- Comfortable reading chair
- Foldable portable chair
← All within "furniture comfort" semantic cluster
```
**Goal:** Break this gravitational pull toward obvious solutions
---
# Theoretical Framework
## Bisociation Theory (Koestler, 1964)
Creative thinking occurs when two unrelated "matrices of thought" collide
**Our Approach:**
1. **Attribute Decomposition** → Break object into structural components
2. **Expert Perspectives** → Introduce distant domain knowledge
3. **Context-Free Keywords** → Force unexpected conceptual leaps
---
# Experimental Design
## 2×2 Factorial + Control
| Condition | Attributes | Experts | Description |
|-----------|:----------:|:-------:|-------------|
| **C1** Direct | - | - | Baseline: Direct LLM generation |
| **C2** Expert-Only | - | ✓ | Expert perspectives without structure |
| **C3** Attribute-Only | ✓ | - | Structure without expert knowledge |
| **C4** Full Pipeline | ✓ | ✓ | Combined approach |
| **C5** Random-Perspective | - | Random | Control: Random words as "experts" |
---
# Research Questions
1. **RQ1:** Does attribute decomposition increase idea diversity?
2. **RQ2:** Do expert perspectives increase idea diversity?
3. **RQ3:** Is there a synergistic (super-additive) interaction effect?
4. **RQ4:** Do domain-relevant experts outperform random perspectives?
---
# Pipeline Architecture
## C4: Full Pipeline Process
```
Query: "Chair"
Step 1: Attribute Decomposition
→ "portable", "stackable", "ergonomic", ...
Step 2: Context-Free Keyword Generation (Expert sees ONLY attribute)
→ Accountant + "portable" → "mobile assets"
→ Architect + "portable" → "modular units"
Step 3: Idea Synthesis (Reunite with query)
→ "Chair" + "mobile assets" + Accountant perspective
→ "Asset-tracking chairs for corporate inventory management"
```
---
# Key Design Decision
## Context-Free Keyword Generation
The expert **never sees the original query** when generating keywords
```python
# Step 2: Expert sees only attribute
prompt = f"As a {expert}, what keyword comes to mind for '{attribute}'?"
# Input: "portable" (NOT "portable chair")
# Step 3: Reunite with query
prompt = f"Apply '{keyword}' to '{query}' from {expert}'s perspective"
# Input: "mobile assets" + "Chair" + "Accountant"
```
**Purpose:** Force bisociation by preventing obvious associations
---
# Pilot Study Parameters
## Model & Generation Settings
| Parameter | Value |
|-----------|-------|
| LLM Model | Qwen3:8b (Ollama) |
| Temperature | 0.9 |
| Ollama Endpoint | localhost:11435 |
| Language | English |
| Random Seed | 42 |
---
# Pilot Study Parameters (cont.)
## Pipeline Configuration
| Parameter | Value |
|-----------|-------|
| Queries | 10 (Chair, Bicycle, Smartphone, Solar panel, 3D printer, Drone, Food delivery, Online education, Public transport, Elderly care) |
| Attribute Categories | 4 (Functions, Usages, User Groups, Characteristics) |
| Attributes per Category | 5 |
| Expert Source | Curated (210 occupations) |
| Experts per Query | 4 |
| Keywords per Expert | 1 |
---
# Pilot Study Parameters (cont.)
## Output & Evaluation
| Parameter | Value |
|-----------|-------|
| Total Ideas Generated | 1,119 (after deduplication) |
| Ideas by Condition | C1: 195, C2: 198, C3: 125, C4: 402, C5: 199 |
| Deduplication Threshold | 0.90 (cosine similarity) |
| Embedding Model | qwen3-embedding:4b (1024D) |
---
# Background: Embedding Models Evolution
## From Static to Contextual Representations
| Generation | Model | Characteristics | Limitation |
|------------|-------|-----------------|------------|
| **1st Gen** | Word2Vec, GloVe | Static vectors, one vector per word | "bank" = same vector (river vs finance) |
| **2nd Gen** | BERT, Sentence-BERT | Contextual, transformer-based | Limited context window, older training |
| **3rd Gen** | Qwen3-embedding | LLM-based, instruction-tuned | Requires more compute |
---
# Background: Transformer vs LLM-based Embedding
## Architecture Differences
| Aspect | Transformer (BERT) | LLM-based (Qwen3) |
|--------|-------------------|-------------------|
| **架構** | Encoder-only | Decoder-only (GPT-style) |
| **訓練目標** | MLM (遮罩語言模型) | Next-token prediction |
| **訓練數據** | ~16GB (Wikipedia + Books) | ~數 TB (網頁、程式碼、書籍) |
| **參數量** | 110M - 340M | 4B+ |
| **上下文** | 512 tokens | 8K - 128K tokens |
---
# Background
## Key Comparison
```
1. 較多的知識訓練
BERT: 只知道 2019 年前的知識
Qwen3: 知道 "drone delivery", "AI-powered", "IoT" 等現代概念
2. 較廣語義理解
BERT: "chair for elderly" ≈ "elderly chair" (詞袋相似)
Qwen3: 理解 "mobility assistance" vs "comfort seating" 的差異
3. 接受指令微調 (Instruction Tuning)
傳統: 無法根據任務意圖調整
Qwen3: 可以理解 "找出創意想法之間的語義差異"
```
---
# Background: Qwen3-Embedding?
## Comparison with Traditional Methods
```
傳統 Sentence-BERT (all-MiniLM-L6-v2):
- 384 維向量
- 訓練於 2021 年之前的數據
- 對短句效果好,長文本理解有限
- Encoder-onlyMLM 訓練
Qwen3-Embedding (qwen3-embedding:4b):
- 1024 維向量(更豐富的語義表達)
- 基於 Qwen3 LLM2024+ 訓練數據)
- 支援長上下文8K tokens
- 指令微調instruction-tuned→ 配合任務意圖
- 繼承 LLM 的部分能力
```
**選擇理由:** 創意想法通常較長且語義複雜,需要更強的上下文理解能力
---
# Background: How Embedding Works
## Semantic Similarity via Vector Space
```
Step 1: 將文字轉換為向量
"Solar-powered charging chair" → [0.12, -0.34, 0.56, ..., 0.78] (1024D)
Step 2: 計算餘弦相似度
similarity = cos(θ) = (A · B) / (|A| × |B|)
Step 3: 相似度解讀
1.0 = 完全相同
0.9 = 非常相似(可能是重複想法)
0.5 = 中等相關
0.0 = 無關
```
**應用:** 去重similarity > 0.9、彈性分析clustering、新穎性centroid distance
---
# Results: Semantic Diversity
## Mean Pairwise Distance (Higher = More Diverse)
> **Method:** We convert each idea into a vector embedding (qwen3-embedding:4b), then calculate the average cosine distance between all pairs of ideas within each condition. Higher values indicate ideas are more spread out in semantic space.
| Condition | Mean | SD | vs C1 (Cohen's d) |
|-----------|:----:|:--:|:-----------------:|
| C1 Direct | 0.294 | 0.039 | - |
| C2 Expert-Only | 0.400 | 0.028 | **3.15*** |
| C3 Attribute-Only | 0.377 | 0.036 | **2.20*** |
| C4 Full Pipeline | 0.395 | 0.019 | **3.21*** |
| C5 Random | 0.405 | 0.062 | **2.72*** |
*p < 0.001, Large effect sizes (d > 0.8)
> **Cohen's d:** Measures effect size (how big the difference is). d > 0.8 = large effect, d > 0.5 = medium, d > 0.2 = small.
---
# Results: ANOVA Summary
## Normalized Diversity Metric
> **Method:** Two-way ANOVA tests whether Attributes and Experts each have independent effects on diversity, and whether combining them produces extra benefit (interaction). F-statistic measures variance between groups vs within groups.
| Effect | F | p | Significant |
|--------|:-:|:-:|:-----------:|
| **Attributes (RQ1)** | 5.31 | 0.027 | Yes |
| **Experts (RQ2)** | 26.07 | <0.001 | Yes |
| **Interaction (RQ3)** | - | - | Sub-additive |
**Key Finding:** Both factors work, but combination is **not synergistic**
---
# Results: Expert vs Random (RQ4)
## C2 (Expert-Only) vs C5 (Random-Perspective)
| Metric | C2 Expert | C5 Random | p-value | Effect |
|--------|:---------:|:---------:|:-------:|:------:|
| Diversity | 0.399 | 0.414 | 0.463 | n.s. |
| Query Distance | 0.448 | 0.437 | 0.654 | n.s. |
**Finding:** Random words perform as well as domain experts
Implication: The value may be in **perspective shift itself**, not expert knowledge
---
# Results: Efficiency Analysis
## Diversity per Idea Generated
| Condition | Mean Ideas | Diversity | Efficiency |
|-----------|:----------:|:---------:|:----------:|
| C1 Direct | 20.0 | 0.293 | 1.46 |
| C2 Expert-Only | 20.0 | 0.399 | **1.99** |
| C3 Attribute-Only | 12.8 | 0.376 | **3.01** |
| C4 Full Pipeline | 51.9 | 0.393 | 0.78 |
| C5 Random | 20.0 | 0.405 | 2.02 |
**C4 produces 2.6× more ideas but achieves same diversity**
---
# Visualization: Diversity by Condition
![height:450px](../results/figures/20260119_165650_diversity_boxplot.png)
---
# Visualization: Query Distance
![height:450px](../results/figures/20260119_165650_query_distance_boxplot.png)
---
# Advanced Analysis: Lexical Diversity
## Type-Token Ratio & Vocabulary Richness
> **Method:** Type-Token Ratio (TTR) = unique words ÷ total words. High TTR means more varied vocabulary; low TTR means more word repetition. Vocabulary size counts total unique words across all ideas in a condition.
| Condition | TTR | Vocabulary | Avg Words/Idea |
|-----------|:---:|:----------:|:--------------:|
| C1 Direct | **0.382** | 853 | 11.5 |
| C2 Expert-Only | 0.330 | 1,358 | 20.8 |
| C3 Attribute-Only | 0.330 | 1,098 | 26.6 |
| C4 Full Pipeline | 0.189 | **1,992** | 26.2 |
| C5 Random | 0.320 | 1,331 | 20.9 |
**Finding:** C4 has largest vocabulary (1,992) but lowest TTR (0.189)
→ More words but more repetition across ideas
---
# Advanced Analysis: Concept Extraction
## Top Keywords by Condition
> **Method:** Extract meaningful keywords from idea texts using NLP (removing stopwords, lemmatization). Top keywords show most frequent concepts; unique keywords count distinct terms. Domain coverage checks if ideas span different knowledge areas.
| Condition | Top Keywords | Unique Keywords |
|-----------|--------------|:---------------:|
| C1 Direct | solar, powered, smart, delivery, drone | 805 |
| C2 Expert | real, create, design, time, develop | 1,306 |
| C3 Attribute | real, time, create, develop, powered | 1,046 |
| C4 Pipeline | time, real, data, ensuring, enhancing | **1,937** |
| C5 Random | like, solar, inspired, energy, uses | 1,286 |
**Finding:** C5 Random shows "inspired" → suggests analogical thinking
All conditions cover 6 domain categories
---
# Advanced Analysis: Novelty Scores
## Distance from Global Centroid (Higher = More Novel)
> **Method:** Compute the centroid (average vector) of ALL ideas across all conditions. Then measure each idea's distance from this "typical idea" center. Ideas far from the centroid are semantically unusual compared to the overall pool.
| Condition | Mean | Std | Interpretation |
|-----------|:----:|:---:|----------------|
| C1 Direct | 0.273 | 0.037 | Closest to "typical" ideas |
| C2 Expert-Only | 0.315 | 0.062 | Moderate novelty |
| C3 Attribute-Only | 0.337 | 0.066 | Moderate novelty |
| C5 Random | 0.365 | 0.069 | High novelty |
| **C4 Full Pipeline** | **0.395** | 0.083 | **Highest novelty** |
**Finding:** C4 produces ideas furthest from the "average" idea space
---
# Advanced Analysis: Cross-Condition Cohesion
## % Nearest Neighbors from Same Condition
> **Method:** For each idea, find its K nearest neighbors in embedding space. Cohesion = percentage of neighbors from the same condition. High cohesion means ideas from that condition cluster together; low cohesion means they're scattered among other conditions.
| Condition | Cohesion | Interpretation |
|-----------|:--------:|----------------|
| **C4 Full Pipeline** | **88.6%** | Highly distinct idea cluster |
| C2 Expert-Only | 72.7% | Moderate clustering |
| C5 Random | 71.4% | Moderate clustering |
| C1 Direct | 70.8% | Moderate clustering |
| C3 Attribute-Only | 51.2% | Ideas scattered, overlap with others |
**Finding:** C4 ideas form a distinct cluster in semantic space
---
# Advanced Analysis: AUT Flexibility
## Semantic Category Diversity (Hadas & Hershkovitz 2024)
> **Method:** Uses the Alternative Uses Task (AUT) flexibility framework. Embedding-based: Hierarchical clustering with average linkage, cut at distance threshold 0.5. Higher cluster count = more semantic categories covered = higher flexibility.
| Condition | Embedding Clusters | Mean Pairwise Similarity |
|-----------|:------------------:|:------------------------:|
| **C5 Random** | **15** | 0.521 (most diverse) |
| **C2 Expert-Only** | **13** | 0.517 |
| C3 Attribute-Only | 12 | - |
| C4 Full Pipeline | 10 | 0.583 |
| C1 Direct | **1** | 0.647 (most similar) |
**Finding:** Expert perspectives (C2, C5) produce more diverse categories than direct generation (C1)
---
# Advanced Analysis: Combined Jump Signal
## Enhanced Method from arXiv:2405.00899
> **Method:** Combined jump signal uses logical AND of two conditions:
> - **jumpcat:** Category changes between consecutive ideas (from embedding clustering)
> - **jumpSS:** Semantic similarity < 0.7 (ideas are semantically dissimilar)
>
> **True jump = jumpcat ∧ jumpSS** — reduces false positives where similar ideas happen to be in different clusters.
| Condition | Cat-Only | Sem-Only | **Combined** | Profile |
|-----------|:--------:|:--------:|:------------:|---------|
| C2 Expert-Only | 54 | 125 | **48** | Persistent |
| C3 Attribute-Only | 34 | 107 | **33** | Persistent |
| C5 Random | 22 | 116 | **20** | Persistent |
| C4 Full Pipeline | 13 | 348 | **13** | Persistent |
| C1 Direct | 0 | 104 | **0** | Persistent |
**Finding:** Combined jumps ≤ category jumps (as expected). All conditions show "Persistent" exploration pattern.
---
# Advanced Analysis: Flexibility Profiles
## Classification Based on Combined Jump Ratio
> **Method:** Classify creativity style based on normalized jump ratio (jumps / transitions):
> - **Persistent:** ratio < 0.30 (deep exploration within categories)
> - **Flexible:** ratio > 0.45 (broad exploration across categories)
> - **Mixed:** 0.30 ≤ ratio ≤ 0.45
| Condition | Combined Jump Ratio | Profile | Interpretation |
|-----------|:-------------------:|:-------:|----------------|
| C3 Attribute-Only | **26.6%** | Persistent | Moderate category switching |
| C2 Expert-Only | **24.4%** | Persistent | Moderate category switching |
| C5 Random | 10.1% | Persistent | Low category switching |
| **C4 Full Pipeline** | **3.2%** | Persistent | Very deep within-category exploration |
| C1 Direct | 0.0% | Persistent | Single semantic cluster |
**Key Insight:** C4's low jump ratio indicates focused, persistent exploration within novel semantic territory
---
# Key Finding: Originality-Flexibility Correlation
## Does Our Pipeline Break the Typical LLM Pattern?
> **Paper Finding (arXiv:2405.00899):**
> - **Humans:** No correlation between flexibility and originality (r ≈ 0)
> - **LLMs:** Positive correlation — flexible LLMs score higher on originality
**Our Results:**
| Metric | Value | Interpretation |
|--------|:-----:|----------------|
| **Pearson r** | **0.071** | Near zero correlation |
| Interpretation | **Human-like pattern** | Breaks typical LLM pattern |
**Per-Condition Breakdown:**
| Condition | Novelty | Flexibility (combined jumps) |
|-----------|:-------:|:----------------------------:|
| C4 Full Pipeline | **0.395** (highest) | **13** (lowest) |
| C5 Random | 0.365 | 20 |
| C3 Attribute-Only | 0.337 | 33 |
| C2 Expert-Only | 0.315 | 48 (highest) |
| C1 Direct | 0.273 (lowest) | 0 |
**Critical Finding:** The attribute+expert pipeline (C4) achieves **highest novelty with lowest flexibility**, demonstrating that structured context-free generation produces **focused novelty** rather than scattered exploration.
---
# Cumulative Jump Profile Visualization
## Exploration Patterns Over Generation Sequence
> **Method:** Track cumulative jump count at each response position. Steep slopes indicate rapid category switching; flat regions indicate persistent exploration within categories.
![height:400px](../results/cumulative_jump_profiles.png)
**Visual Pattern:**
- C2/C3 show steady accumulation of jumps → regular category switching
- C4/C5 show flatter profiles → persistent within-category exploration
- C1 is flat (0 jumps) → all ideas in single cluster
---
# Flexibility vs Novelty: Key Insight
## Novelty and Flexibility are Orthogonal Dimensions
| Condition | Novelty (centroid dist) | Flexibility (combined jumps) | Pattern |
|-----------|:-----------------------:|:----------------------------:|---------|
| C4 Pipeline | **0.395** (highest) | **13** (lowest) | High novel, low flex |
| C5 Random | 0.365 | 20 | High novel, low flex |
| C2 Expert | 0.315 | **48** (highest) | Moderate novel, high flex |
| C3 Attribute | 0.337 | 33 | Moderate both |
| C1 Direct | 0.273 (lowest) | 0 | Typical, single category |
**Interpretation:**
- **C1 Direct** produces similar ideas within one typical category (low novelty, no jumps)
- **C4 Full Pipeline** produces the most novel ideas with focused exploration (low jump ratio)
- **C2 Expert-Only** produces the most category switching but moderate novelty
- **r = 0.071** confirms these are orthogonal dimensions (human-like pattern)
---
# Embedding Visualization: PCA
> **Method:** Principal Component Analysis reduces high-dimensional embeddings (1024D) to 2D for visualization by finding directions of maximum variance. Points close together = semantically similar ideas. Colors represent conditions.
![height:450px](../results/embedding_pca.png)
---
# Embedding Visualization: t-SNE
> **Method:** t-SNE (t-distributed Stochastic Neighbor Embedding) preserves local neighborhood structure when reducing to 2D. Better at revealing clusters than PCA, but distances between clusters are less meaningful. Good for seeing if conditions form distinct groups.
![height:450px](../results/embedding_tsne.png)
---
# Integrated Findings
## What the Advanced Analysis Reveals
| Analysis | C4 Full Pipeline Characteristic |
|----------|--------------------------------|
| Lexical | Largest vocabulary (1,992 words) |
| Novelty | Highest distance from centroid (0.395) |
| Cohesion | Tightest cluster (88.6% same-condition NN) |
| Diversity | High pairwise distance (0.395) |
| **Flexibility** | **Lowest combined jumps (13) = focused exploration** |
**Interpretation:** C4 creates a **distinct semantic territory** -
novel ideas that are internally coherent but far from other conditions.
Low flexibility (3.2% jump ratio) indicates deep, focused exploration within a novel space.
## Understanding Novelty vs Flexibility
| Condition | Novelty | Flexibility (jumps) | Strategy |
|-----------|:-------:|:-------------------:|----------|
| C1 Direct | Low | Lowest (0) | Typical, single category |
| C2 Expert | Medium | **Highest (48)** | Experts = diverse exploration |
| C3 Attribute | Medium | Medium (33) | Structured exploration |
| C5 Random | High | Low (20) | Random but focused |
| **C4 Pipeline** | **Highest** | **Low (13)** | **Focused novelty** |
---
# Critical Limitation
## Embedding Distance ≠ True Novelty
Current metrics measure **semantic spread**, not **creative value**
| What We Measure | What We Miss |
|-----------------|--------------|
| Vector distance | Practical usefulness |
| Cluster spread | Conceptual surprise |
| Query distance | Non-obviousness |
| | Feasibility |
```
"Quantum entanglement chair" → High distance, Low novelty
"Chair legs as drumsticks" → Low distance, High novelty
```
---
# Torrance Creativity Framework
## What True Novelty Assessment Requires
| Dimension | Definition | Our Coverage |
|-----------|------------|:------------:|
| **Fluency** | Number of ideas | ✓ Measured |
| **Flexibility** | Category diversity | ✓ Measured (LLM + embedding) |
| **Originality** | Statistical rarity | Not measured |
| **Elaboration** | Detail & development | Not measured |
**Originality requires human judgment or LLM-as-Judge**
---
# Discussion: The Attribute Anchoring Effect
## Why C4 Has Highest Novelty but Lowest Flexibility
```
C2 (Expert-Only): HIGHEST FLEXIBILITY (48 combined jumps)
Architect → "load-bearing furniture"
Chef → "dining experience design"
← Each expert explores freely, frequent category switching
C4 (Full Pipeline): LOWEST FLEXIBILITY (13 combined jumps, 3.2% ratio)
All experts respond to same attribute set
Architect + "portable" → "modular portable"
Chef + "portable" → "portable serving"
← Attribute anchoring constrains category switching
← BUT forced bisociation produces HIGHEST NOVELTY
```
**Key Mechanism:** Attributes anchor experts to similar conceptual space (low flexibility),
but context-free keyword generation forces novel associations (high novelty).
**Result:** "Focused novelty" — deep exploration in a distant semantic territory
---
# Key Findings Summary
| RQ | Question | Answer |
|----|----------|--------|
| RQ1 | Attributes increase diversity? | **Yes** (p=0.027) |
| RQ2 | Experts increase diversity? | **Yes** (p<0.001) |
| RQ3 | Synergistic interaction? | **No** (sub-additive) |
| RQ4 | Experts > Random? | **No** (p=0.463) |
**Additional Findings (arXiv:2405.00899 Metrics):**
- Full Pipeline (C4) has **highest novelty** but **lowest flexibility**
- **Originality-Flexibility correlation r=0.071** (human-like, breaks typical LLM pattern)
- Novelty and Flexibility are **orthogonal dimensions**
- All conditions show **Persistent** exploration profile (combined jump ratio < 30%)
- Direct generation (C1) produces ideas in a **single semantic cluster**
---
# Limitations
1. **Sample Size:** 10 queries (pilot study)
2. **Novelty Measurement:** Embedding-based metrics only measure semantic distance, not true creative value
3. **Single Model:** Results may vary with different LLMs
4. **No Human Evaluation:** No validation of idea quality or usefulness
5. **Fixed Categories:** 4 attribute categories may limit exploration
---
# Future Work
## Immediate Next Steps
1. **Human Assessment Interface** (Built)
- Web-based rating tool with Torrance dimensions
- Stratified sampling: 200 ideas (4 per condition × 10 queries)
- 4 dimensions: Originality, Elaboration, Coherence, Usefulness
2. **Multi-Model Validation** (Priority)
- Replicate on GPT-4, Claude, Llama-3
- Verify findings generalize across LLMs
3. **LLM-as-Judge evaluation** for full-scale scoring
4. **Scale to 30 queries** for statistical power
5. **Alternative pipeline designs** to address attribute anchoring
**Documentation:**
- `experiments/docs/future_research_plan_zh.md` - Detailed research plan
- `experiments/docs/creative_process_metrics_zh.md` - arXiv:2405.00899 metrics explanation
---
# Conclusion
## Key Takeaways
1. **Both attribute decomposition and expert perspectives significantly increase semantic diversity** compared to direct generation
2. **The combination is sub-additive**, suggesting attribute structure may constrain expert creativity
3. **Random perspectives work as well as domain experts**, implying the value is in perspective shift, not expert knowledge
4. **Novelty and Flexibility are orthogonal creativity dimensions** - high novelty ≠ high flexibility
- C4 Full Pipeline: Highest novelty, lowest flexibility
- C5 Random: Higher flexibility, moderate novelty
5. **🔑 Key Finding:** The pipeline produces **human-like originality-flexibility patterns** (r=0.071)
- Typical LLMs show positive correlation (flexible → more original)
- Our method breaks this pattern: high novelty with focused exploration
6. **True novelty assessment requires judgment-based evaluation** beyond embedding metrics
---
# Appendix: Statistical Details
## T-test Results (vs C1 Baseline)
| Comparison | t | p | Cohen's d |
|------------|:-:|:-:|:---------:|
| C4 vs C1 | 8.55 | <0.001 | 4.05 |
| C2 vs C1 | 7.67 | <0.001 | 3.43 |
| C3 vs C1 | 4.23 | <0.001 | 1.89 |
All experimental conditions significantly outperform baseline
---
# Appendix: Experiment Configuration
```python
EXPERIMENT_CONFIG = {
"model": "qwen3:8b",
"temperature": 0.9,
"expert_count": 4,
"expert_source": "curated", # 210 occupations
"keywords_per_expert": 1,
"categories": ["Functions", "Usages",
"User Groups", "Characteristics"],
"dedup_threshold": 0.90,
"random_seed": 42
}
```
---
# Thank You
## Questions?
**Repository:** novelty-seeking
**Experiment Date:** January 19, 2026
**Contact:** [Your Email]
---
# Backup Slides
---
# Backup: Deduplication Threshold Analysis
Original threshold (0.85) was too aggressive:
- 40.5% of removed pairs were borderline (0.85-0.87)
- Many genuinely different concepts were grouped
Raised to 0.90:
- RQ1 (Attributes) became significant (p: 0.052 → 0.027)
- Preserved ~103 additional unique ideas
---
# Backup: Sample Ideas by Condition
## Query: "Chair"
**C1 Direct:**
- Ergonomic office chair with lumbar support
- Foldable camping chair
**C2 Expert-Only (Architect):**
- Load-bearing furniture integrated into building structure
**C4 Full Pipeline:**
- Asset-tracking chairs with RFID for corporate inventory
- (Accountant + "portable" → "mobile assets")
---
# Backup: Efficiency Calculation
$$\text{Efficiency} = \frac{\text{Mean Pairwise Distance}}{\text{Idea Count}} \times 100$$
| Condition | Calculation | Result |
|-----------|-------------|:------:|
| C3 Attribute | 0.376 / 12.8 × 100 | 3.01 |
| C4 Pipeline | 0.393 / 51.9 × 100 | 0.78 |
C3 achieves 96% of C4's diversity with 25% of the ideas

View File

@@ -0,0 +1,342 @@
# 研究發表計畫與未來工作
**建立日期:** 2026-01-19
**專案:** Breaking Semantic Gravity in LLM-Based Creative Ideation
---
## 一、發表可行性評估
### 現有研究的覆蓋範圍
| 主題 | 代表論文 | 我們的差異 |
|------|----------|------------|
| LLM 創意評估 | Organisciak et al. (2023) | 他們評估 LLM 創意,我們是**增強**創意 |
| AUT 彈性評分 | Hadas & Hershkovitz (2024) | 他們是評估方法,我們是**生成方法** |
| Prompt 工程 | Zhou et al. (2023) | 他們優化 prompt我們是**結構化管線** |
| LLM-as-Judge | Zheng et al. (2023) | 這是評估工具,非核心貢獻 |
### 本研究的獨特貢獻
| 獨特性 | 說明 | 學術價值 |
|--------|------|----------|
| Context-Free Keyword Generation | 專家從未看到原始查詢,強迫雙重聯想 | 方法創新 |
| 次加性交互作用 | 屬性 × 專家 = Sub-additive | 實證發現 |
| 隨機視角 ≈ 領域專家 | 視角轉換本身比專業知識更重要 | 理論貢獻 |
| 新穎性-彈性正交性 | 在 LLM 創意生成中首次驗證 | 理論驗證 |
---
## 二、目前研究狀態
### 已完成 ✓
| 要素 | 狀態 | 詳情 |
|------|:----:|------|
| 理論框架 | ✓ | Bisociation Theory + Torrance Creativity Framework |
| 實驗設計 | ✓ | 2×2 factorial + control (5 conditions) |
| 管線實作 | ✓ | 屬性分解 → 專家轉換 → 去重 |
| 自動評估指標 | ✓ | 新穎性、彈性、多樣性、凝聚度、跳躍信號 |
| 人類評估介面 | ✓ | Web-based Torrance 評分工具 |
| 統計分析 | ✓ | ANOVA、效果量、相關性分析 |
| 初步實驗 | ✓ | 10 queries, Qwen3:8b, 1119 ideas |
### 需要補充 ✗
| 缺口 | 重要性 | 說明 |
|------|:------:|------|
| 多模型驗證 | **高** | 目前只有 Qwen3:8b |
| 人類評估數據 | **高** | 介面已建置但未收集數據 |
| 樣本量擴充 | **中** | 10 → 30-50 queries |
| Baseline 比較 | **中** | 與其他創意增強方法比較 |
| LLM-as-Judge | 中 | 與人類評估的相關性驗證 |
---
## 三、發表策略選項
### 選項 A完整論文頂會/期刊)
**目標會議/期刊:**
- ACL / EMNLPNLP 頂會)
- CHI人機互動頂會
- Creativity Research Journal創意研究期刊
- Thinking Skills and Creativity創意思維期刊
**論文標題建議:**
> "Breaking Semantic Gravity: Context-Free Expert Perspectives for LLM Creative Ideation"
**需要補充的工作:**
| 工作項目 | 預估時間 | 優先級 |
|----------|:--------:|:------:|
| GPT-4 實驗 | 1 週 | P0 |
| Claude 實驗 | 1 週 | P0 |
| Llama-3 實驗 | 1 週 | P1 |
| 人類評估收集 | 2-3 週 | P0 |
| 樣本量擴充 (30 queries) | 1 週 | P1 |
| Baseline 比較實驗 | 1-2 週 | P1 |
| 論文撰寫 | 2-3 週 | - |
**總預估時間:** 2-3 個月
---
### 選項 B短論文 / Workshop Paper
**目標:**
- ACL/EMNLP Workshop on Creativity and AI
- NeurIPS Workshop on Creativity and Design
- ICCC (International Conference on Computational Creativity)
**需要補充的工作:**
| 工作項目 | 預估時間 | 優先級 |
|----------|:--------:|:------:|
| GPT-4 實驗 | 1 週 | P0 |
| 小規模人類評估 (50-100 ideas) | 1 週 | P0 |
| 論文撰寫 | 1 週 | - |
**總預估時間:** 2-4 週
---
## 四、實驗補充計畫
### Phase 1多模型驗證優先級 P0
```
目標:驗證方法的泛化性
模型清單:
□ GPT-4 / GPT-4o (OpenAI)
□ Claude 3.5 Sonnet (Anthropic)
□ Llama-3-70B (Meta)
□ Gemini Pro (Google) [optional]
實驗設計:
- 相同的 10 queries
- 相同的 5 conditions
- 相同的評估指標
預期結果:
- 跨模型一致性分析
- 模型特定效應識別
```
### Phase 2人類評估優先級 P0
```
目標:驗證自動指標與人類判斷的相關性
評估維度Torrance Framework
1. 原創性 (Originality) - 1-5 Likert
2. 精緻性 (Elaboration) - 1-5 Likert
3. 可行性 (Feasibility) - 1-5 Likert
4. 荒謬性 (Nonsense) - Binary
樣本策略:
- 分層抽樣:每 condition × 每 query = 4 ideas
- 總計5 × 10 × 4 = 200 ideas
- 評審者3-5 人(計算 ICC
介面:
- 已建置experiments/assessment/
- 需要:招募評審者、收集數據
```
### Phase 3樣本量擴充優先級 P1
```
目標:提高統計效力
擴充計畫:
- 現有10 queries
- 目標30-50 queries
Query 來源:
- 物品類:傢俱、工具、電器、交通工具
- 概念類:服務、系統、流程
- 混合類:結合物理和數位元素
統計效力分析:
- 當前效果量 d ≈ 2-3大效應
- 30 queries 應足夠達到 power > 0.95
```
### Phase 4Baseline 比較(優先級 P1
```
目標:與現有方法比較
Baseline 方法:
1. Vanilla Prompting
"Generate creative uses for [object]"
2. Chain-of-Thought (CoT)
"Think step by step about creative uses..."
3. Few-shot Examples
提供 3-5 個創意範例
4. Role-Playing (Standard)
"As a [expert], suggest uses for [object]"
(專家看到完整查詢)
比較指標:
- 新穎性、彈性、多樣性
- 想法數量、生成時間
- 人類評估分數
```
---
## 五、論文大綱草稿
### Title
"Breaking Semantic Gravity: Context-Free Expert Perspectives for Enhanced LLM Creative Ideation"
### Abstract
- Problem: LLMs generate ideas clustered around training distributions
- Method: Attribute decomposition + context-free expert transformation
- Results: Sub-additive interaction, random ≈ expert, novelty ⊥ flexibility
- Contribution: Novel pipeline + empirical findings
### 1. Introduction
- Semantic gravity problem in LLM creativity
- Bisociation theory and creative thinking
- Research questions (RQ1-4)
### 2. Related Work
- LLM creativity evaluation
- Prompt engineering for creativity
- Computational creativity methods
### 3. Method
- Pipeline architecture
- Context-free keyword generation
- Experimental design (2×2 + control)
### 4. Evaluation Framework
- Automatic metrics (novelty, flexibility, diversity)
- Human evaluation (Torrance dimensions)
- LLM-as-Judge validation
### 5. Results
- RQ1: Attribute effect
- RQ2: Expert effect
- RQ3: Interaction effect
- RQ4: Expert vs Random
- Cross-model validation
### 6. Discussion
- Attribute anchoring effect
- Value of perspective shift
- Novelty vs flexibility orthogonality
### 7. Conclusion
- Contributions
- Limitations
- Future work
---
## 六、時間線規劃
### 快速發表路線Workshop Paper
```
Week 1-2: 多模型實驗 (GPT-4, Claude)
Week 2-3: 小規模人類評估
Week 3-4: 論文撰寫與投稿
目標2026 Q1 Workshop Deadline
```
### 完整發表路線Full Paper
```
Month 1:
- Week 1-2: 多模型實驗
- Week 3-4: 樣本量擴充
Month 2:
- Week 1-2: 人類評估收集
- Week 3-4: Baseline 比較實驗
Month 3:
- Week 1-2: 數據分析與統計
- Week 3-4: 論文撰寫
目標ACL 2026 / EMNLP 2026
```
---
## 七、風險與緩解
| 風險 | 可能性 | 影響 | 緩解策略 |
|------|:------:|:----:|----------|
| 跨模型結果不一致 | 中 | 高 | 報告為「模型特定發現」 |
| 人類評估 ICC 低 | 中 | 中 | 增加評審者、修訂評分指南 |
| 效應在大樣本消失 | 低 | 高 | 現有效果量大,風險較低 |
| 競爭論文搶先 | 低 | 高 | 優先投 Workshop 建立優先權 |
---
## 八、資源需求
### 計算資源
| 資源 | 用途 | 預估成本 |
|------|------|:--------:|
| OpenAI API | GPT-4 實驗 | ~$50-100 |
| Anthropic API | Claude 實驗 | ~$50-100 |
| Local GPU | Llama 實驗 | 已有 |
| Ollama | Embedding | 已有 |
### 人力資源
| 角色 | 需求 | 說明 |
|------|------|------|
| 人類評審者 | 3-5 人 | 可招募同學或眾包 |
| 統計顧問 | 可選 | 複雜統計分析諮詢 |
---
## 九、成功指標
### 短期1個月內
- [ ] 完成 GPT-4 實驗
- [ ] 完成 Claude 實驗
- [ ] 收集至少 100 個人類評估樣本
### 中期3個月內
- [ ] 完成所有模型實驗
- [ ] 完成人類評估200+ samples, ICC > 0.7
- [ ] 完成 baseline 比較
- [ ] 投稿第一篇論文
### 長期6個月內
- [ ] 論文被接受
- [ ] 開源程式碼和數據集
- [ ] 擴展到其他創意任務
---
## 十、參考文獻
1. Hadas, S., & Hershkovitz, A. (2024). Using Large Language Models to Evaluate Alternative Uses Task Flexibility Score. *Thinking Skills and Creativity*, 52, 101549.
2. Organisciak, P., et al. (2023). Beyond Semantic Distance: Automated Scoring of Divergent Thinking Greatly Improves with Large Language Models. *Thinking Skills and Creativity*, 49, 101356.
3. Koestler, A. (1964). *The Act of Creation*. Hutchinson.
4. Torrance, E.P. (1974). *Torrance Tests of Creative Thinking*. Scholastic Testing Service.
5. Stevenson, C., et al. (2024). Characterizing Creative Processes in Humans and Large Language Models. *arXiv:2405.00899*.
6. Zheng, L., et al. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. *NeurIPS 2023*.

View File

@@ -0,0 +1,178 @@
# 簡報備忘稿
---
## 開場1-2 分鐘)
**問題:** LLM 生成創意時有「語義引力」問題
- 問「椅子創新用途」→ 都是「人體工學椅」「折疊椅」
- 想法集中在訓練數據的高頻區域
**我們的解法:** Bisociation雙重聯想
- 拆解屬性 + 專家視角 + 無上下文關鍵字
- 強迫產生意外連結
---
## 實驗設計1 分鐘)
**五個條件2×2 + 控制組:**
| 條件 | 記法 | 重點 |
|------|------|------|
| C1 | 直接生成 | Baseline |
| C2 | 只有專家 | 專家自由發揮 |
| C3 | 只有屬性 | 結構但無專家 |
| C4 | 完整管線 | 屬性 + 專家 |
| C5 | 隨機詞彙 | 控制組:隨機 vs 專家 |
**關鍵設計:** 專家生成關鍵字時**看不到原始查詢**
- 會計師 + 「便攜」→ 「流動資產」(不知道是椅子)
- 再把「流動資產」+ 「椅子」結合
---
## 四個研究問題的答案
| RQ | 問題 | 答案 | 一句話 |
|----|------|:----:|--------|
| RQ1 | 屬性有效? | ✓ Yes | p=0.027 |
| RQ2 | 專家有效? | ✓ Yes | p<0.001 |
| RQ3 | 有加乘效果? | ✗ No | Sub-additive |
| RQ4 | 專家 > 隨機? | ✗ No | p=0.463 |
**意外發現:** 隨機詞彙跟專家一樣好 → 價值在「視角轉換」本身
---
## 核心數據(記住這幾個數字)
### 新穎性(距離質心,越高越新穎)
```
C4: 0.395 ← 最高!
C5: 0.365
C3: 0.337
C2: 0.315
C1: 0.273 ← 最低(最典型)
```
### 彈性(組合跳躍數,越高越分散)
```
C2: 48 ← 最高!(專家自由探索)
C3: 33
C5: 20
C4: 13 ← 最低!(專注探索)
C1: 0 ← 單一群集
```
---
## 🔑 關鍵發現(重點中的重點)
### 發現 1原創性-彈性相關性
**論文說:**
- 人類r ≈ 0無相關
- 典型 LLMr > 0正相關
**我們的結果r = 0.071(接近零)**
**產生「類似人類」的創意模式!**
### 發現 2C4 的獨特位置
```
C4 = 最高新穎性 + 最低彈性
這代表:「專注的新穎性」
- 不是到處亂跳(高彈性)
- 而是深入一個新穎領域(低彈性但高新穎)
- 像人類專家的創意模式
```
### 發現 3為什麼會這樣
```
屬性錨定效應:
所有專家都回應同樣的屬性集
→ 想法被錨定在相似概念空間(低彈性)
→ 但無上下文關鍵字強迫新穎聯結(高新穎)
結果focused novelty聚焦的新穎性
```
---
## 方法論亮點
### 組合跳躍信號Combined Jump
- 舊方法:只看類別切換
- 新方法:類別切換 **且** 語義不相似
- 減少假陽性,更準確
### 彈性檔案分類
| 檔案 | 跳躍比率 | 我們的結果 |
|------|:--------:|:----------:|
| Persistent | <30% | 全部條件 |
| Mixed | 30-45% | 無 |
| Flexible | >45% | 無 |
→ LLM 傾向「持續探索」而非「靈活跳躍」
---
## 限制(誠實說)
1. **樣本小:** 10 個查詢pilot study
2. **沒有人工評估:** 只有 embedding 指標
3. **單一模型:** 只測 Qwen3:8b
4. **語義距離 ≠ 真正新穎:** 「量子糾纏椅」距離遠但不新穎
---
## 下一步(如果被問到)
1. **人工評估介面**(已建好)
2. **多模型驗證**GPT-4, Claude
3. **LLM-as-Judge** 大規模評分
4. **30 個查詢** 增加統計效力
---
## 一句話總結
> **我們的屬性+專家管線讓 LLM 產生「類似人類專家」的創意模式:
> 高新穎性但專注探索,打破典型 LLM 的「彈性=新穎」正相關。**
---
## 快問快答
**Q: 為什麼隨機詞跟專家一樣好?**
A: 價值在「視角轉換」本身,不在專業知識
**Q: 為什麼 C4 彈性最低但新穎性最高?**
A: 屬性把專家錨定在同一概念空間,但無上下文關鍵字強迫新穎連結
**Q: r=0.071 代表什麼?**
A: 新穎性和彈性無相關,跟人類一樣,打破典型 LLM 的正相關模式
**Q: Persistent profile 是好是壞?**
A: 不是好壞是探索策略。C4 證明可以 persistent 但仍然 novel
**Q: 這對實務有什麼用?**
A: 想要高新穎性 → 用 C4想要多元類別 → 用 C2
---
## 數字速查表
| 指標 | C1 | C2 | C3 | C4 | C5 |
|------|:--:|:--:|:--:|:--:|:--:|
| 想法數 | 195 | 198 | 125 | **402** | 199 |
| 新穎性 | 0.273 | 0.315 | 0.337 | **0.395** | 0.365 |
| 彈性(jumps) | 0 | **48** | 33 | 13 | 20 |
| 跳躍比率 | 0% | 24% | 27% | **3%** | 10% |
| 凝聚度 | 71% | 73% | 51% | **89%** | 71% |
**記憶口訣:** C4 最新穎、最凝聚、最低彈性 = 「聚焦的新穎」

View File

@@ -0,0 +1,290 @@
"""
Main experiment runner for the 5-condition idea generation study.
Usage:
# Run single query through all conditions
python -m experiments.generate_ideas --pilot --query "Chair"
# Run all pilot queries
python -m experiments.generate_ideas --pilot
# Run specific conditions
python -m experiments.generate_ideas --query "Bicycle" --conditions c1_direct c4_full_pipeline
"""
import sys
import json
import argparse
import asyncio
import logging
from datetime import datetime
from pathlib import Path
from typing import List, Dict, Any, Optional
# Add backend to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent / "backend"))
from experiments.config import (
CONDITIONS, CONDITION_NAMES, DATA_DIR, RESULTS_DIR, EXPERIMENT_CONFIG
)
from experiments.conditions import (
c1_generate, c2_generate, c3_generate, c4_generate, c5_generate
)
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# Condition function mapping
CONDITION_FUNCTIONS = {
"c1_direct": c1_generate,
"c2_expert_only": c2_generate,
"c3_attribute_only": c3_generate,
"c4_full_pipeline": c4_generate,
"c5_random_perspective": c5_generate,
}
def load_queries() -> List[Dict[str, Any]]:
"""Load pilot queries from data file."""
queries_file = DATA_DIR / "queries.json"
with open(queries_file, "r", encoding="utf-8") as f:
data = json.load(f)
return data.get("queries", [])
def save_results(results: List[Dict[str, Any]], filename: str) -> Path:
"""Save results to JSON file."""
RESULTS_DIR.mkdir(parents=True, exist_ok=True)
output_path = RESULTS_DIR / filename
with open(output_path, "w", encoding="utf-8") as f:
json.dump(results, f, indent=2, ensure_ascii=False)
return output_path
async def run_condition(
query: str,
condition: str
) -> Dict[str, Any]:
"""Run a single condition for a query."""
if condition not in CONDITION_FUNCTIONS:
raise ValueError(f"Unknown condition: {condition}")
generate_fn = CONDITION_FUNCTIONS[condition]
result = await generate_fn(query)
return result
async def run_experiment(
queries: Optional[List[str]] = None,
conditions: Optional[List[str]] = None,
save_intermediate: bool = True
) -> Dict[str, Any]:
"""
Run the full experiment.
Args:
queries: List of queries to run (None = all pilot queries)
conditions: List of conditions to run (None = all conditions)
save_intermediate: Whether to save results after each query
Returns:
Complete experiment results
"""
# Load queries if not provided
if queries is None:
query_data = load_queries()
queries_to_run = [(q["id"], q["query"], q["category"]) for q in query_data]
else:
queries_to_run = [(f"Q{i}", q, "custom") for i, q in enumerate(queries)]
# Default to all conditions
conditions = conditions or CONDITIONS
logger.info(f"Starting experiment with {len(queries_to_run)} queries and {len(conditions)} conditions")
logger.info(f"Conditions: {', '.join(conditions)}")
experiment_results = {
"experiment_id": datetime.now().strftime("%Y%m%d_%H%M%S"),
"config": EXPERIMENT_CONFIG,
"conditions": conditions,
"query_count": len(queries_to_run),
"results": [],
"summary": {}
}
for query_id, query, category in queries_to_run:
logger.info(f"\n{'='*60}")
logger.info(f"Processing query: {query} (ID: {query_id}, Category: {category})")
logger.info(f"{'='*60}")
query_results = {
"query_id": query_id,
"query": query,
"category": category,
"conditions": {}
}
for condition in conditions:
logger.info(f"\n Running {CONDITION_NAMES.get(condition, condition)}...")
try:
result = await run_condition(query, condition)
query_results["conditions"][condition] = {
"success": True,
"idea_count": result["idea_count"],
"ideas": result["ideas"],
"ideas_with_source": result.get("ideas_with_source", []),
"metadata": result["metadata"]
}
logger.info(f" Generated {result['idea_count']} ideas")
except Exception as e:
logger.error(f" Error in {condition}: {e}")
query_results["conditions"][condition] = {
"success": False,
"error": str(e),
"idea_count": 0,
"ideas": []
}
experiment_results["results"].append(query_results)
# Save intermediate results
if save_intermediate:
save_results(
experiment_results,
f"experiment_{experiment_results['experiment_id']}_intermediate.json"
)
# Calculate summary statistics
experiment_results["summary"] = calculate_summary(experiment_results)
# Save final results
output_path = save_results(
experiment_results,
f"experiment_{experiment_results['experiment_id']}_complete.json"
)
logger.info(f"\n{'='*60}")
logger.info("Experiment complete!")
logger.info(f"Results saved to: {output_path}")
logger.info(f"{'='*60}")
return experiment_results
def calculate_summary(results: Dict[str, Any]) -> Dict[str, Any]:
"""Calculate summary statistics for the experiment."""
summary = {
"total_queries": len(results["results"]),
"conditions": {}
}
for condition in results["conditions"]:
condition_stats = {
"total_ideas": 0,
"successful_queries": 0,
"failed_queries": 0,
"avg_ideas_per_query": 0
}
for query_result in results["results"]:
cond_result = query_result["conditions"].get(condition, {})
if cond_result.get("success", False):
condition_stats["successful_queries"] += 1
condition_stats["total_ideas"] += cond_result.get("idea_count", 0)
else:
condition_stats["failed_queries"] += 1
if condition_stats["successful_queries"] > 0:
condition_stats["avg_ideas_per_query"] = (
condition_stats["total_ideas"] / condition_stats["successful_queries"]
)
summary["conditions"][condition] = condition_stats
return summary
def print_summary(results: Dict[str, Any]):
"""Print a formatted summary of the experiment."""
print("\n" + "=" * 70)
print("EXPERIMENT SUMMARY")
print("=" * 70)
summary = results.get("summary", {})
print(f"\nTotal queries processed: {summary.get('total_queries', 0)}")
print("\nResults by condition:")
print("-" * 70)
print(f"{'Condition':<30} {'Success':<10} {'Total Ideas':<15} {'Avg/Query':<10}")
print("-" * 70)
for condition, stats in summary.get("conditions", {}).items():
name = CONDITION_NAMES.get(condition, condition)
success = stats.get("successful_queries", 0)
total = stats.get("total_ideas", 0)
avg = stats.get("avg_ideas_per_query", 0)
print(f"{name:<30} {success:<10} {total:<15} {avg:<10.1f}")
print("-" * 70)
async def main():
parser = argparse.ArgumentParser(
description="Run the 5-condition idea generation experiment"
)
parser.add_argument(
"--pilot",
action="store_true",
help="Run pilot experiment with all 10 queries"
)
parser.add_argument(
"--query",
type=str,
help="Run single query (e.g., 'Chair')"
)
parser.add_argument(
"--conditions",
nargs="+",
choices=CONDITIONS,
help="Specific conditions to run"
)
parser.add_argument(
"--no-save-intermediate",
action="store_true",
help="Don't save intermediate results"
)
args = parser.parse_args()
# Determine queries to run
if args.query:
queries = [args.query]
elif args.pilot:
queries = None # Will load all pilot queries
else:
parser.print_help()
print("\nError: Must specify --pilot or --query")
sys.exit(1)
# Run experiment
results = await run_experiment(
queries=queries,
conditions=args.conditions,
save_intermediate=not args.no_save_intermediate
)
# Print summary
print_summary(results)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,253 @@
# Novelty-Driven LLM Agent Loop
An autonomous LLM agent that generates tasks in a while loop, using **novelty assessment as the termination condition** to help the agent "jump out" of its trained data distribution (semantic gravity).
## Concept
Traditional LLM-based idea generation tends to produce outputs clustered around high-probability regions of the training distribution. This "semantic gravity" limits creative exploration.
This module implements a novel approach: use **novelty scores** to dynamically control when the agent should stop. Instead of fixed iteration counts, the agent continues until it finds something truly novel (a "breakthrough").
```
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
```
## Research Foundation
This work builds on established research:
- **Novelty Search** (Lehman & Stanley): Reward novelty, not objectives
- **Curiosity-driven Exploration** (Pathak et al.): Intrinsic motivation via prediction error
- **Quality-Diversity** (MAP-Elites): Maintain diverse high-quality solutions
- **Open-ended Learning**: Endless innovation through novelty pressure
The unique contribution is using **novelty as a termination condition** rather than just a reward signal.
## Architecture
```
┌──────────────────────────────────────────────────────────────────┐
│ Novelty-Driven Task Generation Loop │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │ Seed │ "Design a better bicycle" │
│ │ Problem │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ WHILE novelty < threshold AND iterations < max: │ │
│ │ │ │
│ │ 1. Sample random expert (curated occupations) │ │
│ │ e.g., "marine biologist", "choreographer" │ │
│ │ │ │
│ │ 2. Generate task from expert perspective │ │
│ │ "What task would a {expert} assign to improve │ │
│ │ {seed_problem}?" │ │
│ │ │ │
│ │ 3. Embed task, compute novelty vs. centroid │ │
│ │ │ │
│ │ 4. If novelty > threshold → STOP (breakthrough!) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Output: │ Novel task that "jumped out" of typical space │
│ │ Task │ + trajectory of exploration │
│ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
```
## Installation
The module uses the existing project infrastructure. Ensure you have:
1. **Ollama** running with the required models:
```bash
ollama pull qwen3:8b
ollama pull qwen3-embedding:4b
```
2. **Python dependencies** (from project root):
```bash
cd backend
source venv/bin/activate
pip install httpx numpy
```
## Quick Start
### Basic Usage
```bash
cd experiments/novelty_loop
python demo.py "Improve urban transportation"
```
### Example Output
```
Iteration 1
Expert: Architect (Architecture & Design)
Task: Design multi-modal transit hubs that integrate pedestrian, cycling, and public transport seamlessly
Novelty: [████████░░░░░░░░░░░░] 0.1234
Iteration 2
Expert: Chef (Culinary)
Task: Create food delivery route optimization algorithms inspired by kitchen workflow efficiency
Novelty: [███████████░░░░░░░░░] 0.1823
Iteration 3
Expert: Marine Biologist (Science)
Task: Study fish schooling behavior to develop organic traffic flow algorithms
Novelty: [██████████████░░░░░░] 0.3521
Iteration 4
Expert: Choreographer (Performing Arts)
Task: Design pedestrian movement as urban dance, creating rhythmic crossing patterns
Novelty: [████████████████████] 0.5234
★ BREAKTHROUGH! ★
```
## Termination Strategies
### 1. Seek Breakthrough (Default)
Stop when novelty exceeds threshold. Finds the first truly novel task.
```bash
python demo.py "Your problem" --strategy breakthrough --threshold 0.4
```
### 2. Exhaust Frontier
Continue while novelty is high, stop when average novelty drops. Explores more thoroughly.
```bash
python demo.py "Your problem" --strategy exhaust --exhaust-threshold 0.15
```
### 3. Coverage Target
Continue until N distinct conceptual clusters are covered. Ensures diversity.
```bash
python demo.py "Your problem" --strategy coverage --clusters 5
```
## API Usage
```python
import asyncio
from experiments.novelty_loop.agent import NoveltyDrivenTaskAgent
async def main():
agent = NoveltyDrivenTaskAgent(
novelty_threshold=0.4,
max_iterations=20,
language="en"
)
result = await agent.run("Design a better bicycle")
print(f"Found breakthrough: {result.breakthrough_task.task}")
print(f"Novelty score: {result.breakthrough_task.novelty_score}")
print(f"From expert: {result.breakthrough_task.expert}")
await agent.close()
asyncio.run(main())
```
## Novelty Metrics
The `novelty_metrics.py` module provides:
- **Centroid Distance**: Primary novelty metric - how far from the average of all previous outputs
- **Min Distance**: Distance to nearest neighbor (detect duplicates)
- **Jump Detection**: Identifies significant semantic shifts between consecutive outputs
- **Trajectory Tracking**: Cumulative novelty, jump ratio, etc.
```python
from experiments.novelty_loop.novelty_metrics import NoveltyMetrics
metrics = NoveltyMetrics(similarity_threshold=0.7)
# Add embeddings one by one
for embedding in embeddings:
novelty = metrics.compute_novelty(embedding)
metrics.add_embedding(embedding, novelty)
print(f"Novelty: {novelty.score:.4f}, Is Jump: {novelty.is_jump}")
# Get trajectory stats
print(f"Mean novelty: {metrics.trajectory.mean_novelty}")
print(f"Max novelty: {metrics.trajectory.max_novelty}")
print(f"Jump ratio: {metrics.trajectory.jump_ratio}")
```
## CLI Options
```
positional arguments:
seed_problem The seed problem or challenge to explore
options:
--strategy {breakthrough,exhaust,coverage}
Termination strategy (default: breakthrough)
--threshold, -t Novelty threshold for breakthrough (default: 0.4)
--max-iter, -m Maximum iterations (default: 20)
--language, -l {en,zh}
Language for prompts and experts (default: en)
--model LLM model for task generation (default: qwen3:8b)
--embedding-model Embedding model (default: qwen3-embedding:4b)
--temperature LLM temperature (default: 0.7)
--output, -o Save results to JSON file
--quiet, -q Suppress iteration output
--verbose, -v Enable verbose logging
```
## File Structure
```
experiments/novelty_loop/
├── README.md # This file
├── agent.py # Core NoveltyDrivenTaskAgent and variants
├── novelty_metrics.py # Novelty computation utilities
└── demo.py # Interactive CLI demo
```
## Design Decisions
| Question | Decision | Rationale |
|----------|----------|-----------|
| Output Type | **Tasks** | Self-generated sub-goals for autonomous problem decomposition |
| Termination | **Seek Breakthrough** | Stop when novelty exceeds threshold - find truly novel task |
| Perturbation | **Expert Perspectives** | Experts have task-oriented knowledge; more natural than abstract domains |
| Novelty Reference | **Centroid** | Dynamic, adapts as exploration progresses |
## Connection to Main Project
This module integrates with the main novelty-seeking project:
- Uses the same **curated occupation data** (`backend/app/data/curated_occupations_*.json`)
- Uses the same **embedding model** (qwen3-embedding:4b)
- Builds on the **AUT flexibility analysis** metrics for novelty computation
- Can use **DDC domain data** for alternative perturbation strategies
## Future Work
1. **Hybrid Perturbation**: Combine expert + domain perspectives
2. **Contrastive Prompting**: Explicitly ask for outputs unlike recent ones
3. **Semantic Steering**: Guide generation away from centroid direction
4. **Multi-Agent Exploration**: Parallel agents with different strategies
5. **Quality-Diversity Archive**: Maintain diverse high-quality solutions
## References
- Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone.
- Pathak, D., et al. (2017). Curiosity-driven exploration by self-supervised prediction.
- Mouret, J. B., & Clune, J. (2015). Illuminating search spaces by mapping elites.
- arXiv:2405.00899 - Characterising Creative Process in Humans and LLMs

View File

@@ -0,0 +1,42 @@
"""
Novelty-Driven LLM Agent Loop
An autonomous agent that generates tasks using novelty as the termination condition.
"""
from .agent import (
NoveltyDrivenTaskAgent,
ExhaustFrontierAgent,
CoverageTargetAgent,
GeneratedTask,
TaskGenerationResult,
ExpertProvider,
DomainProvider,
)
from .novelty_metrics import (
NoveltyMetrics,
NoveltyScore,
NoveltyTrajectory,
compute_batch_novelty,
find_most_novel,
)
__all__ = [
# Agents
"NoveltyDrivenTaskAgent",
"ExhaustFrontierAgent",
"CoverageTargetAgent",
# Data classes
"GeneratedTask",
"TaskGenerationResult",
"NoveltyScore",
"NoveltyTrajectory",
# Providers
"ExpertProvider",
"DomainProvider",
# Metrics
"NoveltyMetrics",
"compute_batch_novelty",
"find_most_novel",
]

View File

@@ -0,0 +1,725 @@
"""
Novelty-Driven Task Agent - An autonomous agent that generates tasks using novelty as termination condition.
This agent operates in a while loop, generating tasks from diverse expert perspectives,
and terminates when it finds a task that exceeds the novelty threshold (a "breakthrough").
The core innovation is using novelty assessment to help the agent "jump out" of its
trained data distribution (semantic gravity), finding truly novel ideas.
Architecture:
Seed Problem → Expert Sample → Task Generation → Novelty Assessment → Continue/Stop
Termination Strategy: "Seek Breakthrough"
- Continue until novelty > threshold
- Find the first truly novel task and stop
Research Foundation:
- Novelty Search (Lehman & Stanley): Reward novelty, not objectives
- Curiosity-driven Exploration (Pathak et al.): Intrinsic motivation via prediction error
- Quality-Diversity (MAP-Elites): Maintain diverse high-quality solutions
"""
import asyncio
import json
import logging
import random
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Callable, List, Optional
import httpx
import numpy as np
from .novelty_metrics import NoveltyMetrics, NoveltyScore, NoveltyTrajectory
logger = logging.getLogger(__name__)
# ============================================================================
# Data Classes
# ============================================================================
@dataclass
class GeneratedTask:
"""A single generated task with metadata."""
task: str
expert: str
expert_domain: str
novelty_score: float
iteration: int
is_breakthrough: bool = False
embedding: Optional[np.ndarray] = None
@dataclass
class TaskGenerationResult:
"""Result of a complete novelty-driven task generation session."""
seed_problem: str
breakthrough_task: Optional[GeneratedTask] = None
trajectory: List[GeneratedTask] = field(default_factory=list)
total_iterations: int = 0
terminated_by: str = "unknown" # "breakthrough", "max_iterations", "error"
novelty_trajectory: Optional[NoveltyTrajectory] = None
start_time: Optional[str] = None
end_time: Optional[str] = None
config: dict = field(default_factory=dict)
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"seed_problem": self.seed_problem,
"breakthrough_task": {
"task": self.breakthrough_task.task,
"expert": self.breakthrough_task.expert,
"expert_domain": self.breakthrough_task.expert_domain,
"novelty_score": self.breakthrough_task.novelty_score,
"iteration": self.breakthrough_task.iteration
} if self.breakthrough_task else None,
"trajectory": [
{
"task": t.task,
"expert": t.expert,
"expert_domain": t.expert_domain,
"novelty_score": t.novelty_score,
"iteration": t.iteration,
"is_breakthrough": t.is_breakthrough
}
for t in self.trajectory
],
"total_iterations": self.total_iterations,
"terminated_by": self.terminated_by,
"novelty_stats": {
"mean_novelty": self.novelty_trajectory.mean_novelty if self.novelty_trajectory else 0,
"max_novelty": self.novelty_trajectory.max_novelty if self.novelty_trajectory else 0,
"jump_ratio": self.novelty_trajectory.jump_ratio if self.novelty_trajectory else 0,
"cumulative_novelty": self.novelty_trajectory.final_cumulative_novelty if self.novelty_trajectory else 0
},
"start_time": self.start_time,
"end_time": self.end_time,
"config": self.config
}
# ============================================================================
# Expert/Domain Providers
# ============================================================================
class ExpertProvider:
"""Provides random experts from curated occupation lists."""
def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
"""
Args:
data_dir: Path to data directory containing occupation JSON files
language: Language code ("en" or "zh")
"""
if data_dir is None:
# Default to backend data directory
data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
self.data_dir = data_dir
self.language = language
self._occupations: List[dict] = []
self._load_occupations()
def _load_occupations(self):
"""Load occupations from JSON file."""
file_path = self.data_dir / f"curated_occupations_{self.language}.json"
if not file_path.exists():
logger.warning(f"Occupation file not found: {file_path}")
# Fallback to some default experts
self._occupations = [
{"name": "Marine Biologist", "domain": "Science"},
{"name": "Choreographer", "domain": "Arts"},
{"name": "Urban Planner", "domain": "Architecture"},
{"name": "Chef", "domain": "Culinary"},
{"name": "Astronomer", "domain": "Science"},
]
return
try:
with open(file_path, "r", encoding="utf-8") as f:
data = json.load(f)
self._occupations = data.get("occupations", [])
logger.info(f"Loaded {len(self._occupations)} occupations from {file_path.name}")
except Exception as e:
logger.error(f"Error loading occupations: {e}")
self._occupations = []
def get_random_expert(self) -> dict:
"""Get a random expert with name and domain."""
if not self._occupations:
return {"name": "Expert", "domain": "General"}
return random.choice(self._occupations)
def get_random_experts(self, count: int) -> List[dict]:
"""Get multiple random experts without replacement."""
if len(self._occupations) <= count:
return self._occupations.copy()
return random.sample(self._occupations, count)
class DomainProvider:
"""Provides random knowledge domains from DDC classification."""
def __init__(self, data_dir: Optional[Path] = None, language: str = "en"):
if data_dir is None:
data_dir = Path(__file__).parent.parent.parent / "backend" / "app" / "data"
self.data_dir = data_dir
self.language = language
self._domains: List[dict] = []
self._load_domains()
def _load_domains(self):
"""Load domains from JSON file."""
file_path = self.data_dir / f"ddc_domains_{self.language}.json"
if not file_path.exists():
logger.warning(f"Domain file not found: {file_path}")
self._domains = []
return
try:
with open(file_path, "r", encoding="utf-8") as f:
data = json.load(f)
self._domains = data.get("domains", [])
logger.info(f"Loaded {len(self._domains)} domains from {file_path.name}")
except Exception as e:
logger.error(f"Error loading domains: {e}")
def get_random_domain(self, level: Optional[str] = None) -> dict:
"""Get a random domain, optionally filtered by level."""
domains = self._domains
if level:
domains = [d for d in domains if d.get("level") == level]
if not domains:
return {"name": "General Knowledge", "code": "000"}
return random.choice(domains)
# ============================================================================
# Novelty-Driven Task Agent
# ============================================================================
class NoveltyDrivenTaskAgent:
"""
An autonomous agent that generates tasks using novelty as the termination condition.
The agent operates in a loop:
1. Sample a random expert perspective
2. Generate a task from that expert's viewpoint
3. Compute the task's novelty (distance from centroid of previous tasks)
4. If novelty > threshold → STOP (found breakthrough!)
5. Otherwise → Continue with next expert
Example:
agent = NoveltyDrivenTaskAgent(novelty_threshold=0.4)
result = await agent.run("Improve urban transportation")
# result.breakthrough_task contains the novel task found
# result.trajectory shows the exploration path
"""
def __init__(
self,
novelty_threshold: float = 0.4,
max_iterations: int = 20,
ollama_base_url: str = "http://localhost:11435",
llm_model: str = "qwen3:8b",
embedding_model: str = "qwen3-embedding:4b",
language: str = "en",
data_dir: Optional[Path] = None,
on_iteration: Optional[Callable[[GeneratedTask], None]] = None,
temperature: float = 0.7
):
"""
Args:
novelty_threshold: Novelty score threshold for breakthrough (0.0-1.0)
max_iterations: Maximum iterations before stopping
ollama_base_url: Ollama API endpoint
llm_model: Model for task generation
embedding_model: Model for embeddings
language: Language for prompts and experts ("en" or "zh")
data_dir: Path to data directory for expert/domain files
on_iteration: Callback function called after each iteration
temperature: LLM temperature for generation
"""
self.novelty_threshold = novelty_threshold
self.max_iterations = max_iterations
self.ollama_base_url = ollama_base_url
self.llm_model = llm_model
self.embedding_model = embedding_model
self.language = language
self.temperature = temperature
self.on_iteration = on_iteration
# Initialize providers
self.expert_provider = ExpertProvider(data_dir, language)
self.domain_provider = DomainProvider(data_dir, language)
# Initialize novelty metrics
self.novelty_metrics = NoveltyMetrics(
similarity_threshold=0.7,
jump_detection_enabled=True
)
# HTTP client
self._client: Optional[httpx.AsyncClient] = None
async def _get_client(self) -> httpx.AsyncClient:
"""Get or create HTTP client."""
if self._client is None:
self._client = httpx.AsyncClient(timeout=120.0)
return self._client
async def close(self):
"""Close HTTP client."""
if self._client is not None:
await self._client.aclose()
self._client = None
async def _generate_text(self, prompt: str) -> str:
"""Generate text using Ollama LLM."""
client = await self._get_client()
url = f"{self.ollama_base_url}/api/generate"
# Add /no_think prefix for qwen models to disable thinking
if self.llm_model.lower().startswith("qwen"):
prompt = f"/no_think\n{prompt}"
try:
response = await client.post(url, json={
"model": self.llm_model,
"prompt": prompt,
"stream": False,
"options": {
"temperature": self.temperature
}
})
response.raise_for_status()
result = response.json()
return result.get("response", "").strip()
except Exception as e:
logger.error(f"LLM generation error: {e}")
raise
async def _get_embedding(self, text: str) -> np.ndarray:
"""Get embedding vector for text."""
client = await self._get_client()
url = f"{self.ollama_base_url}/api/embed"
try:
response = await client.post(url, json={
"model": self.embedding_model,
"input": text
})
response.raise_for_status()
result = response.json()
return np.array(result["embeddings"][0])
except Exception as e:
logger.error(f"Embedding error: {e}")
raise
def _build_task_prompt(
self,
seed_problem: str,
expert: dict,
previous_tasks: List[str]
) -> str:
"""Build the prompt for task generation."""
expert_name = expert.get("name", "Expert")
expert_domain = expert.get("domain", "General")
# Build context from previous tasks (if any)
context = ""
if previous_tasks:
recent = previous_tasks[-3:] # Last 3 tasks
context = "\n\nPrevious suggestions (generate something DIFFERENT):\n"
for t in recent:
context += f"- {t}\n"
if self.language == "zh":
prompt = f"""你是一位 {expert_name}{expert_domain})。
给定问题:{seed_problem}
请从你的专业角度出发,提出一个独特的改进任务或探索方向。
这个任务应该结合你的专业知识,提供一个非传统但有价值的视角。
{context}
请直接给出任务描述,不要添加解释。任务应该具体、可行、且与众不同。
任务:"""
else:
prompt = f"""You are a {expert_name} ({expert_domain}).
Given problem: {seed_problem}
From your professional perspective, propose a unique task or exploration direction to improve or innovate on this problem.
The task should leverage your domain expertise to provide an unconventional but valuable angle.
{context}
Provide just the task description without explanation. The task should be specific, actionable, and distinctive.
Task:"""
return prompt
async def _generate_task(
self,
seed_problem: str,
expert: dict,
previous_tasks: List[str]
) -> str:
"""Generate a task from an expert's perspective."""
prompt = self._build_task_prompt(seed_problem, expert, previous_tasks)
task = await self._generate_text(prompt)
# Clean up the response
task = task.strip()
# Remove common prefixes
for prefix in ["Task:", "任务:", "Here's", "I suggest", "Based on"]:
if task.lower().startswith(prefix.lower()):
task = task[len(prefix):].strip()
return task
async def run(
self,
seed_problem: str,
used_experts: Optional[List[dict]] = None
) -> TaskGenerationResult:
"""
Run the novelty-driven task generation loop.
Args:
seed_problem: The initial problem/challenge to explore
used_experts: Optional list of experts to avoid (for multi-run scenarios)
Returns:
TaskGenerationResult with breakthrough task (if found) and full trajectory
"""
# Reset state
self.novelty_metrics.reset()
result = TaskGenerationResult(
seed_problem=seed_problem,
start_time=datetime.now(timezone.utc).isoformat(),
config={
"novelty_threshold": self.novelty_threshold,
"max_iterations": self.max_iterations,
"llm_model": self.llm_model,
"embedding_model": self.embedding_model,
"language": self.language
}
)
used_expert_names = set()
if used_experts:
used_expert_names = {e["name"] for e in used_experts}
previous_tasks: List[str] = []
logger.info(f"Starting novelty loop: '{seed_problem}' (threshold={self.novelty_threshold})")
try:
for iteration in range(self.max_iterations):
# 1. Sample a random expert (avoid duplicates)
attempts = 0
expert = self.expert_provider.get_random_expert()
while expert["name"] in used_expert_names and attempts < 10:
expert = self.expert_provider.get_random_expert()
attempts += 1
used_expert_names.add(expert["name"])
logger.info(f"Iteration {iteration + 1}: Expert = {expert['name']} ({expert['domain']})")
# 2. Generate task
task = await self._generate_task(seed_problem, expert, previous_tasks)
previous_tasks.append(task)
# 3. Get embedding
embedding = await self._get_embedding(task)
# 4. Compute novelty
novelty = self.novelty_metrics.compute_novelty(embedding)
self.novelty_metrics.add_embedding(embedding, novelty)
# 5. Create task record
generated_task = GeneratedTask(
task=task,
expert=expert["name"],
expert_domain=expert["domain"],
novelty_score=novelty.score,
iteration=iteration + 1,
is_breakthrough=novelty.score > self.novelty_threshold,
embedding=embedding
)
result.trajectory.append(generated_task)
logger.info(f" Task: {task[:80]}...")
logger.info(f" Novelty: {novelty.score:.4f} (threshold: {self.novelty_threshold})")
# Callback
if self.on_iteration:
self.on_iteration(generated_task)
# 6. Check for breakthrough
if novelty.score > self.novelty_threshold:
result.breakthrough_task = generated_task
result.terminated_by = "breakthrough"
result.total_iterations = iteration + 1
logger.info(f" BREAKTHROUGH! Stopping after {iteration + 1} iterations")
break
else:
# Max iterations reached without breakthrough
result.terminated_by = "max_iterations"
result.total_iterations = self.max_iterations
logger.info(f"Max iterations ({self.max_iterations}) reached without breakthrough")
# Find the most novel task as a fallback
if result.trajectory:
best_task = max(result.trajectory, key=lambda t: t.novelty_score)
best_task.is_breakthrough = True # Mark as best found
result.breakthrough_task = best_task
except Exception as e:
logger.error(f"Error during generation: {e}")
result.terminated_by = f"error: {str(e)}"
result.total_iterations = len(result.trajectory)
# Finalize
result.end_time = datetime.now(timezone.utc).isoformat()
result.novelty_trajectory = self.novelty_metrics.trajectory
return result
# ============================================================================
# Alternative Termination Strategies
# ============================================================================
class ExhaustFrontierAgent(NoveltyDrivenTaskAgent):
"""
Alternative strategy: Continue while novelty is high, stop when it drops.
This explores the "novelty frontier" more thoroughly, finding multiple novel
ideas before stopping when exploration becomes repetitive.
"""
def __init__(
self,
exhaustion_threshold: float = 0.15,
window_size: int = 3,
min_iterations: int = 5,
**kwargs
):
"""
Args:
exhaustion_threshold: Stop when recent average novelty drops below this
window_size: Number of recent iterations to average
min_iterations: Minimum iterations before checking exhaustion
**kwargs: Passed to parent class
"""
super().__init__(**kwargs)
self.exhaustion_threshold = exhaustion_threshold
self.window_size = window_size
self.min_iterations = min_iterations
async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
"""Override to use exhaustion-based termination."""
# Reset state
self.novelty_metrics.reset()
result = TaskGenerationResult(
seed_problem=seed_problem,
start_time=datetime.now(timezone.utc).isoformat(),
config={
"strategy": "exhaust_frontier",
"exhaustion_threshold": self.exhaustion_threshold,
"window_size": self.window_size,
"min_iterations": self.min_iterations,
"max_iterations": self.max_iterations,
"llm_model": self.llm_model
}
)
used_expert_names = set()
previous_tasks: List[str] = []
novelty_history: List[float] = []
try:
for iteration in range(self.max_iterations):
# Sample expert
expert = self.expert_provider.get_random_expert()
while expert["name"] in used_expert_names and len(used_expert_names) < 200:
expert = self.expert_provider.get_random_expert()
used_expert_names.add(expert["name"])
# Generate and evaluate
task = await self._generate_task(seed_problem, expert, previous_tasks)
previous_tasks.append(task)
embedding = await self._get_embedding(task)
novelty = self.novelty_metrics.compute_novelty(embedding)
self.novelty_metrics.add_embedding(embedding, novelty)
novelty_history.append(novelty.score)
generated_task = GeneratedTask(
task=task,
expert=expert["name"],
expert_domain=expert["domain"],
novelty_score=novelty.score,
iteration=iteration + 1
)
result.trajectory.append(generated_task)
if self.on_iteration:
self.on_iteration(generated_task)
# Check exhaustion condition
if iteration >= self.min_iterations:
recent_avg = np.mean(novelty_history[-self.window_size:])
if recent_avg < self.exhaustion_threshold:
result.terminated_by = f"exhaustion (avg={recent_avg:.3f})"
result.total_iterations = iteration + 1
break
else:
result.terminated_by = "max_iterations"
result.total_iterations = self.max_iterations
# Find all "novel" tasks
novel_tasks = [t for t in result.trajectory if t.novelty_score > self.exhaustion_threshold]
if novel_tasks:
result.breakthrough_task = max(novel_tasks, key=lambda t: t.novelty_score)
result.breakthrough_task.is_breakthrough = True
except Exception as e:
result.terminated_by = f"error: {str(e)}"
result.total_iterations = len(result.trajectory)
result.end_time = datetime.now(timezone.utc).isoformat()
result.novelty_trajectory = self.novelty_metrics.trajectory
return result
class CoverageTargetAgent(NoveltyDrivenTaskAgent):
"""
Alternative strategy: Continue until N distinct clusters are covered.
This ensures a diverse portfolio of ideas across different conceptual areas.
"""
def __init__(
self,
target_clusters: int = 5,
cluster_threshold: float = 0.7,
**kwargs
):
"""
Args:
target_clusters: Target number of distinct clusters to find
cluster_threshold: Similarity threshold for cluster membership
**kwargs: Passed to parent class
"""
super().__init__(**kwargs)
self.target_clusters = target_clusters
self.cluster_threshold = cluster_threshold
def _count_clusters(self, embeddings: List[np.ndarray]) -> int:
"""Count distinct clusters using greedy clustering."""
if not embeddings:
return 0
clusters = []
for emb in embeddings:
found_cluster = False
for cluster_centroid in clusters:
similarity = NoveltyMetrics.cosine_similarity(emb, cluster_centroid)
if similarity >= self.cluster_threshold:
found_cluster = True
break
if not found_cluster:
clusters.append(emb)
return len(clusters)
async def run(self, seed_problem: str, **kwargs) -> TaskGenerationResult:
"""Override to use coverage-based termination."""
self.novelty_metrics.reset()
result = TaskGenerationResult(
seed_problem=seed_problem,
start_time=datetime.now(timezone.utc).isoformat(),
config={
"strategy": "coverage_target",
"target_clusters": self.target_clusters,
"cluster_threshold": self.cluster_threshold,
"max_iterations": self.max_iterations
}
)
used_expert_names = set()
previous_tasks: List[str] = []
all_embeddings: List[np.ndarray] = []
try:
for iteration in range(self.max_iterations):
expert = self.expert_provider.get_random_expert()
while expert["name"] in used_expert_names and len(used_expert_names) < 200:
expert = self.expert_provider.get_random_expert()
used_expert_names.add(expert["name"])
task = await self._generate_task(seed_problem, expert, previous_tasks)
previous_tasks.append(task)
embedding = await self._get_embedding(task)
all_embeddings.append(embedding)
novelty = self.novelty_metrics.compute_novelty(embedding)
self.novelty_metrics.add_embedding(embedding, novelty)
generated_task = GeneratedTask(
task=task,
expert=expert["name"],
expert_domain=expert["domain"],
novelty_score=novelty.score,
iteration=iteration + 1
)
result.trajectory.append(generated_task)
if self.on_iteration:
self.on_iteration(generated_task)
# Check coverage
cluster_count = self._count_clusters(all_embeddings)
if cluster_count >= self.target_clusters:
result.terminated_by = f"coverage ({cluster_count} clusters)"
result.total_iterations = iteration + 1
break
else:
final_clusters = self._count_clusters(all_embeddings)
result.terminated_by = f"max_iterations ({final_clusters} clusters)"
result.total_iterations = self.max_iterations
# Find most novel task
if result.trajectory:
best_task = max(result.trajectory, key=lambda t: t.novelty_score)
best_task.is_breakthrough = True
result.breakthrough_task = best_task
except Exception as e:
result.terminated_by = f"error: {str(e)}"
result.total_iterations = len(result.trajectory)
result.end_time = datetime.now(timezone.utc).isoformat()
result.novelty_trajectory = self.novelty_metrics.trajectory
return result

313
experiments/novelty_loop/demo.py Executable file
View File

@@ -0,0 +1,313 @@
#!/usr/bin/env python3
"""
Novelty-Driven Task Generation Demo
Interactive CLI for exploring the novelty-driven task generation agent.
Examples:
# Basic usage with default settings
python demo.py "Improve urban transportation"
# Custom threshold and iterations
python demo.py "Design a better bicycle" --threshold 0.35 --max-iter 15
# Use Chinese language
python demo.py "改进城市交通" --language zh
# Use exhaustion strategy (explore until stuck)
python demo.py "Sustainable energy solutions" --strategy exhaust
# Use coverage strategy (find N distinct clusters)
python demo.py "Future of education" --strategy coverage --clusters 5
# Save results to file
python demo.py "Smart home innovations" --output results.json
# Verbose mode with detailed logging
python demo.py "Healthcare improvements" --verbose
"""
import argparse
import asyncio
import json
import logging
import sys
from datetime import datetime
from pathlib import Path
# Add parent directory to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from experiments.novelty_loop.agent import (
NoveltyDrivenTaskAgent,
ExhaustFrontierAgent,
CoverageTargetAgent,
GeneratedTask,
TaskGenerationResult
)
# ANSI color codes for terminal output
class Colors:
HEADER = '\033[95m'
BLUE = '\033[94m'
CYAN = '\033[96m'
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BOLD = '\033[1m'
UNDERLINE = '\033[4m'
END = '\033[0m'
def print_header(text: str):
"""Print a styled header."""
print(f"\n{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}")
print(f"{Colors.BOLD}{Colors.HEADER}{text.center(60)}{Colors.END}")
print(f"{Colors.BOLD}{Colors.HEADER}{'='*60}{Colors.END}\n")
def print_iteration(task: GeneratedTask):
"""Print iteration result with colors."""
status_color = Colors.GREEN if task.is_breakthrough else Colors.CYAN
print(f"\n{Colors.BOLD}Iteration {task.iteration}{Colors.END}")
print(f" {Colors.YELLOW}Expert:{Colors.END} {task.expert} ({task.expert_domain})")
print(f" {Colors.YELLOW}Task:{Colors.END} {task.task}")
novelty_bar = "" * int(task.novelty_score * 20) + "" * (20 - int(task.novelty_score * 20))
print(f" {Colors.YELLOW}Novelty:{Colors.END} [{novelty_bar}] {task.novelty_score:.4f}")
if task.is_breakthrough:
print(f" {Colors.GREEN}{Colors.BOLD}★ BREAKTHROUGH! ★{Colors.END}")
def print_result(result: TaskGenerationResult):
"""Print final result summary."""
print_header("RESULTS")
print(f"{Colors.BOLD}Seed Problem:{Colors.END} {result.seed_problem}")
print(f"{Colors.BOLD}Total Iterations:{Colors.END} {result.total_iterations}")
print(f"{Colors.BOLD}Terminated By:{Colors.END} {result.terminated_by}")
if result.novelty_trajectory:
print(f"\n{Colors.BOLD}Novelty Statistics:{Colors.END}")
print(f" Mean Novelty: {result.novelty_trajectory.mean_novelty:.4f}")
print(f" Max Novelty: {result.novelty_trajectory.max_novelty:.4f}")
print(f" Jump Ratio: {result.novelty_trajectory.jump_ratio:.2%}")
if result.breakthrough_task:
print(f"\n{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
print(f"{Colors.GREEN}{Colors.BOLD}BREAKTHROUGH TASK{Colors.END}")
print(f"{Colors.GREEN}{Colors.BOLD}{'='*60}{Colors.END}")
print(f"\n{Colors.BOLD}Expert:{Colors.END} {result.breakthrough_task.expert}")
print(f"{Colors.BOLD}Domain:{Colors.END} {result.breakthrough_task.expert_domain}")
print(f"{Colors.BOLD}Task:{Colors.END}")
print(f" {Colors.CYAN}{result.breakthrough_task.task}{Colors.END}")
print(f"\n{Colors.BOLD}Novelty Score:{Colors.END} {result.breakthrough_task.novelty_score:.4f}")
print(f"{Colors.BOLD}Found at Iteration:{Colors.END} {result.breakthrough_task.iteration}")
# Show trajectory summary
print(f"\n{Colors.BOLD}Exploration Trajectory:{Colors.END}")
for task in result.trajectory:
marker = "" if task.is_breakthrough else ""
novelty_indicator = "" * int(task.novelty_score * 10)
print(f" {marker} [{task.iteration:2d}] {task.expert:20s} | {novelty_indicator:10s} {task.novelty_score:.3f}")
def save_result(result: TaskGenerationResult, output_path: str):
"""Save result to JSON file."""
with open(output_path, "w", encoding="utf-8") as f:
json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
print(f"\n{Colors.GREEN}Results saved to: {output_path}{Colors.END}")
async def run_demo(args):
"""Run the novelty-driven task generation demo."""
print_header("NOVELTY-DRIVEN TASK GENERATION")
print(f"{Colors.BOLD}Configuration:{Colors.END}")
print(f" Seed Problem: {args.seed_problem}")
print(f" Strategy: {args.strategy}")
print(f" Novelty Threshold: {args.threshold}")
print(f" Max Iterations: {args.max_iter}")
print(f" Language: {args.language}")
print(f" LLM Model: {args.model}")
# Create appropriate agent based on strategy
common_kwargs = {
"max_iterations": args.max_iter,
"llm_model": args.model,
"embedding_model": args.embedding_model,
"language": args.language,
"temperature": args.temperature,
"on_iteration": print_iteration if not args.quiet else None
}
if args.strategy == "breakthrough":
agent = NoveltyDrivenTaskAgent(
novelty_threshold=args.threshold,
**common_kwargs
)
elif args.strategy == "exhaust":
agent = ExhaustFrontierAgent(
exhaustion_threshold=args.exhaust_threshold,
window_size=args.window_size,
min_iterations=args.min_iter,
**common_kwargs
)
elif args.strategy == "coverage":
agent = CoverageTargetAgent(
target_clusters=args.clusters,
cluster_threshold=args.cluster_threshold,
**common_kwargs
)
else:
print(f"{Colors.RED}Unknown strategy: {args.strategy}{Colors.END}")
return
print(f"\n{Colors.BOLD}Starting generation loop...{Colors.END}")
print("-" * 60)
try:
result = await agent.run(args.seed_problem)
print_result(result)
if args.output:
save_result(result, args.output)
except Exception as e:
print(f"\n{Colors.RED}Error: {e}{Colors.END}")
if args.verbose:
import traceback
traceback.print_exc()
finally:
await agent.close()
def main():
parser = argparse.ArgumentParser(
description="Novelty-Driven Task Generation Demo",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__
)
# Required argument
parser.add_argument(
"seed_problem",
help="The seed problem or challenge to explore"
)
# Strategy selection
parser.add_argument(
"--strategy", "-s",
choices=["breakthrough", "exhaust", "coverage"],
default="breakthrough",
help="Termination strategy (default: breakthrough)"
)
# Common options
parser.add_argument(
"--threshold", "-t",
type=float,
default=0.4,
help="Novelty threshold for breakthrough (default: 0.4)"
)
parser.add_argument(
"--max-iter", "-m",
type=int,
default=20,
help="Maximum iterations (default: 20)"
)
parser.add_argument(
"--language", "-l",
choices=["en", "zh"],
default="en",
help="Language for prompts and experts (default: en)"
)
# Model options
parser.add_argument(
"--model",
default="qwen3:8b",
help="LLM model for task generation (default: qwen3:8b)"
)
parser.add_argument(
"--embedding-model",
default="qwen3-embedding:4b",
help="Embedding model (default: qwen3-embedding:4b)"
)
parser.add_argument(
"--temperature",
type=float,
default=0.7,
help="LLM temperature (default: 0.7)"
)
# Exhaust strategy options
parser.add_argument(
"--exhaust-threshold",
type=float,
default=0.15,
help="Exhaustion threshold for 'exhaust' strategy (default: 0.15)"
)
parser.add_argument(
"--window-size",
type=int,
default=3,
help="Window size for exhaustion check (default: 3)"
)
parser.add_argument(
"--min-iter",
type=int,
default=5,
help="Minimum iterations before exhaustion check (default: 5)"
)
# Coverage strategy options
parser.add_argument(
"--clusters",
type=int,
default=5,
help="Target clusters for 'coverage' strategy (default: 5)"
)
parser.add_argument(
"--cluster-threshold",
type=float,
default=0.7,
help="Cluster similarity threshold (default: 0.7)"
)
# Output options
parser.add_argument(
"--output", "-o",
help="Save results to JSON file"
)
parser.add_argument(
"--quiet", "-q",
action="store_true",
help="Suppress iteration output"
)
parser.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable verbose logging"
)
args = parser.parse_args()
# Configure logging
if args.verbose:
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
else:
logging.basicConfig(level=logging.WARNING)
# Run the demo
asyncio.run(run_demo(args))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,269 @@
"""
Novelty Metrics Module - Compute novelty scores for generated outputs.
This module provides embedding-based novelty metrics adapted from the AUT flexibility
analysis framework for use in novelty-driven agent loops.
Key Metrics:
- Centroid Distance: Measures how far a new output is from the centroid of previous outputs
- Cumulative Novelty: Tracks novelty over the generation sequence
- Jump Detection: Identifies significant semantic shifts between consecutive outputs
"""
from dataclasses import dataclass, field
from typing import List, Optional
import numpy as np
@dataclass
class NoveltyScore:
"""Result of novelty computation for a single output."""
score: float # Main novelty score (0.0 = identical to centroid, 1.0 = maximally distant)
distance_from_centroid: float
min_distance_to_existing: float # Nearest neighbor distance
is_jump: bool # Whether this represents a significant semantic jump
jump_magnitude: Optional[float] = None # Similarity to previous output (if applicable)
@dataclass
class NoveltyTrajectory:
"""Tracks novelty scores over a generation sequence."""
scores: List[float] = field(default_factory=list)
cumulative_novelty: List[float] = field(default_factory=list)
jump_positions: List[int] = field(default_factory=list)
centroid_history: List[np.ndarray] = field(default_factory=list)
@property
def mean_novelty(self) -> float:
"""Average novelty across all outputs."""
return float(np.mean(self.scores)) if self.scores else 0.0
@property
def max_novelty(self) -> float:
"""Maximum novelty achieved."""
return float(max(self.scores)) if self.scores else 0.0
@property
def jump_ratio(self) -> float:
"""Proportion of transitions that were jumps."""
if len(self.scores) < 2:
return 0.0
return len(self.jump_positions) / (len(self.scores) - 1)
@property
def final_cumulative_novelty(self) -> float:
"""Total accumulated novelty."""
return self.cumulative_novelty[-1] if self.cumulative_novelty else 0.0
class NoveltyMetrics:
"""
Computes novelty metrics for embeddings in a streaming fashion.
Designed for use in an agent loop where outputs are generated one at a time
and we need to assess novelty incrementally.
"""
def __init__(
self,
similarity_threshold: float = 0.7,
jump_detection_enabled: bool = True
):
"""
Args:
similarity_threshold: Threshold for semantic similarity (below = jump)
jump_detection_enabled: Whether to track semantic jumps
"""
self.similarity_threshold = similarity_threshold
self.jump_detection_enabled = jump_detection_enabled
# State
self.embeddings: List[np.ndarray] = []
self.trajectory = NoveltyTrajectory()
self._centroid: Optional[np.ndarray] = None
def reset(self):
"""Reset all state for a new generation session."""
self.embeddings = []
self.trajectory = NoveltyTrajectory()
self._centroid = None
@staticmethod
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
"""Compute cosine similarity between two vectors."""
norm_a = np.linalg.norm(a)
norm_b = np.linalg.norm(b)
if norm_a == 0 or norm_b == 0:
return 0.0
return float(np.dot(a, b) / (norm_a * norm_b))
@staticmethod
def cosine_distance(a: np.ndarray, b: np.ndarray) -> float:
"""Compute cosine distance (1 - similarity) between two vectors."""
return 1.0 - NoveltyMetrics.cosine_similarity(a, b)
def compute_centroid(self) -> Optional[np.ndarray]:
"""Compute centroid of all current embeddings."""
if not self.embeddings:
return None
return np.mean(self.embeddings, axis=0)
def compute_novelty(self, embedding: np.ndarray) -> NoveltyScore:
"""
Compute novelty score for a new embedding.
This does NOT add the embedding to the history - call add_embedding() for that.
Args:
embedding: The embedding vector to evaluate
Returns:
NoveltyScore with computed metrics
"""
embedding = np.array(embedding)
# First output is maximally novel (nothing to compare to)
if not self.embeddings:
return NoveltyScore(
score=1.0,
distance_from_centroid=1.0,
min_distance_to_existing=1.0,
is_jump=False,
jump_magnitude=None
)
# Distance from centroid (primary novelty metric)
centroid = self.compute_centroid()
distance_from_centroid = self.cosine_distance(embedding, centroid)
# Minimum distance to any existing embedding (nearest neighbor)
min_distance = min(
self.cosine_distance(embedding, existing)
for existing in self.embeddings
)
# Jump detection (similarity to previous output)
is_jump = False
jump_magnitude = None
if self.jump_detection_enabled and self.embeddings:
similarity_to_prev = self.cosine_similarity(embedding, self.embeddings[-1])
jump_magnitude = similarity_to_prev
is_jump = similarity_to_prev < self.similarity_threshold
# Primary novelty score is distance from centroid
# Normalized to [0, 1] range where higher = more novel
novelty_score = distance_from_centroid
return NoveltyScore(
score=novelty_score,
distance_from_centroid=distance_from_centroid,
min_distance_to_existing=min_distance,
is_jump=is_jump,
jump_magnitude=jump_magnitude
)
def add_embedding(self, embedding: np.ndarray, novelty: Optional[NoveltyScore] = None):
"""
Add an embedding to the history and update trajectory.
Args:
embedding: The embedding to add
novelty: Pre-computed novelty score (computed if not provided)
"""
embedding = np.array(embedding)
if novelty is None:
novelty = self.compute_novelty(embedding)
# Update state
self.embeddings.append(embedding)
self._centroid = self.compute_centroid()
# Update trajectory
self.trajectory.scores.append(novelty.score)
# Cumulative novelty
prev_cumulative = self.trajectory.cumulative_novelty[-1] if self.trajectory.cumulative_novelty else 0.0
self.trajectory.cumulative_novelty.append(prev_cumulative + novelty.score)
# Track jumps
if novelty.is_jump:
self.trajectory.jump_positions.append(len(self.embeddings) - 1)
# Store centroid history
if self._centroid is not None:
self.trajectory.centroid_history.append(self._centroid.copy())
def get_current_state(self) -> dict:
"""Get current state as a dictionary for logging/debugging."""
return {
"num_embeddings": len(self.embeddings),
"mean_novelty": self.trajectory.mean_novelty,
"max_novelty": self.trajectory.max_novelty,
"jump_ratio": self.trajectory.jump_ratio,
"cumulative_novelty": self.trajectory.final_cumulative_novelty,
"recent_scores": self.trajectory.scores[-5:] if self.trajectory.scores else []
}
def compute_batch_novelty(
embeddings: List[np.ndarray],
reference_embeddings: Optional[List[np.ndarray]] = None
) -> List[float]:
"""
Compute novelty scores for a batch of embeddings.
Useful for post-hoc analysis of generated outputs.
Args:
embeddings: List of embeddings to evaluate
reference_embeddings: Optional reference set (uses self if not provided)
Returns:
List of novelty scores (distance from centroid)
"""
if not embeddings:
return []
embeddings_arr = np.array(embeddings)
if reference_embeddings is not None:
centroid = np.mean(reference_embeddings, axis=0)
else:
centroid = np.mean(embeddings_arr, axis=0)
scores = []
for emb in embeddings_arr:
distance = NoveltyMetrics.cosine_distance(emb, centroid)
scores.append(distance)
return scores
def find_most_novel(
embeddings: List[np.ndarray],
texts: List[str],
top_k: int = 5
) -> List[tuple]:
"""
Find the most novel outputs from a batch.
Args:
embeddings: List of embeddings
texts: Corresponding text outputs
top_k: Number of top results to return
Returns:
List of (text, novelty_score, index) tuples, sorted by novelty descending
"""
scores = compute_batch_novelty(embeddings)
indexed_results = [
(texts[i], scores[i], i)
for i in range(len(texts))
]
# Sort by novelty score descending
indexed_results.sort(key=lambda x: x[1], reverse=True)
return indexed_results[:top_k]

5
experiments/results/.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
# Ignore all experiment result files
*.json
# But keep this .gitignore
!.gitignore

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 285 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

521
experiments/visualize.py Normal file
View File

@@ -0,0 +1,521 @@
"""
Visualization for experiment results.
Generates:
- Box plots of diversity by condition
- 2×2 interaction plots
- Bar charts of survival rates
- t-SNE/UMAP of idea embeddings (optional)
Usage:
python -m experiments.visualize --input results/experiment_xxx_metrics.json
"""
import sys
import json
import argparse
from pathlib import Path
from typing import List, Dict, Any, Optional
import numpy as np
# Add experiments to path
sys.path.insert(0, str(Path(__file__).parent.parent))
from experiments.config import RESULTS_DIR
# Try to import visualization libraries
try:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
MATPLOTLIB_AVAILABLE = True
except ImportError:
MATPLOTLIB_AVAILABLE = False
print("Warning: matplotlib not installed. Visualization unavailable.")
print("Install with: pip install matplotlib")
# Condition display names and colors
CONDITION_LABELS = {
"c1_direct": "C1: Direct",
"c2_expert_only": "C2: Expert-Only",
"c3_attribute_only": "C3: Attr-Only",
"c4_full_pipeline": "C4: Full Pipeline",
"c5_random_perspective": "C5: Random"
}
CONDITION_COLORS = {
"c1_direct": "#808080", # Gray (baseline)
"c2_expert_only": "#2196F3", # Blue
"c3_attribute_only": "#FF9800", # Orange
"c4_full_pipeline": "#4CAF50", # Green (main)
"c5_random_perspective": "#9C27B0" # Purple (control)
}
# 2×2 factorial structure
FACTORIAL_2X2 = {
"no_attr_no_expert": "c1_direct",
"no_attr_with_expert": "c2_expert_only",
"with_attr_no_expert": "c3_attribute_only",
"with_attr_with_expert": "c4_full_pipeline"
}
def extract_metric_values(
metrics: Dict[str, Any],
metric_path: str
) -> Dict[str, List[float]]:
"""Extract values for a specific metric across all queries."""
by_condition = {}
for query_metrics in metrics.get("metrics_by_query", []):
for condition, cond_metrics in query_metrics.get("conditions", {}).items():
if condition not in by_condition:
by_condition[condition] = []
value = cond_metrics
for key in metric_path.split("."):
if value is None:
break
if isinstance(value, dict):
value = value.get(key)
else:
value = None
if value is not None and isinstance(value, (int, float)):
by_condition[condition].append(float(value))
return by_condition
def plot_box_comparison(
metrics: Dict[str, Any],
metric_path: str,
title: str,
ylabel: str,
output_path: Path,
figsize: tuple = (10, 6)
):
"""Create box plot comparing conditions."""
if not MATPLOTLIB_AVAILABLE:
return
by_condition = extract_metric_values(metrics, metric_path)
# Order conditions
ordered_conditions = [
"c1_direct", "c2_expert_only", "c3_attribute_only",
"c4_full_pipeline", "c5_random_perspective"
]
conditions = [c for c in ordered_conditions if c in by_condition]
if not conditions:
print(f"No data for {metric_path}")
return
fig, ax = plt.subplots(figsize=figsize)
# Prepare data
data = [by_condition[c] for c in conditions]
labels = [CONDITION_LABELS.get(c, c) for c in conditions]
colors = [CONDITION_COLORS.get(c, "#888888") for c in conditions]
# Create box plot
bp = ax.boxplot(data, labels=labels, patch_artist=True)
# Color boxes
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
patch.set_alpha(0.7)
# Add individual points
for i, (cond, values) in enumerate(zip(conditions, data)):
x = np.random.normal(i + 1, 0.04, size=len(values))
ax.scatter(x, values, alpha=0.6, color=colors[i], edgecolor='black', s=50)
ax.set_ylabel(ylabel)
ax.set_title(title)
ax.grid(axis='y', alpha=0.3)
# Rotate labels if needed
plt.xticks(rotation=15, ha='right')
plt.tight_layout()
plt.savefig(output_path, dpi=150, bbox_inches='tight')
plt.close()
print(f"Saved: {output_path}")
def plot_interaction_2x2(
metrics: Dict[str, Any],
metric_path: str,
title: str,
ylabel: str,
output_path: Path,
figsize: tuple = (8, 6)
):
"""Create 2×2 factorial interaction plot."""
if not MATPLOTLIB_AVAILABLE:
return
by_condition = extract_metric_values(metrics, metric_path)
# Check if all 2×2 conditions available
required = ["c1_direct", "c2_expert_only", "c3_attribute_only", "c4_full_pipeline"]
if not all(c in by_condition and by_condition[c] for c in required):
print(f"Insufficient data for 2×2 plot of {metric_path}")
return
fig, ax = plt.subplots(figsize=figsize)
# Calculate means
means = {c: np.mean(by_condition[c]) for c in required}
stds = {c: np.std(by_condition[c], ddof=1) if len(by_condition[c]) > 1 else 0 for c in required}
# X positions: No Experts, With Experts
x = [0, 1]
x_labels = ["Without Experts", "With Experts"]
# Line 1: Without Attributes (C1 -> C2)
y_no_attr = [means["c1_direct"], means["c2_expert_only"]]
err_no_attr = [stds["c1_direct"], stds["c2_expert_only"]]
ax.errorbar(x, y_no_attr, yerr=err_no_attr, marker='o', markersize=10,
linewidth=2, capsize=5, label="Without Attributes",
color="#FF9800", linestyle='--')
# Line 2: With Attributes (C3 -> C4)
y_with_attr = [means["c3_attribute_only"], means["c4_full_pipeline"]]
err_with_attr = [stds["c3_attribute_only"], stds["c4_full_pipeline"]]
ax.errorbar(x, y_with_attr, yerr=err_with_attr, marker='s', markersize=10,
linewidth=2, capsize=5, label="With Attributes",
color="#4CAF50", linestyle='-')
# Annotate points
offset = 0.02 * (ax.get_ylim()[1] - ax.get_ylim()[0]) if ax.get_ylim()[1] != ax.get_ylim()[0] else 0.01
ax.annotate("C1", (x[0], y_no_attr[0]), textcoords="offset points",
xytext=(-15, -15), fontsize=9)
ax.annotate("C2", (x[1], y_no_attr[1]), textcoords="offset points",
xytext=(5, -15), fontsize=9)
ax.annotate("C3", (x[0], y_with_attr[0]), textcoords="offset points",
xytext=(-15, 10), fontsize=9)
ax.annotate("C4", (x[1], y_with_attr[1]), textcoords="offset points",
xytext=(5, 10), fontsize=9)
ax.set_xticks(x)
ax.set_xticklabels(x_labels)
ax.set_ylabel(ylabel)
ax.set_title(title)
ax.legend(loc='best')
ax.grid(axis='y', alpha=0.3)
# Check for interaction (non-parallel lines)
slope_no_attr = y_no_attr[1] - y_no_attr[0]
slope_with_attr = y_with_attr[1] - y_with_attr[0]
interaction = slope_with_attr - slope_no_attr
interaction_text = f"Interaction: {interaction:+.4f}"
if interaction > 0.01:
interaction_text += " (super-additive)"
elif interaction < -0.01:
interaction_text += " (sub-additive)"
else:
interaction_text += " (additive)"
ax.text(0.02, 0.98, interaction_text, transform=ax.transAxes,
fontsize=10, verticalalignment='top',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
plt.tight_layout()
plt.savefig(output_path, dpi=150, bbox_inches='tight')
plt.close()
print(f"Saved: {output_path}")
def plot_survival_rates(
metrics: Dict[str, Any],
output_path: Path,
figsize: tuple = (10, 6)
):
"""Create bar chart of deduplication survival rates."""
if not MATPLOTLIB_AVAILABLE:
return
by_condition = extract_metric_values(metrics, "survival_rate")
ordered_conditions = [
"c1_direct", "c2_expert_only", "c3_attribute_only",
"c4_full_pipeline", "c5_random_perspective"
]
conditions = [c for c in ordered_conditions if c in by_condition]
if not conditions:
print("No survival rate data")
return
fig, ax = plt.subplots(figsize=figsize)
# Calculate means and stds
means = [np.mean(by_condition[c]) * 100 for c in conditions] # Convert to percentage
stds = [np.std(by_condition[c], ddof=1) * 100 if len(by_condition[c]) > 1 else 0 for c in conditions]
labels = [CONDITION_LABELS.get(c, c) for c in conditions]
colors = [CONDITION_COLORS.get(c, "#888888") for c in conditions]
x = np.arange(len(conditions))
bars = ax.bar(x, means, yerr=stds, capsize=5, color=colors, alpha=0.8, edgecolor='black')
# Add value labels on bars
for bar, mean in zip(bars, means):
height = bar.get_height()
ax.annotate(f'{mean:.1f}%',
xy=(bar.get_x() + bar.get_width() / 2, height),
xytext=(0, 3), textcoords="offset points",
ha='center', va='bottom', fontsize=10)
ax.set_xticks(x)
ax.set_xticklabels(labels, rotation=15, ha='right')
ax.set_ylabel("Survival Rate (%)")
ax.set_title("Deduplication Survival Rate by Condition\n(Higher = More Diverse Generation)")
ax.set_ylim(0, 110)
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig(output_path, dpi=150, bbox_inches='tight')
plt.close()
print(f"Saved: {output_path}")
def plot_idea_counts(
metrics: Dict[str, Any],
output_path: Path,
figsize: tuple = (10, 6)
):
"""Create stacked bar chart of raw vs unique idea counts."""
if not MATPLOTLIB_AVAILABLE:
return
raw_counts = extract_metric_values(metrics, "raw_count")
unique_counts = extract_metric_values(metrics, "unique_count")
ordered_conditions = [
"c1_direct", "c2_expert_only", "c3_attribute_only",
"c4_full_pipeline", "c5_random_perspective"
]
conditions = [c for c in ordered_conditions if c in raw_counts and c in unique_counts]
if not conditions:
print("No count data")
return
fig, ax = plt.subplots(figsize=figsize)
# Calculate means
raw_means = [np.mean(raw_counts[c]) for c in conditions]
unique_means = [np.mean(unique_counts[c]) for c in conditions]
removed_means = [r - u for r, u in zip(raw_means, unique_means)]
labels = [CONDITION_LABELS.get(c, c) for c in conditions]
x = np.arange(len(conditions))
width = 0.6
# Stacked bars: unique (bottom) + removed (top)
bars1 = ax.bar(x, unique_means, width, label='Unique Ideas',
color=[CONDITION_COLORS.get(c, "#888888") for c in conditions], alpha=0.9)
bars2 = ax.bar(x, removed_means, width, bottom=unique_means, label='Duplicates Removed',
color='lightgray', alpha=0.7, hatch='//')
# Add value labels
for i, (unique, raw) in enumerate(zip(unique_means, raw_means)):
ax.annotate(f'{unique:.0f}', xy=(x[i], unique / 2),
ha='center', va='center', fontsize=10, fontweight='bold')
ax.annotate(f'({raw:.0f})', xy=(x[i], raw + 1),
ha='center', va='bottom', fontsize=9, color='gray')
ax.set_xticks(x)
ax.set_xticklabels(labels, rotation=15, ha='right')
ax.set_ylabel("Number of Ideas")
ax.set_title("Idea Counts by Condition\n(Unique ideas shown, raw total in parentheses)")
ax.legend(loc='upper right')
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig(output_path, dpi=150, bbox_inches='tight')
plt.close()
print(f"Saved: {output_path}")
def plot_metrics_comparison(
metrics: Dict[str, Any],
output_path: Path,
figsize: tuple = (12, 8)
):
"""Create multi-panel comparison of key metrics."""
if not MATPLOTLIB_AVAILABLE:
return
fig, axes = plt.subplots(2, 2, figsize=figsize)
# Extract metrics
metrics_to_plot = [
("survival_rate", "Survival Rate", axes[0, 0], True),
("post_dedup_diversity.mean_pairwise_distance", "Semantic Diversity", axes[0, 1], False),
("post_dedup_query_distance.mean_distance", "Query Distance (Novelty)", axes[1, 0], False),
("post_dedup_clusters.optimal_clusters", "Number of Clusters", axes[1, 1], False),
]
ordered_conditions = [
"c1_direct", "c2_expert_only", "c3_attribute_only",
"c4_full_pipeline", "c5_random_perspective"
]
for metric_path, title, ax, is_percentage in metrics_to_plot:
by_condition = extract_metric_values(metrics, metric_path)
conditions = [c for c in ordered_conditions if c in by_condition and by_condition[c]]
if not conditions:
ax.text(0.5, 0.5, "No data", ha='center', va='center', transform=ax.transAxes)
ax.set_title(title)
continue
means = [np.mean(by_condition[c]) for c in conditions]
if is_percentage:
means = [m * 100 for m in means]
colors = [CONDITION_COLORS.get(c, "#888888") for c in conditions]
x = np.arange(len(conditions))
bars = ax.bar(x, means, color=colors, alpha=0.8, edgecolor='black')
# Simplified labels
short_labels = ["C1", "C2", "C3", "C4", "C5"][:len(conditions)]
ax.set_xticks(x)
ax.set_xticklabels(short_labels)
ax.set_title(title)
ax.grid(axis='y', alpha=0.3)
if is_percentage:
ax.set_ylim(0, 110)
# Add legend
legend_elements = [
mpatches.Patch(facecolor=CONDITION_COLORS[c], label=CONDITION_LABELS[c])
for c in ordered_conditions if c in CONDITION_COLORS
]
fig.legend(handles=legend_elements, loc='lower center', ncol=3, bbox_to_anchor=(0.5, -0.02))
plt.tight_layout()
plt.subplots_adjust(bottom=0.15)
plt.savefig(output_path, dpi=150, bbox_inches='tight')
plt.close()
print(f"Saved: {output_path}")
def generate_all_visualizations(
metrics: Dict[str, Any],
output_dir: Path
):
"""Generate all visualization figures."""
if not MATPLOTLIB_AVAILABLE:
print("matplotlib not available. Cannot generate visualizations.")
return
output_dir.mkdir(parents=True, exist_ok=True)
experiment_id = metrics.get("experiment_id", "experiment")
print(f"\nGenerating visualizations for {experiment_id}...")
# 1. Survival rates bar chart
plot_survival_rates(
metrics,
output_dir / f"{experiment_id}_survival_rates.png"
)
# 2. Idea counts stacked bar
plot_idea_counts(
metrics,
output_dir / f"{experiment_id}_idea_counts.png"
)
# 3. Diversity box plot
plot_box_comparison(
metrics,
"post_dedup_diversity.mean_pairwise_distance",
"Semantic Diversity by Condition (Post-Dedup)",
"Mean Pairwise Distance",
output_dir / f"{experiment_id}_diversity_boxplot.png"
)
# 4. Query distance box plot
plot_box_comparison(
metrics,
"post_dedup_query_distance.mean_distance",
"Query Distance by Condition (Novelty)",
"Distance from Original Query",
output_dir / f"{experiment_id}_query_distance_boxplot.png"
)
# 5. 2×2 interaction plot for diversity
plot_interaction_2x2(
metrics,
"post_dedup_diversity.mean_pairwise_distance",
"2×2 Factorial: Semantic Diversity",
"Mean Pairwise Distance",
output_dir / f"{experiment_id}_interaction_diversity.png"
)
# 6. 2×2 interaction plot for query distance
plot_interaction_2x2(
metrics,
"post_dedup_query_distance.mean_distance",
"2×2 Factorial: Query Distance (Novelty)",
"Distance from Original Query",
output_dir / f"{experiment_id}_interaction_novelty.png"
)
# 7. Multi-panel comparison
plot_metrics_comparison(
metrics,
output_dir / f"{experiment_id}_metrics_comparison.png"
)
print(f"\nAll visualizations saved to: {output_dir}")
def main():
parser = argparse.ArgumentParser(
description="Generate visualizations for experiment results"
)
parser.add_argument(
"--input",
type=str,
required=True,
help="Input metrics JSON file"
)
parser.add_argument(
"--output-dir",
type=str,
help="Output directory for figures (default: results/figures/)"
)
args = parser.parse_args()
input_path = Path(args.input)
if not input_path.exists():
input_path = RESULTS_DIR / args.input
if not input_path.exists():
print(f"Error: Input file not found: {args.input}")
sys.exit(1)
# Load metrics
with open(input_path, "r", encoding="utf-8") as f:
metrics = json.load(f)
# Output directory
if args.output_dir:
output_dir = Path(args.output_dir)
else:
output_dir = RESULTS_DIR / "figures"
generate_all_visualizations(metrics, output_dir)
if __name__ == "__main__":
main()

View File

@@ -155,7 +155,6 @@
"integrity": "sha512-e7jT4DxYvIDLk1ZHmU/m/mB19rex9sv0c2ftBtjSBv+kVM/902eh0fINUzD7UwLLNR+jU585GxUJ8/EBfAM5fw==", "integrity": "sha512-e7jT4DxYvIDLk1ZHmU/m/mB19rex9sv0c2ftBtjSBv+kVM/902eh0fINUzD7UwLLNR+jU585GxUJ8/EBfAM5fw==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"peer": true,
"dependencies": { "dependencies": {
"@babel/code-frame": "^7.27.1", "@babel/code-frame": "^7.27.1",
"@babel/generator": "^7.28.5", "@babel/generator": "^7.28.5",
@@ -2446,7 +2445,6 @@
"integrity": "sha512-GNWcUTRBgIRJD5zj+Tq0fKOJ5XZajIiBroOF0yvj2bSU1WvNdYS/dn9UxwsujGW4JX06dnHyjV2y9rRaybH0iQ==", "integrity": "sha512-GNWcUTRBgIRJD5zj+Tq0fKOJ5XZajIiBroOF0yvj2bSU1WvNdYS/dn9UxwsujGW4JX06dnHyjV2y9rRaybH0iQ==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"peer": true,
"dependencies": { "dependencies": {
"undici-types": "~7.16.0" "undici-types": "~7.16.0"
} }
@@ -2457,7 +2455,6 @@
"integrity": "sha512-MWtvHrGZLFttgeEj28VXHxpmwYbor/ATPYbBfSFZEIRK0ecCFLl2Qo55z52Hss+UV9CRN7trSeq1zbgx7YDWWg==", "integrity": "sha512-MWtvHrGZLFttgeEj28VXHxpmwYbor/ATPYbBfSFZEIRK0ecCFLl2Qo55z52Hss+UV9CRN7trSeq1zbgx7YDWWg==",
"devOptional": true, "devOptional": true,
"license": "MIT", "license": "MIT",
"peer": true,
"dependencies": { "dependencies": {
"csstype": "^3.2.2" "csstype": "^3.2.2"
} }
@@ -2518,7 +2515,6 @@
"integrity": "sha512-jCzKdm/QK0Kg4V4IK/oMlRZlY+QOcdjv89U2NgKHZk1CYTj82/RVSx1mV/0gqCVMJ/DA+Zf/S4NBWNF8GQ+eqQ==", "integrity": "sha512-jCzKdm/QK0Kg4V4IK/oMlRZlY+QOcdjv89U2NgKHZk1CYTj82/RVSx1mV/0gqCVMJ/DA+Zf/S4NBWNF8GQ+eqQ==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"peer": true,
"dependencies": { "dependencies": {
"@typescript-eslint/scope-manager": "8.48.0", "@typescript-eslint/scope-manager": "8.48.0",
"@typescript-eslint/types": "8.48.0", "@typescript-eslint/types": "8.48.0",
@@ -2802,7 +2798,6 @@
"integrity": "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==", "integrity": "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"peer": true,
"bin": { "bin": {
"acorn": "bin/acorn" "acorn": "bin/acorn"
}, },
@@ -2971,7 +2966,6 @@
} }
], ],
"license": "MIT", "license": "MIT",
"peer": true,
"dependencies": { "dependencies": {
"baseline-browser-mapping": "^2.8.25", "baseline-browser-mapping": "^2.8.25",
"caniuse-lite": "^1.0.30001754", "caniuse-lite": "^1.0.30001754",
@@ -3442,7 +3436,6 @@
"resolved": "https://registry.npmjs.org/d3-selection/-/d3-selection-3.0.0.tgz", "resolved": "https://registry.npmjs.org/d3-selection/-/d3-selection-3.0.0.tgz",
"integrity": "sha512-fmTRWbNMmsmWq6xJV8D19U/gw/bwrHfNXxrIN+HfZgnzqTHp9jOmKMhsTUjXOJnZOdZY9Q28y4yebKzqDKlxlQ==", "integrity": "sha512-fmTRWbNMmsmWq6xJV8D19U/gw/bwrHfNXxrIN+HfZgnzqTHp9jOmKMhsTUjXOJnZOdZY9Q28y4yebKzqDKlxlQ==",
"license": "ISC", "license": "ISC",
"peer": true,
"engines": { "engines": {
"node": ">=12" "node": ">=12"
} }
@@ -3531,8 +3524,7 @@
"version": "1.11.19", "version": "1.11.19",
"resolved": "https://registry.npmjs.org/dayjs/-/dayjs-1.11.19.tgz", "resolved": "https://registry.npmjs.org/dayjs/-/dayjs-1.11.19.tgz",
"integrity": "sha512-t5EcLVS6QPBNqM2z8fakk/NKel+Xzshgt8FFKAn+qwlD1pzZWxh0nVCrvFK7ZDb6XucZeF9z8C7CBWTRIVApAw==", "integrity": "sha512-t5EcLVS6QPBNqM2z8fakk/NKel+Xzshgt8FFKAn+qwlD1pzZWxh0nVCrvFK7ZDb6XucZeF9z8C7CBWTRIVApAw==",
"license": "MIT", "license": "MIT"
"peer": true
}, },
"node_modules/debug": { "node_modules/debug": {
"version": "4.4.3", "version": "4.4.3",
@@ -3646,7 +3638,6 @@
"integrity": "sha512-BhHmn2yNOFA9H9JmmIVKJmd288g9hrVRDkdoIgRCRuSySRUHH7r/DI6aAXW9T1WwUuY3DFgrcaqB+deURBLR5g==", "integrity": "sha512-BhHmn2yNOFA9H9JmmIVKJmd288g9hrVRDkdoIgRCRuSySRUHH7r/DI6aAXW9T1WwUuY3DFgrcaqB+deURBLR5g==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"peer": true,
"dependencies": { "dependencies": {
"@eslint-community/eslint-utils": "^4.8.0", "@eslint-community/eslint-utils": "^4.8.0",
"@eslint-community/regexpp": "^4.12.1", "@eslint-community/regexpp": "^4.12.1",
@@ -4376,7 +4367,6 @@
"integrity": "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q==", "integrity": "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"peer": true,
"engines": { "engines": {
"node": ">=12" "node": ">=12"
}, },
@@ -4503,7 +4493,6 @@
"resolved": "https://registry.npmjs.org/react/-/react-19.2.0.tgz", "resolved": "https://registry.npmjs.org/react/-/react-19.2.0.tgz",
"integrity": "sha512-tmbWg6W31tQLeB5cdIBOicJDJRR2KzXsV7uSK9iNfLWQ5bIZfxuPEHp7M8wiHyHnn0DD1i7w3Zmin0FtkrwoCQ==", "integrity": "sha512-tmbWg6W31tQLeB5cdIBOicJDJRR2KzXsV7uSK9iNfLWQ5bIZfxuPEHp7M8wiHyHnn0DD1i7w3Zmin0FtkrwoCQ==",
"license": "MIT", "license": "MIT",
"peer": true,
"engines": { "engines": {
"node": ">=0.10.0" "node": ">=0.10.0"
} }
@@ -4513,7 +4502,6 @@
"resolved": "https://registry.npmjs.org/react-dom/-/react-dom-19.2.0.tgz", "resolved": "https://registry.npmjs.org/react-dom/-/react-dom-19.2.0.tgz",
"integrity": "sha512-UlbRu4cAiGaIewkPyiRGJk0imDN2T3JjieT6spoL2UeSf5od4n5LB/mQ4ejmxhCFT1tYe8IvaFulzynWovsEFQ==", "integrity": "sha512-UlbRu4cAiGaIewkPyiRGJk0imDN2T3JjieT6spoL2UeSf5od4n5LB/mQ4ejmxhCFT1tYe8IvaFulzynWovsEFQ==",
"license": "MIT", "license": "MIT",
"peer": true,
"dependencies": { "dependencies": {
"scheduler": "^0.27.0" "scheduler": "^0.27.0"
}, },
@@ -4767,7 +4755,6 @@
"integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==", "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
"dev": true, "dev": true,
"license": "Apache-2.0", "license": "Apache-2.0",
"peer": true,
"bin": { "bin": {
"tsc": "bin/tsc", "tsc": "bin/tsc",
"tsserver": "bin/tsserver" "tsserver": "bin/tsserver"
@@ -4863,7 +4850,6 @@
"integrity": "sha512-tI2l/nFHC5rLh7+5+o7QjKjSR04ivXDF4jcgV0f/bTQ+OJiITy5S6gaynVsEM+7RqzufMnVbIon6Sr5x1SDYaQ==", "integrity": "sha512-tI2l/nFHC5rLh7+5+o7QjKjSR04ivXDF4jcgV0f/bTQ+OJiITy5S6gaynVsEM+7RqzufMnVbIon6Sr5x1SDYaQ==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"peer": true,
"dependencies": { "dependencies": {
"esbuild": "^0.25.0", "esbuild": "^0.25.0",
"fdir": "^6.5.0", "fdir": "^6.5.0",
@@ -4985,7 +4971,6 @@
"integrity": "sha512-AvvthqfqrAhNH9dnfmrfKzX5upOdjUVJYFqNSlkmGf64gRaTzlPwz99IHYnVs28qYAybvAlBV+H7pn0saFY4Ig==", "integrity": "sha512-AvvthqfqrAhNH9dnfmrfKzX5upOdjUVJYFqNSlkmGf64gRaTzlPwz99IHYnVs28qYAybvAlBV+H7pn0saFY4Ig==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"peer": true,
"funding": { "funding": {
"url": "https://github.com/sponsors/colinhacks" "url": "https://github.com/sponsors/colinhacks"
} }

View File

@@ -1,17 +1,24 @@
import { useState, useRef, useCallback, useEffect } from 'react'; import { useState, useRef, useCallback, useEffect, useMemo } from 'react';
import { ConfigProvider, Layout, theme, Typography, Space, Tabs, Slider, Radio } from 'antd'; import { ConfigProvider, Layout, theme, Typography, Space, Tabs, Slider, Radio, Switch, Segmented } from 'antd';
import { ApartmentOutlined, ThunderboltOutlined, FilterOutlined } from '@ant-design/icons'; import { ApartmentOutlined, ThunderboltOutlined, FilterOutlined, SwapOutlined, FileSearchOutlined, GlobalOutlined } from '@ant-design/icons';
import { ThemeToggle } from './components/ThemeToggle'; import { ThemeToggle } from './components/ThemeToggle';
import { InputPanel } from './components/InputPanel'; import { InputPanel } from './components/InputPanel';
import { TransformationInputPanel } from './components/TransformationInputPanel'; import { TransformationInputPanel } from './components/TransformationInputPanel';
import { MindmapPanel } from './components/MindmapPanel'; import { MindmapPanel } from './components/MindmapPanel';
import { TransformationPanel } from './components/TransformationPanel'; import { TransformationPanel } from './components/TransformationPanel';
import { DeduplicationPanel } from './components/DeduplicationPanel'; import { DeduplicationPanel } from './components/DeduplicationPanel';
import { PatentSearchPanel } from './components/PatentSearchPanel';
import { DualPathInputPanel } from './components/DualPathInputPanel';
import { DualPathMindmapPanel } from './components/DualPathMindmapPanel';
import { CrossoverPanel } from './components/CrossoverPanel';
import { useAttribute } from './hooks/useAttribute'; import { useAttribute } from './hooks/useAttribute';
import { useDualPathAttribute } from './hooks/useDualPathAttribute';
import { getModels } from './services/api'; import { getModels } from './services/api';
import { crossoverPairsToDAGs, type CrossoverDAGResult } from './utils/crossoverToDAG';
import { DualTransformationPanel } from './components/DualTransformationPanel';
import type { MindmapDAGRef } from './components/MindmapDAG'; import type { MindmapDAGRef } from './components/MindmapDAG';
import type { TransformationDAGRef } from './components/TransformationDAG'; import type { TransformationDAGRef } from './components/TransformationDAG';
import type { CategoryMode, ExpertSource, ExpertTransformationDAGResult, DeduplicationMethod } from './types'; import type { CategoryMode, ExpertSource, ExpertTransformationDAGResult, DeduplicationMethod, ExpertMode, CrossoverPair, PromptLanguage } from './types';
const { Header, Sider, Content } = Layout; const { Header, Sider, Content } = Layout;
const { Title } = Typography; const { Title } = Typography;
@@ -24,7 +31,15 @@ interface VisualSettings {
function App() { function App() {
const [isDark, setIsDark] = useState(true); const [isDark, setIsDark] = useState(true);
const [activeTab, setActiveTab] = useState<string>('attribute'); const [activeTab, setActiveTab] = useState<string>('attribute');
const [dualPathMode, setDualPathMode] = useState(false);
const [promptLanguage, setPromptLanguage] = useState<PromptLanguage>('zh');
// Single path hook
const { loading, progress, error, currentResult, history, analyze, loadFromHistory } = useAttribute(); const { loading, progress, error, currentResult, history, analyze, loadFromHistory } = useAttribute();
// Dual path hook
const dualPath = useDualPathAttribute();
const [visualSettings, setVisualSettings] = useState<VisualSettings>({ const [visualSettings, setVisualSettings] = useState<VisualSettings>({
nodeSpacing: 32, nodeSpacing: 32,
fontSize: 14, fontSize: 14,
@@ -32,6 +47,21 @@ function App() {
const mindmapRef = useRef<MindmapDAGRef>(null); const mindmapRef = useRef<MindmapDAGRef>(null);
const transformationRef = useRef<TransformationDAGRef>(null); const transformationRef = useRef<TransformationDAGRef>(null);
// Dual path expert mode
const [expertMode, setExpertMode] = useState<ExpertMode>('shared');
const [selectedCrossoverPairs, setSelectedCrossoverPairs] = useState<CrossoverPair[]>([]);
// Convert selected crossover pairs to two separate DAGs for dual transformation
const crossoverDAGs = useMemo((): CrossoverDAGResult | null => {
if (selectedCrossoverPairs.length === 0) return null;
if (!dualPath.pathA.result || !dualPath.pathB.result) return null;
return crossoverPairsToDAGs(
selectedCrossoverPairs,
dualPath.pathA.result,
dualPath.pathB.result
);
}, [selectedCrossoverPairs, dualPath.pathA.result, dualPath.pathB.result]);
// Transformation Agent settings // Transformation Agent settings
const [transformModel, setTransformModel] = useState<string>(''); const [transformModel, setTransformModel] = useState<string>('');
const [transformTemperature, setTransformTemperature] = useState<number>(0.95); const [transformTemperature, setTransformTemperature] = useState<number>(0.95);
@@ -83,9 +113,10 @@ function App() {
chainCount?: number, chainCount?: number,
categoryMode?: CategoryMode, categoryMode?: CategoryMode,
customCategories?: string[], customCategories?: string[],
suggestedCategoryCount?: number suggestedCategoryCount?: number,
lang?: PromptLanguage
) => { ) => {
await analyze(query, model, temperature, chainCount, categoryMode, customCategories, suggestedCategoryCount); await analyze(query, model, temperature, chainCount, categoryMode, customCategories, suggestedCategoryCount, lang || promptLanguage);
}; };
const handleResetView = useCallback(() => { const handleResetView = useCallback(() => {
@@ -96,6 +127,30 @@ function App() {
setShouldStartTransform(true); setShouldStartTransform(true);
}, []); }, []);
// Dual path analysis handler
const handleDualPathAnalyze = useCallback(async (
queryA: string,
queryB: string,
options?: {
model?: string;
temperature?: number;
chainCount?: number;
categoryMode?: CategoryMode;
customCategories?: string[];
suggestedCategoryCount?: number;
lang?: PromptLanguage;
}
) => {
await dualPath.analyzeParallel(queryA, queryB, { ...options, lang: options?.lang || promptLanguage });
}, [dualPath, promptLanguage]);
// Handle mode switch
const handleModeSwitch = useCallback((checked: boolean) => {
setDualPathMode(checked);
// Reset to attribute tab when switching modes
setActiveTab('attribute');
}, []);
return ( return (
<ConfigProvider <ConfigProvider
theme={{ theme={{
@@ -140,7 +195,31 @@ function App() {
Novelty Seeking Novelty Seeking
</Title> </Title>
</Space> </Space>
<ThemeToggle isDark={isDark} onToggle={setIsDark} /> <Space align="center" size="middle">
<Space size="small">
<Typography.Text type="secondary">Single</Typography.Text>
<Switch
checked={dualPathMode}
onChange={handleModeSwitch}
checkedChildren={<SwapOutlined />}
unCheckedChildren={<ApartmentOutlined />}
/>
<Typography.Text type="secondary">Dual</Typography.Text>
</Space>
<Space size="small">
<GlobalOutlined style={{ color: isDark ? '#177ddc' : '#1890ff' }} />
<Segmented
size="small"
value={promptLanguage}
onChange={(value) => setPromptLanguage(value as PromptLanguage)}
options={[
{ label: '中文', value: 'zh' },
{ label: 'EN', value: 'en' },
]}
/>
</Space>
<ThemeToggle isDark={isDark} onToggle={setIsDark} />
</Space>
</Header> </Header>
<Layout> <Layout>
<Content <Content
@@ -155,7 +234,98 @@ function App() {
onChange={setActiveTab} onChange={setActiveTab}
style={{ height: '100%' }} style={{ height: '100%' }}
tabBarStyle={{ marginBottom: 8 }} tabBarStyle={{ marginBottom: 8 }}
items={[ items={dualPathMode ? [
// ===== Dual Path Mode Tabs =====
{
key: 'attribute',
label: (
<span>
<SwapOutlined style={{ marginRight: 8 }} />
Dual Path Attribute
</span>
),
children: (
<div style={{ height: 'calc(100vh - 140px)' }}>
<DualPathMindmapPanel
pathA={dualPath.pathA}
pathB={dualPath.pathB}
isDark={isDark}
visualSettings={visualSettings}
/>
</div>
),
},
{
key: 'crossover',
label: (
<span>
<SwapOutlined style={{ marginRight: 8 }} />
Crossover
</span>
),
children: (
<div style={{ height: 'calc(100vh - 140px)', padding: 16 }}>
<CrossoverPanel
pathAResult={dualPath.pathA.result}
pathBResult={dualPath.pathB.result}
isDark={isDark}
expertMode={expertMode}
onExpertModeChange={setExpertMode}
onCrossoverReady={setSelectedCrossoverPairs}
/>
</div>
),
},
{
key: 'transformation',
label: (
<span>
<ThunderboltOutlined style={{ marginRight: 8 }} />
Transformation Agent
{crossoverDAGs && (
<span style={{ marginLeft: 4, fontSize: 10, opacity: 0.7 }}>
(A:{crossoverDAGs.pathA.nodes.length} / B:{crossoverDAGs.pathB.nodes.length})
</span>
)}
</span>
),
children: (
<div style={{ height: 'calc(100vh - 140px)' }}>
<DualTransformationPanel
crossoverDAGA={crossoverDAGs?.pathA ?? null}
crossoverDAGB={crossoverDAGs?.pathB ?? null}
isDark={isDark}
model={transformModel}
temperature={transformTemperature}
expertConfig={expertConfig}
expertSource={expertSource}
expertLanguage={expertLanguage}
lang={promptLanguage}
shouldStartTransform={shouldStartTransform}
onTransformComplete={() => setShouldStartTransform(false)}
onLoadingChange={setTransformLoading}
/>
</div>
),
},
{
key: 'patent',
label: (
<span>
<FileSearchOutlined style={{ marginRight: 8 }} />
Patent Search
</span>
),
children: (
<div style={{ height: 'calc(100vh - 140px)' }}>
<PatentSearchPanel
isDark={isDark}
/>
</div>
),
},
] : [
// ===== Single Path Mode Tabs =====
{ {
key: 'attribute', key: 'attribute',
label: ( label: (
@@ -196,6 +366,7 @@ function App() {
expertConfig={expertConfig} expertConfig={expertConfig}
expertSource={expertSource} expertSource={expertSource}
expertLanguage={expertLanguage} expertLanguage={expertLanguage}
lang={promptLanguage}
shouldStartTransform={shouldStartTransform} shouldStartTransform={shouldStartTransform}
onTransformComplete={() => setShouldStartTransform(false)} onTransformComplete={() => setShouldStartTransform(false)}
onLoadingChange={setTransformLoading} onLoadingChange={setTransformLoading}
@@ -221,6 +392,24 @@ function App() {
onThresholdChange={setDeduplicationThreshold} onThresholdChange={setDeduplicationThreshold}
method={deduplicationMethod} method={deduplicationMethod}
onMethodChange={setDeduplicationMethod} onMethodChange={setDeduplicationMethod}
lang={promptLanguage}
/>
</div>
),
},
{
key: 'patent',
label: (
<span>
<FileSearchOutlined style={{ marginRight: 8 }} />
Patent Search
</span>
),
children: (
<div style={{ height: 'calc(100vh - 140px)' }}>
<PatentSearchPanel
descriptions={transformationResult?.results.flatMap(r => r.descriptions)}
isDark={isDark}
/> />
</div> </div>
), ),
@@ -236,24 +425,54 @@ function App() {
overflow: 'auto', overflow: 'auto',
}} }}
> >
{activeTab === 'attribute' && ( {activeTab === 'attribute' && !dualPathMode && (
<InputPanel <InputPanel
loading={loading} loading={loading}
progress={progress} progress={progress}
history={history} history={history}
currentResult={currentResult} currentResult={currentResult}
onAnalyze={handleAnalyze} onAnalyze={handleAnalyze}
onLoadHistory={loadFromHistory} onLoadHistory={(item, lang) => loadFromHistory(item, lang || promptLanguage)}
onResetView={handleResetView} onResetView={handleResetView}
visualSettings={visualSettings} visualSettings={visualSettings}
onVisualSettingsChange={setVisualSettings} onVisualSettingsChange={setVisualSettings}
lang={promptLanguage}
/> />
)} )}
{activeTab === 'attribute' && dualPathMode && (
<DualPathInputPanel
onAnalyze={handleDualPathAnalyze}
loadingA={dualPath.pathA.loading}
loadingB={dualPath.pathB.loading}
progressA={dualPath.pathA.progress}
progressB={dualPath.pathB.progress}
availableModels={availableModels}
lang={promptLanguage}
/>
)}
{activeTab === 'crossover' && dualPathMode && (
<div style={{ padding: 16 }}>
<Typography.Title level={5} style={{ marginBottom: 16 }}>
<SwapOutlined style={{ marginRight: 8 }} />
Crossover Settings
</Typography.Title>
<Typography.Text type="secondary">
Select attribute pairs in the main panel to create crossover combinations.
{selectedCrossoverPairs.length > 0 && (
<div style={{ marginTop: 8 }}>
<Typography.Text strong>
{selectedCrossoverPairs.length} pairs selected
</Typography.Text>
</div>
)}
</Typography.Text>
</div>
)}
{activeTab === 'transformation' && ( {activeTab === 'transformation' && (
<TransformationInputPanel <TransformationInputPanel
onTransform={handleTransform} onTransform={handleTransform}
loading={transformLoading} loading={transformLoading}
hasData={!!currentResult} hasData={dualPathMode ? !!crossoverDAGs : !!currentResult}
isDark={isDark} isDark={isDark}
model={transformModel} model={transformModel}
temperature={transformTemperature} temperature={transformTemperature}
@@ -270,6 +489,37 @@ function App() {
availableModels={availableModels} availableModels={availableModels}
/> />
)} )}
{activeTab === 'patent' && (
<div style={{ padding: 16 }}>
<Typography.Title level={5} style={{ marginBottom: 16 }}>
<FileSearchOutlined style={{ marginRight: 8 }} />
Patent Search Info
</Typography.Title>
<Typography.Paragraph type="secondary" style={{ fontSize: 12 }}>
Search patents using the Lens.org API to find prior art and similar inventions.
</Typography.Paragraph>
<Typography.Title level={5} style={{ marginTop: 24, marginBottom: 12 }}>
How to Use
</Typography.Title>
<Typography.Paragraph style={{ fontSize: 12 }}>
<ol style={{ paddingLeft: 16, margin: 0 }}>
<li style={{ marginBottom: 8 }}>Click a generated description on the left to load it into the search box</li>
<li style={{ marginBottom: 8 }}>Edit the description to refine your search query</li>
<li style={{ marginBottom: 8 }}>Click "Search Patents" to find similar patents</li>
<li style={{ marginBottom: 8 }}>Results appear on the right - click to view on Lens.org</li>
</ol>
</Typography.Paragraph>
<Typography.Title level={5} style={{ marginTop: 24, marginBottom: 12 }}>
Result Interpretation
</Typography.Title>
<Typography.Paragraph type="secondary" style={{ fontSize: 12 }}>
<strong>Many results:</strong> Query may overlap with existing prior art - consider making it more specific.
</Typography.Paragraph>
<Typography.Paragraph type="secondary" style={{ fontSize: 12 }}>
<strong>Few/no results:</strong> Potentially novel concept - good candidate for further exploration.
</Typography.Paragraph>
</div>
)}
{activeTab === 'deduplication' && ( {activeTab === 'deduplication' && (
<div style={{ padding: 16 }}> <div style={{ padding: 16 }}>
<Typography.Title level={5} style={{ marginBottom: 16 }}> <Typography.Title level={5} style={{ marginBottom: 16 }}>

View File

@@ -0,0 +1,298 @@
import { useEffect, useState } from 'react';
import {
Empty,
Card,
Button,
Statistic,
Row,
Col,
Typography,
Space,
Badge,
Collapse,
Checkbox,
Radio,
} from 'antd';
import {
SwapOutlined,
CheckCircleOutlined,
ReloadOutlined,
UnorderedListOutlined,
TableOutlined,
} from '@ant-design/icons';
import type { AttributeDAG, CrossoverPair, ExpertMode } from '../types';
import { useAttributeCrossover } from '../hooks/useAttributeCrossover';
import { CrossoverCard } from './crossover/CrossoverCard';
import { CrossoverMatrix } from './crossover/CrossoverMatrix';
import { CrossoverPreview } from './crossover/CrossoverPreview';
const { Text } = Typography;
interface CrossoverPanelProps {
pathAResult: AttributeDAG | null;
pathBResult: AttributeDAG | null;
isDark: boolean;
expertMode: ExpertMode;
onExpertModeChange: (mode: ExpertMode) => void;
onCrossoverReady?: (selectedPairs: CrossoverPair[]) => void;
}
type ViewMode = 'list' | 'matrix';
export function CrossoverPanel({
pathAResult,
pathBResult,
isDark,
expertMode,
onExpertModeChange,
onCrossoverReady,
}: CrossoverPanelProps) {
const [viewMode, setViewMode] = useState<ViewMode>('list');
const {
pairs,
selectedPairs,
pairsByType,
crossTypeStats,
applyPairs,
togglePairSelection,
selectPairsByType,
selectAll,
clearPairs,
} = useAttributeCrossover();
// Generate pairs when both results are available
useEffect(() => {
if (pathAResult && pathBResult) {
applyPairs(pathAResult, pathBResult);
} else {
clearPairs();
}
}, [pathAResult, pathBResult, applyPairs, clearPairs]);
// Notify parent when selection changes
useEffect(() => {
onCrossoverReady?.(selectedPairs);
}, [selectedPairs, onCrossoverReady]);
// Render when no data
if (!pathAResult || !pathBResult) {
return (
<div style={{
display: 'flex',
justifyContent: 'center',
alignItems: 'center',
height: '100%',
}}>
<Empty
description={
<Space direction="vertical" align="center">
<Text>Complete both Path A and Path B analysis first</Text>
<Text type="secondary">
{!pathAResult && !pathBResult
? 'Neither path has been analyzed'
: !pathAResult
? 'Path A has not been analyzed'
: 'Path B has not been analyzed'}
</Text>
</Space>
}
/>
</div>
);
}
// Generate cross type labels dynamically
const getCrossTypeLabel = (crossType: string): string => {
if (crossType.startsWith('same-')) {
const category = crossType.replace('same-', '');
return `Same Category: ${category}`;
}
if (crossType.startsWith('cross-')) {
const parts = crossType.replace('cross-', '').split('-');
if (parts.length >= 2) {
return `Cross: ${parts[0]} × ${parts.slice(1).join('-')}`;
}
}
return crossType;
};
const renderListView = () => {
const crossTypes = Object.keys(pairsByType);
if (crossTypes.length === 0) {
return <Empty description="No crossover pairs generated" />;
}
const collapseItems = crossTypes.map(type => {
const typePairs = pairsByType[type];
const stats = crossTypeStats[type];
const label = getCrossTypeLabel(type);
return {
key: type,
label: (
<div style={{ display: 'flex', alignItems: 'center', gap: 8 }}>
<Checkbox
checked={stats.selected === stats.total}
indeterminate={stats.selected > 0 && stats.selected < stats.total}
onClick={(e) => e.stopPropagation()}
onChange={(e) => selectPairsByType(type, e.target.checked)}
/>
<Text>{label}</Text>
<Badge
count={`${stats.selected}/${stats.total}`}
style={{
backgroundColor: stats.selected > 0 ? '#52c41a' : '#d9d9d9',
}}
/>
</div>
),
children: (
<div style={{
display: 'grid',
gridTemplateColumns: 'repeat(auto-fill, minmax(280px, 1fr))',
gap: 8,
}}>
{typePairs.map(pair => (
<CrossoverCard
key={pair.id}
pair={pair}
onToggle={togglePairSelection}
isDark={isDark}
/>
))}
</div>
),
};
});
return (
<Collapse
items={collapseItems}
defaultActiveKey={crossTypes.filter(t => t.startsWith('same-'))}
/>
);
};
const renderMatrixView = () => {
return (
<CrossoverMatrix
dagA={pathAResult}
dagB={pathBResult}
pairs={pairs}
onTogglePair={togglePairSelection}
isDark={isDark}
/>
);
};
return (
<div style={{ height: '100%', display: 'flex', flexDirection: 'column' }}>
{/* Statistics Header */}
<Card size="small" style={{ marginBottom: 16 }}>
<Row gutter={16}>
<Col span={6}>
<Statistic
title="Total Pairs"
value={pairs.length}
prefix={<SwapOutlined />}
/>
</Col>
<Col span={6}>
<Statistic
title="Selected"
value={selectedPairs.length}
prefix={<CheckCircleOutlined />}
valueStyle={{ color: '#52c41a' }}
/>
</Col>
<Col span={6}>
<Statistic
title="Path A Attrs"
value={pathAResult.nodes.length}
/>
</Col>
<Col span={6}>
<Statistic
title="Path B Attrs"
value={pathBResult.nodes.length}
/>
</Col>
</Row>
</Card>
{/* Selection Preview */}
<CrossoverPreview
selectedPairs={selectedPairs}
dagA={pathAResult}
dagB={pathBResult}
isDark={isDark}
/>
{/* Expert Mode Selection */}
<Card size="small" style={{ marginBottom: 16 }}>
<Space direction="vertical" style={{ width: '100%' }}>
<Text strong>Expert Team Mode</Text>
<Radio.Group
value={expertMode}
onChange={(e) => onExpertModeChange(e.target.value)}
buttonStyle="solid"
>
<Radio.Button value="shared">
Shared Experts
</Radio.Button>
<Radio.Button value="independent">
Independent Experts
</Radio.Button>
</Radio.Group>
<Text type="secondary" style={{ fontSize: 12 }}>
{expertMode === 'shared'
? 'Both paths use the same expert team for crossover transformation'
: 'Each path uses its own expert team, combined for crossover'}
</Text>
</Space>
</Card>
{/* Actions */}
<div style={{ marginBottom: 16, display: 'flex', gap: 8 }}>
<Button
icon={<CheckCircleOutlined />}
onClick={() => selectAll(true)}
>
Select All
</Button>
<Button
onClick={() => selectAll(false)}
>
Deselect All
</Button>
<Button
icon={<ReloadOutlined />}
onClick={() => applyPairs(pathAResult, pathBResult)}
>
Regenerate
</Button>
<div style={{ flex: 1 }} />
<Radio.Group
value={viewMode}
onChange={(e) => setViewMode(e.target.value)}
buttonStyle="solid"
size="small"
>
<Radio.Button value="list">
<UnorderedListOutlined /> List
</Radio.Button>
<Radio.Button value="matrix">
<TableOutlined /> Matrix
</Radio.Button>
</Radio.Group>
</div>
{/* Content */}
<div style={{ flex: 1, overflow: 'auto' }}>
{viewMode === 'list' ? renderListView() : renderMatrixView()}
</div>
</div>
);
}

View File

@@ -26,6 +26,7 @@ import type {
ExpertTransformationDAGResult, ExpertTransformationDAGResult,
ExpertTransformationDescription, ExpertTransformationDescription,
DeduplicationMethod, DeduplicationMethod,
PromptLanguage,
} from '../types'; } from '../types';
const { Title, Text } = Typography; const { Title, Text } = Typography;
@@ -37,6 +38,7 @@ interface DeduplicationPanelProps {
onThresholdChange: (value: number) => void; onThresholdChange: (value: number) => void;
method: DeduplicationMethod; method: DeduplicationMethod;
onMethodChange?: (method: DeduplicationMethod) => void; // Optional, handled in App.tsx sidebar onMethodChange?: (method: DeduplicationMethod) => void; // Optional, handled in App.tsx sidebar
lang?: PromptLanguage;
} }
/** /**
@@ -48,6 +50,7 @@ export const DeduplicationPanel: React.FC<DeduplicationPanelProps> = ({
threshold, threshold,
onThresholdChange, onThresholdChange,
method, method,
lang = 'zh',
// onMethodChange is handled in App.tsx sidebar // onMethodChange is handled in App.tsx sidebar
}) => { }) => {
const { loading, result, error, progress, deduplicate, clearResult } = useDeduplication(); const { loading, result, error, progress, deduplicate, clearResult } = useDeduplication();
@@ -70,7 +73,7 @@ export const DeduplicationPanel: React.FC<DeduplicationPanelProps> = ({
const handleDeduplicate = () => { const handleDeduplicate = () => {
if (allDescriptions.length > 0) { if (allDescriptions.length > 0) {
deduplicate(allDescriptions, threshold, method); deduplicate(allDescriptions, threshold, method, lang);
} }
}; };

View File

@@ -0,0 +1,312 @@
import { useState, useEffect } from 'react';
import {
Input,
Button,
Select,
Typography,
Space,
message,
Slider,
Collapse,
Progress,
Card,
Alert,
Tag,
Divider,
} from 'antd';
import {
SearchOutlined,
LoadingOutlined,
SwapOutlined,
} from '@ant-design/icons';
import type { CategoryMode, DAGProgress, PromptLanguage } from '../types';
import { getModels } from '../services/api';
import { CategorySelector } from './CategorySelector';
const { TextArea } = Input;
const { Text } = Typography;
interface DualPathInputPanelProps {
onAnalyze: (queryA: string, queryB: string, options?: {
model?: string;
temperature?: number;
chainCount?: number;
categoryMode?: CategoryMode;
customCategories?: string[];
suggestedCategoryCount?: number;
lang?: PromptLanguage;
}) => Promise<void>;
loadingA: boolean;
loadingB: boolean;
progressA: DAGProgress;
progressB: DAGProgress;
availableModels?: string[];
lang?: PromptLanguage;
}
export function DualPathInputPanel({
onAnalyze,
loadingA,
loadingB,
progressA,
progressB,
availableModels: propModels,
lang = 'zh',
}: DualPathInputPanelProps) {
const [queryA, setQueryA] = useState('');
const [queryB, setQueryB] = useState('');
const [models, setModels] = useState<string[]>(propModels || []);
const [selectedModel, setSelectedModel] = useState<string | undefined>();
const [loadingModels, setLoadingModels] = useState(false);
const [temperature, setTemperature] = useState(0.7);
const [chainCount, setChainCount] = useState(5);
// Category settings
const [categoryMode, setCategoryMode] = useState<CategoryMode>('dynamic_auto' as CategoryMode);
const [customCategories, setCustomCategories] = useState<string[]>([]);
const [suggestedCategoryCount, setSuggestedCategoryCount] = useState(3);
const isLoading = loadingA || loadingB;
useEffect(() => {
if (propModels && propModels.length > 0) {
setModels(propModels);
if (!selectedModel) {
const defaultModel = propModels.find((m) => m.includes('qwen3')) || propModels[0];
setSelectedModel(defaultModel);
}
return;
}
async function fetchModels() {
setLoadingModels(true);
try {
const response = await getModels();
setModels(response.models);
if (response.models.length > 0 && !selectedModel) {
const defaultModel = response.models.find((m) => m.includes('qwen3')) || response.models[0];
setSelectedModel(defaultModel);
}
} catch {
message.error('Failed to fetch models');
} finally {
setLoadingModels(false);
}
}
fetchModels();
}, [propModels]);
const handleAnalyze = async () => {
if (!queryA.trim() || !queryB.trim()) {
message.warning(lang === 'zh' ? '請輸入兩個路徑的查詢內容' : 'Please enter queries for both paths');
return;
}
try {
await onAnalyze(queryA.trim(), queryB.trim(), {
model: selectedModel,
temperature,
chainCount,
categoryMode,
customCategories: customCategories.length > 0 ? customCategories : undefined,
suggestedCategoryCount,
lang,
});
} catch {
message.error(lang === 'zh' ? '分析失敗' : 'Analysis failed');
}
};
const handleSwapQueries = () => {
const temp = queryA;
setQueryA(queryB);
setQueryB(temp);
};
const renderProgressIndicator = (label: string, progress: DAGProgress, loading: boolean) => {
if (progress.step === 'idle' && !loading) return null;
if (progress.step === 'done') return null;
const percent = progress.step === 'step0'
? 15
: progress.step === 'step1'
? 50
: progress.step === 'relationships'
? 85
: 100;
return (
<div style={{ marginTop: 8 }}>
<Text type="secondary" style={{ fontSize: 12 }}>{label}: {progress.message}</Text>
<Progress
percent={Math.round(percent)}
size="small"
status={progress.step === 'error' ? 'exception' : 'active'}
strokeColor={{ from: '#108ee9', to: '#87d068' }}
/>
</div>
);
};
const collapseItems = [
{
key: 'categories',
label: 'Category Settings',
children: (
<CategorySelector
mode={categoryMode}
onModeChange={setCategoryMode}
customCategories={customCategories}
onCustomCategoriesChange={setCustomCategories}
suggestedCount={suggestedCategoryCount}
onSuggestedCountChange={setSuggestedCategoryCount}
disabled={isLoading}
/>
),
},
{
key: 'llm',
label: 'LLM Parameters',
children: (
<Space direction="vertical" style={{ width: '100%' }} size="middle">
<div>
<Text type="secondary" style={{ fontSize: 12 }}>Temperature: {temperature}</Text>
<Slider
min={0}
max={1}
step={0.1}
value={temperature}
onChange={setTemperature}
marks={{ 0: '0', 0.5: '0.5', 1: '1' }}
disabled={isLoading}
/>
</div>
<div>
<Text type="secondary" style={{ fontSize: 12 }}>Chain Count: {chainCount}</Text>
<Slider
min={1}
max={10}
step={1}
value={chainCount}
onChange={setChainCount}
marks={{ 1: '1', 5: '5', 10: '10' }}
disabled={isLoading}
/>
</div>
</Space>
),
},
];
return (
<div style={{
display: 'flex',
flexDirection: 'column',
height: '100%',
padding: 16,
gap: 16,
}}>
{/* Dual Path Input Card */}
<Card
size="small"
title={<Text strong>Dual Path Analysis</Text>}
styles={{ body: { padding: 12 } }}
>
<Space direction="vertical" style={{ width: '100%' }} size="middle">
{/* Model Selection */}
<Select
style={{ width: '100%' }}
value={selectedModel}
onChange={setSelectedModel}
loading={loadingModels}
placeholder="Select a model"
options={models.map((m) => ({ label: m, value: m }))}
size="middle"
disabled={isLoading}
/>
{/* Path A Input */}
<div>
<Tag color="blue" style={{ marginBottom: 4 }}>Path A</Tag>
<TextArea
value={queryA}
onChange={(e) => setQueryA(e.target.value)}
placeholder="Enter first object (e.g., umbrella)"
autoSize={{ minRows: 1, maxRows: 2 }}
disabled={isLoading}
/>
{renderProgressIndicator('Path A', progressA, loadingA)}
</div>
{/* Swap Button */}
<div style={{ textAlign: 'center' }}>
<Button
icon={<SwapOutlined rotate={90} />}
size="small"
onClick={handleSwapQueries}
disabled={isLoading}
>
Swap
</Button>
</div>
{/* Path B Input */}
<div>
<Tag color="green" style={{ marginBottom: 4 }}>Path B</Tag>
<TextArea
value={queryB}
onChange={(e) => setQueryB(e.target.value)}
placeholder="Enter second object (e.g., bicycle)"
autoSize={{ minRows: 1, maxRows: 2 }}
disabled={isLoading}
/>
{renderProgressIndicator('Path B', progressB, loadingB)}
</div>
{/* Analyze Button */}
<Button
type="primary"
icon={<SearchOutlined />}
onClick={handleAnalyze}
loading={isLoading}
block
size="large"
disabled={!queryA.trim() || !queryB.trim()}
>
{isLoading ? 'Analyzing...' : 'Analyze Both'}
</Button>
</Space>
</Card>
{/* Combined Progress Alert */}
{isLoading && (
<Alert
type="info"
icon={<LoadingOutlined spin />}
message="Parallel Analysis in Progress"
description={
<Space direction="vertical" style={{ width: '100%' }}>
<div>
<Tag color="blue">A</Tag> {progressA.message || 'Waiting...'}
</div>
<div>
<Tag color="green">B</Tag> {progressB.message || 'Waiting...'}
</div>
</Space>
}
showIcon
/>
)}
<Divider style={{ margin: '4px 0' }} />
{/* Settings Collapse */}
<Collapse
items={collapseItems}
defaultActiveKey={[]}
size="small"
style={{ background: 'transparent' }}
/>
</div>
);
}

View File

@@ -0,0 +1,191 @@
import { Empty, Spin, Tag, Typography } from 'antd';
import type { PathState } from '../types';
import { MindmapDAG } from './MindmapDAG';
const { Text } = Typography;
interface VisualSettings {
nodeSpacing: number;
fontSize: number;
}
interface DualPathMindmapPanelProps {
pathA: PathState;
pathB: PathState;
isDark: boolean;
visualSettings: VisualSettings;
}
interface SinglePathViewProps {
path: PathState;
label: string;
color: 'blue' | 'green';
isDark: boolean;
visualSettings: VisualSettings;
}
function SinglePathView({ path, label, color, isDark, visualSettings }: SinglePathViewProps) {
const { result, loading, error, query, progress } = path;
// Header with label
const headerStyle: React.CSSProperties = {
padding: '6px 12px',
background: isDark ? '#1f1f1f' : '#fafafa',
borderBottom: `1px solid ${isDark ? '#303030' : '#f0f0f0'}`,
display: 'flex',
alignItems: 'center',
gap: 8,
minHeight: 36,
};
const contentStyle: React.CSSProperties = {
flex: 1,
position: 'relative',
overflow: 'hidden',
};
const renderContent = () => {
if (loading) {
return (
<div style={{
display: 'flex',
flexDirection: 'column',
justifyContent: 'center',
alignItems: 'center',
height: '100%',
gap: 8,
}}>
<Spin size="large" />
<Text type="secondary">{progress.message || 'Analyzing...'}</Text>
</div>
);
}
if (error) {
return (
<div style={{
display: 'flex',
justifyContent: 'center',
alignItems: 'center',
height: '100%',
}}>
<Empty description={error} />
</div>
);
}
if (!result) {
return (
<div style={{
display: 'flex',
justifyContent: 'center',
alignItems: 'center',
height: '100%',
}}>
<Empty description={`Enter a query for ${label}`} />
</div>
);
}
return (
<MindmapDAG
data={result}
isDark={isDark}
visualSettings={visualSettings}
/>
);
};
return (
<div style={{ display: 'flex', flexDirection: 'column', height: '100%' }}>
<div style={headerStyle}>
<Tag color={color}>{label}</Tag>
{result && (
<Text strong style={{ flex: 1 }}>
{result.query}
</Text>
)}
{!result && query && (
<Text type="secondary" style={{ flex: 1 }}>
{query}
</Text>
)}
{result && (
<Text type="secondary" style={{ fontSize: 12 }}>
{result.nodes.length} attributes
</Text>
)}
</div>
<div style={contentStyle}>
{renderContent()}
</div>
</div>
);
}
export function DualPathMindmapPanel({
pathA,
pathB,
isDark,
visualSettings,
}: DualPathMindmapPanelProps) {
const containerStyle: React.CSSProperties = {
display: 'flex',
flexDirection: 'column',
height: '100%',
gap: 2,
};
const pathContainerStyle: React.CSSProperties = {
flex: 1,
minHeight: 0,
borderRadius: 6,
overflow: 'hidden',
border: `1px solid ${isDark ? '#303030' : '#f0f0f0'}`,
};
const dividerStyle: React.CSSProperties = {
height: 4,
background: isDark ? '#303030' : '#f0f0f0',
cursor: 'row-resize',
display: 'flex',
alignItems: 'center',
justifyContent: 'center',
};
return (
<div style={containerStyle}>
{/* Path A - Top Half */}
<div style={pathContainerStyle}>
<SinglePathView
path={pathA}
label="Path A"
color="blue"
isDark={isDark}
visualSettings={visualSettings}
/>
</div>
{/* Divider */}
<div style={dividerStyle}>
<div style={{
width: 40,
height: 3,
borderRadius: 2,
background: isDark ? '#505050' : '#d0d0d0',
}} />
</div>
{/* Path B - Bottom Half */}
<div style={pathContainerStyle}>
<SinglePathView
path={pathB}
label="Path B"
color="green"
isDark={isDark}
visualSettings={visualSettings}
/>
</div>
</div>
);
}

Some files were not shown because too many files have changed in this diff Show More