feat: Add experiments framework and novelty-driven agent loop

- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 10:16:21 +08:00
parent 26a56a2a07
commit 43c025e060
81 changed files with 18766 additions and 2 deletions
--- a/experiments/assessment/README.md
+++ b/experiments/assessment/README.md
@@ -0,0 +1,314 @@
+# Human Assessment Web Interface
+
+A standalone web application for human assessment of generated ideas using Torrance-inspired creativity metrics.
+
+## Overview
+
+This tool enables blind evaluation of creative ideas generated by the novelty-seeking experiment. Raters assess ideas on four dimensions without knowing which experimental condition produced each idea, ensuring unbiased evaluation.
+
+## Quick Start
+
+```bash
+cd experiments/assessment
+
+# 1. Prepare assessment data (if not already done)
+python3 prepare_data.py
+
+# 2. Start the system
+./start.sh
+
+# 3. Open browser
+open http://localhost:5174
+```
+
+## Directory Structure
+
+```
+assessment/
+├── backend/
+│   ├── app.py           # FastAPI backend API
+│   ├── database.py      # SQLite database operations
+│   ├── models.py        # Pydantic models & dimension definitions
+│   └── requirements.txt # Python dependencies
+├── frontend/
+│   ├── src/
+│   │   ├── components/  # React UI components
+│   │   ├── hooks/       # React state management
+│   │   ├── services/    # API client
+│   │   └── types/       # TypeScript definitions
+│   └── package.json
+├── data/
+│   └── assessment_items.json  # Prepared ideas for rating
+├── results/
+│   └── ratings.db             # SQLite database with ratings
+├── prepare_data.py      # Data preparation script
+├── analyze_ratings.py   # Inter-rater reliability analysis
+├── start.sh             # Start both servers
+├── stop.sh              # Stop all services
+└── README.md            # This file
+```
+
+## Data Preparation
+
+### List Available Experiment Files
+
+```bash
+python3 prepare_data.py --list
+```
+
+Output:
+```
+Available experiment files (most recent first):
+  experiment_20260119_165650_deduped.json (1571.3 KB)
+  experiment_20260119_163040_deduped.json (156.4 KB)
+```
+
+### Prepare Assessment Data
+
+```bash
+# Use all ideas (not recommended for human assessment)
+python3 prepare_data.py
+
+# RECOMMENDED: Stratified sampling - 4 ideas per condition per query
+# Results in ~200 ideas (5 conditions × 4 ideas × 10 queries)
+python3 prepare_data.py --per-condition 4
+
+# Alternative: Sample 150 ideas total (proportionally across queries)
+python3 prepare_data.py --sample 150
+
+# Limit per query (20 ideas max per query)
+python3 prepare_data.py --per-query 20
+
+# Combined: 4 per condition, max 15 per query
+python3 prepare_data.py --per-condition 4 --per-query 15
+
+# Specify a different experiment file
+python3 prepare_data.py experiment_20260119_163040_deduped.json --per-condition 4
+```
+
+### Sampling Options
+
+| Option | Description | Example |
+|--------|-------------|---------|
+| `--per-condition N` | Max N ideas per condition per query (stratified) | `--per-condition 4` → ~200 ideas |
+| `--per-query N` | Max N ideas per query | `--per-query 20` |
+| `--sample N` | Total N ideas (proportionally distributed) | `--sample 150` |
+| `--seed N` | Random seed for reproducibility | `--seed 42` (default) |
+
+**Recommendation**: Use `--per-condition 4` for balanced assessment across conditions.
+
+The script:
+1. Loads the deduped experiment results
+2. Extracts all unique ideas with hidden metadata (condition, expert, keyword)
+3. Assigns stable IDs to each idea
+4. Shuffles ideas within each query (reproducible with seed=42)
+5. Outputs `data/assessment_items.json`
+
+## Assessment Dimensions
+
+Raters evaluate each idea on four dimensions using a 1-5 Likert scale:
+
+### Originality
+*How unexpected or surprising is this idea?*
+
+| Score | Description |
+|-------|-------------|
+| 1 | Very common/obvious idea anyone would suggest |
+| 2 | Somewhat common, slight variation on expected ideas |
+| 3 | Moderately original, some unexpected elements |
+| 4 | Quite original, notably different approach |
+| 5 | Highly unexpected, truly novel concept |
+
+### Elaboration
+*How detailed and well-developed is this idea?*
+
+| Score | Description |
+|-------|-------------|
+| 1 | Vague, minimal detail, just a concept |
+| 2 | Basic idea with little specificity |
+| 3 | Moderately detailed, some specifics provided |
+| 4 | Well-developed with clear implementation hints |
+| 5 | Highly specific, thoroughly developed concept |
+
+### Coherence
+*Does this idea make logical sense and relate to the query object?*
+
+| Score | Description |
+|-------|-------------|
+| 1 | Nonsensical, irrelevant, or incomprehensible |
+| 2 | Mostly unclear, weak connection to query |
+| 3 | Partially coherent, some logical gaps |
+| 4 | Mostly coherent with minor issues |
+| 5 | Fully coherent, clearly relates to query |
+
+### Usefulness
+*Could this idea have practical value or inspire real innovation?*
+
+| Score | Description |
+|-------|-------------|
+| 1 | No practical value whatsoever |
+| 2 | Minimal usefulness, highly impractical |
+| 3 | Some potential value with major limitations |
+| 4 | Useful idea with realistic applications |
+| 5 | Highly useful, clear practical value |
+
+## Running the System
+
+### Start
+
+```bash
+./start.sh
+```
+
+This will:
+1. Check for `data/assessment_items.json` (runs `prepare_data.py` if missing)
+2. Install frontend dependencies if needed
+3. Start backend API on port 8002
+4. Start frontend dev server on port 5174
+
+### Stop
+
+```bash
+./stop.sh
+```
+
+Or press `Ctrl+C` in the terminal running `start.sh`.
+
+### Manual Start (Development)
+
+```bash
+# Terminal 1: Backend
+cd backend
+../../../backend/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8002 --reload
+
+# Terminal 2: Frontend
+cd frontend
+npm run dev
+```
+
+## API Endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/health` | GET | Health check |
+| `/api/info` | GET | Experiment info (total ideas, queries, conditions) |
+| `/api/dimensions` | GET | Dimension definitions for UI |
+| `/api/raters` | GET | List all raters |
+| `/api/raters` | POST | Register/login rater |
+| `/api/queries` | GET | List all queries |
+| `/api/queries/{id}` | GET | Get query with all ideas |
+| `/api/queries/{id}/unrated?rater_id=X` | GET | Get unrated ideas for rater |
+| `/api/ratings` | POST | Submit a rating |
+| `/api/progress/{rater_id}` | GET | Get rater's progress |
+| `/api/statistics` | GET | Overall statistics |
+| `/api/export` | GET | Export all ratings with metadata |
+
+## Analysis
+
+After collecting ratings from multiple raters:
+
+```bash
+python3 analyze_ratings.py
+```
+
+This calculates:
+- **Krippendorff's alpha**: Inter-rater reliability for ordinal data
+- **ICC(2,1)**: Intraclass Correlation Coefficient with 95% CI
+- **Mean ratings per condition**: Compare experimental conditions
+- **Kruskal-Wallis test**: Statistical significance between conditions
+
+Output is saved to `results/analysis_results.json`.
+
+## Database Schema
+
+SQLite database (`results/ratings.db`):
+
+```sql
+-- Raters
+CREATE TABLE raters (
+    rater_id TEXT PRIMARY KEY,
+    name TEXT,
+    created_at TIMESTAMP
+);
+
+-- Ratings
+CREATE TABLE ratings (
+    id INTEGER PRIMARY KEY,
+    rater_id TEXT,
+    idea_id TEXT,
+    query_id TEXT,
+    originality INTEGER CHECK(originality BETWEEN 1 AND 5),
+    elaboration INTEGER CHECK(elaboration BETWEEN 1 AND 5),
+    coherence INTEGER CHECK(coherence BETWEEN 1 AND 5),
+    usefulness INTEGER CHECK(usefulness BETWEEN 1 AND 5),
+    skipped INTEGER DEFAULT 0,
+    timestamp TIMESTAMP,
+    UNIQUE(rater_id, idea_id)
+);
+
+-- Progress tracking
+CREATE TABLE progress (
+    rater_id TEXT,
+    query_id TEXT,
+    completed_count INTEGER,
+    total_count INTEGER,
+    PRIMARY KEY (rater_id, query_id)
+);
+```
+
+## Blind Assessment Design
+
+To ensure unbiased evaluation:
+
+1. **Randomization**: Ideas are shuffled within each query using a fixed seed (42) for reproducibility
+2. **Hidden metadata**: Condition, expert name, and keywords are stored but not shown to raters
+3. **Consistent ordering**: All raters see the same randomized order
+4. **Context provided**: Only the query text is shown (e.g., "Chair", "Bicycle")
+
+## Workflow for Raters
+
+1. **Login**: Enter a unique rater ID
+2. **Instructions**: Read dimension definitions (shown before first rating)
+3. **Rate ideas**: For each idea:
+   - Read the idea text
+   - Rate all 4 dimensions (1-5)
+   - Click "Submit & Next" or "Skip"
+4. **Progress**: Track completion per query and overall
+5. **Completion**: Summary shown when all ideas are rated
+
+## Troubleshooting
+
+### Backend won't start
+```bash
+# Check if port 8002 is in use
+lsof -i :8002
+
+# Check backend logs
+cat /tmp/assessment_backend.log
+```
+
+### Frontend won't start
+```bash
+# Reinstall dependencies
+cd frontend
+rm -rf node_modules
+npm install
+```
+
+### Reset database
+```bash
+rm results/ratings.db
+# Database is auto-created on next backend start
+```
+
+### Regenerate assessment data
+```bash
+rm data/assessment_items.json
+python3 prepare_data.py
+```
+
+## Tech Stack
+
+- **Backend**: Python 3.11+, FastAPI, SQLite, Pydantic
+- **Frontend**: React 19, TypeScript, Vite, Ant Design 6.0
+- **Analysis**: NumPy, SciPy (for statistical tests)
--- a/experiments/assessment/analyze_ratings.py
+++ b/experiments/assessment/analyze_ratings.py
@@ -0,0 +1,356 @@
+#!/usr/bin/env python3
+"""
+Analyze assessment ratings for inter-rater reliability and condition comparisons.
+
+This script:
+1. Loads ratings from the SQLite database
+2. Joins with hidden metadata (condition, expert)
+3. Calculates inter-rater reliability metrics
+4. Computes mean ratings per dimension per condition
+5. Performs statistical comparisons between conditions
+"""
+
+import json
+import sqlite3
+from collections import defaultdict
+from datetime import datetime
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+from scipy import stats
+
+
+# Paths
+RESULTS_DIR = Path(__file__).parent / 'results'
+DATA_DIR = Path(__file__).parent / 'data'
+DB_PATH = RESULTS_DIR / 'ratings.db'
+ASSESSMENT_DATA_PATH = DATA_DIR / 'assessment_items.json'
+
+
+def load_assessment_data() -> dict[str, Any]:
+    """Load the assessment items data with hidden metadata."""
+    with open(ASSESSMENT_DATA_PATH, 'r', encoding='utf-8') as f:
+        return json.load(f)
+
+
+def load_ratings_from_db() -> list[dict[str, Any]]:
+    """Load all ratings from the SQLite database."""
+    if not DB_PATH.exists():
+        print(f"Database not found at {DB_PATH}")
+        return []
+
+    conn = sqlite3.connect(DB_PATH)
+    conn.row_factory = sqlite3.Row
+    cursor = conn.cursor()
+
+    cursor.execute('''
+        SELECT r.*, rat.name as rater_name
+        FROM ratings r
+        LEFT JOIN raters rat ON r.rater_id = rat.rater_id
+        WHERE r.skipped = 0
+    ''')
+
+    ratings = [dict(row) for row in cursor.fetchall()]
+    conn.close()
+
+    return ratings
+
+
+def build_idea_lookup(assessment_data: dict[str, Any]) -> dict[str, dict[str, Any]]:
+    """Build a lookup table from idea_id to metadata."""
+    lookup = {}
+    for query in assessment_data['queries']:
+        for idea in query['ideas']:
+            lookup[idea['idea_id']] = {
+                'text': idea['text'],
+                'query_id': query['query_id'],
+                'query_text': query['query_text'],
+                **idea['_hidden']
+            }
+    return lookup
+
+
+def calculate_krippendorff_alpha(ratings_matrix: np.ndarray) -> float:
+    """
+    Calculate Krippendorff's alpha for ordinal data.
+
+    Args:
+        ratings_matrix: 2D array where rows are items and columns are raters.
+                       NaN values indicate missing ratings.
+
+    Returns:
+        Krippendorff's alpha coefficient
+    """
+    # Remove items with fewer than 2 raters
+    valid_items = ~np.all(np.isnan(ratings_matrix), axis=1)
+    ratings_matrix = ratings_matrix[valid_items]
+
+    if ratings_matrix.shape[0] < 2:
+        return np.nan
+
+    n_items, n_raters = ratings_matrix.shape
+
+    # Observed disagreement
+    observed_disagreement = 0
+    n_pairs = 0
+
+    for i in range(n_items):
+        values = ratings_matrix[i, ~np.isnan(ratings_matrix[i])]
+        if len(values) < 2:
+            continue
+        # Ordinal distance: squared difference
+        for j in range(len(values)):
+            for k in range(j + 1, len(values)):
+                observed_disagreement += (values[j] - values[k]) ** 2
+                n_pairs += 1
+
+    if n_pairs == 0:
+        return np.nan
+
+    observed_disagreement /= n_pairs
+
+    # Expected disagreement (based on marginal distribution)
+    all_values = ratings_matrix[~np.isnan(ratings_matrix)]
+    if len(all_values) < 2:
+        return np.nan
+
+    expected_disagreement = 0
+    n_total_pairs = 0
+    for i in range(len(all_values)):
+        for j in range(i + 1, len(all_values)):
+            expected_disagreement += (all_values[i] - all_values[j]) ** 2
+            n_total_pairs += 1
+
+    if n_total_pairs == 0:
+        return np.nan
+
+    expected_disagreement /= n_total_pairs
+
+    if expected_disagreement == 0:
+        return 1.0
+
+    alpha = 1 - (observed_disagreement / expected_disagreement)
+    return alpha
+
+
+def calculate_icc(ratings_matrix: np.ndarray) -> tuple[float, float, float]:
+    """
+    Calculate Intraclass Correlation Coefficient (ICC(2,1)).
+
+    Args:
+        ratings_matrix: 2D array where rows are items and columns are raters.
+
+    Returns:
+        Tuple of (ICC, lower_bound, upper_bound)
+    """
+    # Remove rows with any NaN
+    valid_rows = ~np.any(np.isnan(ratings_matrix), axis=1)
+    ratings_matrix = ratings_matrix[valid_rows]
+
+    if ratings_matrix.shape[0] < 2 or ratings_matrix.shape[1] < 2:
+        return np.nan, np.nan, np.nan
+
+    n, k = ratings_matrix.shape
+
+    # Grand mean
+    grand_mean = np.mean(ratings_matrix)
+
+    # Row means (item means)
+    row_means = np.mean(ratings_matrix, axis=1)
+
+    # Column means (rater means)
+    col_means = np.mean(ratings_matrix, axis=0)
+
+    # Sum of squares
+    ss_total = np.sum((ratings_matrix - grand_mean) ** 2)
+    ss_rows = k * np.sum((row_means - grand_mean) ** 2)
+    ss_cols = n * np.sum((col_means - grand_mean) ** 2)
+    ss_error = ss_total - ss_rows - ss_cols
+
+    # Mean squares
+    ms_rows = ss_rows / (n - 1) if n > 1 else 0
+    ms_cols = ss_cols / (k - 1) if k > 1 else 0
+    ms_error = ss_error / ((n - 1) * (k - 1)) if (n > 1 and k > 1) else 0
+
+    # ICC(2,1) - two-way random, absolute agreement, single rater
+    if ms_error + (ms_cols - ms_error) / n == 0:
+        return np.nan, np.nan, np.nan
+
+    icc = (ms_rows - ms_error) / (ms_rows + (k - 1) * ms_error + k * (ms_cols - ms_error) / n)
+
+    # Confidence interval (approximate)
+    # Using F distribution
+    df1 = n - 1
+    df2 = (n - 1) * (k - 1)
+
+    if ms_error == 0:
+        return icc, np.nan, np.nan
+
+    f_value = ms_rows / ms_error
+    f_lower = f_value / stats.f.ppf(0.975, df1, df2)
+    f_upper = f_value / stats.f.ppf(0.025, df1, df2)
+
+    icc_lower = (f_lower - 1) / (f_lower + k - 1)
+    icc_upper = (f_upper - 1) / (f_upper + k - 1)
+
+    return icc, icc_lower, icc_upper
+
+
+def analyze_ratings():
+    """Main analysis function."""
+    print("=" * 60)
+    print("CREATIVE IDEA ASSESSMENT ANALYSIS")
+    print("=" * 60)
+    print()
+
+    # Load data
+    assessment_data = load_assessment_data()
+    ratings = load_ratings_from_db()
+    idea_lookup = build_idea_lookup(assessment_data)
+
+    if not ratings:
+        print("No ratings found in database.")
+        return
+
+    print(f"Loaded {len(ratings)} ratings from database")
+    print(f"Experiment ID: {assessment_data['experiment_id']}")
+    print()
+
+    # Get unique raters
+    raters = list(set(r['rater_id'] for r in ratings))
+    print(f"Raters: {raters}")
+    print()
+
+    # Join ratings with metadata
+    enriched_ratings = []
+    for r in ratings:
+        idea_meta = idea_lookup.get(r['idea_id'], {})
+        enriched_ratings.append({
+            **r,
+            'condition': idea_meta.get('condition', 'unknown'),
+            'expert_name': idea_meta.get('expert_name', ''),
+            'keyword': idea_meta.get('keyword', ''),
+            'query_text': idea_meta.get('query_text', ''),
+            'idea_text': idea_meta.get('text', '')
+        })
+
+    # Dimensions
+    dimensions = ['originality', 'elaboration', 'coherence', 'usefulness']
+
+    # ================================
+    # Inter-rater reliability
+    # ================================
+    print("-" * 60)
+    print("INTER-RATER RELIABILITY")
+    print("-" * 60)
+    print()
+
+    if len(raters) >= 2:
+        # Build ratings matrix per dimension
+        idea_ids = list(set(r['idea_id'] for r in enriched_ratings))
+
+        for dim in dimensions:
+            # Create matrix: rows = ideas, cols = raters
+            matrix = np.full((len(idea_ids), len(raters)), np.nan)
+            idea_to_idx = {idea: idx for idx, idea in enumerate(idea_ids)}
+            rater_to_idx = {rater: idx for idx, rater in enumerate(raters)}
+
+            for r in enriched_ratings:
+                if r[dim] is not None:
+                    i = idea_to_idx[r['idea_id']]
+                    j = rater_to_idx[r['rater_id']]
+                    matrix[i, j] = r[dim]
+
+            # Calculate metrics
+            alpha = calculate_krippendorff_alpha(matrix)
+            icc, icc_low, icc_high = calculate_icc(matrix)
+
+            print(f"{dim.upper()}:")
+            print(f"  Krippendorff's alpha: {alpha:.3f}")
+            print(f"  ICC(2,1): {icc:.3f} (95% CI: {icc_low:.3f} - {icc_high:.3f})")
+            print()
+    else:
+        print("Need at least 2 raters for inter-rater reliability analysis.")
+        print()
+
+    # ================================
+    # Condition comparisons
+    # ================================
+    print("-" * 60)
+    print("MEAN RATINGS BY CONDITION")
+    print("-" * 60)
+    print()
+
+    # Group ratings by condition
+    condition_ratings: dict[str, dict[str, list[int]]] = defaultdict(lambda: defaultdict(list))
+
+    for r in enriched_ratings:
+        condition = r['condition']
+        for dim in dimensions:
+            if r[dim] is not None:
+                condition_ratings[condition][dim].append(r[dim])
+
+    # Calculate means and print
+    condition_stats = {}
+    for condition in sorted(condition_ratings.keys()):
+        print(f"\n{condition}:")
+        condition_stats[condition] = {}
+        for dim in dimensions:
+            values = condition_ratings[condition][dim]
+            if values:
+                mean = np.mean(values)
+                std = np.std(values)
+                n = len(values)
+                condition_stats[condition][dim] = {'mean': mean, 'std': std, 'n': n}
+                print(f"  {dim}: {mean:.2f} (SD={std:.2f}, n={n})")
+            else:
+                print(f"  {dim}: no data")
+
+    # ================================
+    # Statistical comparisons
+    # ================================
+    print()
+    print("-" * 60)
+    print("STATISTICAL COMPARISONS (Kruskal-Wallis)")
+    print("-" * 60)
+    print()
+
+    conditions = sorted(condition_ratings.keys())
+    if len(conditions) >= 2:
+        for dim in dimensions:
+            groups = [condition_ratings[c][dim] for c in conditions if condition_ratings[c][dim]]
+            if len(groups) >= 2:
+                h_stat, p_value = stats.kruskal(*groups)
+                sig = "*" if p_value < 0.05 else ""
+                print(f"{dim}: H={h_stat:.2f}, p={p_value:.4f} {sig}")
+            else:
+                print(f"{dim}: insufficient data for comparison")
+    else:
+        print("Need at least 2 conditions with data for statistical comparison.")
+
+    # ================================
+    # Export results
+    # ================================
+    output = {
+        'analysis_timestamp': datetime.utcnow().isoformat(),
+        'experiment_id': assessment_data['experiment_id'],
+        'total_ratings': len(ratings),
+        'raters': raters,
+        'rater_count': len(raters),
+        'condition_stats': condition_stats,
+        'enriched_ratings': enriched_ratings
+    }
+
+    output_path = RESULTS_DIR / 'analysis_results.json'
+    with open(output_path, 'w', encoding='utf-8') as f:
+        json.dump(output, f, ensure_ascii=False, indent=2, default=str)
+
+    print()
+    print("-" * 60)
+    print(f"Results exported to: {output_path}")
+    print("=" * 60)
+
+
+if __name__ == '__main__':
+    analyze_ratings()
--- a/experiments/assessment/backend/init.py
+++ b/experiments/assessment/backend/init.py
@@ -0,0 +1 @@
+"""Assessment backend package."""
--- a/experiments/assessment/backend/app.py
+++ b/experiments/assessment/backend/app.py
@@ -0,0 +1,374 @@
+"""
+FastAPI backend for human assessment of creative ideas.
+"""
+
+import json
+from datetime import datetime
+from pathlib import Path
+from typing import Any
+
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+
+try:
+    from . import database as db
+    from .models import (
+        DIMENSION_DEFINITIONS,
+        ExportData,
+        ExportRating,
+        IdeaForRating,
+        Progress,
+        QueryInfo,
+        QueryWithIdeas,
+        Rater,
+        RaterCreate,
+        RaterProgress,
+        Rating,
+        RatingSubmit,
+        Statistics,
+    )
+except ImportError:
+    import database as db
+    from models import (
+        DIMENSION_DEFINITIONS,
+        ExportData,
+        ExportRating,
+        IdeaForRating,
+        Progress,
+        QueryInfo,
+        QueryWithIdeas,
+        Rater,
+        RaterCreate,
+        RaterProgress,
+        Rating,
+        RatingSubmit,
+        Statistics,
+    )
+
+
+# Load assessment data
+DATA_PATH = Path(__file__).parent.parent / 'data' / 'assessment_items.json'
+
+
+def load_assessment_data() -> dict[str, Any]:
+    """Load the assessment items data."""
+    if not DATA_PATH.exists():
+        raise RuntimeError(f"Assessment data not found at {DATA_PATH}. Run prepare_data.py first.")
+    with open(DATA_PATH, 'r', encoding='utf-8') as f:
+        return json.load(f)
+
+
+# Initialize FastAPI app
+app = FastAPI(
+    title="Creative Idea Assessment API",
+    description="API for human assessment of creative ideas using Torrance-inspired metrics",
+    version="1.0.0"
+)
+
+# CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+# Cache for assessment data
+_assessment_data: dict[str, Any] | None = None
+
+
+def get_assessment_data() -> dict[str, Any]:
+    """Get cached assessment data."""
+    global _assessment_data
+    if _assessment_data is None:
+        _assessment_data = load_assessment_data()
+    return _assessment_data
+
+
+# Rater endpoints
+@app.get("/api/raters", response_model=list[Rater])
+def list_raters() -> list[dict[str, Any]]:
+    """List all registered raters."""
+    return db.list_raters()
+
+
+@app.post("/api/raters", response_model=Rater)
+def create_or_get_rater(rater_data: RaterCreate) -> dict[str, Any]:
+    """Register a new rater or get existing one."""
+    return db.create_rater(rater_data.rater_id, rater_data.name)
+
+
+@app.get("/api/raters/{rater_id}", response_model=Rater)
+def get_rater(rater_id: str) -> dict[str, Any]:
+    """Get a specific rater."""
+    rater = db.get_rater(rater_id)
+    if not rater:
+        raise HTTPException(status_code=404, detail="Rater not found")
+    return rater
+
+
+# Query endpoints
+@app.get("/api/queries", response_model=list[QueryInfo])
+def list_queries() -> list[dict[str, Any]]:
+    """List all queries available for assessment."""
+    data = get_assessment_data()
+    return [
+        {
+            'query_id': q['query_id'],
+            'query_text': q['query_text'],
+            'category': q.get('category', ''),
+            'idea_count': q['idea_count']
+        }
+        for q in data['queries']
+    ]
+
+
+@app.get("/api/queries/{query_id}", response_model=QueryWithIdeas)
+def get_query_with_ideas(query_id: str) -> dict[str, Any]:
+    """Get a query with all its ideas for rating (without hidden metadata)."""
+    data = get_assessment_data()
+
+    for query in data['queries']:
+        if query['query_id'] == query_id:
+            ideas = [
+                IdeaForRating(
+                    idea_id=idea['idea_id'],
+                    text=idea['text'],
+                    index=idx
+                )
+                for idx, idea in enumerate(query['ideas'])
+            ]
+            return QueryWithIdeas(
+                query_id=query['query_id'],
+                query_text=query['query_text'],
+                category=query.get('category', ''),
+                ideas=ideas,
+                total_count=len(ideas)
+            )
+
+    raise HTTPException(status_code=404, detail="Query not found")
+
+
+@app.get("/api/queries/{query_id}/unrated", response_model=QueryWithIdeas)
+def get_unrated_ideas(query_id: str, rater_id: str) -> dict[str, Any]:
+    """Get unrated ideas for a query by a specific rater."""
+    data = get_assessment_data()
+
+    for query in data['queries']:
+        if query['query_id'] == query_id:
+            # Get already rated idea IDs
+            rated_ids = db.get_rated_idea_ids(rater_id, query_id)
+
+            # Filter to unrated ideas
+            unrated_ideas = [
+                IdeaForRating(
+                    idea_id=idea['idea_id'],
+                    text=idea['text'],
+                    index=idx
+                )
+                for idx, idea in enumerate(query['ideas'])
+                if idea['idea_id'] not in rated_ids
+            ]
+
+            return QueryWithIdeas(
+                query_id=query['query_id'],
+                query_text=query['query_text'],
+                category=query.get('category', ''),
+                ideas=unrated_ideas,
+                total_count=query['idea_count']
+            )
+
+    raise HTTPException(status_code=404, detail="Query not found")
+
+
+# Rating endpoints
+@app.post("/api/ratings", response_model=dict[str, Any])
+def submit_rating(rating: RatingSubmit) -> dict[str, Any]:
+    """Submit a rating for an idea."""
+    # Validate that rater exists
+    rater = db.get_rater(rating.rater_id)
+    if not rater:
+        raise HTTPException(status_code=404, detail="Rater not found. Please register first.")
+
+    # Validate idea exists
+    data = get_assessment_data()
+    idea_found = False
+    for query in data['queries']:
+        for idea in query['ideas']:
+            if idea['idea_id'] == rating.idea_id:
+                idea_found = True
+                break
+        if idea_found:
+            break
+
+    if not idea_found:
+        raise HTTPException(status_code=404, detail="Idea not found")
+
+    # If not skipped, require all ratings
+    if not rating.skipped:
+        if rating.originality is None or rating.elaboration is None or rating.coherence is None or rating.usefulness is None:
+            raise HTTPException(
+                status_code=400,
+                detail="All dimensions must be rated unless skipping"
+            )
+
+    # Save rating
+    return db.save_rating(
+        rater_id=rating.rater_id,
+        idea_id=rating.idea_id,
+        query_id=rating.query_id,
+        originality=rating.originality,
+        elaboration=rating.elaboration,
+        coherence=rating.coherence,
+        usefulness=rating.usefulness,
+        skipped=rating.skipped
+    )
+
+
+@app.get("/api/ratings/{rater_id}/{idea_id}", response_model=Rating | None)
+def get_rating(rater_id: str, idea_id: str) -> dict[str, Any] | None:
+    """Get a specific rating."""
+    return db.get_rating(rater_id, idea_id)
+
+
+@app.get("/api/ratings/rater/{rater_id}", response_model=list[Rating])
+def get_ratings_by_rater(rater_id: str) -> list[dict[str, Any]]:
+    """Get all ratings by a rater."""
+    return db.get_ratings_by_rater(rater_id)
+
+
+# Progress endpoints
+@app.get("/api/progress/{rater_id}", response_model=RaterProgress)
+def get_rater_progress(rater_id: str) -> RaterProgress:
+    """Get complete progress for a rater."""
+    rater = db.get_rater(rater_id)
+    if not rater:
+        raise HTTPException(status_code=404, detail="Rater not found")
+
+    data = get_assessment_data()
+
+    # Get rated idea counts per query
+    ratings = db.get_ratings_by_rater(rater_id)
+    ratings_per_query: dict[str, int] = {}
+    for r in ratings:
+        qid = r['query_id']
+        ratings_per_query[qid] = ratings_per_query.get(qid, 0) + 1
+
+    # Build progress list
+    query_progress = []
+    total_completed = 0
+    total_ideas = 0
+
+    for query in data['queries']:
+        qid = query['query_id']
+        completed = ratings_per_query.get(qid, 0)
+        total = query['idea_count']
+
+        query_progress.append(Progress(
+            rater_id=rater_id,
+            query_id=qid,
+            completed_count=completed,
+            total_count=total
+        ))
+
+        total_completed += completed
+        total_ideas += total
+
+    percentage = (total_completed / total_ideas * 100) if total_ideas > 0 else 0
+
+    return RaterProgress(
+        rater_id=rater_id,
+        queries=query_progress,
+        total_completed=total_completed,
+        total_ideas=total_ideas,
+        percentage=round(percentage, 1)
+    )
+
+
+# Statistics endpoint
+@app.get("/api/statistics", response_model=Statistics)
+def get_statistics() -> Statistics:
+    """Get overall assessment statistics."""
+    stats = db.get_statistics()
+    return Statistics(**stats)
+
+
+# Dimension definitions endpoint
+@app.get("/api/dimensions")
+def get_dimensions() -> dict[str, Any]:
+    """Get dimension definitions for the UI."""
+    return DIMENSION_DEFINITIONS
+
+
+# Export endpoint
+@app.get("/api/export", response_model=ExportData)
+def export_ratings() -> ExportData:
+    """Export all ratings with hidden metadata for analysis."""
+    data = get_assessment_data()
+    all_ratings = db.get_all_ratings()
+
+    # Build idea lookup with hidden metadata
+    idea_lookup: dict[str, dict[str, Any]] = {}
+    query_lookup: dict[str, str] = {}
+
+    for query in data['queries']:
+        query_lookup[query['query_id']] = query['query_text']
+        for idea in query['ideas']:
+            idea_lookup[idea['idea_id']] = {
+                'text': idea['text'],
+                'condition': idea['_hidden']['condition'],
+                'expert_name': idea['_hidden']['expert_name'],
+                'keyword': idea['_hidden']['keyword']
+            }
+
+    # Build export ratings
+    export_ratings = []
+    for r in all_ratings:
+        idea_data = idea_lookup.get(r['idea_id'], {})
+        export_ratings.append(ExportRating(
+            rater_id=r['rater_id'],
+            idea_id=r['idea_id'],
+            query_id=r['query_id'],
+            query_text=query_lookup.get(r['query_id'], ''),
+            idea_text=idea_data.get('text', ''),
+            originality=r['originality'],
+            elaboration=r['elaboration'],
+            coherence=r['coherence'],
+            usefulness=r['usefulness'],
+            skipped=bool(r['skipped']),
+            condition=idea_data.get('condition', ''),
+            expert_name=idea_data.get('expert_name', ''),
+            keyword=idea_data.get('keyword', ''),
+            timestamp=r['timestamp']
+        ))
+
+    return ExportData(
+        experiment_id=data['experiment_id'],
+        export_timestamp=datetime.utcnow(),
+        rater_count=len(db.list_raters()),
+        rating_count=len(export_ratings),
+        ratings=export_ratings
+    )
+
+
+# Health check
+@app.get("/api/health")
+def health_check() -> dict[str, str]:
+    """Health check endpoint."""
+    return {"status": "healthy"}
+
+
+# Info endpoint
+@app.get("/api/info")
+def get_info() -> dict[str, Any]:
+    """Get assessment session info."""
+    data = get_assessment_data()
+    return {
+        'experiment_id': data['experiment_id'],
+        'total_ideas': data['total_ideas'],
+        'query_count': data['query_count'],
+        'conditions': data['conditions'],
+        'randomization_seed': data['randomization_seed']
+    }
--- a/experiments/assessment/backend/database.py
+++ b/experiments/assessment/backend/database.py
@@ -0,0 +1,309 @@
+"""
+SQLite database setup and operations for assessment ratings storage.
+"""
+
+import sqlite3
+from contextlib import contextmanager
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Generator
+
+
+# Database path
+DB_PATH = Path(__file__).parent.parent / 'results' / 'ratings.db'
+
+
+def get_db_path() -> Path:
+    """Get the database path, ensuring directory exists."""
+    DB_PATH.parent.mkdir(parents=True, exist_ok=True)
+    return DB_PATH
+
+
+@contextmanager
+def get_connection() -> Generator[sqlite3.Connection, None, None]:
+    """Get a database connection as a context manager."""
+    conn = sqlite3.connect(get_db_path())
+    conn.row_factory = sqlite3.Row
+    try:
+        yield conn
+    finally:
+        conn.close()
+
+
+def init_db() -> None:
+    """Initialize the database with required tables."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+
+        # Raters table
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS raters (
+                rater_id TEXT PRIMARY KEY,
+                name TEXT,
+                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+            )
+        ''')
+
+        # Ratings table
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS ratings (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                rater_id TEXT NOT NULL,
+                idea_id TEXT NOT NULL,
+                query_id TEXT NOT NULL,
+                originality INTEGER CHECK(originality BETWEEN 1 AND 5),
+                elaboration INTEGER CHECK(elaboration BETWEEN 1 AND 5),
+                coherence INTEGER CHECK(coherence BETWEEN 1 AND 5),
+                usefulness INTEGER CHECK(usefulness BETWEEN 1 AND 5),
+                skipped INTEGER DEFAULT 0,
+                timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                FOREIGN KEY (rater_id) REFERENCES raters(rater_id),
+                UNIQUE(rater_id, idea_id)
+            )
+        ''')
+
+        # Progress table
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS progress (
+                rater_id TEXT NOT NULL,
+                query_id TEXT NOT NULL,
+                completed_count INTEGER DEFAULT 0,
+                total_count INTEGER DEFAULT 0,
+                started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                PRIMARY KEY (rater_id, query_id),
+                FOREIGN KEY (rater_id) REFERENCES raters(rater_id)
+            )
+        ''')
+
+        # Create indexes for common queries
+        cursor.execute('''
+            CREATE INDEX IF NOT EXISTS idx_ratings_rater
+            ON ratings(rater_id)
+        ''')
+        cursor.execute('''
+            CREATE INDEX IF NOT EXISTS idx_ratings_idea
+            ON ratings(idea_id)
+        ''')
+
+        conn.commit()
+
+
+# Rater operations
+def create_rater(rater_id: str, name: str | None = None) -> dict[str, Any]:
+    """Create a new rater."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        try:
+            cursor.execute(
+                'INSERT INTO raters (rater_id, name) VALUES (?, ?)',
+                (rater_id, name or rater_id)
+            )
+            conn.commit()
+            return {'rater_id': rater_id, 'name': name or rater_id, 'created': True}
+        except sqlite3.IntegrityError:
+            # Rater already exists
+            return get_rater(rater_id)
+
+
+def get_rater(rater_id: str) -> dict[str, Any] | None:
+    """Get a rater by ID."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute('SELECT * FROM raters WHERE rater_id = ?', (rater_id,))
+        row = cursor.fetchone()
+        if row:
+            return dict(row)
+        return None
+
+
+def list_raters() -> list[dict[str, Any]]:
+    """List all raters."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute('SELECT * FROM raters ORDER BY created_at')
+        return [dict(row) for row in cursor.fetchall()]
+
+
+# Rating operations
+def save_rating(
+    rater_id: str,
+    idea_id: str,
+    query_id: str,
+    originality: int | None,
+    elaboration: int | None,
+    coherence: int | None,
+    usefulness: int | None,
+    skipped: bool = False
+) -> dict[str, Any]:
+    """Save or update a rating."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute('''
+            INSERT INTO ratings (rater_id, idea_id, query_id, originality, elaboration, coherence, usefulness, skipped, timestamp)
+            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
+            ON CONFLICT(rater_id, idea_id) DO UPDATE SET
+                originality = excluded.originality,
+                elaboration = excluded.elaboration,
+                coherence = excluded.coherence,
+                usefulness = excluded.usefulness,
+                skipped = excluded.skipped,
+                timestamp = excluded.timestamp
+        ''', (rater_id, idea_id, query_id, originality, elaboration, coherence, usefulness, int(skipped), datetime.utcnow()))
+        conn.commit()
+
+        # Update progress
+        update_progress(rater_id, query_id)
+
+        return {
+            'rater_id': rater_id,
+            'idea_id': idea_id,
+            'saved': True
+        }
+
+
+def get_rating(rater_id: str, idea_id: str) -> dict[str, Any] | None:
+    """Get a specific rating."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute(
+            'SELECT * FROM ratings WHERE rater_id = ? AND idea_id = ?',
+            (rater_id, idea_id)
+        )
+        row = cursor.fetchone()
+        if row:
+            return dict(row)
+        return None
+
+
+def get_ratings_by_rater(rater_id: str) -> list[dict[str, Any]]:
+    """Get all ratings by a rater."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute(
+            'SELECT * FROM ratings WHERE rater_id = ? ORDER BY timestamp',
+            (rater_id,)
+        )
+        return [dict(row) for row in cursor.fetchall()]
+
+
+def get_ratings_by_idea(idea_id: str) -> list[dict[str, Any]]:
+    """Get all ratings for an idea."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute(
+            'SELECT * FROM ratings WHERE idea_id = ? ORDER BY rater_id',
+            (idea_id,)
+        )
+        return [dict(row) for row in cursor.fetchall()]
+
+
+def get_all_ratings() -> list[dict[str, Any]]:
+    """Get all ratings."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute('SELECT * FROM ratings ORDER BY timestamp')
+        return [dict(row) for row in cursor.fetchall()]
+
+
+# Progress operations
+def update_progress(rater_id: str, query_id: str) -> None:
+    """Update progress for a rater on a query."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+
+        # Count completed ratings for this query
+        cursor.execute('''
+            SELECT COUNT(*) as count FROM ratings
+            WHERE rater_id = ? AND query_id = ?
+        ''', (rater_id, query_id))
+        completed = cursor.fetchone()['count']
+
+        # Update or insert progress
+        cursor.execute('''
+            INSERT INTO progress (rater_id, query_id, completed_count, updated_at)
+            VALUES (?, ?, ?, ?)
+            ON CONFLICT(rater_id, query_id) DO UPDATE SET
+                completed_count = excluded.completed_count,
+                updated_at = excluded.updated_at
+        ''', (rater_id, query_id, completed, datetime.utcnow()))
+        conn.commit()
+
+
+def set_progress_total(rater_id: str, query_id: str, total: int) -> None:
+    """Set the total count for a query's progress."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute('''
+            INSERT INTO progress (rater_id, query_id, total_count, completed_count)
+            VALUES (?, ?, ?, 0)
+            ON CONFLICT(rater_id, query_id) DO UPDATE SET
+                total_count = excluded.total_count
+        ''', (rater_id, query_id, total))
+        conn.commit()
+
+
+def get_progress(rater_id: str) -> list[dict[str, Any]]:
+    """Get progress for all queries for a rater."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute(
+            'SELECT * FROM progress WHERE rater_id = ? ORDER BY query_id',
+            (rater_id,)
+        )
+        return [dict(row) for row in cursor.fetchall()]
+
+
+def get_progress_for_query(rater_id: str, query_id: str) -> dict[str, Any] | None:
+    """Get progress for a specific query."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute(
+            'SELECT * FROM progress WHERE rater_id = ? AND query_id = ?',
+            (rater_id, query_id)
+        )
+        row = cursor.fetchone()
+        if row:
+            return dict(row)
+        return None
+
+
+def get_rated_idea_ids(rater_id: str, query_id: str) -> set[str]:
+    """Get the set of idea IDs already rated by a rater for a query."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+        cursor.execute(
+            'SELECT idea_id FROM ratings WHERE rater_id = ? AND query_id = ?',
+            (rater_id, query_id)
+        )
+        return {row['idea_id'] for row in cursor.fetchall()}
+
+
+# Statistics
+def get_statistics() -> dict[str, Any]:
+    """Get overall statistics."""
+    with get_connection() as conn:
+        cursor = conn.cursor()
+
+        cursor.execute('SELECT COUNT(*) as count FROM raters')
+        rater_count = cursor.fetchone()['count']
+
+        cursor.execute('SELECT COUNT(*) as count FROM ratings WHERE skipped = 0')
+        rating_count = cursor.fetchone()['count']
+
+        cursor.execute('SELECT COUNT(*) as count FROM ratings WHERE skipped = 1')
+        skip_count = cursor.fetchone()['count']
+
+        cursor.execute('SELECT COUNT(DISTINCT idea_id) as count FROM ratings')
+        rated_ideas = cursor.fetchone()['count']
+
+        return {
+            'rater_count': rater_count,
+            'rating_count': rating_count,
+            'skip_count': skip_count,
+            'rated_ideas': rated_ideas
+        }
+
+
+# Initialize on import
+init_db()
--- a/experiments/assessment/backend/models.py
+++ b/experiments/assessment/backend/models.py
@@ -0,0 +1,183 @@
+"""
+Pydantic models for the assessment API.
+"""
+
+from datetime import datetime
+from pydantic import BaseModel, Field
+
+
+# Request models
+class RaterCreate(BaseModel):
+    """Request to create or login as a rater."""
+    rater_id: str = Field(..., min_length=1, max_length=50, description="Unique rater identifier")
+    name: str | None = Field(None, max_length=100, description="Optional display name")
+
+
+class RatingSubmit(BaseModel):
+    """Request to submit a rating."""
+    rater_id: str = Field(..., description="Rater identifier")
+    idea_id: str = Field(..., description="Idea identifier")
+    query_id: str = Field(..., description="Query identifier")
+    originality: int | None = Field(None, ge=1, le=5, description="Originality score 1-5")
+    elaboration: int | None = Field(None, ge=1, le=5, description="Elaboration score 1-5")
+    coherence: int | None = Field(None, ge=1, le=5, description="Coherence score 1-5")
+    usefulness: int | None = Field(None, ge=1, le=5, description="Usefulness score 1-5")
+    skipped: bool = Field(False, description="Whether the idea was skipped")
+
+
+# Response models
+class Rater(BaseModel):
+    """Rater information."""
+    rater_id: str
+    name: str | None
+    created_at: datetime | None = None
+
+
+class Rating(BaseModel):
+    """A single rating."""
+    id: int
+    rater_id: str
+    idea_id: str
+    query_id: str
+    originality: int | None
+    elaboration: int | None
+    coherence: int | None
+    usefulness: int | None
+    skipped: int
+    timestamp: datetime | None
+
+
+class Progress(BaseModel):
+    """Progress for a rater on a query."""
+    rater_id: str
+    query_id: str
+    completed_count: int
+    total_count: int
+    started_at: datetime | None = None
+    updated_at: datetime | None = None
+
+
+class QueryInfo(BaseModel):
+    """Information about a query."""
+    query_id: str
+    query_text: str
+    category: str
+    idea_count: int
+
+
+class IdeaForRating(BaseModel):
+    """An idea presented for rating (without hidden metadata)."""
+    idea_id: str
+    text: str
+    index: int  # Position in the randomized list for this query
+
+
+class QueryWithIdeas(BaseModel):
+    """A query with its ideas for rating."""
+    query_id: str
+    query_text: str
+    category: str
+    ideas: list[IdeaForRating]
+    total_count: int
+
+
+class Statistics(BaseModel):
+    """Overall statistics."""
+    rater_count: int
+    rating_count: int
+    skip_count: int
+    rated_ideas: int
+
+
+class RaterProgress(BaseModel):
+    """Complete progress summary for a rater."""
+    rater_id: str
+    queries: list[Progress]
+    total_completed: int
+    total_ideas: int
+    percentage: float
+
+
+# Export response models
+class ExportRating(BaseModel):
+    """Rating with hidden metadata for export."""
+    rater_id: str
+    idea_id: str
+    query_id: str
+    query_text: str
+    idea_text: str
+    originality: int | None
+    elaboration: int | None
+    coherence: int | None
+    usefulness: int | None
+    skipped: bool
+    condition: str
+    expert_name: str
+    keyword: str
+    timestamp: datetime | None
+
+
+class ExportData(BaseModel):
+    """Full export data structure."""
+    experiment_id: str
+    export_timestamp: datetime
+    rater_count: int
+    rating_count: int
+    ratings: list[ExportRating]
+
+
+# Dimension definitions (for frontend)
+DIMENSION_DEFINITIONS = {
+    "originality": {
+        "name": "Originality",
+        "question": "How unexpected or surprising is this idea? Would most people NOT think of this?",
+        "scale": {
+            1: "Very common/obvious idea anyone would suggest",
+            2: "Somewhat common, slight variation on expected ideas",
+            3: "Moderately original, some unexpected elements",
+            4: "Quite original, notably different approach",
+            5: "Highly unexpected, truly novel concept"
+        },
+        "low_label": "Common",
+        "high_label": "Unexpected"
+    },
+    "elaboration": {
+        "name": "Elaboration",
+        "question": "How detailed and well-developed is this idea?",
+        "scale": {
+            1: "Vague, minimal detail, just a concept",
+            2: "Basic idea with little specificity",
+            3: "Moderately detailed, some specifics provided",
+            4: "Well-developed with clear implementation hints",
+            5: "Highly specific, thoroughly developed concept"
+        },
+        "low_label": "Vague",
+        "high_label": "Detailed"
+    },
+    "coherence": {
+        "name": "Coherence",
+        "question": "Does this idea make logical sense and relate to the query object?",
+        "scale": {
+            1: "Nonsensical, irrelevant, or incomprehensible",
+            2: "Mostly unclear, weak connection to query",
+            3: "Partially coherent, some logical gaps",
+            4: "Mostly coherent with minor issues",
+            5: "Fully coherent, clearly relates to query"
+        },
+        "low_label": "Nonsense",
+        "high_label": "Coherent"
+    },
+    "usefulness": {
+        "name": "Usefulness",
+        "question": "Could this idea have practical value or inspire real innovation?",
+        "scale": {
+            1: "No practical value whatsoever",
+            2: "Minimal usefulness, highly impractical",
+            3: "Some potential value with major limitations",
+            4: "Useful idea with realistic applications",
+            5: "Highly useful, clear practical value"
+        },
+        "low_label": "Useless",
+        "high_label": "Useful"
+    }
+}
--- a/experiments/assessment/backend/requirements.txt
+++ b/experiments/assessment/backend/requirements.txt
@@ -0,0 +1,3 @@
+fastapi>=0.109.0
+uvicorn>=0.27.0
+pydantic>=2.5.0
--- a/experiments/assessment/data/assessment_items.json
+++ b/experiments/assessment/data/assessment_items.json
--- a/experiments/assessment/frontend/index.html
+++ b/experiments/assessment/frontend/index.html
@@ -0,0 +1,13 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>Creative Idea Assessment</title>
+  </head>
+  <body>
+    <div id="root"></div>
+    <script type="module" src="/src/main.tsx"></script>
+  </body>
+</html>
--- a/experiments/assessment/frontend/package-lock.json
+++ b/experiments/assessment/frontend/package-lock.json
--- a/experiments/assessment/frontend/package.json
+++ b/experiments/assessment/frontend/package.json
@@ -0,0 +1,32 @@
+{
+  "name": "assessment-frontend",
+  "private": true,
+  "version": "1.0.0",
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "tsc -b && vite build",
+    "lint": "eslint .",
+    "preview": "vite preview"
+  },
+  "dependencies": {
+    "@ant-design/icons": "^6.1.0",
+    "antd": "^6.0.0",
+    "react": "^19.2.0",
+    "react-dom": "^19.2.0"
+  },
+  "devDependencies": {
+    "@eslint/js": "^9.39.1",
+    "@types/node": "^24.10.1",
+    "@types/react": "^19.2.5",
+    "@types/react-dom": "^19.2.3",
+    "@vitejs/plugin-react": "^5.1.1",
+    "eslint": "^9.39.1",
+    "eslint-plugin-react-hooks": "^7.0.1",
+    "eslint-plugin-react-refresh": "^0.4.24",
+    "globals": "^16.5.0",
+    "typescript": "~5.9.3",
+    "typescript-eslint": "^8.46.4",
+    "vite": "^7.2.4"
+  }
+}
--- a/experiments/assessment/frontend/src/App.tsx
+++ b/experiments/assessment/frontend/src/App.tsx
@@ -0,0 +1,109 @@
+/**
+ * Main application component for the assessment interface.
+ */
+
+import { ConfigProvider, theme, Spin } from 'antd';
+import { useAssessment } from './hooks/useAssessment';
+import { RaterLogin } from './components/RaterLogin';
+import { InstructionsPage } from './components/InstructionsPage';
+import { AssessmentPage } from './components/AssessmentPage';
+import { CompletionPage } from './components/CompletionPage';
+
+function App() {
+  const assessment = useAssessment();
+
+  const renderContent = () => {
+    // Show loading spinner for initial load
+    if (assessment.loading && !assessment.rater) {
+      return (
+        <div style={{
+          display: 'flex',
+          justifyContent: 'center',
+          alignItems: 'center',
+          minHeight: '100vh'
+        }}>
+          <Spin size="large" />
+        </div>
+      );
+    }
+
+    switch (assessment.view) {
+      case 'login':
+        return (
+          <RaterLogin
+            onLogin={assessment.login}
+            loading={assessment.loading}
+            error={assessment.error}
+          />
+        );
+
+      case 'instructions':
+        return (
+          <InstructionsPage
+            dimensions={assessment.dimensions}
+            onStart={assessment.startAssessment}
+            loading={assessment.loading}
+          />
+        );
+
+      case 'assessment':
+        if (!assessment.rater || !assessment.currentQuery || !assessment.currentIdea || !assessment.dimensions) {
+          return (
+            <div style={{
+              display: 'flex',
+              justifyContent: 'center',
+              alignItems: 'center',
+              minHeight: '100vh'
+            }}>
+              <Spin size="large" tip="Loading..." />
+            </div>
+          );
+        }
+        return (
+          <AssessmentPage
+            raterId={assessment.rater.rater_id}
+            queryId={assessment.currentQuery.query_id}
+            queryText={assessment.currentQuery.query_text}
+            idea={assessment.currentIdea}
+            ideaIndex={assessment.currentIdeaIndex}
+            totalIdeas={assessment.currentQuery.total_count}
+            dimensions={assessment.dimensions}
+            progress={assessment.progress}
+            onNext={assessment.nextIdea}
+            onPrev={assessment.prevIdea}
+            onShowDefinitions={assessment.showInstructions}
+            onLogout={assessment.logout}
+            canGoPrev={assessment.currentIdeaIndex > 0}
+          />
+        );
+
+      case 'completion':
+        return (
+          <CompletionPage
+            raterId={assessment.rater?.rater_id ?? ''}
+            progress={assessment.progress}
+            onLogout={assessment.logout}
+          />
+        );
+
+      default:
+        return null;
+    }
+  };
+
+  return (
+    <ConfigProvider
+      theme={{
+        algorithm: theme.defaultAlgorithm,
+        token: {
+          colorPrimary: '#1677ff',
+          borderRadius: 6,
+        },
+      }}
+    >
+      {renderContent()}
+    </ConfigProvider>
+  );
+}
+
+export default App;
--- a/experiments/assessment/frontend/src/components/AssessmentPage.tsx
+++ b/experiments/assessment/frontend/src/components/AssessmentPage.tsx
@@ -0,0 +1,199 @@
+/**
+ * Main assessment page for rating ideas.
+ */
+
+import { Card, Button, Space, Alert, Typography } from 'antd';
+import {
+  ArrowLeftOutlined,
+  ArrowRightOutlined,
+  ForwardOutlined,
+  BookOutlined,
+  LogoutOutlined
+} from '@ant-design/icons';
+import type { IdeaForRating, DimensionDefinitions, RaterProgress } from '../types';
+import { useRatings } from '../hooks/useRatings';
+import { IdeaCard } from './IdeaCard';
+import { RatingSlider } from './RatingSlider';
+import { ProgressBar } from './ProgressBar';
+
+const { Text } = Typography;
+
+interface AssessmentPageProps {
+  raterId: string;
+  queryId: string;
+  queryText: string;
+  idea: IdeaForRating;
+  ideaIndex: number;
+  totalIdeas: number;
+  dimensions: DimensionDefinitions;
+  progress: RaterProgress | null;
+  onNext: () => void;
+  onPrev: () => void;
+  onShowDefinitions: () => void;
+  onLogout: () => void;
+  canGoPrev: boolean;
+}
+
+export function AssessmentPage({
+  raterId,
+  queryId,
+  queryText,
+  idea,
+  ideaIndex,
+  totalIdeas,
+  dimensions,
+  progress,
+  onNext,
+  onPrev,
+  onShowDefinitions,
+  onLogout,
+  canGoPrev
+}: AssessmentPageProps) {
+  const {
+    ratings,
+    setRating,
+    isComplete,
+    submit,
+    skip,
+    submitting,
+    error
+  } = useRatings({
+    raterId,
+    queryId,
+    ideaId: idea.idea_id,
+    onSuccess: onNext
+  });
+
+  const handleSubmit = async () => {
+    await submit();
+  };
+
+  const handleSkip = async () => {
+    await skip();
+  };
+
+  // Calculate query progress
+  const queryProgress = progress?.queries.find(q => q.query_id === queryId);
+  const queryCompleted = queryProgress?.completed_count ?? ideaIndex;
+  const queryTotal = totalIdeas;
+
+  return (
+    <div style={{ maxWidth: 800, margin: '0 auto', padding: 24 }}>
+      {/* Header with query info and overall progress */}
+      <Card size="small" style={{ marginBottom: 16 }}>
+        <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center', marginBottom: 8 }}>
+          <Text strong style={{ fontSize: 16 }}>Query: "{queryText}"</Text>
+          <Space>
+            <Button
+              icon={<BookOutlined />}
+              onClick={onShowDefinitions}
+              size="small"
+            >
+              Definitions
+            </Button>
+            <Button
+              icon={<LogoutOutlined />}
+              onClick={onLogout}
+              size="small"
+              danger
+            >
+              Exit
+            </Button>
+          </Space>
+        </div>
+        <ProgressBar
+          completed={queryCompleted}
+          total={queryTotal}
+          label="Query Progress"
+        />
+        {progress && (
+          <div style={{ marginTop: 8 }}>
+            <ProgressBar
+              completed={progress.total_completed}
+              total={progress.total_ideas}
+              label="Overall Progress"
+            />
+          </div>
+        )}
+      </Card>
+
+      {/* Error display */}
+      {error && (
+        <Alert
+          message={error}
+          type="error"
+          showIcon
+          closable
+          style={{ marginBottom: 16 }}
+        />
+      )}
+
+      {/* Idea card */}
+      <IdeaCard
+        ideaNumber={ideaIndex + 1}
+        text={idea.text}
+        queryText={queryText}
+      />
+
+      {/* Rating inputs */}
+      <Card style={{ marginBottom: 16 }}>
+        <RatingSlider
+          dimension={dimensions.originality}
+          value={ratings.originality}
+          onChange={(v) => setRating('originality', v)}
+          disabled={submitting}
+        />
+        <RatingSlider
+          dimension={dimensions.elaboration}
+          value={ratings.elaboration}
+          onChange={(v) => setRating('elaboration', v)}
+          disabled={submitting}
+        />
+        <RatingSlider
+          dimension={dimensions.coherence}
+          value={ratings.coherence}
+          onChange={(v) => setRating('coherence', v)}
+          disabled={submitting}
+        />
+        <RatingSlider
+          dimension={dimensions.usefulness}
+          value={ratings.usefulness}
+          onChange={(v) => setRating('usefulness', v)}
+          disabled={submitting}
+        />
+      </Card>
+
+      {/* Navigation buttons */}
+      <Card>
+        <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
+          <Button
+            icon={<ArrowLeftOutlined />}
+            onClick={onPrev}
+            disabled={!canGoPrev || submitting}
+          >
+            Back
+          </Button>
+
+          <Space>
+            <Button
+              icon={<ForwardOutlined />}
+              onClick={handleSkip}
+              loading={submitting}
+            >
+              Skip
+            </Button>
+            <Button
+              type="primary"
+              icon={<ArrowRightOutlined />}
+              onClick={handleSubmit}
+              loading={submitting}
+              disabled={!isComplete()}
+            >
+              Submit & Next
+            </Button>
+          </Space>
+        </div>
+      </Card>
+    </div>
+  );
+}
--- a/experiments/assessment/frontend/src/components/CompletionPage.tsx
+++ b/experiments/assessment/frontend/src/components/CompletionPage.tsx
@@ -0,0 +1,105 @@
+/**
+ * Completion page shown when all ideas have been rated.
+ */
+
+import { Card, Button, Typography, Space, Result, Statistic, Row, Col } from 'antd';
+import { CheckCircleOutlined, BarChartOutlined, LogoutOutlined } from '@ant-design/icons';
+import type { RaterProgress } from '../types';
+
+const { Title, Text } = Typography;
+
+interface CompletionPageProps {
+  raterId: string;
+  progress: RaterProgress | null;
+  onLogout: () => void;
+}
+
+export function CompletionPage({ raterId, progress, onLogout }: CompletionPageProps) {
+  const completed = progress?.total_completed ?? 0;
+  const total = progress?.total_ideas ?? 0;
+  const percentage = progress?.percentage ?? 0;
+
+  const isFullyComplete = completed >= total;
+
+  return (
+    <div style={{
+      display: 'flex',
+      justifyContent: 'center',
+      alignItems: 'center',
+      minHeight: '100vh',
+      padding: 24
+    }}>
+      <Card style={{ maxWidth: 600, width: '100%' }}>
+        <Result
+          status={isFullyComplete ? 'success' : 'info'}
+          icon={isFullyComplete ? <CheckCircleOutlined /> : <BarChartOutlined />}
+          title={isFullyComplete ? 'Assessment Complete!' : 'Session Summary'}
+          subTitle={
+            isFullyComplete
+              ? 'Thank you for completing the assessment.'
+              : 'You have made progress on the assessment.'
+          }
+          extra={[
+            <Button
+              type="primary"
+              key="logout"
+              icon={<LogoutOutlined />}
+              onClick={onLogout}
+            >
+              Exit
+            </Button>
+          ]}
+        >
+          <Row gutter={16} style={{ marginTop: 24 }}>
+            <Col span={8}>
+              <Statistic
+                title="Ideas Rated"
+                value={completed}
+                suffix={`/ ${total}`}
+              />
+            </Col>
+            <Col span={8}>
+              <Statistic
+                title="Progress"
+                value={percentage}
+                suffix="%"
+                precision={1}
+              />
+            </Col>
+            <Col span={8}>
+              <Statistic
+                title="Rater ID"
+                value={raterId}
+                valueStyle={{ fontSize: 16 }}
+              />
+            </Col>
+          </Row>
+
+          {progress && progress.queries.length > 0 && (
+            <div style={{ marginTop: 24 }}>
+              <Title level={5}>Progress by Query</Title>
+              <Space direction="vertical" style={{ width: '100%' }}>
+                {progress.queries.map((q) => (
+                  <div
+                    key={q.query_id}
+                    style={{
+                      display: 'flex',
+                      justifyContent: 'space-between',
+                      padding: '4px 0'
+                    }}
+                  >
+                    <Text>{q.query_id}</Text>
+                    <Text type={q.completed_count >= q.total_count ? 'success' : 'secondary'}>
+                      {q.completed_count} / {q.total_count}
+                      {q.completed_count >= q.total_count && ' ✓'}
+                    </Text>
+                  </div>
+                ))}
+              </Space>
+            </div>
+          )}
+        </Result>
+      </Card>
+    </div>
+  );
+}
--- a/experiments/assessment/frontend/src/components/IdeaCard.tsx
+++ b/experiments/assessment/frontend/src/components/IdeaCard.tsx
@@ -0,0 +1,36 @@
+/**
+ * Card displaying a single idea for rating.
+ */
+
+import { Card, Typography, Tag } from 'antd';
+
+const { Text, Paragraph } = Typography;
+
+interface IdeaCardProps {
+  ideaNumber: number;
+  text: string;
+  queryText: string;
+}
+
+export function IdeaCard({ ideaNumber, text, queryText }: IdeaCardProps) {
+  return (
+    <Card
+      title={
+        <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
+          <Text strong>IDEA #{ideaNumber}</Text>
+          <Tag color="blue">Query: {queryText}</Tag>
+        </div>
+      }
+      style={{ marginBottom: 24 }}
+    >
+      <Paragraph style={{
+        fontSize: 16,
+        lineHeight: 1.8,
+        margin: 0,
+        padding: '8px 0'
+      }}>
+        "{text}"
+      </Paragraph>
+    </Card>
+  );
+}
--- a/experiments/assessment/frontend/src/components/InstructionsPage.tsx
+++ b/experiments/assessment/frontend/src/components/InstructionsPage.tsx
@@ -0,0 +1,134 @@
+/**
+ * Instructions page showing dimension definitions.
+ */
+
+import { useState } from 'react';
+import { Card, Button, Typography, Space, Checkbox, Divider, Tag } from 'antd';
+import { PlayCircleOutlined } from '@ant-design/icons';
+import type { DimensionDefinitions } from '../types';
+
+const { Title, Text, Paragraph } = Typography;
+
+interface InstructionsPageProps {
+  dimensions: DimensionDefinitions | null;
+  onStart: () => void;
+  onBack?: () => void;
+  loading: boolean;
+  isReturning?: boolean;
+}
+
+export function InstructionsPage({
+  dimensions,
+  onStart,
+  onBack,
+  loading,
+  isReturning = false
+}: InstructionsPageProps) {
+  const [acknowledged, setAcknowledged] = useState(isReturning);
+
+  if (!dimensions) {
+    return (
+      <div style={{ padding: 24, textAlign: 'center' }}>
+        <Text>Loading instructions...</Text>
+      </div>
+    );
+  }
+
+  const dimensionOrder = ['originality', 'elaboration', 'coherence', 'usefulness'] as const;
+
+  return (
+    <div style={{
+      maxWidth: 800,
+      margin: '0 auto',
+      padding: 24
+    }}>
+      <Card>
+        <Space direction="vertical" size="large" style={{ width: '100%' }}>
+          <div style={{ textAlign: 'center' }}>
+            <Title level={2}>Assessment Instructions</Title>
+            <Paragraph type="secondary">
+              You will rate creative ideas on 4 dimensions using a 1-5 scale.
+              Please read each definition carefully before beginning.
+            </Paragraph>
+          </div>
+
+          <Divider />
+
+          {dimensionOrder.map((key) => {
+            const dim = dimensions[key];
+            return (
+              <Card
+                key={key}
+                size="small"
+                title={
+                  <Space>
+                    <Tag color="blue">{dim.name}</Tag>
+                    <Text type="secondary">{dim.question}</Text>
+                  </Space>
+                }
+                style={{ marginBottom: 16 }}
+              >
+                <div style={{
+                  display: 'grid',
+                  gridTemplateColumns: 'auto 1fr',
+                  gap: '8px 16px',
+                  fontSize: 14
+                }}>
+                  {([1, 2, 3, 4, 5] as const).map((score) => (
+                    <>
+                      <Tag
+                        key={`score-${score}`}
+                        color={score <= 2 ? 'red' : score === 3 ? 'orange' : 'green'}
+                      >
+                        {score}
+                      </Tag>
+                      <Text key={`text-${score}`}>
+                        {dim.scale[score]}
+                      </Text>
+                    </>
+                  ))}
+                </div>
+                <Divider style={{ margin: '12px 0' }} />
+                <div style={{ display: 'flex', justifyContent: 'space-between' }}>
+                  <Text type="secondary">{dim.low_label}</Text>
+                  <Text type="secondary">{dim.high_label}</Text>
+                </div>
+              </Card>
+            );
+          })}
+
+          <Divider />
+
+          <Space direction="vertical" style={{ width: '100%' }}>
+            {!isReturning && (
+              <Checkbox
+                checked={acknowledged}
+                onChange={(e) => setAcknowledged(e.target.checked)}
+              >
+                I have read and understood the instructions
+              </Checkbox>
+            )}
+
+            <Space style={{ width: '100%', justifyContent: 'center' }}>
+              {onBack && (
+                <Button onClick={onBack}>
+                  Back to Assessment
+                </Button>
+              )}
+              <Button
+                type="primary"
+                size="large"
+                icon={<PlayCircleOutlined />}
+                onClick={onStart}
+                loading={loading}
+                disabled={!acknowledged}
+              >
+                {isReturning ? 'Continue Rating' : 'Begin Rating'}
+              </Button>
+            </Space>
+          </Space>
+        </Space>
+      </Card>
+    </div>
+  );
+}
--- a/experiments/assessment/frontend/src/components/ProgressBar.tsx
+++ b/experiments/assessment/frontend/src/components/ProgressBar.tsx
@@ -0,0 +1,39 @@
+/**
+ * Progress bar component showing assessment progress.
+ */
+
+import { Progress, Typography, Space } from 'antd';
+
+const { Text } = Typography;
+
+interface ProgressBarProps {
+  completed: number;
+  total: number;
+  label?: string;
+}
+
+export function ProgressBar({ completed, total, label }: ProgressBarProps) {
+  const percentage = total > 0 ? Math.round((completed / total) * 100) : 0;
+
+  return (
+    <div style={{ width: '100%' }}>
+      {label && (
+        <Space style={{ marginBottom: 4, justifyContent: 'space-between', width: '100%' }}>
+          <Text type="secondary">{label}</Text>
+          <Text type="secondary">
+            {completed}/{total} ({percentage}%)
+          </Text>
+        </Space>
+      )}
+      <Progress
+        percent={percentage}
+        showInfo={!label}
+        status="active"
+        strokeColor={{
+          '0%': '#108ee9',
+          '100%': '#87d068',
+        }}
+      />
+    </div>
+  );
+}
--- a/experiments/assessment/frontend/src/components/RaterLogin.tsx
+++ b/experiments/assessment/frontend/src/components/RaterLogin.tsx
@@ -0,0 +1,116 @@
+/**
+ * Rater login component.
+ */
+
+import { useState, useEffect } from 'react';
+import { Card, Input, Button, Typography, Space, List, Alert } from 'antd';
+import { UserOutlined, LoginOutlined } from '@ant-design/icons';
+import * as api from '../services/api';
+import type { Rater } from '../types';
+
+const { Title, Text } = Typography;
+
+interface RaterLoginProps {
+  onLogin: (raterId: string, name?: string) => void;
+  loading: boolean;
+  error: string | null;
+}
+
+export function RaterLogin({ onLogin, loading, error }: RaterLoginProps) {
+  const [raterId, setRaterId] = useState('');
+  const [existingRaters, setExistingRaters] = useState<Rater[]>([]);
+
+  useEffect(() => {
+    api.listRaters()
+      .then(setExistingRaters)
+      .catch(console.error);
+  }, []);
+
+  const handleLogin = () => {
+    if (raterId.trim()) {
+      onLogin(raterId.trim());
+    }
+  };
+
+  const handleQuickLogin = (rater: Rater) => {
+    onLogin(rater.rater_id);
+  };
+
+  return (
+    <div style={{
+      display: 'flex',
+      justifyContent: 'center',
+      alignItems: 'center',
+      minHeight: '100vh',
+      padding: 24
+    }}>
+      <Card
+        style={{ width: 400, maxWidth: '100%' }}
+        styles={{ body: { padding: 32 } }}
+      >
+        <Space direction="vertical" size="large" style={{ width: '100%' }}>
+          <div style={{ textAlign: 'center' }}>
+            <Title level={3} style={{ marginBottom: 8 }}>
+              Creative Idea Assessment
+            </Title>
+            <Text type="secondary">
+              Enter your rater ID to begin
+            </Text>
+          </div>
+
+          {error && (
+            <Alert message={error} type="error" showIcon />
+          )}
+
+          <Input
+            size="large"
+            placeholder="Enter your rater ID"
+            prefix={<UserOutlined />}
+            value={raterId}
+            onChange={(e) => setRaterId(e.target.value)}
+            onPressEnter={handleLogin}
+            disabled={loading}
+          />
+
+          <Button
+            type="primary"
+            size="large"
+            icon={<LoginOutlined />}
+            onClick={handleLogin}
+            loading={loading}
+            disabled={!raterId.trim()}
+            block
+          >
+            Start Assessment
+          </Button>
+
+          {existingRaters.length > 0 && (
+            <div>
+              <Text type="secondary" style={{ display: 'block', marginBottom: 8 }}>
+                Existing raters:
+              </Text>
+              <List
+                size="small"
+                bordered
+                dataSource={existingRaters}
+                renderItem={(rater) => (
+                  <List.Item
+                    style={{ cursor: 'pointer' }}
+                    onClick={() => handleQuickLogin(rater)}
+                  >
+                    <Text code>{rater.rater_id}</Text>
+                    {rater.name && rater.name !== rater.rater_id && (
+                      <Text type="secondary" style={{ marginLeft: 8 }}>
+                        ({rater.name})
+                      </Text>
+                    )}
+                  </List.Item>
+                )}
+              />
+            </div>
+          )}
+        </Space>
+      </Card>
+    </div>
+  );
+}
--- a/experiments/assessment/frontend/src/components/RatingSlider.tsx
+++ b/experiments/assessment/frontend/src/components/RatingSlider.tsx
@@ -0,0 +1,74 @@
+/**
+ * Rating input component with radio buttons for 1-5 scale.
+ */
+
+import { Radio, Typography, Space, Tooltip, Button } from 'antd';
+import { QuestionCircleOutlined } from '@ant-design/icons';
+import type { DimensionDefinition } from '../types';
+
+const { Text } = Typography;
+
+interface RatingSliderProps {
+  dimension: DimensionDefinition;
+  value: number | null;
+  onChange: (value: number | null) => void;
+  disabled?: boolean;
+}
+
+export function RatingSlider({ dimension, value, onChange, disabled }: RatingSliderProps) {
+  return (
+    <div style={{ marginBottom: 24 }}>
+      <div style={{ display: 'flex', alignItems: 'center', marginBottom: 8 }}>
+        <Text strong style={{ marginRight: 8 }}>
+          {dimension.name.toUpperCase()}
+        </Text>
+        <Tooltip
+          title={
+            <div>
+              <p style={{ marginBottom: 8 }}>{dimension.question}</p>
+              {([1, 2, 3, 4, 5] as const).map((score) => (
+                <div key={score} style={{ marginBottom: 4 }}>
+                  <strong>{score}:</strong> {dimension.scale[score]}
+                </div>
+              ))}
+            </div>
+          }
+          placement="right"
+          overlayStyle={{ maxWidth: 400 }}
+        >
+          <Button
+            type="text"
+            size="small"
+            icon={<QuestionCircleOutlined />}
+            style={{ padding: 0, height: 'auto' }}
+          />
+        </Tooltip>
+      </div>
+
+      <div style={{ display: 'flex', alignItems: 'center', gap: 16 }}>
+        <Text type="secondary" style={{ minWidth: 80, textAlign: 'right' }}>
+          {dimension.low_label}
+        </Text>
+
+        <Radio.Group
+          value={value}
+          onChange={(e) => onChange(e.target.value)}
+          disabled={disabled}
+          style={{ flex: 1 }}
+        >
+          <Space size="large">
+            {[1, 2, 3, 4, 5].map((score) => (
+              <Radio key={score} value={score}>
+                {score}
+              </Radio>
+            ))}
+          </Space>
+        </Radio.Group>
+
+        <Text type="secondary" style={{ minWidth: 80 }}>
+          {dimension.high_label}
+        </Text>
+      </div>
+    </div>
+  );
+}
--- a/experiments/assessment/frontend/src/hooks/useAssessment.ts
+++ b/experiments/assessment/frontend/src/hooks/useAssessment.ts
@@ -0,0 +1,272 @@
+/**
+ * Hook for managing the assessment session state.
+ */
+
+import { useState, useCallback, useEffect } from 'react';
+import type {
+  AppView,
+  DimensionDefinitions,
+  QueryInfo,
+  QueryWithIdeas,
+  Rater,
+  RaterProgress,
+} from '../types';
+import * as api from '../services/api';
+
+interface AssessmentState {
+  view: AppView;
+  rater: Rater | null;
+  queries: QueryInfo[];
+  currentQueryIndex: number;
+  currentQuery: QueryWithIdeas | null;
+  currentIdeaIndex: number;
+  progress: RaterProgress | null;
+  dimensions: DimensionDefinitions | null;
+  loading: boolean;
+  error: string | null;
+}
+
+const initialState: AssessmentState = {
+  view: 'login',
+  rater: null,
+  queries: [],
+  currentQueryIndex: 0,
+  currentQuery: null,
+  currentIdeaIndex: 0,
+  progress: null,
+  dimensions: null,
+  loading: false,
+  error: null,
+};
+
+export function useAssessment() {
+  const [state, setState] = useState<AssessmentState>(initialState);
+
+  // Load dimension definitions on mount
+  useEffect(() => {
+    api.getDimensionDefinitions()
+      .then((dimensions) => setState((s) => ({ ...s, dimensions })))
+      .catch((err) => console.error('Failed to load dimensions:', err));
+  }, []);
+
+  // Login as a rater
+  const login = useCallback(async (raterId: string, name?: string) => {
+    setState((s) => ({ ...s, loading: true, error: null }));
+    try {
+      const rater = await api.createOrGetRater({ rater_id: raterId, name });
+      const queries = await api.listQueries();
+      const progress = await api.getRaterProgress(raterId);
+
+      setState((s) => ({
+        ...s,
+        rater,
+        queries,
+        progress,
+        view: 'instructions',
+        loading: false,
+      }));
+    } catch (err) {
+      setState((s) => ({
+        ...s,
+        error: err instanceof Error ? err.message : 'Login failed',
+        loading: false,
+      }));
+    }
+  }, []);
+
+  // Start assessment (move from instructions to assessment)
+  const startAssessment = useCallback(async () => {
+    if (!state.rater || state.queries.length === 0) return;
+
+    setState((s) => ({ ...s, loading: true }));
+    try {
+      // Find first query with unrated ideas
+      let queryIndex = 0;
+      let queryData: QueryWithIdeas | null = null;
+
+      for (let i = 0; i < state.queries.length; i++) {
+        const unrated = await api.getUnratedIdeas(state.queries[i].query_id, state.rater.rater_id);
+        if (unrated.ideas.length > 0) {
+          queryIndex = i;
+          queryData = unrated;
+          break;
+        }
+      }
+
+      if (!queryData) {
+        // All done
+        setState((s) => ({
+          ...s,
+          view: 'completion',
+          loading: false,
+        }));
+        return;
+      }
+
+      setState((s) => ({
+        ...s,
+        view: 'assessment',
+        currentQueryIndex: queryIndex,
+        currentQuery: queryData,
+        currentIdeaIndex: 0,
+        loading: false,
+      }));
+    } catch (err) {
+      setState((s) => ({
+        ...s,
+        error: err instanceof Error ? err.message : 'Failed to start assessment',
+        loading: false,
+      }));
+    }
+  }, [state.rater, state.queries]);
+
+  // Move to next idea
+  const nextIdea = useCallback(async () => {
+    if (!state.currentQuery || !state.rater) return;
+
+    const nextIndex = state.currentIdeaIndex + 1;
+
+    if (nextIndex < state.currentQuery.ideas.length) {
+      // More ideas in current query
+      setState((s) => ({ ...s, currentIdeaIndex: nextIndex }));
+    } else {
+      // Query complete, try to move to next query
+      const nextQueryIndex = state.currentQueryIndex + 1;
+
+      if (nextQueryIndex < state.queries.length) {
+        setState((s) => ({ ...s, loading: true }));
+        try {
+          const unrated = await api.getUnratedIdeas(
+            state.queries[nextQueryIndex].query_id,
+            state.rater.rater_id
+          );
+
+          if (unrated.ideas.length > 0) {
+            setState((s) => ({
+              ...s,
+              currentQueryIndex: nextQueryIndex,
+              currentQuery: unrated,
+              currentIdeaIndex: 0,
+              loading: false,
+            }));
+          } else {
+            // Try to find next query with unrated ideas
+            for (let i = nextQueryIndex + 1; i < state.queries.length; i++) {
+              const nextUnrated = await api.getUnratedIdeas(
+                state.queries[i].query_id,
+                state.rater.rater_id
+              );
+              if (nextUnrated.ideas.length > 0) {
+                setState((s) => ({
+                  ...s,
+                  currentQueryIndex: i,
+                  currentQuery: nextUnrated,
+                  currentIdeaIndex: 0,
+                  loading: false,
+                }));
+                return;
+              }
+            }
+            // All queries complete
+            setState((s) => ({
+              ...s,
+              view: 'completion',
+              loading: false,
+            }));
+          }
+        } catch (err) {
+          setState((s) => ({
+            ...s,
+            error: err instanceof Error ? err.message : 'Failed to load next query',
+            loading: false,
+          }));
+        }
+      } else {
+        // All queries complete
+        setState((s) => ({ ...s, view: 'completion' }));
+      }
+    }
+
+    // Refresh progress
+    try {
+      const progress = await api.getRaterProgress(state.rater.rater_id);
+      setState((s) => ({ ...s, progress }));
+    } catch (err) {
+      console.error('Failed to refresh progress:', err);
+    }
+  }, [state.currentQuery, state.currentIdeaIndex, state.currentQueryIndex, state.queries, state.rater]);
+
+  // Move to previous idea
+  const prevIdea = useCallback(() => {
+    if (state.currentIdeaIndex > 0) {
+      setState((s) => ({ ...s, currentIdeaIndex: s.currentIdeaIndex - 1 }));
+    }
+  }, [state.currentIdeaIndex]);
+
+  // Jump to a specific query
+  const jumpToQuery = useCallback(async (queryIndex: number) => {
+    if (!state.rater || queryIndex < 0 || queryIndex >= state.queries.length) return;
+
+    setState((s) => ({ ...s, loading: true }));
+    try {
+      const queryData = await api.getQueryWithIdeas(state.queries[queryIndex].query_id);
+      setState((s) => ({
+        ...s,
+        currentQueryIndex: queryIndex,
+        currentQuery: queryData,
+        currentIdeaIndex: 0,
+        view: 'assessment',
+        loading: false,
+      }));
+    } catch (err) {
+      setState((s) => ({
+        ...s,
+        error: err instanceof Error ? err.message : 'Failed to load query',
+        loading: false,
+      }));
+    }
+  }, [state.rater, state.queries]);
+
+  // Refresh progress
+  const refreshProgress = useCallback(async () => {
+    if (!state.rater) return;
+    try {
+      const progress = await api.getRaterProgress(state.rater.rater_id);
+      setState((s) => ({ ...s, progress }));
+    } catch (err) {
+      console.error('Failed to refresh progress:', err);
+    }
+  }, [state.rater]);
+
+  // Show definitions
+  const showInstructions = useCallback(() => {
+    setState((s) => ({ ...s, view: 'instructions' }));
+  }, []);
+
+  // Return to assessment
+  const returnToAssessment = useCallback(() => {
+    setState((s) => ({ ...s, view: 'assessment' }));
+  }, []);
+
+  // Logout
+  const logout = useCallback(() => {
+    setState(initialState);
+  }, []);
+
+  // Get current idea
+  const currentIdea = state.currentQuery?.ideas[state.currentIdeaIndex] ?? null;
+
+  return {
+    ...state,
+    currentIdea,
+    login,
+    startAssessment,
+    nextIdea,
+    prevIdea,
+    jumpToQuery,
+    refreshProgress,
+    showInstructions,
+    returnToAssessment,
+    logout,
+  };
+}
--- a/experiments/assessment/frontend/src/hooks/useRatings.ts
+++ b/experiments/assessment/frontend/src/hooks/useRatings.ts
@@ -0,0 +1,133 @@
+/**
+ * Hook for managing rating submission.
+ */
+
+import { useState, useCallback } from 'react';
+import type { RatingState, DimensionKey } from '../types';
+import * as api from '../services/api';
+
+interface UseRatingsOptions {
+  raterId: string | null;
+  queryId: string | null;
+  ideaId: string | null;
+  onSuccess?: () => void;
+}
+
+export function useRatings({ raterId, queryId, ideaId, onSuccess }: UseRatingsOptions) {
+  const [ratings, setRatings] = useState<RatingState>({
+    originality: null,
+    elaboration: null,
+    coherence: null,
+    usefulness: null,
+  });
+  const [submitting, setSubmitting] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+
+  // Set a single rating
+  const setRating = useCallback((dimension: DimensionKey, value: number | null) => {
+    setRatings((prev) => ({ ...prev, [dimension]: value }));
+  }, []);
+
+  // Reset all ratings
+  const resetRatings = useCallback(() => {
+    setRatings({
+      originality: null,
+      elaboration: null,
+      coherence: null,
+      usefulness: null,
+    });
+    setError(null);
+  }, []);
+
+  // Check if all ratings are set
+  const isComplete = useCallback(() => {
+    return (
+      ratings.originality !== null &&
+      ratings.elaboration !== null &&
+      ratings.coherence !== null &&
+      ratings.usefulness !== null
+    );
+  }, [ratings]);
+
+  // Submit rating
+  const submit = useCallback(async () => {
+    if (!raterId || !queryId || !ideaId) {
+      setError('Missing required information');
+      return false;
+    }
+
+    if (!isComplete()) {
+      setError('Please rate all dimensions');
+      return false;
+    }
+
+    setSubmitting(true);
+    setError(null);
+
+    try {
+      await api.submitRating({
+        rater_id: raterId,
+        idea_id: ideaId,
+        query_id: queryId,
+        originality: ratings.originality,
+        elaboration: ratings.elaboration,
+        coherence: ratings.coherence,
+        usefulness: ratings.usefulness,
+        skipped: false,
+      });
+
+      resetRatings();
+      onSuccess?.();
+      return true;
+    } catch (err) {
+      setError(err instanceof Error ? err.message : 'Failed to submit rating');
+      return false;
+    } finally {
+      setSubmitting(false);
+    }
+  }, [raterId, queryId, ideaId, ratings, isComplete, resetRatings, onSuccess]);
+
+  // Skip idea
+  const skip = useCallback(async () => {
+    if (!raterId || !queryId || !ideaId) {
+      setError('Missing required information');
+      return false;
+    }
+
+    setSubmitting(true);
+    setError(null);
+
+    try {
+      await api.submitRating({
+        rater_id: raterId,
+        idea_id: ideaId,
+        query_id: queryId,
+        originality: null,
+        elaboration: null,
+        coherence: null,
+        usefulness: null,
+        skipped: true,
+      });
+
+      resetRatings();
+      onSuccess?.();
+      return true;
+    } catch (err) {
+      setError(err instanceof Error ? err.message : 'Failed to skip idea');
+      return false;
+    } finally {
+      setSubmitting(false);
+    }
+  }, [raterId, queryId, ideaId, resetRatings, onSuccess]);
+
+  return {
+    ratings,
+    setRating,
+    resetRatings,
+    isComplete,
+    submit,
+    skip,
+    submitting,
+    error,
+  };
+}
--- a/experiments/assessment/frontend/src/index.css
+++ b/experiments/assessment/frontend/src/index.css
@@ -0,0 +1,43 @@
+:root {
+  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
+  line-height: 1.5;
+  font-weight: 400;
+
+  color-scheme: light;
+  color: rgba(0, 0, 0, 0.88);
+  background-color: #f5f5f5;
+
+  font-synthesis: none;
+  text-rendering: optimizeLegibility;
+  -webkit-font-smoothing: antialiased;
+  -moz-osx-font-smoothing: grayscale;
+}
+
+body {
+  margin: 0;
+  min-height: 100vh;
+}
+
+#root {
+  min-height: 100vh;
+}
+
+/* Custom scrollbar */
+::-webkit-scrollbar {
+  width: 8px;
+  height: 8px;
+}
+
+::-webkit-scrollbar-track {
+  background: #f1f1f1;
+  border-radius: 4px;
+}
+
+::-webkit-scrollbar-thumb {
+  background: #c1c1c1;
+  border-radius: 4px;
+}
+
+::-webkit-scrollbar-thumb:hover {
+  background: #a8a8a8;
+}
--- a/experiments/assessment/frontend/src/main.tsx
+++ b/experiments/assessment/frontend/src/main.tsx
@@ -0,0 +1,10 @@
+import { StrictMode } from 'react'
+import { createRoot } from 'react-dom/client'
+import './index.css'
+import App from './App'
+
+createRoot(document.getElementById('root')!).render(
+  <StrictMode>
+    <App />
+  </StrictMode>,
+)
--- a/experiments/assessment/frontend/src/services/api.ts
+++ b/experiments/assessment/frontend/src/services/api.ts
@@ -0,0 +1,116 @@
+/**
+ * API client for the assessment backend.
+ */
+
+import type {
+  DimensionDefinitions,
+  QueryInfo,
+  QueryWithIdeas,
+  Rater,
+  RaterCreate,
+  RaterProgress,
+  Rating,
+  RatingSubmit,
+  SessionInfo,
+  Statistics,
+} from '../types';
+
+const API_BASE = '/api';
+
+async function fetchJson<T>(url: string, options?: RequestInit): Promise<T> {
+  const response = await fetch(`${API_BASE}${url}`, {
+    headers: {
+      'Content-Type': 'application/json',
+      ...options?.headers,
+    },
+    ...options,
+  });
+
+  if (!response.ok) {
+    const error = await response.json().catch(() => ({ detail: response.statusText }));
+    throw new Error(error.detail || 'API request failed');
+  }
+
+  return response.json();
+}
+
+// Rater API
+export async function listRaters(): Promise<Rater[]> {
+  return fetchJson<Rater[]>('/raters');
+}
+
+export async function createOrGetRater(data: RaterCreate): Promise<Rater> {
+  return fetchJson<Rater>('/raters', {
+    method: 'POST',
+    body: JSON.stringify(data),
+  });
+}
+
+export async function getRater(raterId: string): Promise<Rater> {
+  return fetchJson<Rater>(`/raters/${encodeURIComponent(raterId)}`);
+}
+
+// Query API
+export async function listQueries(): Promise<QueryInfo[]> {
+  return fetchJson<QueryInfo[]>('/queries');
+}
+
+export async function getQueryWithIdeas(queryId: string): Promise<QueryWithIdeas> {
+  return fetchJson<QueryWithIdeas>(`/queries/${encodeURIComponent(queryId)}`);
+}
+
+export async function getUnratedIdeas(queryId: string, raterId: string): Promise<QueryWithIdeas> {
+  return fetchJson<QueryWithIdeas>(
+    `/queries/${encodeURIComponent(queryId)}/unrated?rater_id=${encodeURIComponent(raterId)}`
+  );
+}
+
+// Rating API
+export async function submitRating(rating: RatingSubmit): Promise<{ saved: boolean }> {
+  return fetchJson<{ saved: boolean }>('/ratings', {
+    method: 'POST',
+    body: JSON.stringify(rating),
+  });
+}
+
+export async function getRating(raterId: string, ideaId: string): Promise<Rating | null> {
+  try {
+    return await fetchJson<Rating>(`/ratings/${encodeURIComponent(raterId)}/${encodeURIComponent(ideaId)}`);
+  } catch {
+    return null;
+  }
+}
+
+export async function getRatingsByRater(raterId: string): Promise<Rating[]> {
+  return fetchJson<Rating[]>(`/ratings/rater/${encodeURIComponent(raterId)}`);
+}
+
+// Progress API
+export async function getRaterProgress(raterId: string): Promise<RaterProgress> {
+  return fetchJson<RaterProgress>(`/progress/${encodeURIComponent(raterId)}`);
+}
+
+// Statistics API
+export async function getStatistics(): Promise<Statistics> {
+  return fetchJson<Statistics>('/statistics');
+}
+
+// Dimension definitions API
+export async function getDimensionDefinitions(): Promise<DimensionDefinitions> {
+  return fetchJson<DimensionDefinitions>('/dimensions');
+}
+
+// Session info API
+export async function getSessionInfo(): Promise<SessionInfo> {
+  return fetchJson<SessionInfo>('/info');
+}
+
+// Health check
+export async function healthCheck(): Promise<boolean> {
+  try {
+    await fetchJson<{ status: string }>('/health');
+    return true;
+  } catch {
+    return false;
+  }
+}
--- a/experiments/assessment/frontend/src/types/index.ts
+++ b/experiments/assessment/frontend/src/types/index.ts
@@ -0,0 +1,142 @@
+/**
+ * TypeScript types for the assessment frontend.
+ */
+
+// Rater types
+export interface Rater {
+  rater_id: string;
+  name: string | null;
+  created_at?: string;
+}
+
+export interface RaterCreate {
+  rater_id: string;
+  name?: string;
+}
+
+// Query types
+export interface QueryInfo {
+  query_id: string;
+  query_text: string;
+  category: string;
+  idea_count: number;
+}
+
+export interface IdeaForRating {
+  idea_id: string;
+  text: string;
+  index: number;
+}
+
+export interface QueryWithIdeas {
+  query_id: string;
+  query_text: string;
+  category: string;
+  ideas: IdeaForRating[];
+  total_count: number;
+}
+
+// Rating types
+export interface RatingSubmit {
+  rater_id: string;
+  idea_id: string;
+  query_id: string;
+  originality: number | null;
+  elaboration: number | null;
+  coherence: number | null;
+  usefulness: number | null;
+  skipped: boolean;
+}
+
+export interface Rating {
+  id: number;
+  rater_id: string;
+  idea_id: string;
+  query_id: string;
+  originality: number | null;
+  elaboration: number | null;
+  coherence: number | null;
+  usefulness: number | null;
+  skipped: number;
+  timestamp: string | null;
+}
+
+// Progress types
+export interface QueryProgress {
+  rater_id: string;
+  query_id: string;
+  completed_count: number;
+  total_count: number;
+  started_at?: string;
+  updated_at?: string;
+}
+
+export interface RaterProgress {
+  rater_id: string;
+  queries: QueryProgress[];
+  total_completed: number;
+  total_ideas: number;
+  percentage: number;
+}
+
+// Statistics types
+export interface Statistics {
+  rater_count: number;
+  rating_count: number;
+  skip_count: number;
+  rated_ideas: number;
+}
+
+// Dimension definition types
+export interface DimensionScale {
+  1: string;
+  2: string;
+  3: string;
+  4: string;
+  5: string;
+}
+
+export interface DimensionDefinition {
+  name: string;
+  question: string;
+  scale: DimensionScale;
+  low_label: string;
+  high_label: string;
+}
+
+export interface DimensionDefinitions {
+  originality: DimensionDefinition;
+  elaboration: DimensionDefinition;
+  coherence: DimensionDefinition;
+  usefulness: DimensionDefinition;
+}
+
+// Session info
+export interface SessionInfo {
+  experiment_id: string;
+  total_ideas: number;
+  query_count: number;
+  conditions: string[];
+  randomization_seed: number;
+}
+
+// UI State types
+export type AppView = 'login' | 'instructions' | 'assessment' | 'completion';
+
+export interface RatingState {
+  originality: number | null;
+  elaboration: number | null;
+  coherence: number | null;
+  usefulness: number | null;
+}
+
+export const EMPTY_RATING_STATE: RatingState = {
+  originality: null,
+  elaboration: null,
+  coherence: null,
+  usefulness: null,
+};
+
+export type DimensionKey = keyof RatingState;
+
+export const DIMENSION_KEYS: DimensionKey[] = ['originality', 'elaboration', 'coherence', 'usefulness'];
--- a/experiments/assessment/frontend/tsconfig.json
+++ b/experiments/assessment/frontend/tsconfig.json
@@ -0,0 +1,20 @@
+{
+  "compilerOptions": {
+    "target": "ES2020",
+    "useDefineForClassFields": true,
+    "lib": ["ES2020", "DOM", "DOM.Iterable"],
+    "module": "ESNext",
+    "skipLibCheck": true,
+    "moduleResolution": "bundler",
+    "allowImportingTsExtensions": true,
+    "isolatedModules": true,
+    "moduleDetection": "force",
+    "noEmit": true,
+    "jsx": "react-jsx",
+    "strict": true,
+    "noUnusedLocals": true,
+    "noUnusedParameters": true,
+    "noFallthroughCasesInSwitch": true
+  },
+  "include": ["src"]
+}
--- a/experiments/assessment/frontend/vite.config.ts
+++ b/experiments/assessment/frontend/vite.config.ts
@@ -0,0 +1,16 @@
+import { defineConfig } from 'vite'
+import react from '@vitejs/plugin-react'
+
+export default defineConfig({
+  plugins: [react()],
+  server: {
+    host: '0.0.0.0',
+    port: 5174,
+    proxy: {
+      '/api': {
+        target: 'http://localhost:8002',
+        changeOrigin: true
+      }
+    }
+  },
+})
--- a/experiments/assessment/prepare_data.py
+++ b/experiments/assessment/prepare_data.py
@@ -0,0 +1,375 @@
+#!/usr/bin/env python3
+"""
+Prepare assessment data from experiment results.
+
+Extracts unique ideas from deduped experiment results, assigns stable IDs,
+and randomizes the order within each query for unbiased human assessment.
+
+Usage:
+    python prepare_data.py                              # Use latest, all ideas
+    python prepare_data.py --sample 100                 # Sample 100 ideas total
+    python prepare_data.py --per-query 10               # 10 ideas per query
+    python prepare_data.py --per-condition 5            # 5 ideas per condition per query
+    python prepare_data.py --list                       # List available files
+"""
+
+import argparse
+import json
+import random
+from pathlib import Path
+from typing import Any
+
+
+def load_experiment_data(filepath: Path) -> dict[str, Any]:
+    """Load experiment data from JSON file."""
+    with open(filepath, 'r', encoding='utf-8') as f:
+        return json.load(f)
+
+
+def sample_ideas_stratified(
+    ideas: list[dict[str, Any]],
+    per_condition: int | None = None,
+    total_limit: int | None = None,
+    rng: random.Random | None = None
+) -> list[dict[str, Any]]:
+    """
+    Sample ideas with stratification by condition.
+
+    Args:
+        ideas: List of ideas with _hidden.condition metadata
+        per_condition: Max ideas per condition (stratified sampling)
+        total_limit: Max total ideas (after stratified sampling)
+        rng: Random number generator for reproducibility
+
+    Returns:
+        Sampled list of ideas
+    """
+    if rng is None:
+        rng = random.Random()
+
+    if per_condition is None and total_limit is None:
+        return ideas
+
+    # Group by condition
+    by_condition: dict[str, list[dict[str, Any]]] = {}
+    for idea in ideas:
+        condition = idea['_hidden']['condition']
+        if condition not in by_condition:
+            by_condition[condition] = []
+        by_condition[condition].append(idea)
+
+    # Sample per condition
+    sampled = []
+    for condition, cond_ideas in by_condition.items():
+        rng.shuffle(cond_ideas)
+        if per_condition is not None:
+            cond_ideas = cond_ideas[:per_condition]
+        sampled.extend(cond_ideas)
+
+    # Apply total limit if specified
+    if total_limit is not None and len(sampled) > total_limit:
+        rng.shuffle(sampled)
+        sampled = sampled[:total_limit]
+
+    return sampled
+
+
+def extract_ideas_from_condition(
+    query_id: str,
+    condition_name: str,
+    condition_data: dict[str, Any],
+    idea_counter: dict[str, int]
+) -> list[dict[str, Any]]:
+    """Extract ideas from a single condition with hidden metadata."""
+    ideas = []
+
+    dedup_data = condition_data.get('dedup', {})
+    unique_ideas_with_source = dedup_data.get('unique_ideas_with_source', [])
+
+    for item in unique_ideas_with_source:
+        idea_text = item.get('idea', '')
+        if not idea_text:
+            continue
+
+        # Generate stable idea ID
+        current_count = idea_counter.get(query_id, 0)
+        idea_id = f"{query_id}_I{current_count:03d}"
+        idea_counter[query_id] = current_count + 1
+
+        ideas.append({
+            'idea_id': idea_id,
+            'text': idea_text,
+            '_hidden': {
+                'condition': condition_name,
+                'expert_name': item.get('expert_name', ''),
+                'keyword': item.get('keyword', '')
+            }
+        })
+
+    return ideas
+
+
+def prepare_assessment_data(
+    experiment_filepath: Path,
+    output_filepath: Path,
+    seed: int = 42,
+    sample_total: int | None = None,
+    per_query: int | None = None,
+    per_condition: int | None = None
+) -> dict[str, Any]:
+    """
+    Prepare assessment data from experiment results.
+
+    Args:
+        experiment_filepath: Path to deduped experiment JSON
+        output_filepath: Path to write assessment items JSON
+        seed: Random seed for reproducible shuffling
+        sample_total: Total number of ideas to sample (across all queries)
+        per_query: Maximum ideas per query
+        per_condition: Maximum ideas per condition per query (stratified)
+
+    Returns:
+        Assessment data structure
+    """
+    rng = random.Random(seed)
+
+    # Load experiment data
+    data = load_experiment_data(experiment_filepath)
+    experiment_id = data.get('experiment_id', 'unknown')
+    conditions = data.get('conditions', [])
+    results = data.get('results', [])
+
+    print(f"Loading experiment: {experiment_id}")
+    print(f"Conditions: {conditions}")
+    print(f"Number of queries: {len(results)}")
+
+    # Show sampling config
+    if sample_total or per_query or per_condition:
+        print(f"Sampling config: total={sample_total}, per_query={per_query}, per_condition={per_condition}")
+
+    assessment_queries = []
+    total_ideas = 0
+    idea_counter: dict[str, int] = {}
+
+    for result in results:
+        query_id = result.get('query_id', '')
+        query_text = result.get('query', '')
+        category = result.get('category', '')
+
+        query_ideas = []
+
+        # Extract ideas from all conditions
+        conditions_data = result.get('conditions', {})
+        for condition_name, condition_data in conditions_data.items():
+            ideas = extract_ideas_from_condition(
+                query_id, condition_name, condition_data, idea_counter
+            )
+            query_ideas.extend(ideas)
+
+        # Apply stratified sampling if per_condition is specified
+        if per_condition is not None:
+            query_ideas = sample_ideas_stratified(
+                query_ideas,
+                per_condition=per_condition,
+                rng=rng
+            )
+
+        # Apply per-query limit
+        if per_query is not None and len(query_ideas) > per_query:
+            rng.shuffle(query_ideas)
+            query_ideas = query_ideas[:per_query]
+
+        # Shuffle ideas within this query
+        rng.shuffle(query_ideas)
+
+        assessment_queries.append({
+            'query_id': query_id,
+            'query_text': query_text,
+            'category': category,
+            'ideas': query_ideas,
+            'idea_count': len(query_ideas)
+        })
+
+        total_ideas += len(query_ideas)
+        print(f"  Query '{query_text}' ({query_id}): {len(query_ideas)} ideas")
+
+    # Apply total sample limit across all queries (proportionally)
+    if sample_total is not None and total_ideas > sample_total:
+        print(f"\nApplying total sample limit: {sample_total} (from {total_ideas})")
+        # Calculate proportion to keep
+        keep_ratio = sample_total / total_ideas
+        new_total = 0
+
+        for query in assessment_queries:
+            n_keep = max(1, int(len(query['ideas']) * keep_ratio))
+            rng.shuffle(query['ideas'])
+            query['ideas'] = query['ideas'][:n_keep]
+            query['idea_count'] = len(query['ideas'])
+            new_total += len(query['ideas'])
+
+        total_ideas = new_total
+
+    # Build output structure
+    assessment_data = {
+        'experiment_id': experiment_id,
+        'queries': assessment_queries,
+        'total_ideas': total_ideas,
+        'query_count': len(assessment_queries),
+        'conditions': conditions,
+        'randomization_seed': seed,
+        'sampling': {
+            'sample_total': sample_total,
+            'per_query': per_query,
+            'per_condition': per_condition
+        },
+        'metadata': {
+            'source_file': str(experiment_filepath.name),
+            'prepared_for': 'human_assessment'
+        }
+    }
+
+    # Write output
+    output_filepath.parent.mkdir(parents=True, exist_ok=True)
+    with open(output_filepath, 'w', encoding='utf-8') as f:
+        json.dump(assessment_data, f, ensure_ascii=False, indent=2)
+
+    print(f"\nTotal ideas for assessment: {total_ideas}")
+    print(f"Output written to: {output_filepath}")
+
+    return assessment_data
+
+
+def list_experiment_files(results_dir: Path) -> list[Path]:
+    """List available deduped experiment files."""
+    return sorted(results_dir.glob('*_deduped.json'), key=lambda p: p.stat().st_mtime, reverse=True)
+
+
+def main():
+    """Main entry point."""
+    parser = argparse.ArgumentParser(
+        description='Prepare assessment data from experiment results.',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  python prepare_data.py                              # Use latest, all ideas
+  python prepare_data.py --sample 100                 # Sample 100 ideas total
+  python prepare_data.py --per-query 20               # Max 20 ideas per query
+  python prepare_data.py --per-condition 4            # 4 ideas per condition per query
+  python prepare_data.py --per-condition 4 --per-query 15  # Combined limits
+  python prepare_data.py --list                       # List available files
+
+Recommended for human assessment:
+  # 5 conditions × 4 ideas × 10 queries = 200 ideas (balanced)
+  python prepare_data.py --per-condition 4
+
+  # Or limit total to ~150 ideas
+  python prepare_data.py --sample 150
+        """
+    )
+    parser.add_argument(
+        'experiment_file',
+        nargs='?',
+        default=None,
+        help='Experiment file name (e.g., experiment_20260119_165650_deduped.json)'
+    )
+    parser.add_argument(
+        '--list', '-l',
+        action='store_true',
+        help='List available experiment files'
+    )
+    parser.add_argument(
+        '--sample',
+        type=int,
+        default=None,
+        metavar='N',
+        help='Total number of ideas to sample (proportionally across queries)'
+    )
+    parser.add_argument(
+        '--per-query',
+        type=int,
+        default=None,
+        metavar='N',
+        help='Maximum ideas per query'
+    )
+    parser.add_argument(
+        '--per-condition',
+        type=int,
+        default=None,
+        metavar='N',
+        help='Maximum ideas per condition per query (stratified sampling)'
+    )
+    parser.add_argument(
+        '--seed', '-s',
+        type=int,
+        default=42,
+        help='Random seed for shuffling (default: 42)'
+    )
+    args = parser.parse_args()
+
+    # Paths
+    base_dir = Path(__file__).parent.parent
+    results_dir = base_dir / 'results'
+    output_file = Path(__file__).parent / 'data' / 'assessment_items.json'
+
+    # List available files
+    available_files = list_experiment_files(results_dir)
+
+    if args.list:
+        print("Available experiment files (most recent first):")
+        for f in available_files:
+            size_kb = f.stat().st_size / 1024
+            print(f"  {f.name} ({size_kb:.1f} KB)")
+        return
+
+    # Determine which file to use
+    if args.experiment_file:
+        experiment_file = results_dir / args.experiment_file
+        if not experiment_file.exists():
+            # Try without .json extension
+            experiment_file = results_dir / f"{args.experiment_file}.json"
+    else:
+        # Use the latest deduped file
+        if not available_files:
+            print("Error: No deduped experiment files found in results directory.")
+            return
+        experiment_file = available_files[0]
+        print(f"Using latest experiment file: {experiment_file.name}")
+
+    if not experiment_file.exists():
+        print(f"Error: Experiment file not found: {experiment_file}")
+        print("\nAvailable files:")
+        for f in available_files:
+            print(f"  {f.name}")
+        return
+
+    prepare_assessment_data(
+        experiment_file,
+        output_file,
+        seed=args.seed,
+        sample_total=args.sample,
+        per_query=args.per_query,
+        per_condition=args.per_condition
+    )
+
+    # Verify output
+    with open(output_file, 'r') as f:
+        data = json.load(f)
+
+    print("\n--- Verification ---")
+    print(f"Queries: {data['query_count']}")
+    print(f"Total ideas: {data['total_ideas']}")
+
+    # Show distribution by condition (from hidden metadata)
+    condition_counts: dict[str, int] = {}
+    for query in data['queries']:
+        for idea in query['ideas']:
+            condition = idea['_hidden']['condition']
+            condition_counts[condition] = condition_counts.get(condition, 0) + 1
+
+    print("\nIdeas per condition:")
+    for condition, count in sorted(condition_counts.items()):
+        print(f"  {condition}: {count}")
+
+
+if __name__ == '__main__':
+    main()
--- a/experiments/assessment/results/ratings.db
+++ b/experiments/assessment/results/ratings.db
--- a/experiments/assessment/start.sh
+++ b/experiments/assessment/start.sh
@@ -0,0 +1,101 @@
+#!/bin/bash
+
+# Human Assessment Web Interface Start Script
+# This script starts both the backend API and frontend dev server
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m' # No Color
+
+echo -e "${GREEN}================================${NC}"
+echo -e "${GREEN}Creative Idea Assessment System${NC}"
+echo -e "${GREEN}================================${NC}"
+echo
+
+# Find Python with FastAPI (use project venv or system)
+VENV_PYTHON="$SCRIPT_DIR/../../backend/venv/bin/python"
+if [ -x "$VENV_PYTHON" ]; then
+    PYTHON_CMD="$VENV_PYTHON"
+    UVICORN_CMD="$SCRIPT_DIR/../../backend/venv/bin/uvicorn"
+else
+    PYTHON_CMD="python3"
+    UVICORN_CMD="uvicorn"
+fi
+
+# Check if assessment data exists
+if [ ! -f "data/assessment_items.json" ]; then
+    echo -e "${YELLOW}Assessment data not found. Running prepare_data.py...${NC}"
+    $PYTHON_CMD prepare_data.py
+    echo
+fi
+
+# Check if node_modules exist in frontend
+if [ ! -d "frontend/node_modules" ]; then
+    echo -e "${YELLOW}Installing frontend dependencies...${NC}"
+    cd frontend
+    npm install
+    cd ..
+    echo
+fi
+
+# Function to cleanup background processes on exit
+cleanup() {
+    echo
+    echo -e "${YELLOW}Shutting down...${NC}"
+    kill $BACKEND_PID 2>/dev/null || true
+    kill $FRONTEND_PID 2>/dev/null || true
+    exit 0
+}
+
+trap cleanup SIGINT SIGTERM
+
+# Start backend
+echo -e "${GREEN}Starting backend API on port 8002...${NC}"
+cd backend
+$UVICORN_CMD app:app --host 0.0.0.0 --port 8002 --reload &
+BACKEND_PID=$!
+cd ..
+
+# Wait for backend to start
+echo "Waiting for backend to initialize..."
+sleep 2
+
+# Check if backend is running
+if ! curl -s http://localhost:8002/api/health > /dev/null 2>&1; then
+    echo -e "${RED}Backend failed to start. Check for errors above.${NC}"
+    kill $BACKEND_PID 2>/dev/null || true
+    exit 1
+fi
+echo -e "${GREEN}Backend is running.${NC}"
+echo
+
+# Start frontend
+echo -e "${GREEN}Starting frontend on port 5174...${NC}"
+cd frontend
+npm run dev &
+FRONTEND_PID=$!
+cd ..
+
+# Wait for frontend to start
+sleep 3
+
+echo
+echo -e "${GREEN}================================${NC}"
+echo -e "${GREEN}Assessment system is running!${NC}"
+echo -e "${GREEN}================================${NC}"
+echo
+echo -e "Backend API:  ${YELLOW}http://localhost:8002${NC}"
+echo -e "Frontend UI:  ${YELLOW}http://localhost:5174${NC}"
+echo
+echo -e "Press Ctrl+C to stop all services"
+echo
+
+# Wait for any process to exit
+wait
--- a/experiments/assessment/stop.sh
+++ b/experiments/assessment/stop.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+# Stop the assessment system
+
+echo "Stopping assessment system..."
+
+# Kill backend (uvicorn on port 8002)
+pkill -f "uvicorn app:app.*8002" 2>/dev/null && echo "Backend stopped" || echo "Backend not running"
+
+# Kill frontend (vite on port 5174)
+pkill -f "vite.*5174" 2>/dev/null && echo "Frontend stopped" || echo "Frontend not running"
+
+echo "Done"