feat: Add experiments framework and novelty-driven agent loop
- Add complete experiments directory with pilot study infrastructure - 5 experimental conditions (direct, expert-only, attribute-only, full-pipeline, random-perspective) - Human assessment tool with React frontend and FastAPI backend - AUT flexibility analysis with jump signal detection - Result visualization and metrics computation - Add novelty-driven agent loop module (experiments/novelty_loop/) - NoveltyDrivenTaskAgent with expert perspective perturbation - Three termination strategies: breakthrough, exhaust, coverage - Interactive CLI demo with colored output - Embedding-based novelty scoring - Add DDC knowledge domain classification data (en/zh) - Add CLAUDE.md project documentation - Update research report with experiment findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
314
experiments/assessment/README.md
Normal file
314
experiments/assessment/README.md
Normal file
@@ -0,0 +1,314 @@
|
||||
# Human Assessment Web Interface
|
||||
|
||||
A standalone web application for human assessment of generated ideas using Torrance-inspired creativity metrics.
|
||||
|
||||
## Overview
|
||||
|
||||
This tool enables blind evaluation of creative ideas generated by the novelty-seeking experiment. Raters assess ideas on four dimensions without knowing which experimental condition produced each idea, ensuring unbiased evaluation.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
cd experiments/assessment
|
||||
|
||||
# 1. Prepare assessment data (if not already done)
|
||||
python3 prepare_data.py
|
||||
|
||||
# 2. Start the system
|
||||
./start.sh
|
||||
|
||||
# 3. Open browser
|
||||
open http://localhost:5174
|
||||
```
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
assessment/
|
||||
├── backend/
|
||||
│ ├── app.py # FastAPI backend API
|
||||
│ ├── database.py # SQLite database operations
|
||||
│ ├── models.py # Pydantic models & dimension definitions
|
||||
│ └── requirements.txt # Python dependencies
|
||||
├── frontend/
|
||||
│ ├── src/
|
||||
│ │ ├── components/ # React UI components
|
||||
│ │ ├── hooks/ # React state management
|
||||
│ │ ├── services/ # API client
|
||||
│ │ └── types/ # TypeScript definitions
|
||||
│ └── package.json
|
||||
├── data/
|
||||
│ └── assessment_items.json # Prepared ideas for rating
|
||||
├── results/
|
||||
│ └── ratings.db # SQLite database with ratings
|
||||
├── prepare_data.py # Data preparation script
|
||||
├── analyze_ratings.py # Inter-rater reliability analysis
|
||||
├── start.sh # Start both servers
|
||||
├── stop.sh # Stop all services
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## Data Preparation
|
||||
|
||||
### List Available Experiment Files
|
||||
|
||||
```bash
|
||||
python3 prepare_data.py --list
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
Available experiment files (most recent first):
|
||||
experiment_20260119_165650_deduped.json (1571.3 KB)
|
||||
experiment_20260119_163040_deduped.json (156.4 KB)
|
||||
```
|
||||
|
||||
### Prepare Assessment Data
|
||||
|
||||
```bash
|
||||
# Use all ideas (not recommended for human assessment)
|
||||
python3 prepare_data.py
|
||||
|
||||
# RECOMMENDED: Stratified sampling - 4 ideas per condition per query
|
||||
# Results in ~200 ideas (5 conditions × 4 ideas × 10 queries)
|
||||
python3 prepare_data.py --per-condition 4
|
||||
|
||||
# Alternative: Sample 150 ideas total (proportionally across queries)
|
||||
python3 prepare_data.py --sample 150
|
||||
|
||||
# Limit per query (20 ideas max per query)
|
||||
python3 prepare_data.py --per-query 20
|
||||
|
||||
# Combined: 4 per condition, max 15 per query
|
||||
python3 prepare_data.py --per-condition 4 --per-query 15
|
||||
|
||||
# Specify a different experiment file
|
||||
python3 prepare_data.py experiment_20260119_163040_deduped.json --per-condition 4
|
||||
```
|
||||
|
||||
### Sampling Options
|
||||
|
||||
| Option | Description | Example |
|
||||
|--------|-------------|---------|
|
||||
| `--per-condition N` | Max N ideas per condition per query (stratified) | `--per-condition 4` → ~200 ideas |
|
||||
| `--per-query N` | Max N ideas per query | `--per-query 20` |
|
||||
| `--sample N` | Total N ideas (proportionally distributed) | `--sample 150` |
|
||||
| `--seed N` | Random seed for reproducibility | `--seed 42` (default) |
|
||||
|
||||
**Recommendation**: Use `--per-condition 4` for balanced assessment across conditions.
|
||||
|
||||
The script:
|
||||
1. Loads the deduped experiment results
|
||||
2. Extracts all unique ideas with hidden metadata (condition, expert, keyword)
|
||||
3. Assigns stable IDs to each idea
|
||||
4. Shuffles ideas within each query (reproducible with seed=42)
|
||||
5. Outputs `data/assessment_items.json`
|
||||
|
||||
## Assessment Dimensions
|
||||
|
||||
Raters evaluate each idea on four dimensions using a 1-5 Likert scale:
|
||||
|
||||
### Originality
|
||||
*How unexpected or surprising is this idea?*
|
||||
|
||||
| Score | Description |
|
||||
|-------|-------------|
|
||||
| 1 | Very common/obvious idea anyone would suggest |
|
||||
| 2 | Somewhat common, slight variation on expected ideas |
|
||||
| 3 | Moderately original, some unexpected elements |
|
||||
| 4 | Quite original, notably different approach |
|
||||
| 5 | Highly unexpected, truly novel concept |
|
||||
|
||||
### Elaboration
|
||||
*How detailed and well-developed is this idea?*
|
||||
|
||||
| Score | Description |
|
||||
|-------|-------------|
|
||||
| 1 | Vague, minimal detail, just a concept |
|
||||
| 2 | Basic idea with little specificity |
|
||||
| 3 | Moderately detailed, some specifics provided |
|
||||
| 4 | Well-developed with clear implementation hints |
|
||||
| 5 | Highly specific, thoroughly developed concept |
|
||||
|
||||
### Coherence
|
||||
*Does this idea make logical sense and relate to the query object?*
|
||||
|
||||
| Score | Description |
|
||||
|-------|-------------|
|
||||
| 1 | Nonsensical, irrelevant, or incomprehensible |
|
||||
| 2 | Mostly unclear, weak connection to query |
|
||||
| 3 | Partially coherent, some logical gaps |
|
||||
| 4 | Mostly coherent with minor issues |
|
||||
| 5 | Fully coherent, clearly relates to query |
|
||||
|
||||
### Usefulness
|
||||
*Could this idea have practical value or inspire real innovation?*
|
||||
|
||||
| Score | Description |
|
||||
|-------|-------------|
|
||||
| 1 | No practical value whatsoever |
|
||||
| 2 | Minimal usefulness, highly impractical |
|
||||
| 3 | Some potential value with major limitations |
|
||||
| 4 | Useful idea with realistic applications |
|
||||
| 5 | Highly useful, clear practical value |
|
||||
|
||||
## Running the System
|
||||
|
||||
### Start
|
||||
|
||||
```bash
|
||||
./start.sh
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Check for `data/assessment_items.json` (runs `prepare_data.py` if missing)
|
||||
2. Install frontend dependencies if needed
|
||||
3. Start backend API on port 8002
|
||||
4. Start frontend dev server on port 5174
|
||||
|
||||
### Stop
|
||||
|
||||
```bash
|
||||
./stop.sh
|
||||
```
|
||||
|
||||
Or press `Ctrl+C` in the terminal running `start.sh`.
|
||||
|
||||
### Manual Start (Development)
|
||||
|
||||
```bash
|
||||
# Terminal 1: Backend
|
||||
cd backend
|
||||
../../../backend/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8002 --reload
|
||||
|
||||
# Terminal 2: Frontend
|
||||
cd frontend
|
||||
npm run dev
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/api/health` | GET | Health check |
|
||||
| `/api/info` | GET | Experiment info (total ideas, queries, conditions) |
|
||||
| `/api/dimensions` | GET | Dimension definitions for UI |
|
||||
| `/api/raters` | GET | List all raters |
|
||||
| `/api/raters` | POST | Register/login rater |
|
||||
| `/api/queries` | GET | List all queries |
|
||||
| `/api/queries/{id}` | GET | Get query with all ideas |
|
||||
| `/api/queries/{id}/unrated?rater_id=X` | GET | Get unrated ideas for rater |
|
||||
| `/api/ratings` | POST | Submit a rating |
|
||||
| `/api/progress/{rater_id}` | GET | Get rater's progress |
|
||||
| `/api/statistics` | GET | Overall statistics |
|
||||
| `/api/export` | GET | Export all ratings with metadata |
|
||||
|
||||
## Analysis
|
||||
|
||||
After collecting ratings from multiple raters:
|
||||
|
||||
```bash
|
||||
python3 analyze_ratings.py
|
||||
```
|
||||
|
||||
This calculates:
|
||||
- **Krippendorff's alpha**: Inter-rater reliability for ordinal data
|
||||
- **ICC(2,1)**: Intraclass Correlation Coefficient with 95% CI
|
||||
- **Mean ratings per condition**: Compare experimental conditions
|
||||
- **Kruskal-Wallis test**: Statistical significance between conditions
|
||||
|
||||
Output is saved to `results/analysis_results.json`.
|
||||
|
||||
## Database Schema
|
||||
|
||||
SQLite database (`results/ratings.db`):
|
||||
|
||||
```sql
|
||||
-- Raters
|
||||
CREATE TABLE raters (
|
||||
rater_id TEXT PRIMARY KEY,
|
||||
name TEXT,
|
||||
created_at TIMESTAMP
|
||||
);
|
||||
|
||||
-- Ratings
|
||||
CREATE TABLE ratings (
|
||||
id INTEGER PRIMARY KEY,
|
||||
rater_id TEXT,
|
||||
idea_id TEXT,
|
||||
query_id TEXT,
|
||||
originality INTEGER CHECK(originality BETWEEN 1 AND 5),
|
||||
elaboration INTEGER CHECK(elaboration BETWEEN 1 AND 5),
|
||||
coherence INTEGER CHECK(coherence BETWEEN 1 AND 5),
|
||||
usefulness INTEGER CHECK(usefulness BETWEEN 1 AND 5),
|
||||
skipped INTEGER DEFAULT 0,
|
||||
timestamp TIMESTAMP,
|
||||
UNIQUE(rater_id, idea_id)
|
||||
);
|
||||
|
||||
-- Progress tracking
|
||||
CREATE TABLE progress (
|
||||
rater_id TEXT,
|
||||
query_id TEXT,
|
||||
completed_count INTEGER,
|
||||
total_count INTEGER,
|
||||
PRIMARY KEY (rater_id, query_id)
|
||||
);
|
||||
```
|
||||
|
||||
## Blind Assessment Design
|
||||
|
||||
To ensure unbiased evaluation:
|
||||
|
||||
1. **Randomization**: Ideas are shuffled within each query using a fixed seed (42) for reproducibility
|
||||
2. **Hidden metadata**: Condition, expert name, and keywords are stored but not shown to raters
|
||||
3. **Consistent ordering**: All raters see the same randomized order
|
||||
4. **Context provided**: Only the query text is shown (e.g., "Chair", "Bicycle")
|
||||
|
||||
## Workflow for Raters
|
||||
|
||||
1. **Login**: Enter a unique rater ID
|
||||
2. **Instructions**: Read dimension definitions (shown before first rating)
|
||||
3. **Rate ideas**: For each idea:
|
||||
- Read the idea text
|
||||
- Rate all 4 dimensions (1-5)
|
||||
- Click "Submit & Next" or "Skip"
|
||||
4. **Progress**: Track completion per query and overall
|
||||
5. **Completion**: Summary shown when all ideas are rated
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Backend won't start
|
||||
```bash
|
||||
# Check if port 8002 is in use
|
||||
lsof -i :8002
|
||||
|
||||
# Check backend logs
|
||||
cat /tmp/assessment_backend.log
|
||||
```
|
||||
|
||||
### Frontend won't start
|
||||
```bash
|
||||
# Reinstall dependencies
|
||||
cd frontend
|
||||
rm -rf node_modules
|
||||
npm install
|
||||
```
|
||||
|
||||
### Reset database
|
||||
```bash
|
||||
rm results/ratings.db
|
||||
# Database is auto-created on next backend start
|
||||
```
|
||||
|
||||
### Regenerate assessment data
|
||||
```bash
|
||||
rm data/assessment_items.json
|
||||
python3 prepare_data.py
|
||||
```
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Backend**: Python 3.11+, FastAPI, SQLite, Pydantic
|
||||
- **Frontend**: React 19, TypeScript, Vite, Ant Design 6.0
|
||||
- **Analysis**: NumPy, SciPy (for statistical tests)
|
||||
356
experiments/assessment/analyze_ratings.py
Executable file
356
experiments/assessment/analyze_ratings.py
Executable file
@@ -0,0 +1,356 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze assessment ratings for inter-rater reliability and condition comparisons.
|
||||
|
||||
This script:
|
||||
1. Loads ratings from the SQLite database
|
||||
2. Joins with hidden metadata (condition, expert)
|
||||
3. Calculates inter-rater reliability metrics
|
||||
4. Computes mean ratings per dimension per condition
|
||||
5. Performs statistical comparisons between conditions
|
||||
"""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
from collections import defaultdict
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import numpy as np
|
||||
from scipy import stats
|
||||
|
||||
|
||||
# Paths
|
||||
RESULTS_DIR = Path(__file__).parent / 'results'
|
||||
DATA_DIR = Path(__file__).parent / 'data'
|
||||
DB_PATH = RESULTS_DIR / 'ratings.db'
|
||||
ASSESSMENT_DATA_PATH = DATA_DIR / 'assessment_items.json'
|
||||
|
||||
|
||||
def load_assessment_data() -> dict[str, Any]:
|
||||
"""Load the assessment items data with hidden metadata."""
|
||||
with open(ASSESSMENT_DATA_PATH, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def load_ratings_from_db() -> list[dict[str, Any]]:
|
||||
"""Load all ratings from the SQLite database."""
|
||||
if not DB_PATH.exists():
|
||||
print(f"Database not found at {DB_PATH}")
|
||||
return []
|
||||
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
conn.row_factory = sqlite3.Row
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('''
|
||||
SELECT r.*, rat.name as rater_name
|
||||
FROM ratings r
|
||||
LEFT JOIN raters rat ON r.rater_id = rat.rater_id
|
||||
WHERE r.skipped = 0
|
||||
''')
|
||||
|
||||
ratings = [dict(row) for row in cursor.fetchall()]
|
||||
conn.close()
|
||||
|
||||
return ratings
|
||||
|
||||
|
||||
def build_idea_lookup(assessment_data: dict[str, Any]) -> dict[str, dict[str, Any]]:
|
||||
"""Build a lookup table from idea_id to metadata."""
|
||||
lookup = {}
|
||||
for query in assessment_data['queries']:
|
||||
for idea in query['ideas']:
|
||||
lookup[idea['idea_id']] = {
|
||||
'text': idea['text'],
|
||||
'query_id': query['query_id'],
|
||||
'query_text': query['query_text'],
|
||||
**idea['_hidden']
|
||||
}
|
||||
return lookup
|
||||
|
||||
|
||||
def calculate_krippendorff_alpha(ratings_matrix: np.ndarray) -> float:
|
||||
"""
|
||||
Calculate Krippendorff's alpha for ordinal data.
|
||||
|
||||
Args:
|
||||
ratings_matrix: 2D array where rows are items and columns are raters.
|
||||
NaN values indicate missing ratings.
|
||||
|
||||
Returns:
|
||||
Krippendorff's alpha coefficient
|
||||
"""
|
||||
# Remove items with fewer than 2 raters
|
||||
valid_items = ~np.all(np.isnan(ratings_matrix), axis=1)
|
||||
ratings_matrix = ratings_matrix[valid_items]
|
||||
|
||||
if ratings_matrix.shape[0] < 2:
|
||||
return np.nan
|
||||
|
||||
n_items, n_raters = ratings_matrix.shape
|
||||
|
||||
# Observed disagreement
|
||||
observed_disagreement = 0
|
||||
n_pairs = 0
|
||||
|
||||
for i in range(n_items):
|
||||
values = ratings_matrix[i, ~np.isnan(ratings_matrix[i])]
|
||||
if len(values) < 2:
|
||||
continue
|
||||
# Ordinal distance: squared difference
|
||||
for j in range(len(values)):
|
||||
for k in range(j + 1, len(values)):
|
||||
observed_disagreement += (values[j] - values[k]) ** 2
|
||||
n_pairs += 1
|
||||
|
||||
if n_pairs == 0:
|
||||
return np.nan
|
||||
|
||||
observed_disagreement /= n_pairs
|
||||
|
||||
# Expected disagreement (based on marginal distribution)
|
||||
all_values = ratings_matrix[~np.isnan(ratings_matrix)]
|
||||
if len(all_values) < 2:
|
||||
return np.nan
|
||||
|
||||
expected_disagreement = 0
|
||||
n_total_pairs = 0
|
||||
for i in range(len(all_values)):
|
||||
for j in range(i + 1, len(all_values)):
|
||||
expected_disagreement += (all_values[i] - all_values[j]) ** 2
|
||||
n_total_pairs += 1
|
||||
|
||||
if n_total_pairs == 0:
|
||||
return np.nan
|
||||
|
||||
expected_disagreement /= n_total_pairs
|
||||
|
||||
if expected_disagreement == 0:
|
||||
return 1.0
|
||||
|
||||
alpha = 1 - (observed_disagreement / expected_disagreement)
|
||||
return alpha
|
||||
|
||||
|
||||
def calculate_icc(ratings_matrix: np.ndarray) -> tuple[float, float, float]:
|
||||
"""
|
||||
Calculate Intraclass Correlation Coefficient (ICC(2,1)).
|
||||
|
||||
Args:
|
||||
ratings_matrix: 2D array where rows are items and columns are raters.
|
||||
|
||||
Returns:
|
||||
Tuple of (ICC, lower_bound, upper_bound)
|
||||
"""
|
||||
# Remove rows with any NaN
|
||||
valid_rows = ~np.any(np.isnan(ratings_matrix), axis=1)
|
||||
ratings_matrix = ratings_matrix[valid_rows]
|
||||
|
||||
if ratings_matrix.shape[0] < 2 or ratings_matrix.shape[1] < 2:
|
||||
return np.nan, np.nan, np.nan
|
||||
|
||||
n, k = ratings_matrix.shape
|
||||
|
||||
# Grand mean
|
||||
grand_mean = np.mean(ratings_matrix)
|
||||
|
||||
# Row means (item means)
|
||||
row_means = np.mean(ratings_matrix, axis=1)
|
||||
|
||||
# Column means (rater means)
|
||||
col_means = np.mean(ratings_matrix, axis=0)
|
||||
|
||||
# Sum of squares
|
||||
ss_total = np.sum((ratings_matrix - grand_mean) ** 2)
|
||||
ss_rows = k * np.sum((row_means - grand_mean) ** 2)
|
||||
ss_cols = n * np.sum((col_means - grand_mean) ** 2)
|
||||
ss_error = ss_total - ss_rows - ss_cols
|
||||
|
||||
# Mean squares
|
||||
ms_rows = ss_rows / (n - 1) if n > 1 else 0
|
||||
ms_cols = ss_cols / (k - 1) if k > 1 else 0
|
||||
ms_error = ss_error / ((n - 1) * (k - 1)) if (n > 1 and k > 1) else 0
|
||||
|
||||
# ICC(2,1) - two-way random, absolute agreement, single rater
|
||||
if ms_error + (ms_cols - ms_error) / n == 0:
|
||||
return np.nan, np.nan, np.nan
|
||||
|
||||
icc = (ms_rows - ms_error) / (ms_rows + (k - 1) * ms_error + k * (ms_cols - ms_error) / n)
|
||||
|
||||
# Confidence interval (approximate)
|
||||
# Using F distribution
|
||||
df1 = n - 1
|
||||
df2 = (n - 1) * (k - 1)
|
||||
|
||||
if ms_error == 0:
|
||||
return icc, np.nan, np.nan
|
||||
|
||||
f_value = ms_rows / ms_error
|
||||
f_lower = f_value / stats.f.ppf(0.975, df1, df2)
|
||||
f_upper = f_value / stats.f.ppf(0.025, df1, df2)
|
||||
|
||||
icc_lower = (f_lower - 1) / (f_lower + k - 1)
|
||||
icc_upper = (f_upper - 1) / (f_upper + k - 1)
|
||||
|
||||
return icc, icc_lower, icc_upper
|
||||
|
||||
|
||||
def analyze_ratings():
|
||||
"""Main analysis function."""
|
||||
print("=" * 60)
|
||||
print("CREATIVE IDEA ASSESSMENT ANALYSIS")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# Load data
|
||||
assessment_data = load_assessment_data()
|
||||
ratings = load_ratings_from_db()
|
||||
idea_lookup = build_idea_lookup(assessment_data)
|
||||
|
||||
if not ratings:
|
||||
print("No ratings found in database.")
|
||||
return
|
||||
|
||||
print(f"Loaded {len(ratings)} ratings from database")
|
||||
print(f"Experiment ID: {assessment_data['experiment_id']}")
|
||||
print()
|
||||
|
||||
# Get unique raters
|
||||
raters = list(set(r['rater_id'] for r in ratings))
|
||||
print(f"Raters: {raters}")
|
||||
print()
|
||||
|
||||
# Join ratings with metadata
|
||||
enriched_ratings = []
|
||||
for r in ratings:
|
||||
idea_meta = idea_lookup.get(r['idea_id'], {})
|
||||
enriched_ratings.append({
|
||||
**r,
|
||||
'condition': idea_meta.get('condition', 'unknown'),
|
||||
'expert_name': idea_meta.get('expert_name', ''),
|
||||
'keyword': idea_meta.get('keyword', ''),
|
||||
'query_text': idea_meta.get('query_text', ''),
|
||||
'idea_text': idea_meta.get('text', '')
|
||||
})
|
||||
|
||||
# Dimensions
|
||||
dimensions = ['originality', 'elaboration', 'coherence', 'usefulness']
|
||||
|
||||
# ================================
|
||||
# Inter-rater reliability
|
||||
# ================================
|
||||
print("-" * 60)
|
||||
print("INTER-RATER RELIABILITY")
|
||||
print("-" * 60)
|
||||
print()
|
||||
|
||||
if len(raters) >= 2:
|
||||
# Build ratings matrix per dimension
|
||||
idea_ids = list(set(r['idea_id'] for r in enriched_ratings))
|
||||
|
||||
for dim in dimensions:
|
||||
# Create matrix: rows = ideas, cols = raters
|
||||
matrix = np.full((len(idea_ids), len(raters)), np.nan)
|
||||
idea_to_idx = {idea: idx for idx, idea in enumerate(idea_ids)}
|
||||
rater_to_idx = {rater: idx for idx, rater in enumerate(raters)}
|
||||
|
||||
for r in enriched_ratings:
|
||||
if r[dim] is not None:
|
||||
i = idea_to_idx[r['idea_id']]
|
||||
j = rater_to_idx[r['rater_id']]
|
||||
matrix[i, j] = r[dim]
|
||||
|
||||
# Calculate metrics
|
||||
alpha = calculate_krippendorff_alpha(matrix)
|
||||
icc, icc_low, icc_high = calculate_icc(matrix)
|
||||
|
||||
print(f"{dim.upper()}:")
|
||||
print(f" Krippendorff's alpha: {alpha:.3f}")
|
||||
print(f" ICC(2,1): {icc:.3f} (95% CI: {icc_low:.3f} - {icc_high:.3f})")
|
||||
print()
|
||||
else:
|
||||
print("Need at least 2 raters for inter-rater reliability analysis.")
|
||||
print()
|
||||
|
||||
# ================================
|
||||
# Condition comparisons
|
||||
# ================================
|
||||
print("-" * 60)
|
||||
print("MEAN RATINGS BY CONDITION")
|
||||
print("-" * 60)
|
||||
print()
|
||||
|
||||
# Group ratings by condition
|
||||
condition_ratings: dict[str, dict[str, list[int]]] = defaultdict(lambda: defaultdict(list))
|
||||
|
||||
for r in enriched_ratings:
|
||||
condition = r['condition']
|
||||
for dim in dimensions:
|
||||
if r[dim] is not None:
|
||||
condition_ratings[condition][dim].append(r[dim])
|
||||
|
||||
# Calculate means and print
|
||||
condition_stats = {}
|
||||
for condition in sorted(condition_ratings.keys()):
|
||||
print(f"\n{condition}:")
|
||||
condition_stats[condition] = {}
|
||||
for dim in dimensions:
|
||||
values = condition_ratings[condition][dim]
|
||||
if values:
|
||||
mean = np.mean(values)
|
||||
std = np.std(values)
|
||||
n = len(values)
|
||||
condition_stats[condition][dim] = {'mean': mean, 'std': std, 'n': n}
|
||||
print(f" {dim}: {mean:.2f} (SD={std:.2f}, n={n})")
|
||||
else:
|
||||
print(f" {dim}: no data")
|
||||
|
||||
# ================================
|
||||
# Statistical comparisons
|
||||
# ================================
|
||||
print()
|
||||
print("-" * 60)
|
||||
print("STATISTICAL COMPARISONS (Kruskal-Wallis)")
|
||||
print("-" * 60)
|
||||
print()
|
||||
|
||||
conditions = sorted(condition_ratings.keys())
|
||||
if len(conditions) >= 2:
|
||||
for dim in dimensions:
|
||||
groups = [condition_ratings[c][dim] for c in conditions if condition_ratings[c][dim]]
|
||||
if len(groups) >= 2:
|
||||
h_stat, p_value = stats.kruskal(*groups)
|
||||
sig = "*" if p_value < 0.05 else ""
|
||||
print(f"{dim}: H={h_stat:.2f}, p={p_value:.4f} {sig}")
|
||||
else:
|
||||
print(f"{dim}: insufficient data for comparison")
|
||||
else:
|
||||
print("Need at least 2 conditions with data for statistical comparison.")
|
||||
|
||||
# ================================
|
||||
# Export results
|
||||
# ================================
|
||||
output = {
|
||||
'analysis_timestamp': datetime.utcnow().isoformat(),
|
||||
'experiment_id': assessment_data['experiment_id'],
|
||||
'total_ratings': len(ratings),
|
||||
'raters': raters,
|
||||
'rater_count': len(raters),
|
||||
'condition_stats': condition_stats,
|
||||
'enriched_ratings': enriched_ratings
|
||||
}
|
||||
|
||||
output_path = RESULTS_DIR / 'analysis_results.json'
|
||||
with open(output_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(output, f, ensure_ascii=False, indent=2, default=str)
|
||||
|
||||
print()
|
||||
print("-" * 60)
|
||||
print(f"Results exported to: {output_path}")
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
analyze_ratings()
|
||||
1
experiments/assessment/backend/__init__.py
Normal file
1
experiments/assessment/backend/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Assessment backend package."""
|
||||
374
experiments/assessment/backend/app.py
Normal file
374
experiments/assessment/backend/app.py
Normal file
@@ -0,0 +1,374 @@
|
||||
"""
|
||||
FastAPI backend for human assessment of creative ideas.
|
||||
"""
|
||||
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from fastapi import FastAPI, HTTPException
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
|
||||
try:
|
||||
from . import database as db
|
||||
from .models import (
|
||||
DIMENSION_DEFINITIONS,
|
||||
ExportData,
|
||||
ExportRating,
|
||||
IdeaForRating,
|
||||
Progress,
|
||||
QueryInfo,
|
||||
QueryWithIdeas,
|
||||
Rater,
|
||||
RaterCreate,
|
||||
RaterProgress,
|
||||
Rating,
|
||||
RatingSubmit,
|
||||
Statistics,
|
||||
)
|
||||
except ImportError:
|
||||
import database as db
|
||||
from models import (
|
||||
DIMENSION_DEFINITIONS,
|
||||
ExportData,
|
||||
ExportRating,
|
||||
IdeaForRating,
|
||||
Progress,
|
||||
QueryInfo,
|
||||
QueryWithIdeas,
|
||||
Rater,
|
||||
RaterCreate,
|
||||
RaterProgress,
|
||||
Rating,
|
||||
RatingSubmit,
|
||||
Statistics,
|
||||
)
|
||||
|
||||
|
||||
# Load assessment data
|
||||
DATA_PATH = Path(__file__).parent.parent / 'data' / 'assessment_items.json'
|
||||
|
||||
|
||||
def load_assessment_data() -> dict[str, Any]:
|
||||
"""Load the assessment items data."""
|
||||
if not DATA_PATH.exists():
|
||||
raise RuntimeError(f"Assessment data not found at {DATA_PATH}. Run prepare_data.py first.")
|
||||
with open(DATA_PATH, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
# Initialize FastAPI app
|
||||
app = FastAPI(
|
||||
title="Creative Idea Assessment API",
|
||||
description="API for human assessment of creative ideas using Torrance-inspired metrics",
|
||||
version="1.0.0"
|
||||
)
|
||||
|
||||
# CORS middleware
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
|
||||
# Cache for assessment data
|
||||
_assessment_data: dict[str, Any] | None = None
|
||||
|
||||
|
||||
def get_assessment_data() -> dict[str, Any]:
|
||||
"""Get cached assessment data."""
|
||||
global _assessment_data
|
||||
if _assessment_data is None:
|
||||
_assessment_data = load_assessment_data()
|
||||
return _assessment_data
|
||||
|
||||
|
||||
# Rater endpoints
|
||||
@app.get("/api/raters", response_model=list[Rater])
|
||||
def list_raters() -> list[dict[str, Any]]:
|
||||
"""List all registered raters."""
|
||||
return db.list_raters()
|
||||
|
||||
|
||||
@app.post("/api/raters", response_model=Rater)
|
||||
def create_or_get_rater(rater_data: RaterCreate) -> dict[str, Any]:
|
||||
"""Register a new rater or get existing one."""
|
||||
return db.create_rater(rater_data.rater_id, rater_data.name)
|
||||
|
||||
|
||||
@app.get("/api/raters/{rater_id}", response_model=Rater)
|
||||
def get_rater(rater_id: str) -> dict[str, Any]:
|
||||
"""Get a specific rater."""
|
||||
rater = db.get_rater(rater_id)
|
||||
if not rater:
|
||||
raise HTTPException(status_code=404, detail="Rater not found")
|
||||
return rater
|
||||
|
||||
|
||||
# Query endpoints
|
||||
@app.get("/api/queries", response_model=list[QueryInfo])
|
||||
def list_queries() -> list[dict[str, Any]]:
|
||||
"""List all queries available for assessment."""
|
||||
data = get_assessment_data()
|
||||
return [
|
||||
{
|
||||
'query_id': q['query_id'],
|
||||
'query_text': q['query_text'],
|
||||
'category': q.get('category', ''),
|
||||
'idea_count': q['idea_count']
|
||||
}
|
||||
for q in data['queries']
|
||||
]
|
||||
|
||||
|
||||
@app.get("/api/queries/{query_id}", response_model=QueryWithIdeas)
|
||||
def get_query_with_ideas(query_id: str) -> dict[str, Any]:
|
||||
"""Get a query with all its ideas for rating (without hidden metadata)."""
|
||||
data = get_assessment_data()
|
||||
|
||||
for query in data['queries']:
|
||||
if query['query_id'] == query_id:
|
||||
ideas = [
|
||||
IdeaForRating(
|
||||
idea_id=idea['idea_id'],
|
||||
text=idea['text'],
|
||||
index=idx
|
||||
)
|
||||
for idx, idea in enumerate(query['ideas'])
|
||||
]
|
||||
return QueryWithIdeas(
|
||||
query_id=query['query_id'],
|
||||
query_text=query['query_text'],
|
||||
category=query.get('category', ''),
|
||||
ideas=ideas,
|
||||
total_count=len(ideas)
|
||||
)
|
||||
|
||||
raise HTTPException(status_code=404, detail="Query not found")
|
||||
|
||||
|
||||
@app.get("/api/queries/{query_id}/unrated", response_model=QueryWithIdeas)
|
||||
def get_unrated_ideas(query_id: str, rater_id: str) -> dict[str, Any]:
|
||||
"""Get unrated ideas for a query by a specific rater."""
|
||||
data = get_assessment_data()
|
||||
|
||||
for query in data['queries']:
|
||||
if query['query_id'] == query_id:
|
||||
# Get already rated idea IDs
|
||||
rated_ids = db.get_rated_idea_ids(rater_id, query_id)
|
||||
|
||||
# Filter to unrated ideas
|
||||
unrated_ideas = [
|
||||
IdeaForRating(
|
||||
idea_id=idea['idea_id'],
|
||||
text=idea['text'],
|
||||
index=idx
|
||||
)
|
||||
for idx, idea in enumerate(query['ideas'])
|
||||
if idea['idea_id'] not in rated_ids
|
||||
]
|
||||
|
||||
return QueryWithIdeas(
|
||||
query_id=query['query_id'],
|
||||
query_text=query['query_text'],
|
||||
category=query.get('category', ''),
|
||||
ideas=unrated_ideas,
|
||||
total_count=query['idea_count']
|
||||
)
|
||||
|
||||
raise HTTPException(status_code=404, detail="Query not found")
|
||||
|
||||
|
||||
# Rating endpoints
|
||||
@app.post("/api/ratings", response_model=dict[str, Any])
|
||||
def submit_rating(rating: RatingSubmit) -> dict[str, Any]:
|
||||
"""Submit a rating for an idea."""
|
||||
# Validate that rater exists
|
||||
rater = db.get_rater(rating.rater_id)
|
||||
if not rater:
|
||||
raise HTTPException(status_code=404, detail="Rater not found. Please register first.")
|
||||
|
||||
# Validate idea exists
|
||||
data = get_assessment_data()
|
||||
idea_found = False
|
||||
for query in data['queries']:
|
||||
for idea in query['ideas']:
|
||||
if idea['idea_id'] == rating.idea_id:
|
||||
idea_found = True
|
||||
break
|
||||
if idea_found:
|
||||
break
|
||||
|
||||
if not idea_found:
|
||||
raise HTTPException(status_code=404, detail="Idea not found")
|
||||
|
||||
# If not skipped, require all ratings
|
||||
if not rating.skipped:
|
||||
if rating.originality is None or rating.elaboration is None or rating.coherence is None or rating.usefulness is None:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail="All dimensions must be rated unless skipping"
|
||||
)
|
||||
|
||||
# Save rating
|
||||
return db.save_rating(
|
||||
rater_id=rating.rater_id,
|
||||
idea_id=rating.idea_id,
|
||||
query_id=rating.query_id,
|
||||
originality=rating.originality,
|
||||
elaboration=rating.elaboration,
|
||||
coherence=rating.coherence,
|
||||
usefulness=rating.usefulness,
|
||||
skipped=rating.skipped
|
||||
)
|
||||
|
||||
|
||||
@app.get("/api/ratings/{rater_id}/{idea_id}", response_model=Rating | None)
|
||||
def get_rating(rater_id: str, idea_id: str) -> dict[str, Any] | None:
|
||||
"""Get a specific rating."""
|
||||
return db.get_rating(rater_id, idea_id)
|
||||
|
||||
|
||||
@app.get("/api/ratings/rater/{rater_id}", response_model=list[Rating])
|
||||
def get_ratings_by_rater(rater_id: str) -> list[dict[str, Any]]:
|
||||
"""Get all ratings by a rater."""
|
||||
return db.get_ratings_by_rater(rater_id)
|
||||
|
||||
|
||||
# Progress endpoints
|
||||
@app.get("/api/progress/{rater_id}", response_model=RaterProgress)
|
||||
def get_rater_progress(rater_id: str) -> RaterProgress:
|
||||
"""Get complete progress for a rater."""
|
||||
rater = db.get_rater(rater_id)
|
||||
if not rater:
|
||||
raise HTTPException(status_code=404, detail="Rater not found")
|
||||
|
||||
data = get_assessment_data()
|
||||
|
||||
# Get rated idea counts per query
|
||||
ratings = db.get_ratings_by_rater(rater_id)
|
||||
ratings_per_query: dict[str, int] = {}
|
||||
for r in ratings:
|
||||
qid = r['query_id']
|
||||
ratings_per_query[qid] = ratings_per_query.get(qid, 0) + 1
|
||||
|
||||
# Build progress list
|
||||
query_progress = []
|
||||
total_completed = 0
|
||||
total_ideas = 0
|
||||
|
||||
for query in data['queries']:
|
||||
qid = query['query_id']
|
||||
completed = ratings_per_query.get(qid, 0)
|
||||
total = query['idea_count']
|
||||
|
||||
query_progress.append(Progress(
|
||||
rater_id=rater_id,
|
||||
query_id=qid,
|
||||
completed_count=completed,
|
||||
total_count=total
|
||||
))
|
||||
|
||||
total_completed += completed
|
||||
total_ideas += total
|
||||
|
||||
percentage = (total_completed / total_ideas * 100) if total_ideas > 0 else 0
|
||||
|
||||
return RaterProgress(
|
||||
rater_id=rater_id,
|
||||
queries=query_progress,
|
||||
total_completed=total_completed,
|
||||
total_ideas=total_ideas,
|
||||
percentage=round(percentage, 1)
|
||||
)
|
||||
|
||||
|
||||
# Statistics endpoint
|
||||
@app.get("/api/statistics", response_model=Statistics)
|
||||
def get_statistics() -> Statistics:
|
||||
"""Get overall assessment statistics."""
|
||||
stats = db.get_statistics()
|
||||
return Statistics(**stats)
|
||||
|
||||
|
||||
# Dimension definitions endpoint
|
||||
@app.get("/api/dimensions")
|
||||
def get_dimensions() -> dict[str, Any]:
|
||||
"""Get dimension definitions for the UI."""
|
||||
return DIMENSION_DEFINITIONS
|
||||
|
||||
|
||||
# Export endpoint
|
||||
@app.get("/api/export", response_model=ExportData)
|
||||
def export_ratings() -> ExportData:
|
||||
"""Export all ratings with hidden metadata for analysis."""
|
||||
data = get_assessment_data()
|
||||
all_ratings = db.get_all_ratings()
|
||||
|
||||
# Build idea lookup with hidden metadata
|
||||
idea_lookup: dict[str, dict[str, Any]] = {}
|
||||
query_lookup: dict[str, str] = {}
|
||||
|
||||
for query in data['queries']:
|
||||
query_lookup[query['query_id']] = query['query_text']
|
||||
for idea in query['ideas']:
|
||||
idea_lookup[idea['idea_id']] = {
|
||||
'text': idea['text'],
|
||||
'condition': idea['_hidden']['condition'],
|
||||
'expert_name': idea['_hidden']['expert_name'],
|
||||
'keyword': idea['_hidden']['keyword']
|
||||
}
|
||||
|
||||
# Build export ratings
|
||||
export_ratings = []
|
||||
for r in all_ratings:
|
||||
idea_data = idea_lookup.get(r['idea_id'], {})
|
||||
export_ratings.append(ExportRating(
|
||||
rater_id=r['rater_id'],
|
||||
idea_id=r['idea_id'],
|
||||
query_id=r['query_id'],
|
||||
query_text=query_lookup.get(r['query_id'], ''),
|
||||
idea_text=idea_data.get('text', ''),
|
||||
originality=r['originality'],
|
||||
elaboration=r['elaboration'],
|
||||
coherence=r['coherence'],
|
||||
usefulness=r['usefulness'],
|
||||
skipped=bool(r['skipped']),
|
||||
condition=idea_data.get('condition', ''),
|
||||
expert_name=idea_data.get('expert_name', ''),
|
||||
keyword=idea_data.get('keyword', ''),
|
||||
timestamp=r['timestamp']
|
||||
))
|
||||
|
||||
return ExportData(
|
||||
experiment_id=data['experiment_id'],
|
||||
export_timestamp=datetime.utcnow(),
|
||||
rater_count=len(db.list_raters()),
|
||||
rating_count=len(export_ratings),
|
||||
ratings=export_ratings
|
||||
)
|
||||
|
||||
|
||||
# Health check
|
||||
@app.get("/api/health")
|
||||
def health_check() -> dict[str, str]:
|
||||
"""Health check endpoint."""
|
||||
return {"status": "healthy"}
|
||||
|
||||
|
||||
# Info endpoint
|
||||
@app.get("/api/info")
|
||||
def get_info() -> dict[str, Any]:
|
||||
"""Get assessment session info."""
|
||||
data = get_assessment_data()
|
||||
return {
|
||||
'experiment_id': data['experiment_id'],
|
||||
'total_ideas': data['total_ideas'],
|
||||
'query_count': data['query_count'],
|
||||
'conditions': data['conditions'],
|
||||
'randomization_seed': data['randomization_seed']
|
||||
}
|
||||
309
experiments/assessment/backend/database.py
Normal file
309
experiments/assessment/backend/database.py
Normal file
@@ -0,0 +1,309 @@
|
||||
"""
|
||||
SQLite database setup and operations for assessment ratings storage.
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
from contextlib import contextmanager
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Generator
|
||||
|
||||
|
||||
# Database path
|
||||
DB_PATH = Path(__file__).parent.parent / 'results' / 'ratings.db'
|
||||
|
||||
|
||||
def get_db_path() -> Path:
|
||||
"""Get the database path, ensuring directory exists."""
|
||||
DB_PATH.parent.mkdir(parents=True, exist_ok=True)
|
||||
return DB_PATH
|
||||
|
||||
|
||||
@contextmanager
|
||||
def get_connection() -> Generator[sqlite3.Connection, None, None]:
|
||||
"""Get a database connection as a context manager."""
|
||||
conn = sqlite3.connect(get_db_path())
|
||||
conn.row_factory = sqlite3.Row
|
||||
try:
|
||||
yield conn
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def init_db() -> None:
|
||||
"""Initialize the database with required tables."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Raters table
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS raters (
|
||||
rater_id TEXT PRIMARY KEY,
|
||||
name TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
''')
|
||||
|
||||
# Ratings table
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS ratings (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
rater_id TEXT NOT NULL,
|
||||
idea_id TEXT NOT NULL,
|
||||
query_id TEXT NOT NULL,
|
||||
originality INTEGER CHECK(originality BETWEEN 1 AND 5),
|
||||
elaboration INTEGER CHECK(elaboration BETWEEN 1 AND 5),
|
||||
coherence INTEGER CHECK(coherence BETWEEN 1 AND 5),
|
||||
usefulness INTEGER CHECK(usefulness BETWEEN 1 AND 5),
|
||||
skipped INTEGER DEFAULT 0,
|
||||
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (rater_id) REFERENCES raters(rater_id),
|
||||
UNIQUE(rater_id, idea_id)
|
||||
)
|
||||
''')
|
||||
|
||||
# Progress table
|
||||
cursor.execute('''
|
||||
CREATE TABLE IF NOT EXISTS progress (
|
||||
rater_id TEXT NOT NULL,
|
||||
query_id TEXT NOT NULL,
|
||||
completed_count INTEGER DEFAULT 0,
|
||||
total_count INTEGER DEFAULT 0,
|
||||
started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
PRIMARY KEY (rater_id, query_id),
|
||||
FOREIGN KEY (rater_id) REFERENCES raters(rater_id)
|
||||
)
|
||||
''')
|
||||
|
||||
# Create indexes for common queries
|
||||
cursor.execute('''
|
||||
CREATE INDEX IF NOT EXISTS idx_ratings_rater
|
||||
ON ratings(rater_id)
|
||||
''')
|
||||
cursor.execute('''
|
||||
CREATE INDEX IF NOT EXISTS idx_ratings_idea
|
||||
ON ratings(idea_id)
|
||||
''')
|
||||
|
||||
conn.commit()
|
||||
|
||||
|
||||
# Rater operations
|
||||
def create_rater(rater_id: str, name: str | None = None) -> dict[str, Any]:
|
||||
"""Create a new rater."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
try:
|
||||
cursor.execute(
|
||||
'INSERT INTO raters (rater_id, name) VALUES (?, ?)',
|
||||
(rater_id, name or rater_id)
|
||||
)
|
||||
conn.commit()
|
||||
return {'rater_id': rater_id, 'name': name or rater_id, 'created': True}
|
||||
except sqlite3.IntegrityError:
|
||||
# Rater already exists
|
||||
return get_rater(rater_id)
|
||||
|
||||
|
||||
def get_rater(rater_id: str) -> dict[str, Any] | None:
|
||||
"""Get a rater by ID."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute('SELECT * FROM raters WHERE rater_id = ?', (rater_id,))
|
||||
row = cursor.fetchone()
|
||||
if row:
|
||||
return dict(row)
|
||||
return None
|
||||
|
||||
|
||||
def list_raters() -> list[dict[str, Any]]:
|
||||
"""List all raters."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute('SELECT * FROM raters ORDER BY created_at')
|
||||
return [dict(row) for row in cursor.fetchall()]
|
||||
|
||||
|
||||
# Rating operations
|
||||
def save_rating(
|
||||
rater_id: str,
|
||||
idea_id: str,
|
||||
query_id: str,
|
||||
originality: int | None,
|
||||
elaboration: int | None,
|
||||
coherence: int | None,
|
||||
usefulness: int | None,
|
||||
skipped: bool = False
|
||||
) -> dict[str, Any]:
|
||||
"""Save or update a rating."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute('''
|
||||
INSERT INTO ratings (rater_id, idea_id, query_id, originality, elaboration, coherence, usefulness, skipped, timestamp)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(rater_id, idea_id) DO UPDATE SET
|
||||
originality = excluded.originality,
|
||||
elaboration = excluded.elaboration,
|
||||
coherence = excluded.coherence,
|
||||
usefulness = excluded.usefulness,
|
||||
skipped = excluded.skipped,
|
||||
timestamp = excluded.timestamp
|
||||
''', (rater_id, idea_id, query_id, originality, elaboration, coherence, usefulness, int(skipped), datetime.utcnow()))
|
||||
conn.commit()
|
||||
|
||||
# Update progress
|
||||
update_progress(rater_id, query_id)
|
||||
|
||||
return {
|
||||
'rater_id': rater_id,
|
||||
'idea_id': idea_id,
|
||||
'saved': True
|
||||
}
|
||||
|
||||
|
||||
def get_rating(rater_id: str, idea_id: str) -> dict[str, Any] | None:
|
||||
"""Get a specific rating."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
'SELECT * FROM ratings WHERE rater_id = ? AND idea_id = ?',
|
||||
(rater_id, idea_id)
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
if row:
|
||||
return dict(row)
|
||||
return None
|
||||
|
||||
|
||||
def get_ratings_by_rater(rater_id: str) -> list[dict[str, Any]]:
|
||||
"""Get all ratings by a rater."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
'SELECT * FROM ratings WHERE rater_id = ? ORDER BY timestamp',
|
||||
(rater_id,)
|
||||
)
|
||||
return [dict(row) for row in cursor.fetchall()]
|
||||
|
||||
|
||||
def get_ratings_by_idea(idea_id: str) -> list[dict[str, Any]]:
|
||||
"""Get all ratings for an idea."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
'SELECT * FROM ratings WHERE idea_id = ? ORDER BY rater_id',
|
||||
(idea_id,)
|
||||
)
|
||||
return [dict(row) for row in cursor.fetchall()]
|
||||
|
||||
|
||||
def get_all_ratings() -> list[dict[str, Any]]:
|
||||
"""Get all ratings."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute('SELECT * FROM ratings ORDER BY timestamp')
|
||||
return [dict(row) for row in cursor.fetchall()]
|
||||
|
||||
|
||||
# Progress operations
|
||||
def update_progress(rater_id: str, query_id: str) -> None:
|
||||
"""Update progress for a rater on a query."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Count completed ratings for this query
|
||||
cursor.execute('''
|
||||
SELECT COUNT(*) as count FROM ratings
|
||||
WHERE rater_id = ? AND query_id = ?
|
||||
''', (rater_id, query_id))
|
||||
completed = cursor.fetchone()['count']
|
||||
|
||||
# Update or insert progress
|
||||
cursor.execute('''
|
||||
INSERT INTO progress (rater_id, query_id, completed_count, updated_at)
|
||||
VALUES (?, ?, ?, ?)
|
||||
ON CONFLICT(rater_id, query_id) DO UPDATE SET
|
||||
completed_count = excluded.completed_count,
|
||||
updated_at = excluded.updated_at
|
||||
''', (rater_id, query_id, completed, datetime.utcnow()))
|
||||
conn.commit()
|
||||
|
||||
|
||||
def set_progress_total(rater_id: str, query_id: str, total: int) -> None:
|
||||
"""Set the total count for a query's progress."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute('''
|
||||
INSERT INTO progress (rater_id, query_id, total_count, completed_count)
|
||||
VALUES (?, ?, ?, 0)
|
||||
ON CONFLICT(rater_id, query_id) DO UPDATE SET
|
||||
total_count = excluded.total_count
|
||||
''', (rater_id, query_id, total))
|
||||
conn.commit()
|
||||
|
||||
|
||||
def get_progress(rater_id: str) -> list[dict[str, Any]]:
|
||||
"""Get progress for all queries for a rater."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
'SELECT * FROM progress WHERE rater_id = ? ORDER BY query_id',
|
||||
(rater_id,)
|
||||
)
|
||||
return [dict(row) for row in cursor.fetchall()]
|
||||
|
||||
|
||||
def get_progress_for_query(rater_id: str, query_id: str) -> dict[str, Any] | None:
|
||||
"""Get progress for a specific query."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
'SELECT * FROM progress WHERE rater_id = ? AND query_id = ?',
|
||||
(rater_id, query_id)
|
||||
)
|
||||
row = cursor.fetchone()
|
||||
if row:
|
||||
return dict(row)
|
||||
return None
|
||||
|
||||
|
||||
def get_rated_idea_ids(rater_id: str, query_id: str) -> set[str]:
|
||||
"""Get the set of idea IDs already rated by a rater for a query."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(
|
||||
'SELECT idea_id FROM ratings WHERE rater_id = ? AND query_id = ?',
|
||||
(rater_id, query_id)
|
||||
)
|
||||
return {row['idea_id'] for row in cursor.fetchall()}
|
||||
|
||||
|
||||
# Statistics
|
||||
def get_statistics() -> dict[str, Any]:
|
||||
"""Get overall statistics."""
|
||||
with get_connection() as conn:
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute('SELECT COUNT(*) as count FROM raters')
|
||||
rater_count = cursor.fetchone()['count']
|
||||
|
||||
cursor.execute('SELECT COUNT(*) as count FROM ratings WHERE skipped = 0')
|
||||
rating_count = cursor.fetchone()['count']
|
||||
|
||||
cursor.execute('SELECT COUNT(*) as count FROM ratings WHERE skipped = 1')
|
||||
skip_count = cursor.fetchone()['count']
|
||||
|
||||
cursor.execute('SELECT COUNT(DISTINCT idea_id) as count FROM ratings')
|
||||
rated_ideas = cursor.fetchone()['count']
|
||||
|
||||
return {
|
||||
'rater_count': rater_count,
|
||||
'rating_count': rating_count,
|
||||
'skip_count': skip_count,
|
||||
'rated_ideas': rated_ideas
|
||||
}
|
||||
|
||||
|
||||
# Initialize on import
|
||||
init_db()
|
||||
183
experiments/assessment/backend/models.py
Normal file
183
experiments/assessment/backend/models.py
Normal file
@@ -0,0 +1,183 @@
|
||||
"""
|
||||
Pydantic models for the assessment API.
|
||||
"""
|
||||
|
||||
from datetime import datetime
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
# Request models
|
||||
class RaterCreate(BaseModel):
|
||||
"""Request to create or login as a rater."""
|
||||
rater_id: str = Field(..., min_length=1, max_length=50, description="Unique rater identifier")
|
||||
name: str | None = Field(None, max_length=100, description="Optional display name")
|
||||
|
||||
|
||||
class RatingSubmit(BaseModel):
|
||||
"""Request to submit a rating."""
|
||||
rater_id: str = Field(..., description="Rater identifier")
|
||||
idea_id: str = Field(..., description="Idea identifier")
|
||||
query_id: str = Field(..., description="Query identifier")
|
||||
originality: int | None = Field(None, ge=1, le=5, description="Originality score 1-5")
|
||||
elaboration: int | None = Field(None, ge=1, le=5, description="Elaboration score 1-5")
|
||||
coherence: int | None = Field(None, ge=1, le=5, description="Coherence score 1-5")
|
||||
usefulness: int | None = Field(None, ge=1, le=5, description="Usefulness score 1-5")
|
||||
skipped: bool = Field(False, description="Whether the idea was skipped")
|
||||
|
||||
|
||||
# Response models
|
||||
class Rater(BaseModel):
|
||||
"""Rater information."""
|
||||
rater_id: str
|
||||
name: str | None
|
||||
created_at: datetime | None = None
|
||||
|
||||
|
||||
class Rating(BaseModel):
|
||||
"""A single rating."""
|
||||
id: int
|
||||
rater_id: str
|
||||
idea_id: str
|
||||
query_id: str
|
||||
originality: int | None
|
||||
elaboration: int | None
|
||||
coherence: int | None
|
||||
usefulness: int | None
|
||||
skipped: int
|
||||
timestamp: datetime | None
|
||||
|
||||
|
||||
class Progress(BaseModel):
|
||||
"""Progress for a rater on a query."""
|
||||
rater_id: str
|
||||
query_id: str
|
||||
completed_count: int
|
||||
total_count: int
|
||||
started_at: datetime | None = None
|
||||
updated_at: datetime | None = None
|
||||
|
||||
|
||||
class QueryInfo(BaseModel):
|
||||
"""Information about a query."""
|
||||
query_id: str
|
||||
query_text: str
|
||||
category: str
|
||||
idea_count: int
|
||||
|
||||
|
||||
class IdeaForRating(BaseModel):
|
||||
"""An idea presented for rating (without hidden metadata)."""
|
||||
idea_id: str
|
||||
text: str
|
||||
index: int # Position in the randomized list for this query
|
||||
|
||||
|
||||
class QueryWithIdeas(BaseModel):
|
||||
"""A query with its ideas for rating."""
|
||||
query_id: str
|
||||
query_text: str
|
||||
category: str
|
||||
ideas: list[IdeaForRating]
|
||||
total_count: int
|
||||
|
||||
|
||||
class Statistics(BaseModel):
|
||||
"""Overall statistics."""
|
||||
rater_count: int
|
||||
rating_count: int
|
||||
skip_count: int
|
||||
rated_ideas: int
|
||||
|
||||
|
||||
class RaterProgress(BaseModel):
|
||||
"""Complete progress summary for a rater."""
|
||||
rater_id: str
|
||||
queries: list[Progress]
|
||||
total_completed: int
|
||||
total_ideas: int
|
||||
percentage: float
|
||||
|
||||
|
||||
# Export response models
|
||||
class ExportRating(BaseModel):
|
||||
"""Rating with hidden metadata for export."""
|
||||
rater_id: str
|
||||
idea_id: str
|
||||
query_id: str
|
||||
query_text: str
|
||||
idea_text: str
|
||||
originality: int | None
|
||||
elaboration: int | None
|
||||
coherence: int | None
|
||||
usefulness: int | None
|
||||
skipped: bool
|
||||
condition: str
|
||||
expert_name: str
|
||||
keyword: str
|
||||
timestamp: datetime | None
|
||||
|
||||
|
||||
class ExportData(BaseModel):
|
||||
"""Full export data structure."""
|
||||
experiment_id: str
|
||||
export_timestamp: datetime
|
||||
rater_count: int
|
||||
rating_count: int
|
||||
ratings: list[ExportRating]
|
||||
|
||||
|
||||
# Dimension definitions (for frontend)
|
||||
DIMENSION_DEFINITIONS = {
|
||||
"originality": {
|
||||
"name": "Originality",
|
||||
"question": "How unexpected or surprising is this idea? Would most people NOT think of this?",
|
||||
"scale": {
|
||||
1: "Very common/obvious idea anyone would suggest",
|
||||
2: "Somewhat common, slight variation on expected ideas",
|
||||
3: "Moderately original, some unexpected elements",
|
||||
4: "Quite original, notably different approach",
|
||||
5: "Highly unexpected, truly novel concept"
|
||||
},
|
||||
"low_label": "Common",
|
||||
"high_label": "Unexpected"
|
||||
},
|
||||
"elaboration": {
|
||||
"name": "Elaboration",
|
||||
"question": "How detailed and well-developed is this idea?",
|
||||
"scale": {
|
||||
1: "Vague, minimal detail, just a concept",
|
||||
2: "Basic idea with little specificity",
|
||||
3: "Moderately detailed, some specifics provided",
|
||||
4: "Well-developed with clear implementation hints",
|
||||
5: "Highly specific, thoroughly developed concept"
|
||||
},
|
||||
"low_label": "Vague",
|
||||
"high_label": "Detailed"
|
||||
},
|
||||
"coherence": {
|
||||
"name": "Coherence",
|
||||
"question": "Does this idea make logical sense and relate to the query object?",
|
||||
"scale": {
|
||||
1: "Nonsensical, irrelevant, or incomprehensible",
|
||||
2: "Mostly unclear, weak connection to query",
|
||||
3: "Partially coherent, some logical gaps",
|
||||
4: "Mostly coherent with minor issues",
|
||||
5: "Fully coherent, clearly relates to query"
|
||||
},
|
||||
"low_label": "Nonsense",
|
||||
"high_label": "Coherent"
|
||||
},
|
||||
"usefulness": {
|
||||
"name": "Usefulness",
|
||||
"question": "Could this idea have practical value or inspire real innovation?",
|
||||
"scale": {
|
||||
1: "No practical value whatsoever",
|
||||
2: "Minimal usefulness, highly impractical",
|
||||
3: "Some potential value with major limitations",
|
||||
4: "Useful idea with realistic applications",
|
||||
5: "Highly useful, clear practical value"
|
||||
},
|
||||
"low_label": "Useless",
|
||||
"high_label": "Useful"
|
||||
}
|
||||
}
|
||||
3
experiments/assessment/backend/requirements.txt
Normal file
3
experiments/assessment/backend/requirements.txt
Normal file
@@ -0,0 +1,3 @@
|
||||
fastapi>=0.109.0
|
||||
uvicorn>=0.27.0
|
||||
pydantic>=2.5.0
|
||||
1832
experiments/assessment/data/assessment_items.json
Normal file
1832
experiments/assessment/data/assessment_items.json
Normal file
File diff suppressed because it is too large
Load Diff
13
experiments/assessment/frontend/index.html
Normal file
13
experiments/assessment/frontend/index.html
Normal file
@@ -0,0 +1,13 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Creative Idea Assessment</title>
|
||||
</head>
|
||||
<body>
|
||||
<div id="root"></div>
|
||||
<script type="module" src="/src/main.tsx"></script>
|
||||
</body>
|
||||
</html>
|
||||
4221
experiments/assessment/frontend/package-lock.json
generated
Normal file
4221
experiments/assessment/frontend/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
32
experiments/assessment/frontend/package.json
Normal file
32
experiments/assessment/frontend/package.json
Normal file
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"name": "assessment-frontend",
|
||||
"private": true,
|
||||
"version": "1.0.0",
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"dev": "vite",
|
||||
"build": "tsc -b && vite build",
|
||||
"lint": "eslint .",
|
||||
"preview": "vite preview"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ant-design/icons": "^6.1.0",
|
||||
"antd": "^6.0.0",
|
||||
"react": "^19.2.0",
|
||||
"react-dom": "^19.2.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@eslint/js": "^9.39.1",
|
||||
"@types/node": "^24.10.1",
|
||||
"@types/react": "^19.2.5",
|
||||
"@types/react-dom": "^19.2.3",
|
||||
"@vitejs/plugin-react": "^5.1.1",
|
||||
"eslint": "^9.39.1",
|
||||
"eslint-plugin-react-hooks": "^7.0.1",
|
||||
"eslint-plugin-react-refresh": "^0.4.24",
|
||||
"globals": "^16.5.0",
|
||||
"typescript": "~5.9.3",
|
||||
"typescript-eslint": "^8.46.4",
|
||||
"vite": "^7.2.4"
|
||||
}
|
||||
}
|
||||
109
experiments/assessment/frontend/src/App.tsx
Normal file
109
experiments/assessment/frontend/src/App.tsx
Normal file
@@ -0,0 +1,109 @@
|
||||
/**
|
||||
* Main application component for the assessment interface.
|
||||
*/
|
||||
|
||||
import { ConfigProvider, theme, Spin } from 'antd';
|
||||
import { useAssessment } from './hooks/useAssessment';
|
||||
import { RaterLogin } from './components/RaterLogin';
|
||||
import { InstructionsPage } from './components/InstructionsPage';
|
||||
import { AssessmentPage } from './components/AssessmentPage';
|
||||
import { CompletionPage } from './components/CompletionPage';
|
||||
|
||||
function App() {
|
||||
const assessment = useAssessment();
|
||||
|
||||
const renderContent = () => {
|
||||
// Show loading spinner for initial load
|
||||
if (assessment.loading && !assessment.rater) {
|
||||
return (
|
||||
<div style={{
|
||||
display: 'flex',
|
||||
justifyContent: 'center',
|
||||
alignItems: 'center',
|
||||
minHeight: '100vh'
|
||||
}}>
|
||||
<Spin size="large" />
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
switch (assessment.view) {
|
||||
case 'login':
|
||||
return (
|
||||
<RaterLogin
|
||||
onLogin={assessment.login}
|
||||
loading={assessment.loading}
|
||||
error={assessment.error}
|
||||
/>
|
||||
);
|
||||
|
||||
case 'instructions':
|
||||
return (
|
||||
<InstructionsPage
|
||||
dimensions={assessment.dimensions}
|
||||
onStart={assessment.startAssessment}
|
||||
loading={assessment.loading}
|
||||
/>
|
||||
);
|
||||
|
||||
case 'assessment':
|
||||
if (!assessment.rater || !assessment.currentQuery || !assessment.currentIdea || !assessment.dimensions) {
|
||||
return (
|
||||
<div style={{
|
||||
display: 'flex',
|
||||
justifyContent: 'center',
|
||||
alignItems: 'center',
|
||||
minHeight: '100vh'
|
||||
}}>
|
||||
<Spin size="large" tip="Loading..." />
|
||||
</div>
|
||||
);
|
||||
}
|
||||
return (
|
||||
<AssessmentPage
|
||||
raterId={assessment.rater.rater_id}
|
||||
queryId={assessment.currentQuery.query_id}
|
||||
queryText={assessment.currentQuery.query_text}
|
||||
idea={assessment.currentIdea}
|
||||
ideaIndex={assessment.currentIdeaIndex}
|
||||
totalIdeas={assessment.currentQuery.total_count}
|
||||
dimensions={assessment.dimensions}
|
||||
progress={assessment.progress}
|
||||
onNext={assessment.nextIdea}
|
||||
onPrev={assessment.prevIdea}
|
||||
onShowDefinitions={assessment.showInstructions}
|
||||
onLogout={assessment.logout}
|
||||
canGoPrev={assessment.currentIdeaIndex > 0}
|
||||
/>
|
||||
);
|
||||
|
||||
case 'completion':
|
||||
return (
|
||||
<CompletionPage
|
||||
raterId={assessment.rater?.rater_id ?? ''}
|
||||
progress={assessment.progress}
|
||||
onLogout={assessment.logout}
|
||||
/>
|
||||
);
|
||||
|
||||
default:
|
||||
return null;
|
||||
}
|
||||
};
|
||||
|
||||
return (
|
||||
<ConfigProvider
|
||||
theme={{
|
||||
algorithm: theme.defaultAlgorithm,
|
||||
token: {
|
||||
colorPrimary: '#1677ff',
|
||||
borderRadius: 6,
|
||||
},
|
||||
}}
|
||||
>
|
||||
{renderContent()}
|
||||
</ConfigProvider>
|
||||
);
|
||||
}
|
||||
|
||||
export default App;
|
||||
@@ -0,0 +1,199 @@
|
||||
/**
|
||||
* Main assessment page for rating ideas.
|
||||
*/
|
||||
|
||||
import { Card, Button, Space, Alert, Typography } from 'antd';
|
||||
import {
|
||||
ArrowLeftOutlined,
|
||||
ArrowRightOutlined,
|
||||
ForwardOutlined,
|
||||
BookOutlined,
|
||||
LogoutOutlined
|
||||
} from '@ant-design/icons';
|
||||
import type { IdeaForRating, DimensionDefinitions, RaterProgress } from '../types';
|
||||
import { useRatings } from '../hooks/useRatings';
|
||||
import { IdeaCard } from './IdeaCard';
|
||||
import { RatingSlider } from './RatingSlider';
|
||||
import { ProgressBar } from './ProgressBar';
|
||||
|
||||
const { Text } = Typography;
|
||||
|
||||
interface AssessmentPageProps {
|
||||
raterId: string;
|
||||
queryId: string;
|
||||
queryText: string;
|
||||
idea: IdeaForRating;
|
||||
ideaIndex: number;
|
||||
totalIdeas: number;
|
||||
dimensions: DimensionDefinitions;
|
||||
progress: RaterProgress | null;
|
||||
onNext: () => void;
|
||||
onPrev: () => void;
|
||||
onShowDefinitions: () => void;
|
||||
onLogout: () => void;
|
||||
canGoPrev: boolean;
|
||||
}
|
||||
|
||||
export function AssessmentPage({
|
||||
raterId,
|
||||
queryId,
|
||||
queryText,
|
||||
idea,
|
||||
ideaIndex,
|
||||
totalIdeas,
|
||||
dimensions,
|
||||
progress,
|
||||
onNext,
|
||||
onPrev,
|
||||
onShowDefinitions,
|
||||
onLogout,
|
||||
canGoPrev
|
||||
}: AssessmentPageProps) {
|
||||
const {
|
||||
ratings,
|
||||
setRating,
|
||||
isComplete,
|
||||
submit,
|
||||
skip,
|
||||
submitting,
|
||||
error
|
||||
} = useRatings({
|
||||
raterId,
|
||||
queryId,
|
||||
ideaId: idea.idea_id,
|
||||
onSuccess: onNext
|
||||
});
|
||||
|
||||
const handleSubmit = async () => {
|
||||
await submit();
|
||||
};
|
||||
|
||||
const handleSkip = async () => {
|
||||
await skip();
|
||||
};
|
||||
|
||||
// Calculate query progress
|
||||
const queryProgress = progress?.queries.find(q => q.query_id === queryId);
|
||||
const queryCompleted = queryProgress?.completed_count ?? ideaIndex;
|
||||
const queryTotal = totalIdeas;
|
||||
|
||||
return (
|
||||
<div style={{ maxWidth: 800, margin: '0 auto', padding: 24 }}>
|
||||
{/* Header with query info and overall progress */}
|
||||
<Card size="small" style={{ marginBottom: 16 }}>
|
||||
<div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center', marginBottom: 8 }}>
|
||||
<Text strong style={{ fontSize: 16 }}>Query: "{queryText}"</Text>
|
||||
<Space>
|
||||
<Button
|
||||
icon={<BookOutlined />}
|
||||
onClick={onShowDefinitions}
|
||||
size="small"
|
||||
>
|
||||
Definitions
|
||||
</Button>
|
||||
<Button
|
||||
icon={<LogoutOutlined />}
|
||||
onClick={onLogout}
|
||||
size="small"
|
||||
danger
|
||||
>
|
||||
Exit
|
||||
</Button>
|
||||
</Space>
|
||||
</div>
|
||||
<ProgressBar
|
||||
completed={queryCompleted}
|
||||
total={queryTotal}
|
||||
label="Query Progress"
|
||||
/>
|
||||
{progress && (
|
||||
<div style={{ marginTop: 8 }}>
|
||||
<ProgressBar
|
||||
completed={progress.total_completed}
|
||||
total={progress.total_ideas}
|
||||
label="Overall Progress"
|
||||
/>
|
||||
</div>
|
||||
)}
|
||||
</Card>
|
||||
|
||||
{/* Error display */}
|
||||
{error && (
|
||||
<Alert
|
||||
message={error}
|
||||
type="error"
|
||||
showIcon
|
||||
closable
|
||||
style={{ marginBottom: 16 }}
|
||||
/>
|
||||
)}
|
||||
|
||||
{/* Idea card */}
|
||||
<IdeaCard
|
||||
ideaNumber={ideaIndex + 1}
|
||||
text={idea.text}
|
||||
queryText={queryText}
|
||||
/>
|
||||
|
||||
{/* Rating inputs */}
|
||||
<Card style={{ marginBottom: 16 }}>
|
||||
<RatingSlider
|
||||
dimension={dimensions.originality}
|
||||
value={ratings.originality}
|
||||
onChange={(v) => setRating('originality', v)}
|
||||
disabled={submitting}
|
||||
/>
|
||||
<RatingSlider
|
||||
dimension={dimensions.elaboration}
|
||||
value={ratings.elaboration}
|
||||
onChange={(v) => setRating('elaboration', v)}
|
||||
disabled={submitting}
|
||||
/>
|
||||
<RatingSlider
|
||||
dimension={dimensions.coherence}
|
||||
value={ratings.coherence}
|
||||
onChange={(v) => setRating('coherence', v)}
|
||||
disabled={submitting}
|
||||
/>
|
||||
<RatingSlider
|
||||
dimension={dimensions.usefulness}
|
||||
value={ratings.usefulness}
|
||||
onChange={(v) => setRating('usefulness', v)}
|
||||
disabled={submitting}
|
||||
/>
|
||||
</Card>
|
||||
|
||||
{/* Navigation buttons */}
|
||||
<Card>
|
||||
<div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
|
||||
<Button
|
||||
icon={<ArrowLeftOutlined />}
|
||||
onClick={onPrev}
|
||||
disabled={!canGoPrev || submitting}
|
||||
>
|
||||
Back
|
||||
</Button>
|
||||
|
||||
<Space>
|
||||
<Button
|
||||
icon={<ForwardOutlined />}
|
||||
onClick={handleSkip}
|
||||
loading={submitting}
|
||||
>
|
||||
Skip
|
||||
</Button>
|
||||
<Button
|
||||
type="primary"
|
||||
icon={<ArrowRightOutlined />}
|
||||
onClick={handleSubmit}
|
||||
loading={submitting}
|
||||
disabled={!isComplete()}
|
||||
>
|
||||
Submit & Next
|
||||
</Button>
|
||||
</Space>
|
||||
</div>
|
||||
</Card>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,105 @@
|
||||
/**
|
||||
* Completion page shown when all ideas have been rated.
|
||||
*/
|
||||
|
||||
import { Card, Button, Typography, Space, Result, Statistic, Row, Col } from 'antd';
|
||||
import { CheckCircleOutlined, BarChartOutlined, LogoutOutlined } from '@ant-design/icons';
|
||||
import type { RaterProgress } from '../types';
|
||||
|
||||
const { Title, Text } = Typography;
|
||||
|
||||
interface CompletionPageProps {
|
||||
raterId: string;
|
||||
progress: RaterProgress | null;
|
||||
onLogout: () => void;
|
||||
}
|
||||
|
||||
export function CompletionPage({ raterId, progress, onLogout }: CompletionPageProps) {
|
||||
const completed = progress?.total_completed ?? 0;
|
||||
const total = progress?.total_ideas ?? 0;
|
||||
const percentage = progress?.percentage ?? 0;
|
||||
|
||||
const isFullyComplete = completed >= total;
|
||||
|
||||
return (
|
||||
<div style={{
|
||||
display: 'flex',
|
||||
justifyContent: 'center',
|
||||
alignItems: 'center',
|
||||
minHeight: '100vh',
|
||||
padding: 24
|
||||
}}>
|
||||
<Card style={{ maxWidth: 600, width: '100%' }}>
|
||||
<Result
|
||||
status={isFullyComplete ? 'success' : 'info'}
|
||||
icon={isFullyComplete ? <CheckCircleOutlined /> : <BarChartOutlined />}
|
||||
title={isFullyComplete ? 'Assessment Complete!' : 'Session Summary'}
|
||||
subTitle={
|
||||
isFullyComplete
|
||||
? 'Thank you for completing the assessment.'
|
||||
: 'You have made progress on the assessment.'
|
||||
}
|
||||
extra={[
|
||||
<Button
|
||||
type="primary"
|
||||
key="logout"
|
||||
icon={<LogoutOutlined />}
|
||||
onClick={onLogout}
|
||||
>
|
||||
Exit
|
||||
</Button>
|
||||
]}
|
||||
>
|
||||
<Row gutter={16} style={{ marginTop: 24 }}>
|
||||
<Col span={8}>
|
||||
<Statistic
|
||||
title="Ideas Rated"
|
||||
value={completed}
|
||||
suffix={`/ ${total}`}
|
||||
/>
|
||||
</Col>
|
||||
<Col span={8}>
|
||||
<Statistic
|
||||
title="Progress"
|
||||
value={percentage}
|
||||
suffix="%"
|
||||
precision={1}
|
||||
/>
|
||||
</Col>
|
||||
<Col span={8}>
|
||||
<Statistic
|
||||
title="Rater ID"
|
||||
value={raterId}
|
||||
valueStyle={{ fontSize: 16 }}
|
||||
/>
|
||||
</Col>
|
||||
</Row>
|
||||
|
||||
{progress && progress.queries.length > 0 && (
|
||||
<div style={{ marginTop: 24 }}>
|
||||
<Title level={5}>Progress by Query</Title>
|
||||
<Space direction="vertical" style={{ width: '100%' }}>
|
||||
{progress.queries.map((q) => (
|
||||
<div
|
||||
key={q.query_id}
|
||||
style={{
|
||||
display: 'flex',
|
||||
justifyContent: 'space-between',
|
||||
padding: '4px 0'
|
||||
}}
|
||||
>
|
||||
<Text>{q.query_id}</Text>
|
||||
<Text type={q.completed_count >= q.total_count ? 'success' : 'secondary'}>
|
||||
{q.completed_count} / {q.total_count}
|
||||
{q.completed_count >= q.total_count && ' ✓'}
|
||||
</Text>
|
||||
</div>
|
||||
))}
|
||||
</Space>
|
||||
</div>
|
||||
)}
|
||||
</Result>
|
||||
</Card>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
36
experiments/assessment/frontend/src/components/IdeaCard.tsx
Normal file
36
experiments/assessment/frontend/src/components/IdeaCard.tsx
Normal file
@@ -0,0 +1,36 @@
|
||||
/**
|
||||
* Card displaying a single idea for rating.
|
||||
*/
|
||||
|
||||
import { Card, Typography, Tag } from 'antd';
|
||||
|
||||
const { Text, Paragraph } = Typography;
|
||||
|
||||
interface IdeaCardProps {
|
||||
ideaNumber: number;
|
||||
text: string;
|
||||
queryText: string;
|
||||
}
|
||||
|
||||
export function IdeaCard({ ideaNumber, text, queryText }: IdeaCardProps) {
|
||||
return (
|
||||
<Card
|
||||
title={
|
||||
<div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
|
||||
<Text strong>IDEA #{ideaNumber}</Text>
|
||||
<Tag color="blue">Query: {queryText}</Tag>
|
||||
</div>
|
||||
}
|
||||
style={{ marginBottom: 24 }}
|
||||
>
|
||||
<Paragraph style={{
|
||||
fontSize: 16,
|
||||
lineHeight: 1.8,
|
||||
margin: 0,
|
||||
padding: '8px 0'
|
||||
}}>
|
||||
"{text}"
|
||||
</Paragraph>
|
||||
</Card>
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,134 @@
|
||||
/**
|
||||
* Instructions page showing dimension definitions.
|
||||
*/
|
||||
|
||||
import { useState } from 'react';
|
||||
import { Card, Button, Typography, Space, Checkbox, Divider, Tag } from 'antd';
|
||||
import { PlayCircleOutlined } from '@ant-design/icons';
|
||||
import type { DimensionDefinitions } from '../types';
|
||||
|
||||
const { Title, Text, Paragraph } = Typography;
|
||||
|
||||
interface InstructionsPageProps {
|
||||
dimensions: DimensionDefinitions | null;
|
||||
onStart: () => void;
|
||||
onBack?: () => void;
|
||||
loading: boolean;
|
||||
isReturning?: boolean;
|
||||
}
|
||||
|
||||
export function InstructionsPage({
|
||||
dimensions,
|
||||
onStart,
|
||||
onBack,
|
||||
loading,
|
||||
isReturning = false
|
||||
}: InstructionsPageProps) {
|
||||
const [acknowledged, setAcknowledged] = useState(isReturning);
|
||||
|
||||
if (!dimensions) {
|
||||
return (
|
||||
<div style={{ padding: 24, textAlign: 'center' }}>
|
||||
<Text>Loading instructions...</Text>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
const dimensionOrder = ['originality', 'elaboration', 'coherence', 'usefulness'] as const;
|
||||
|
||||
return (
|
||||
<div style={{
|
||||
maxWidth: 800,
|
||||
margin: '0 auto',
|
||||
padding: 24
|
||||
}}>
|
||||
<Card>
|
||||
<Space direction="vertical" size="large" style={{ width: '100%' }}>
|
||||
<div style={{ textAlign: 'center' }}>
|
||||
<Title level={2}>Assessment Instructions</Title>
|
||||
<Paragraph type="secondary">
|
||||
You will rate creative ideas on 4 dimensions using a 1-5 scale.
|
||||
Please read each definition carefully before beginning.
|
||||
</Paragraph>
|
||||
</div>
|
||||
|
||||
<Divider />
|
||||
|
||||
{dimensionOrder.map((key) => {
|
||||
const dim = dimensions[key];
|
||||
return (
|
||||
<Card
|
||||
key={key}
|
||||
size="small"
|
||||
title={
|
||||
<Space>
|
||||
<Tag color="blue">{dim.name}</Tag>
|
||||
<Text type="secondary">{dim.question}</Text>
|
||||
</Space>
|
||||
}
|
||||
style={{ marginBottom: 16 }}
|
||||
>
|
||||
<div style={{
|
||||
display: 'grid',
|
||||
gridTemplateColumns: 'auto 1fr',
|
||||
gap: '8px 16px',
|
||||
fontSize: 14
|
||||
}}>
|
||||
{([1, 2, 3, 4, 5] as const).map((score) => (
|
||||
<>
|
||||
<Tag
|
||||
key={`score-${score}`}
|
||||
color={score <= 2 ? 'red' : score === 3 ? 'orange' : 'green'}
|
||||
>
|
||||
{score}
|
||||
</Tag>
|
||||
<Text key={`text-${score}`}>
|
||||
{dim.scale[score]}
|
||||
</Text>
|
||||
</>
|
||||
))}
|
||||
</div>
|
||||
<Divider style={{ margin: '12px 0' }} />
|
||||
<div style={{ display: 'flex', justifyContent: 'space-between' }}>
|
||||
<Text type="secondary">{dim.low_label}</Text>
|
||||
<Text type="secondary">{dim.high_label}</Text>
|
||||
</div>
|
||||
</Card>
|
||||
);
|
||||
})}
|
||||
|
||||
<Divider />
|
||||
|
||||
<Space direction="vertical" style={{ width: '100%' }}>
|
||||
{!isReturning && (
|
||||
<Checkbox
|
||||
checked={acknowledged}
|
||||
onChange={(e) => setAcknowledged(e.target.checked)}
|
||||
>
|
||||
I have read and understood the instructions
|
||||
</Checkbox>
|
||||
)}
|
||||
|
||||
<Space style={{ width: '100%', justifyContent: 'center' }}>
|
||||
{onBack && (
|
||||
<Button onClick={onBack}>
|
||||
Back to Assessment
|
||||
</Button>
|
||||
)}
|
||||
<Button
|
||||
type="primary"
|
||||
size="large"
|
||||
icon={<PlayCircleOutlined />}
|
||||
onClick={onStart}
|
||||
loading={loading}
|
||||
disabled={!acknowledged}
|
||||
>
|
||||
{isReturning ? 'Continue Rating' : 'Begin Rating'}
|
||||
</Button>
|
||||
</Space>
|
||||
</Space>
|
||||
</Space>
|
||||
</Card>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
/**
|
||||
* Progress bar component showing assessment progress.
|
||||
*/
|
||||
|
||||
import { Progress, Typography, Space } from 'antd';
|
||||
|
||||
const { Text } = Typography;
|
||||
|
||||
interface ProgressBarProps {
|
||||
completed: number;
|
||||
total: number;
|
||||
label?: string;
|
||||
}
|
||||
|
||||
export function ProgressBar({ completed, total, label }: ProgressBarProps) {
|
||||
const percentage = total > 0 ? Math.round((completed / total) * 100) : 0;
|
||||
|
||||
return (
|
||||
<div style={{ width: '100%' }}>
|
||||
{label && (
|
||||
<Space style={{ marginBottom: 4, justifyContent: 'space-between', width: '100%' }}>
|
||||
<Text type="secondary">{label}</Text>
|
||||
<Text type="secondary">
|
||||
{completed}/{total} ({percentage}%)
|
||||
</Text>
|
||||
</Space>
|
||||
)}
|
||||
<Progress
|
||||
percent={percentage}
|
||||
showInfo={!label}
|
||||
status="active"
|
||||
strokeColor={{
|
||||
'0%': '#108ee9',
|
||||
'100%': '#87d068',
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
116
experiments/assessment/frontend/src/components/RaterLogin.tsx
Normal file
116
experiments/assessment/frontend/src/components/RaterLogin.tsx
Normal file
@@ -0,0 +1,116 @@
|
||||
/**
|
||||
* Rater login component.
|
||||
*/
|
||||
|
||||
import { useState, useEffect } from 'react';
|
||||
import { Card, Input, Button, Typography, Space, List, Alert } from 'antd';
|
||||
import { UserOutlined, LoginOutlined } from '@ant-design/icons';
|
||||
import * as api from '../services/api';
|
||||
import type { Rater } from '../types';
|
||||
|
||||
const { Title, Text } = Typography;
|
||||
|
||||
interface RaterLoginProps {
|
||||
onLogin: (raterId: string, name?: string) => void;
|
||||
loading: boolean;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
export function RaterLogin({ onLogin, loading, error }: RaterLoginProps) {
|
||||
const [raterId, setRaterId] = useState('');
|
||||
const [existingRaters, setExistingRaters] = useState<Rater[]>([]);
|
||||
|
||||
useEffect(() => {
|
||||
api.listRaters()
|
||||
.then(setExistingRaters)
|
||||
.catch(console.error);
|
||||
}, []);
|
||||
|
||||
const handleLogin = () => {
|
||||
if (raterId.trim()) {
|
||||
onLogin(raterId.trim());
|
||||
}
|
||||
};
|
||||
|
||||
const handleQuickLogin = (rater: Rater) => {
|
||||
onLogin(rater.rater_id);
|
||||
};
|
||||
|
||||
return (
|
||||
<div style={{
|
||||
display: 'flex',
|
||||
justifyContent: 'center',
|
||||
alignItems: 'center',
|
||||
minHeight: '100vh',
|
||||
padding: 24
|
||||
}}>
|
||||
<Card
|
||||
style={{ width: 400, maxWidth: '100%' }}
|
||||
styles={{ body: { padding: 32 } }}
|
||||
>
|
||||
<Space direction="vertical" size="large" style={{ width: '100%' }}>
|
||||
<div style={{ textAlign: 'center' }}>
|
||||
<Title level={3} style={{ marginBottom: 8 }}>
|
||||
Creative Idea Assessment
|
||||
</Title>
|
||||
<Text type="secondary">
|
||||
Enter your rater ID to begin
|
||||
</Text>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<Alert message={error} type="error" showIcon />
|
||||
)}
|
||||
|
||||
<Input
|
||||
size="large"
|
||||
placeholder="Enter your rater ID"
|
||||
prefix={<UserOutlined />}
|
||||
value={raterId}
|
||||
onChange={(e) => setRaterId(e.target.value)}
|
||||
onPressEnter={handleLogin}
|
||||
disabled={loading}
|
||||
/>
|
||||
|
||||
<Button
|
||||
type="primary"
|
||||
size="large"
|
||||
icon={<LoginOutlined />}
|
||||
onClick={handleLogin}
|
||||
loading={loading}
|
||||
disabled={!raterId.trim()}
|
||||
block
|
||||
>
|
||||
Start Assessment
|
||||
</Button>
|
||||
|
||||
{existingRaters.length > 0 && (
|
||||
<div>
|
||||
<Text type="secondary" style={{ display: 'block', marginBottom: 8 }}>
|
||||
Existing raters:
|
||||
</Text>
|
||||
<List
|
||||
size="small"
|
||||
bordered
|
||||
dataSource={existingRaters}
|
||||
renderItem={(rater) => (
|
||||
<List.Item
|
||||
style={{ cursor: 'pointer' }}
|
||||
onClick={() => handleQuickLogin(rater)}
|
||||
>
|
||||
<Text code>{rater.rater_id}</Text>
|
||||
{rater.name && rater.name !== rater.rater_id && (
|
||||
<Text type="secondary" style={{ marginLeft: 8 }}>
|
||||
({rater.name})
|
||||
</Text>
|
||||
)}
|
||||
</List.Item>
|
||||
)}
|
||||
/>
|
||||
</div>
|
||||
)}
|
||||
</Space>
|
||||
</Card>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,74 @@
|
||||
/**
|
||||
* Rating input component with radio buttons for 1-5 scale.
|
||||
*/
|
||||
|
||||
import { Radio, Typography, Space, Tooltip, Button } from 'antd';
|
||||
import { QuestionCircleOutlined } from '@ant-design/icons';
|
||||
import type { DimensionDefinition } from '../types';
|
||||
|
||||
const { Text } = Typography;
|
||||
|
||||
interface RatingSliderProps {
|
||||
dimension: DimensionDefinition;
|
||||
value: number | null;
|
||||
onChange: (value: number | null) => void;
|
||||
disabled?: boolean;
|
||||
}
|
||||
|
||||
export function RatingSlider({ dimension, value, onChange, disabled }: RatingSliderProps) {
|
||||
return (
|
||||
<div style={{ marginBottom: 24 }}>
|
||||
<div style={{ display: 'flex', alignItems: 'center', marginBottom: 8 }}>
|
||||
<Text strong style={{ marginRight: 8 }}>
|
||||
{dimension.name.toUpperCase()}
|
||||
</Text>
|
||||
<Tooltip
|
||||
title={
|
||||
<div>
|
||||
<p style={{ marginBottom: 8 }}>{dimension.question}</p>
|
||||
{([1, 2, 3, 4, 5] as const).map((score) => (
|
||||
<div key={score} style={{ marginBottom: 4 }}>
|
||||
<strong>{score}:</strong> {dimension.scale[score]}
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
}
|
||||
placement="right"
|
||||
overlayStyle={{ maxWidth: 400 }}
|
||||
>
|
||||
<Button
|
||||
type="text"
|
||||
size="small"
|
||||
icon={<QuestionCircleOutlined />}
|
||||
style={{ padding: 0, height: 'auto' }}
|
||||
/>
|
||||
</Tooltip>
|
||||
</div>
|
||||
|
||||
<div style={{ display: 'flex', alignItems: 'center', gap: 16 }}>
|
||||
<Text type="secondary" style={{ minWidth: 80, textAlign: 'right' }}>
|
||||
{dimension.low_label}
|
||||
</Text>
|
||||
|
||||
<Radio.Group
|
||||
value={value}
|
||||
onChange={(e) => onChange(e.target.value)}
|
||||
disabled={disabled}
|
||||
style={{ flex: 1 }}
|
||||
>
|
||||
<Space size="large">
|
||||
{[1, 2, 3, 4, 5].map((score) => (
|
||||
<Radio key={score} value={score}>
|
||||
{score}
|
||||
</Radio>
|
||||
))}
|
||||
</Space>
|
||||
</Radio.Group>
|
||||
|
||||
<Text type="secondary" style={{ minWidth: 80 }}>
|
||||
{dimension.high_label}
|
||||
</Text>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
272
experiments/assessment/frontend/src/hooks/useAssessment.ts
Normal file
272
experiments/assessment/frontend/src/hooks/useAssessment.ts
Normal file
@@ -0,0 +1,272 @@
|
||||
/**
|
||||
* Hook for managing the assessment session state.
|
||||
*/
|
||||
|
||||
import { useState, useCallback, useEffect } from 'react';
|
||||
import type {
|
||||
AppView,
|
||||
DimensionDefinitions,
|
||||
QueryInfo,
|
||||
QueryWithIdeas,
|
||||
Rater,
|
||||
RaterProgress,
|
||||
} from '../types';
|
||||
import * as api from '../services/api';
|
||||
|
||||
interface AssessmentState {
|
||||
view: AppView;
|
||||
rater: Rater | null;
|
||||
queries: QueryInfo[];
|
||||
currentQueryIndex: number;
|
||||
currentQuery: QueryWithIdeas | null;
|
||||
currentIdeaIndex: number;
|
||||
progress: RaterProgress | null;
|
||||
dimensions: DimensionDefinitions | null;
|
||||
loading: boolean;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
const initialState: AssessmentState = {
|
||||
view: 'login',
|
||||
rater: null,
|
||||
queries: [],
|
||||
currentQueryIndex: 0,
|
||||
currentQuery: null,
|
||||
currentIdeaIndex: 0,
|
||||
progress: null,
|
||||
dimensions: null,
|
||||
loading: false,
|
||||
error: null,
|
||||
};
|
||||
|
||||
export function useAssessment() {
|
||||
const [state, setState] = useState<AssessmentState>(initialState);
|
||||
|
||||
// Load dimension definitions on mount
|
||||
useEffect(() => {
|
||||
api.getDimensionDefinitions()
|
||||
.then((dimensions) => setState((s) => ({ ...s, dimensions })))
|
||||
.catch((err) => console.error('Failed to load dimensions:', err));
|
||||
}, []);
|
||||
|
||||
// Login as a rater
|
||||
const login = useCallback(async (raterId: string, name?: string) => {
|
||||
setState((s) => ({ ...s, loading: true, error: null }));
|
||||
try {
|
||||
const rater = await api.createOrGetRater({ rater_id: raterId, name });
|
||||
const queries = await api.listQueries();
|
||||
const progress = await api.getRaterProgress(raterId);
|
||||
|
||||
setState((s) => ({
|
||||
...s,
|
||||
rater,
|
||||
queries,
|
||||
progress,
|
||||
view: 'instructions',
|
||||
loading: false,
|
||||
}));
|
||||
} catch (err) {
|
||||
setState((s) => ({
|
||||
...s,
|
||||
error: err instanceof Error ? err.message : 'Login failed',
|
||||
loading: false,
|
||||
}));
|
||||
}
|
||||
}, []);
|
||||
|
||||
// Start assessment (move from instructions to assessment)
|
||||
const startAssessment = useCallback(async () => {
|
||||
if (!state.rater || state.queries.length === 0) return;
|
||||
|
||||
setState((s) => ({ ...s, loading: true }));
|
||||
try {
|
||||
// Find first query with unrated ideas
|
||||
let queryIndex = 0;
|
||||
let queryData: QueryWithIdeas | null = null;
|
||||
|
||||
for (let i = 0; i < state.queries.length; i++) {
|
||||
const unrated = await api.getUnratedIdeas(state.queries[i].query_id, state.rater.rater_id);
|
||||
if (unrated.ideas.length > 0) {
|
||||
queryIndex = i;
|
||||
queryData = unrated;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!queryData) {
|
||||
// All done
|
||||
setState((s) => ({
|
||||
...s,
|
||||
view: 'completion',
|
||||
loading: false,
|
||||
}));
|
||||
return;
|
||||
}
|
||||
|
||||
setState((s) => ({
|
||||
...s,
|
||||
view: 'assessment',
|
||||
currentQueryIndex: queryIndex,
|
||||
currentQuery: queryData,
|
||||
currentIdeaIndex: 0,
|
||||
loading: false,
|
||||
}));
|
||||
} catch (err) {
|
||||
setState((s) => ({
|
||||
...s,
|
||||
error: err instanceof Error ? err.message : 'Failed to start assessment',
|
||||
loading: false,
|
||||
}));
|
||||
}
|
||||
}, [state.rater, state.queries]);
|
||||
|
||||
// Move to next idea
|
||||
const nextIdea = useCallback(async () => {
|
||||
if (!state.currentQuery || !state.rater) return;
|
||||
|
||||
const nextIndex = state.currentIdeaIndex + 1;
|
||||
|
||||
if (nextIndex < state.currentQuery.ideas.length) {
|
||||
// More ideas in current query
|
||||
setState((s) => ({ ...s, currentIdeaIndex: nextIndex }));
|
||||
} else {
|
||||
// Query complete, try to move to next query
|
||||
const nextQueryIndex = state.currentQueryIndex + 1;
|
||||
|
||||
if (nextQueryIndex < state.queries.length) {
|
||||
setState((s) => ({ ...s, loading: true }));
|
||||
try {
|
||||
const unrated = await api.getUnratedIdeas(
|
||||
state.queries[nextQueryIndex].query_id,
|
||||
state.rater.rater_id
|
||||
);
|
||||
|
||||
if (unrated.ideas.length > 0) {
|
||||
setState((s) => ({
|
||||
...s,
|
||||
currentQueryIndex: nextQueryIndex,
|
||||
currentQuery: unrated,
|
||||
currentIdeaIndex: 0,
|
||||
loading: false,
|
||||
}));
|
||||
} else {
|
||||
// Try to find next query with unrated ideas
|
||||
for (let i = nextQueryIndex + 1; i < state.queries.length; i++) {
|
||||
const nextUnrated = await api.getUnratedIdeas(
|
||||
state.queries[i].query_id,
|
||||
state.rater.rater_id
|
||||
);
|
||||
if (nextUnrated.ideas.length > 0) {
|
||||
setState((s) => ({
|
||||
...s,
|
||||
currentQueryIndex: i,
|
||||
currentQuery: nextUnrated,
|
||||
currentIdeaIndex: 0,
|
||||
loading: false,
|
||||
}));
|
||||
return;
|
||||
}
|
||||
}
|
||||
// All queries complete
|
||||
setState((s) => ({
|
||||
...s,
|
||||
view: 'completion',
|
||||
loading: false,
|
||||
}));
|
||||
}
|
||||
} catch (err) {
|
||||
setState((s) => ({
|
||||
...s,
|
||||
error: err instanceof Error ? err.message : 'Failed to load next query',
|
||||
loading: false,
|
||||
}));
|
||||
}
|
||||
} else {
|
||||
// All queries complete
|
||||
setState((s) => ({ ...s, view: 'completion' }));
|
||||
}
|
||||
}
|
||||
|
||||
// Refresh progress
|
||||
try {
|
||||
const progress = await api.getRaterProgress(state.rater.rater_id);
|
||||
setState((s) => ({ ...s, progress }));
|
||||
} catch (err) {
|
||||
console.error('Failed to refresh progress:', err);
|
||||
}
|
||||
}, [state.currentQuery, state.currentIdeaIndex, state.currentQueryIndex, state.queries, state.rater]);
|
||||
|
||||
// Move to previous idea
|
||||
const prevIdea = useCallback(() => {
|
||||
if (state.currentIdeaIndex > 0) {
|
||||
setState((s) => ({ ...s, currentIdeaIndex: s.currentIdeaIndex - 1 }));
|
||||
}
|
||||
}, [state.currentIdeaIndex]);
|
||||
|
||||
// Jump to a specific query
|
||||
const jumpToQuery = useCallback(async (queryIndex: number) => {
|
||||
if (!state.rater || queryIndex < 0 || queryIndex >= state.queries.length) return;
|
||||
|
||||
setState((s) => ({ ...s, loading: true }));
|
||||
try {
|
||||
const queryData = await api.getQueryWithIdeas(state.queries[queryIndex].query_id);
|
||||
setState((s) => ({
|
||||
...s,
|
||||
currentQueryIndex: queryIndex,
|
||||
currentQuery: queryData,
|
||||
currentIdeaIndex: 0,
|
||||
view: 'assessment',
|
||||
loading: false,
|
||||
}));
|
||||
} catch (err) {
|
||||
setState((s) => ({
|
||||
...s,
|
||||
error: err instanceof Error ? err.message : 'Failed to load query',
|
||||
loading: false,
|
||||
}));
|
||||
}
|
||||
}, [state.rater, state.queries]);
|
||||
|
||||
// Refresh progress
|
||||
const refreshProgress = useCallback(async () => {
|
||||
if (!state.rater) return;
|
||||
try {
|
||||
const progress = await api.getRaterProgress(state.rater.rater_id);
|
||||
setState((s) => ({ ...s, progress }));
|
||||
} catch (err) {
|
||||
console.error('Failed to refresh progress:', err);
|
||||
}
|
||||
}, [state.rater]);
|
||||
|
||||
// Show definitions
|
||||
const showInstructions = useCallback(() => {
|
||||
setState((s) => ({ ...s, view: 'instructions' }));
|
||||
}, []);
|
||||
|
||||
// Return to assessment
|
||||
const returnToAssessment = useCallback(() => {
|
||||
setState((s) => ({ ...s, view: 'assessment' }));
|
||||
}, []);
|
||||
|
||||
// Logout
|
||||
const logout = useCallback(() => {
|
||||
setState(initialState);
|
||||
}, []);
|
||||
|
||||
// Get current idea
|
||||
const currentIdea = state.currentQuery?.ideas[state.currentIdeaIndex] ?? null;
|
||||
|
||||
return {
|
||||
...state,
|
||||
currentIdea,
|
||||
login,
|
||||
startAssessment,
|
||||
nextIdea,
|
||||
prevIdea,
|
||||
jumpToQuery,
|
||||
refreshProgress,
|
||||
showInstructions,
|
||||
returnToAssessment,
|
||||
logout,
|
||||
};
|
||||
}
|
||||
133
experiments/assessment/frontend/src/hooks/useRatings.ts
Normal file
133
experiments/assessment/frontend/src/hooks/useRatings.ts
Normal file
@@ -0,0 +1,133 @@
|
||||
/**
|
||||
* Hook for managing rating submission.
|
||||
*/
|
||||
|
||||
import { useState, useCallback } from 'react';
|
||||
import type { RatingState, DimensionKey } from '../types';
|
||||
import * as api from '../services/api';
|
||||
|
||||
interface UseRatingsOptions {
|
||||
raterId: string | null;
|
||||
queryId: string | null;
|
||||
ideaId: string | null;
|
||||
onSuccess?: () => void;
|
||||
}
|
||||
|
||||
export function useRatings({ raterId, queryId, ideaId, onSuccess }: UseRatingsOptions) {
|
||||
const [ratings, setRatings] = useState<RatingState>({
|
||||
originality: null,
|
||||
elaboration: null,
|
||||
coherence: null,
|
||||
usefulness: null,
|
||||
});
|
||||
const [submitting, setSubmitting] = useState(false);
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
|
||||
// Set a single rating
|
||||
const setRating = useCallback((dimension: DimensionKey, value: number | null) => {
|
||||
setRatings((prev) => ({ ...prev, [dimension]: value }));
|
||||
}, []);
|
||||
|
||||
// Reset all ratings
|
||||
const resetRatings = useCallback(() => {
|
||||
setRatings({
|
||||
originality: null,
|
||||
elaboration: null,
|
||||
coherence: null,
|
||||
usefulness: null,
|
||||
});
|
||||
setError(null);
|
||||
}, []);
|
||||
|
||||
// Check if all ratings are set
|
||||
const isComplete = useCallback(() => {
|
||||
return (
|
||||
ratings.originality !== null &&
|
||||
ratings.elaboration !== null &&
|
||||
ratings.coherence !== null &&
|
||||
ratings.usefulness !== null
|
||||
);
|
||||
}, [ratings]);
|
||||
|
||||
// Submit rating
|
||||
const submit = useCallback(async () => {
|
||||
if (!raterId || !queryId || !ideaId) {
|
||||
setError('Missing required information');
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!isComplete()) {
|
||||
setError('Please rate all dimensions');
|
||||
return false;
|
||||
}
|
||||
|
||||
setSubmitting(true);
|
||||
setError(null);
|
||||
|
||||
try {
|
||||
await api.submitRating({
|
||||
rater_id: raterId,
|
||||
idea_id: ideaId,
|
||||
query_id: queryId,
|
||||
originality: ratings.originality,
|
||||
elaboration: ratings.elaboration,
|
||||
coherence: ratings.coherence,
|
||||
usefulness: ratings.usefulness,
|
||||
skipped: false,
|
||||
});
|
||||
|
||||
resetRatings();
|
||||
onSuccess?.();
|
||||
return true;
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Failed to submit rating');
|
||||
return false;
|
||||
} finally {
|
||||
setSubmitting(false);
|
||||
}
|
||||
}, [raterId, queryId, ideaId, ratings, isComplete, resetRatings, onSuccess]);
|
||||
|
||||
// Skip idea
|
||||
const skip = useCallback(async () => {
|
||||
if (!raterId || !queryId || !ideaId) {
|
||||
setError('Missing required information');
|
||||
return false;
|
||||
}
|
||||
|
||||
setSubmitting(true);
|
||||
setError(null);
|
||||
|
||||
try {
|
||||
await api.submitRating({
|
||||
rater_id: raterId,
|
||||
idea_id: ideaId,
|
||||
query_id: queryId,
|
||||
originality: null,
|
||||
elaboration: null,
|
||||
coherence: null,
|
||||
usefulness: null,
|
||||
skipped: true,
|
||||
});
|
||||
|
||||
resetRatings();
|
||||
onSuccess?.();
|
||||
return true;
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Failed to skip idea');
|
||||
return false;
|
||||
} finally {
|
||||
setSubmitting(false);
|
||||
}
|
||||
}, [raterId, queryId, ideaId, resetRatings, onSuccess]);
|
||||
|
||||
return {
|
||||
ratings,
|
||||
setRating,
|
||||
resetRatings,
|
||||
isComplete,
|
||||
submit,
|
||||
skip,
|
||||
submitting,
|
||||
error,
|
||||
};
|
||||
}
|
||||
43
experiments/assessment/frontend/src/index.css
Normal file
43
experiments/assessment/frontend/src/index.css
Normal file
@@ -0,0 +1,43 @@
|
||||
:root {
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
|
||||
line-height: 1.5;
|
||||
font-weight: 400;
|
||||
|
||||
color-scheme: light;
|
||||
color: rgba(0, 0, 0, 0.88);
|
||||
background-color: #f5f5f5;
|
||||
|
||||
font-synthesis: none;
|
||||
text-rendering: optimizeLegibility;
|
||||
-webkit-font-smoothing: antialiased;
|
||||
-moz-osx-font-smoothing: grayscale;
|
||||
}
|
||||
|
||||
body {
|
||||
margin: 0;
|
||||
min-height: 100vh;
|
||||
}
|
||||
|
||||
#root {
|
||||
min-height: 100vh;
|
||||
}
|
||||
|
||||
/* Custom scrollbar */
|
||||
::-webkit-scrollbar {
|
||||
width: 8px;
|
||||
height: 8px;
|
||||
}
|
||||
|
||||
::-webkit-scrollbar-track {
|
||||
background: #f1f1f1;
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
::-webkit-scrollbar-thumb {
|
||||
background: #c1c1c1;
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
::-webkit-scrollbar-thumb:hover {
|
||||
background: #a8a8a8;
|
||||
}
|
||||
10
experiments/assessment/frontend/src/main.tsx
Normal file
10
experiments/assessment/frontend/src/main.tsx
Normal file
@@ -0,0 +1,10 @@
|
||||
import { StrictMode } from 'react'
|
||||
import { createRoot } from 'react-dom/client'
|
||||
import './index.css'
|
||||
import App from './App'
|
||||
|
||||
createRoot(document.getElementById('root')!).render(
|
||||
<StrictMode>
|
||||
<App />
|
||||
</StrictMode>,
|
||||
)
|
||||
116
experiments/assessment/frontend/src/services/api.ts
Normal file
116
experiments/assessment/frontend/src/services/api.ts
Normal file
@@ -0,0 +1,116 @@
|
||||
/**
|
||||
* API client for the assessment backend.
|
||||
*/
|
||||
|
||||
import type {
|
||||
DimensionDefinitions,
|
||||
QueryInfo,
|
||||
QueryWithIdeas,
|
||||
Rater,
|
||||
RaterCreate,
|
||||
RaterProgress,
|
||||
Rating,
|
||||
RatingSubmit,
|
||||
SessionInfo,
|
||||
Statistics,
|
||||
} from '../types';
|
||||
|
||||
const API_BASE = '/api';
|
||||
|
||||
async function fetchJson<T>(url: string, options?: RequestInit): Promise<T> {
|
||||
const response = await fetch(`${API_BASE}${url}`, {
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
...options?.headers,
|
||||
},
|
||||
...options,
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
const error = await response.json().catch(() => ({ detail: response.statusText }));
|
||||
throw new Error(error.detail || 'API request failed');
|
||||
}
|
||||
|
||||
return response.json();
|
||||
}
|
||||
|
||||
// Rater API
|
||||
export async function listRaters(): Promise<Rater[]> {
|
||||
return fetchJson<Rater[]>('/raters');
|
||||
}
|
||||
|
||||
export async function createOrGetRater(data: RaterCreate): Promise<Rater> {
|
||||
return fetchJson<Rater>('/raters', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify(data),
|
||||
});
|
||||
}
|
||||
|
||||
export async function getRater(raterId: string): Promise<Rater> {
|
||||
return fetchJson<Rater>(`/raters/${encodeURIComponent(raterId)}`);
|
||||
}
|
||||
|
||||
// Query API
|
||||
export async function listQueries(): Promise<QueryInfo[]> {
|
||||
return fetchJson<QueryInfo[]>('/queries');
|
||||
}
|
||||
|
||||
export async function getQueryWithIdeas(queryId: string): Promise<QueryWithIdeas> {
|
||||
return fetchJson<QueryWithIdeas>(`/queries/${encodeURIComponent(queryId)}`);
|
||||
}
|
||||
|
||||
export async function getUnratedIdeas(queryId: string, raterId: string): Promise<QueryWithIdeas> {
|
||||
return fetchJson<QueryWithIdeas>(
|
||||
`/queries/${encodeURIComponent(queryId)}/unrated?rater_id=${encodeURIComponent(raterId)}`
|
||||
);
|
||||
}
|
||||
|
||||
// Rating API
|
||||
export async function submitRating(rating: RatingSubmit): Promise<{ saved: boolean }> {
|
||||
return fetchJson<{ saved: boolean }>('/ratings', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify(rating),
|
||||
});
|
||||
}
|
||||
|
||||
export async function getRating(raterId: string, ideaId: string): Promise<Rating | null> {
|
||||
try {
|
||||
return await fetchJson<Rating>(`/ratings/${encodeURIComponent(raterId)}/${encodeURIComponent(ideaId)}`);
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export async function getRatingsByRater(raterId: string): Promise<Rating[]> {
|
||||
return fetchJson<Rating[]>(`/ratings/rater/${encodeURIComponent(raterId)}`);
|
||||
}
|
||||
|
||||
// Progress API
|
||||
export async function getRaterProgress(raterId: string): Promise<RaterProgress> {
|
||||
return fetchJson<RaterProgress>(`/progress/${encodeURIComponent(raterId)}`);
|
||||
}
|
||||
|
||||
// Statistics API
|
||||
export async function getStatistics(): Promise<Statistics> {
|
||||
return fetchJson<Statistics>('/statistics');
|
||||
}
|
||||
|
||||
// Dimension definitions API
|
||||
export async function getDimensionDefinitions(): Promise<DimensionDefinitions> {
|
||||
return fetchJson<DimensionDefinitions>('/dimensions');
|
||||
}
|
||||
|
||||
// Session info API
|
||||
export async function getSessionInfo(): Promise<SessionInfo> {
|
||||
return fetchJson<SessionInfo>('/info');
|
||||
}
|
||||
|
||||
// Health check
|
||||
export async function healthCheck(): Promise<boolean> {
|
||||
try {
|
||||
await fetchJson<{ status: string }>('/health');
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
142
experiments/assessment/frontend/src/types/index.ts
Normal file
142
experiments/assessment/frontend/src/types/index.ts
Normal file
@@ -0,0 +1,142 @@
|
||||
/**
|
||||
* TypeScript types for the assessment frontend.
|
||||
*/
|
||||
|
||||
// Rater types
|
||||
export interface Rater {
|
||||
rater_id: string;
|
||||
name: string | null;
|
||||
created_at?: string;
|
||||
}
|
||||
|
||||
export interface RaterCreate {
|
||||
rater_id: string;
|
||||
name?: string;
|
||||
}
|
||||
|
||||
// Query types
|
||||
export interface QueryInfo {
|
||||
query_id: string;
|
||||
query_text: string;
|
||||
category: string;
|
||||
idea_count: number;
|
||||
}
|
||||
|
||||
export interface IdeaForRating {
|
||||
idea_id: string;
|
||||
text: string;
|
||||
index: number;
|
||||
}
|
||||
|
||||
export interface QueryWithIdeas {
|
||||
query_id: string;
|
||||
query_text: string;
|
||||
category: string;
|
||||
ideas: IdeaForRating[];
|
||||
total_count: number;
|
||||
}
|
||||
|
||||
// Rating types
|
||||
export interface RatingSubmit {
|
||||
rater_id: string;
|
||||
idea_id: string;
|
||||
query_id: string;
|
||||
originality: number | null;
|
||||
elaboration: number | null;
|
||||
coherence: number | null;
|
||||
usefulness: number | null;
|
||||
skipped: boolean;
|
||||
}
|
||||
|
||||
export interface Rating {
|
||||
id: number;
|
||||
rater_id: string;
|
||||
idea_id: string;
|
||||
query_id: string;
|
||||
originality: number | null;
|
||||
elaboration: number | null;
|
||||
coherence: number | null;
|
||||
usefulness: number | null;
|
||||
skipped: number;
|
||||
timestamp: string | null;
|
||||
}
|
||||
|
||||
// Progress types
|
||||
export interface QueryProgress {
|
||||
rater_id: string;
|
||||
query_id: string;
|
||||
completed_count: number;
|
||||
total_count: number;
|
||||
started_at?: string;
|
||||
updated_at?: string;
|
||||
}
|
||||
|
||||
export interface RaterProgress {
|
||||
rater_id: string;
|
||||
queries: QueryProgress[];
|
||||
total_completed: number;
|
||||
total_ideas: number;
|
||||
percentage: number;
|
||||
}
|
||||
|
||||
// Statistics types
|
||||
export interface Statistics {
|
||||
rater_count: number;
|
||||
rating_count: number;
|
||||
skip_count: number;
|
||||
rated_ideas: number;
|
||||
}
|
||||
|
||||
// Dimension definition types
|
||||
export interface DimensionScale {
|
||||
1: string;
|
||||
2: string;
|
||||
3: string;
|
||||
4: string;
|
||||
5: string;
|
||||
}
|
||||
|
||||
export interface DimensionDefinition {
|
||||
name: string;
|
||||
question: string;
|
||||
scale: DimensionScale;
|
||||
low_label: string;
|
||||
high_label: string;
|
||||
}
|
||||
|
||||
export interface DimensionDefinitions {
|
||||
originality: DimensionDefinition;
|
||||
elaboration: DimensionDefinition;
|
||||
coherence: DimensionDefinition;
|
||||
usefulness: DimensionDefinition;
|
||||
}
|
||||
|
||||
// Session info
|
||||
export interface SessionInfo {
|
||||
experiment_id: string;
|
||||
total_ideas: number;
|
||||
query_count: number;
|
||||
conditions: string[];
|
||||
randomization_seed: number;
|
||||
}
|
||||
|
||||
// UI State types
|
||||
export type AppView = 'login' | 'instructions' | 'assessment' | 'completion';
|
||||
|
||||
export interface RatingState {
|
||||
originality: number | null;
|
||||
elaboration: number | null;
|
||||
coherence: number | null;
|
||||
usefulness: number | null;
|
||||
}
|
||||
|
||||
export const EMPTY_RATING_STATE: RatingState = {
|
||||
originality: null,
|
||||
elaboration: null,
|
||||
coherence: null,
|
||||
usefulness: null,
|
||||
};
|
||||
|
||||
export type DimensionKey = keyof RatingState;
|
||||
|
||||
export const DIMENSION_KEYS: DimensionKey[] = ['originality', 'elaboration', 'coherence', 'usefulness'];
|
||||
20
experiments/assessment/frontend/tsconfig.json
Normal file
20
experiments/assessment/frontend/tsconfig.json
Normal file
@@ -0,0 +1,20 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"target": "ES2020",
|
||||
"useDefineForClassFields": true,
|
||||
"lib": ["ES2020", "DOM", "DOM.Iterable"],
|
||||
"module": "ESNext",
|
||||
"skipLibCheck": true,
|
||||
"moduleResolution": "bundler",
|
||||
"allowImportingTsExtensions": true,
|
||||
"isolatedModules": true,
|
||||
"moduleDetection": "force",
|
||||
"noEmit": true,
|
||||
"jsx": "react-jsx",
|
||||
"strict": true,
|
||||
"noUnusedLocals": true,
|
||||
"noUnusedParameters": true,
|
||||
"noFallthroughCasesInSwitch": true
|
||||
},
|
||||
"include": ["src"]
|
||||
}
|
||||
16
experiments/assessment/frontend/vite.config.ts
Normal file
16
experiments/assessment/frontend/vite.config.ts
Normal file
@@ -0,0 +1,16 @@
|
||||
import { defineConfig } from 'vite'
|
||||
import react from '@vitejs/plugin-react'
|
||||
|
||||
export default defineConfig({
|
||||
plugins: [react()],
|
||||
server: {
|
||||
host: '0.0.0.0',
|
||||
port: 5174,
|
||||
proxy: {
|
||||
'/api': {
|
||||
target: 'http://localhost:8002',
|
||||
changeOrigin: true
|
||||
}
|
||||
}
|
||||
},
|
||||
})
|
||||
375
experiments/assessment/prepare_data.py
Executable file
375
experiments/assessment/prepare_data.py
Executable file
@@ -0,0 +1,375 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Prepare assessment data from experiment results.
|
||||
|
||||
Extracts unique ideas from deduped experiment results, assigns stable IDs,
|
||||
and randomizes the order within each query for unbiased human assessment.
|
||||
|
||||
Usage:
|
||||
python prepare_data.py # Use latest, all ideas
|
||||
python prepare_data.py --sample 100 # Sample 100 ideas total
|
||||
python prepare_data.py --per-query 10 # 10 ideas per query
|
||||
python prepare_data.py --per-condition 5 # 5 ideas per condition per query
|
||||
python prepare_data.py --list # List available files
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import random
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
|
||||
def load_experiment_data(filepath: Path) -> dict[str, Any]:
|
||||
"""Load experiment data from JSON file."""
|
||||
with open(filepath, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def sample_ideas_stratified(
|
||||
ideas: list[dict[str, Any]],
|
||||
per_condition: int | None = None,
|
||||
total_limit: int | None = None,
|
||||
rng: random.Random | None = None
|
||||
) -> list[dict[str, Any]]:
|
||||
"""
|
||||
Sample ideas with stratification by condition.
|
||||
|
||||
Args:
|
||||
ideas: List of ideas with _hidden.condition metadata
|
||||
per_condition: Max ideas per condition (stratified sampling)
|
||||
total_limit: Max total ideas (after stratified sampling)
|
||||
rng: Random number generator for reproducibility
|
||||
|
||||
Returns:
|
||||
Sampled list of ideas
|
||||
"""
|
||||
if rng is None:
|
||||
rng = random.Random()
|
||||
|
||||
if per_condition is None and total_limit is None:
|
||||
return ideas
|
||||
|
||||
# Group by condition
|
||||
by_condition: dict[str, list[dict[str, Any]]] = {}
|
||||
for idea in ideas:
|
||||
condition = idea['_hidden']['condition']
|
||||
if condition not in by_condition:
|
||||
by_condition[condition] = []
|
||||
by_condition[condition].append(idea)
|
||||
|
||||
# Sample per condition
|
||||
sampled = []
|
||||
for condition, cond_ideas in by_condition.items():
|
||||
rng.shuffle(cond_ideas)
|
||||
if per_condition is not None:
|
||||
cond_ideas = cond_ideas[:per_condition]
|
||||
sampled.extend(cond_ideas)
|
||||
|
||||
# Apply total limit if specified
|
||||
if total_limit is not None and len(sampled) > total_limit:
|
||||
rng.shuffle(sampled)
|
||||
sampled = sampled[:total_limit]
|
||||
|
||||
return sampled
|
||||
|
||||
|
||||
def extract_ideas_from_condition(
|
||||
query_id: str,
|
||||
condition_name: str,
|
||||
condition_data: dict[str, Any],
|
||||
idea_counter: dict[str, int]
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Extract ideas from a single condition with hidden metadata."""
|
||||
ideas = []
|
||||
|
||||
dedup_data = condition_data.get('dedup', {})
|
||||
unique_ideas_with_source = dedup_data.get('unique_ideas_with_source', [])
|
||||
|
||||
for item in unique_ideas_with_source:
|
||||
idea_text = item.get('idea', '')
|
||||
if not idea_text:
|
||||
continue
|
||||
|
||||
# Generate stable idea ID
|
||||
current_count = idea_counter.get(query_id, 0)
|
||||
idea_id = f"{query_id}_I{current_count:03d}"
|
||||
idea_counter[query_id] = current_count + 1
|
||||
|
||||
ideas.append({
|
||||
'idea_id': idea_id,
|
||||
'text': idea_text,
|
||||
'_hidden': {
|
||||
'condition': condition_name,
|
||||
'expert_name': item.get('expert_name', ''),
|
||||
'keyword': item.get('keyword', '')
|
||||
}
|
||||
})
|
||||
|
||||
return ideas
|
||||
|
||||
|
||||
def prepare_assessment_data(
|
||||
experiment_filepath: Path,
|
||||
output_filepath: Path,
|
||||
seed: int = 42,
|
||||
sample_total: int | None = None,
|
||||
per_query: int | None = None,
|
||||
per_condition: int | None = None
|
||||
) -> dict[str, Any]:
|
||||
"""
|
||||
Prepare assessment data from experiment results.
|
||||
|
||||
Args:
|
||||
experiment_filepath: Path to deduped experiment JSON
|
||||
output_filepath: Path to write assessment items JSON
|
||||
seed: Random seed for reproducible shuffling
|
||||
sample_total: Total number of ideas to sample (across all queries)
|
||||
per_query: Maximum ideas per query
|
||||
per_condition: Maximum ideas per condition per query (stratified)
|
||||
|
||||
Returns:
|
||||
Assessment data structure
|
||||
"""
|
||||
rng = random.Random(seed)
|
||||
|
||||
# Load experiment data
|
||||
data = load_experiment_data(experiment_filepath)
|
||||
experiment_id = data.get('experiment_id', 'unknown')
|
||||
conditions = data.get('conditions', [])
|
||||
results = data.get('results', [])
|
||||
|
||||
print(f"Loading experiment: {experiment_id}")
|
||||
print(f"Conditions: {conditions}")
|
||||
print(f"Number of queries: {len(results)}")
|
||||
|
||||
# Show sampling config
|
||||
if sample_total or per_query or per_condition:
|
||||
print(f"Sampling config: total={sample_total}, per_query={per_query}, per_condition={per_condition}")
|
||||
|
||||
assessment_queries = []
|
||||
total_ideas = 0
|
||||
idea_counter: dict[str, int] = {}
|
||||
|
||||
for result in results:
|
||||
query_id = result.get('query_id', '')
|
||||
query_text = result.get('query', '')
|
||||
category = result.get('category', '')
|
||||
|
||||
query_ideas = []
|
||||
|
||||
# Extract ideas from all conditions
|
||||
conditions_data = result.get('conditions', {})
|
||||
for condition_name, condition_data in conditions_data.items():
|
||||
ideas = extract_ideas_from_condition(
|
||||
query_id, condition_name, condition_data, idea_counter
|
||||
)
|
||||
query_ideas.extend(ideas)
|
||||
|
||||
# Apply stratified sampling if per_condition is specified
|
||||
if per_condition is not None:
|
||||
query_ideas = sample_ideas_stratified(
|
||||
query_ideas,
|
||||
per_condition=per_condition,
|
||||
rng=rng
|
||||
)
|
||||
|
||||
# Apply per-query limit
|
||||
if per_query is not None and len(query_ideas) > per_query:
|
||||
rng.shuffle(query_ideas)
|
||||
query_ideas = query_ideas[:per_query]
|
||||
|
||||
# Shuffle ideas within this query
|
||||
rng.shuffle(query_ideas)
|
||||
|
||||
assessment_queries.append({
|
||||
'query_id': query_id,
|
||||
'query_text': query_text,
|
||||
'category': category,
|
||||
'ideas': query_ideas,
|
||||
'idea_count': len(query_ideas)
|
||||
})
|
||||
|
||||
total_ideas += len(query_ideas)
|
||||
print(f" Query '{query_text}' ({query_id}): {len(query_ideas)} ideas")
|
||||
|
||||
# Apply total sample limit across all queries (proportionally)
|
||||
if sample_total is not None and total_ideas > sample_total:
|
||||
print(f"\nApplying total sample limit: {sample_total} (from {total_ideas})")
|
||||
# Calculate proportion to keep
|
||||
keep_ratio = sample_total / total_ideas
|
||||
new_total = 0
|
||||
|
||||
for query in assessment_queries:
|
||||
n_keep = max(1, int(len(query['ideas']) * keep_ratio))
|
||||
rng.shuffle(query['ideas'])
|
||||
query['ideas'] = query['ideas'][:n_keep]
|
||||
query['idea_count'] = len(query['ideas'])
|
||||
new_total += len(query['ideas'])
|
||||
|
||||
total_ideas = new_total
|
||||
|
||||
# Build output structure
|
||||
assessment_data = {
|
||||
'experiment_id': experiment_id,
|
||||
'queries': assessment_queries,
|
||||
'total_ideas': total_ideas,
|
||||
'query_count': len(assessment_queries),
|
||||
'conditions': conditions,
|
||||
'randomization_seed': seed,
|
||||
'sampling': {
|
||||
'sample_total': sample_total,
|
||||
'per_query': per_query,
|
||||
'per_condition': per_condition
|
||||
},
|
||||
'metadata': {
|
||||
'source_file': str(experiment_filepath.name),
|
||||
'prepared_for': 'human_assessment'
|
||||
}
|
||||
}
|
||||
|
||||
# Write output
|
||||
output_filepath.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(output_filepath, 'w', encoding='utf-8') as f:
|
||||
json.dump(assessment_data, f, ensure_ascii=False, indent=2)
|
||||
|
||||
print(f"\nTotal ideas for assessment: {total_ideas}")
|
||||
print(f"Output written to: {output_filepath}")
|
||||
|
||||
return assessment_data
|
||||
|
||||
|
||||
def list_experiment_files(results_dir: Path) -> list[Path]:
|
||||
"""List available deduped experiment files."""
|
||||
return sorted(results_dir.glob('*_deduped.json'), key=lambda p: p.stat().st_mtime, reverse=True)
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Prepare assessment data from experiment results.',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
python prepare_data.py # Use latest, all ideas
|
||||
python prepare_data.py --sample 100 # Sample 100 ideas total
|
||||
python prepare_data.py --per-query 20 # Max 20 ideas per query
|
||||
python prepare_data.py --per-condition 4 # 4 ideas per condition per query
|
||||
python prepare_data.py --per-condition 4 --per-query 15 # Combined limits
|
||||
python prepare_data.py --list # List available files
|
||||
|
||||
Recommended for human assessment:
|
||||
# 5 conditions × 4 ideas × 10 queries = 200 ideas (balanced)
|
||||
python prepare_data.py --per-condition 4
|
||||
|
||||
# Or limit total to ~150 ideas
|
||||
python prepare_data.py --sample 150
|
||||
"""
|
||||
)
|
||||
parser.add_argument(
|
||||
'experiment_file',
|
||||
nargs='?',
|
||||
default=None,
|
||||
help='Experiment file name (e.g., experiment_20260119_165650_deduped.json)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--list', '-l',
|
||||
action='store_true',
|
||||
help='List available experiment files'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--sample',
|
||||
type=int,
|
||||
default=None,
|
||||
metavar='N',
|
||||
help='Total number of ideas to sample (proportionally across queries)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--per-query',
|
||||
type=int,
|
||||
default=None,
|
||||
metavar='N',
|
||||
help='Maximum ideas per query'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--per-condition',
|
||||
type=int,
|
||||
default=None,
|
||||
metavar='N',
|
||||
help='Maximum ideas per condition per query (stratified sampling)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--seed', '-s',
|
||||
type=int,
|
||||
default=42,
|
||||
help='Random seed for shuffling (default: 42)'
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
# Paths
|
||||
base_dir = Path(__file__).parent.parent
|
||||
results_dir = base_dir / 'results'
|
||||
output_file = Path(__file__).parent / 'data' / 'assessment_items.json'
|
||||
|
||||
# List available files
|
||||
available_files = list_experiment_files(results_dir)
|
||||
|
||||
if args.list:
|
||||
print("Available experiment files (most recent first):")
|
||||
for f in available_files:
|
||||
size_kb = f.stat().st_size / 1024
|
||||
print(f" {f.name} ({size_kb:.1f} KB)")
|
||||
return
|
||||
|
||||
# Determine which file to use
|
||||
if args.experiment_file:
|
||||
experiment_file = results_dir / args.experiment_file
|
||||
if not experiment_file.exists():
|
||||
# Try without .json extension
|
||||
experiment_file = results_dir / f"{args.experiment_file}.json"
|
||||
else:
|
||||
# Use the latest deduped file
|
||||
if not available_files:
|
||||
print("Error: No deduped experiment files found in results directory.")
|
||||
return
|
||||
experiment_file = available_files[0]
|
||||
print(f"Using latest experiment file: {experiment_file.name}")
|
||||
|
||||
if not experiment_file.exists():
|
||||
print(f"Error: Experiment file not found: {experiment_file}")
|
||||
print("\nAvailable files:")
|
||||
for f in available_files:
|
||||
print(f" {f.name}")
|
||||
return
|
||||
|
||||
prepare_assessment_data(
|
||||
experiment_file,
|
||||
output_file,
|
||||
seed=args.seed,
|
||||
sample_total=args.sample,
|
||||
per_query=args.per_query,
|
||||
per_condition=args.per_condition
|
||||
)
|
||||
|
||||
# Verify output
|
||||
with open(output_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
|
||||
print("\n--- Verification ---")
|
||||
print(f"Queries: {data['query_count']}")
|
||||
print(f"Total ideas: {data['total_ideas']}")
|
||||
|
||||
# Show distribution by condition (from hidden metadata)
|
||||
condition_counts: dict[str, int] = {}
|
||||
for query in data['queries']:
|
||||
for idea in query['ideas']:
|
||||
condition = idea['_hidden']['condition']
|
||||
condition_counts[condition] = condition_counts.get(condition, 0) + 1
|
||||
|
||||
print("\nIdeas per condition:")
|
||||
for condition, count in sorted(condition_counts.items()):
|
||||
print(f" {condition}: {count}")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
BIN
experiments/assessment/results/ratings.db
Normal file
BIN
experiments/assessment/results/ratings.db
Normal file
Binary file not shown.
101
experiments/assessment/start.sh
Executable file
101
experiments/assessment/start.sh
Executable file
@@ -0,0 +1,101 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Human Assessment Web Interface Start Script
|
||||
# This script starts both the backend API and frontend dev server
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
echo -e "${GREEN}================================${NC}"
|
||||
echo -e "${GREEN}Creative Idea Assessment System${NC}"
|
||||
echo -e "${GREEN}================================${NC}"
|
||||
echo
|
||||
|
||||
# Find Python with FastAPI (use project venv or system)
|
||||
VENV_PYTHON="$SCRIPT_DIR/../../backend/venv/bin/python"
|
||||
if [ -x "$VENV_PYTHON" ]; then
|
||||
PYTHON_CMD="$VENV_PYTHON"
|
||||
UVICORN_CMD="$SCRIPT_DIR/../../backend/venv/bin/uvicorn"
|
||||
else
|
||||
PYTHON_CMD="python3"
|
||||
UVICORN_CMD="uvicorn"
|
||||
fi
|
||||
|
||||
# Check if assessment data exists
|
||||
if [ ! -f "data/assessment_items.json" ]; then
|
||||
echo -e "${YELLOW}Assessment data not found. Running prepare_data.py...${NC}"
|
||||
$PYTHON_CMD prepare_data.py
|
||||
echo
|
||||
fi
|
||||
|
||||
# Check if node_modules exist in frontend
|
||||
if [ ! -d "frontend/node_modules" ]; then
|
||||
echo -e "${YELLOW}Installing frontend dependencies...${NC}"
|
||||
cd frontend
|
||||
npm install
|
||||
cd ..
|
||||
echo
|
||||
fi
|
||||
|
||||
# Function to cleanup background processes on exit
|
||||
cleanup() {
|
||||
echo
|
||||
echo -e "${YELLOW}Shutting down...${NC}"
|
||||
kill $BACKEND_PID 2>/dev/null || true
|
||||
kill $FRONTEND_PID 2>/dev/null || true
|
||||
exit 0
|
||||
}
|
||||
|
||||
trap cleanup SIGINT SIGTERM
|
||||
|
||||
# Start backend
|
||||
echo -e "${GREEN}Starting backend API on port 8002...${NC}"
|
||||
cd backend
|
||||
$UVICORN_CMD app:app --host 0.0.0.0 --port 8002 --reload &
|
||||
BACKEND_PID=$!
|
||||
cd ..
|
||||
|
||||
# Wait for backend to start
|
||||
echo "Waiting for backend to initialize..."
|
||||
sleep 2
|
||||
|
||||
# Check if backend is running
|
||||
if ! curl -s http://localhost:8002/api/health > /dev/null 2>&1; then
|
||||
echo -e "${RED}Backend failed to start. Check for errors above.${NC}"
|
||||
kill $BACKEND_PID 2>/dev/null || true
|
||||
exit 1
|
||||
fi
|
||||
echo -e "${GREEN}Backend is running.${NC}"
|
||||
echo
|
||||
|
||||
# Start frontend
|
||||
echo -e "${GREEN}Starting frontend on port 5174...${NC}"
|
||||
cd frontend
|
||||
npm run dev &
|
||||
FRONTEND_PID=$!
|
||||
cd ..
|
||||
|
||||
# Wait for frontend to start
|
||||
sleep 3
|
||||
|
||||
echo
|
||||
echo -e "${GREEN}================================${NC}"
|
||||
echo -e "${GREEN}Assessment system is running!${NC}"
|
||||
echo -e "${GREEN}================================${NC}"
|
||||
echo
|
||||
echo -e "Backend API: ${YELLOW}http://localhost:8002${NC}"
|
||||
echo -e "Frontend UI: ${YELLOW}http://localhost:5174${NC}"
|
||||
echo
|
||||
echo -e "Press Ctrl+C to stop all services"
|
||||
echo
|
||||
|
||||
# Wait for any process to exit
|
||||
wait
|
||||
13
experiments/assessment/stop.sh
Executable file
13
experiments/assessment/stop.sh
Executable file
@@ -0,0 +1,13 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Stop the assessment system
|
||||
|
||||
echo "Stopping assessment system..."
|
||||
|
||||
# Kill backend (uvicorn on port 8002)
|
||||
pkill -f "uvicorn app:app.*8002" 2>/dev/null && echo "Backend stopped" || echo "Backend not running"
|
||||
|
||||
# Kill frontend (vite on port 5174)
|
||||
pkill -f "vite.*5174" 2>/dev/null && echo "Frontend stopped" || echo "Frontend not running"
|
||||
|
||||
echo "Done"
|
||||
Reference in New Issue
Block a user