feat: Add curated expert occupations with local data sources

- Add curated occupations seed files (210 entries in zh/en) with specific domains
- Add DBpedia occupations data (2164 entries) for external source option
- Refactor expert_source_service to read from local JSON files
- Improve keyword generation prompts to leverage expert domain context
- Add architecture analysis documentation (ARCHITECTURE_ANALYSIS.md)
- Fix expert source selection bug (proper handling of empty custom_experts)
- Update frontend to support curated/dbpedia/wikidata expert sources

Key changes:
- backend/app/data/: Local occupation data files
- backend/app/services/expert_source_service.py: Simplified local file reading
- backend/app/prompts/expert_transformation_prompt.py: Better domain-aware prompts
- Removed expert_cache.py (no longer needed with local files)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-12-04 16:34:35 +08:00
parent 8777e27cbb
commit 5571076406
15 changed files with 9970 additions and 380 deletions

277
ARCHITECTURE_ANALYSIS.md Normal file
View File

@@ -0,0 +1,277 @@
# novelty-seeking 系統流程與耦合度分析
> 生成日期: 2025-12-04
## 一、系統整體架構概覽
novelty-seeking 是一個創新思維引導系統,由三個核心 Agent 組成:
- **Attribute Agent**:從查詢到屬性節點的映射
- **Transformation Agent**:屬性到新關鍵字的轉換
- **Expert Transformation Agent**:多視角專家角度的屬性轉換
---
## 二、完整資料流程
```
┌─────────────────────────────────────────────────────────────────────┐
│ Attribute Agent │
├─────────────────────────────────────────────────────────────────────┤
│ 用戶輸入 Query (如「腳踏車」) │
│ ↓ │
│ Step 0: 類別分析 (category_mode 決定) │
│ → 產出: CategoryDefinition[] (如 材料/功能/用途/使用族群) │
│ ↓ │
│ Step 1: 屬性列表生成 │
│ → 產出: {材料: [鋼,木,碳纖維], 功能: [搬運,儲存], ...} │
│ ↓ │
│ Step 2: 關係生成 (DAG 邊) │
│ → 產出: AttributeDAG (nodes + edges) │
└─────────────────────────────────────────────────────────────────────┘
↓ (高耦合)
┌─────────────────────────────────────────────────────────────────────┐
│ Expert Transformation Agent │
├─────────────────────────────────────────────────────────────────────┤
│ 輸入: Query + Category + Attributes (來自 Attribute Agent) │
│ ↓ │
│ Step 0: 專家團隊生成 │
│ → expert_source 決定: llm / curated / dbpedia / wikidata │
│ → 產出: ExpertProfile[] (如 會計師/心理師/生態學家) │
│ ↓ │
│ Step 1: 專家視角關鍵字生成 (對每個 attribute) │
│ → 產出: ExpertKeyword[] (關鍵字 + 來源專家 + 來源屬性) │
│ ↓ │
│ Step 2: 描述生成 (對每個 keyword) │
│ → 產出: ExpertTransformationDescription[] │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 三、Attribute Agent 詳細流程
### 3.1 流程架構
```
用戶查詢 (Query)
┌─────────────────────────────────────────────┐
│ Step 0: 類別分析 (Category Mode 決定) │
├─────────────────────────────────────────────┤
│ 輸入: query, suggested_category_count │
│ 處理: │
│ - FIXED_ONLY: 使用 4 個固定類別 │
│ - FIXED_PLUS_CUSTOM: 固定 + 用戶自訂 │
│ - FIXED_PLUS_DYNAMIC: 固定 + LLM 推薦 │
│ - CUSTOM_ONLY: 僅 LLM 推薦 │
│ - DYNAMIC_AUTO: 純 LLM 推薦 (預設) │
│ 輸出: Step0Result (recommended categories) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Step 1: 屬性列表生成 (Attributes) │
├─────────────────────────────────────────────┤
│ 輸入: query, final_categories │
│ LLM 處理: │
│ - 分析 query 在各類別下的屬性 │
│ - 每個類別生成 3-5 個屬性 │
│ 輸出: DynamicStep1Result │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Step 2: 關係映射 (Relationships → DAG) │
├─────────────────────────────────────────────┤
│ 輸入: query, categories, attributes │
│ LLM 處理: │
│ - 分析相鄰類別之間的因果關係 │
│ - 生成 (source, target) 關係對 │
│ 輸出: AttributeDAG │
└─────────────────────────────────────────────┘
```
### 3.2 關鍵輸入變數
| 變數 | 來源 | 影響範圍 | 作用 |
|------|------|--------|------|
| `query` | 用戶輸入 | Step 0-2 全部 | 決定分析的物件 |
| `category_mode` | 用戶選擇 | Step 0 | 決定使用哪些類別 |
| `suggested_category_count` | 用戶設定 | Step 0 | LLM 推薦類別的數量 |
| `temperature` | 用戶設定 | Step 0-2 | 控制 LLM 輸出的多樣性 |
| `model` | 用戶選擇 | Step 0-2 | 選擇不同的 LLM 模型 |
---
## 四、Expert Transformation Agent 詳細流程
### 4.1 流程架構
```
屬性列表 (attributes from Attribute Agent)
┌─────────────────────────────────────────────┐
│ Step 0: 專家團隊生成 (Expert Generation) │
├─────────────────────────────────────────────┤
│ 決定因素: │
│ - expert_source = 'llm' → LLM 生成 │
│ - expert_source ∈ ['curated', 'dbpedia', │
│ 'wikidata'] → 本地檔案隨機選取 │
│ - 有 custom_experts → 結合 LLM │
│ │
│ 輸出: ExpertProfile[] │
│ [{id, name, domain, perspective}] │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Step 1: 專家視角關鍵字生成 (Keywords) │
├─────────────────────────────────────────────┤
│ 迴圈: for each attribute in attributes: │
│ LLM 為每個專家生成 keywords_per_expert │
│ 個關鍵字 │
│ │
│ 輸出: ExpertKeyword[] │
│ [{keyword, expert_id, expert_name, │
│ source_attribute}] │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Step 2: 描述生成 (Descriptions) │
├─────────────────────────────────────────────┤
│ 迴圈: for each expert_keyword: │
│ LLM 生成 15-30 字的創新應用描述 │
│ │
│ 輸出: ExpertTransformationDescription[] │
└─────────────────────────────────────────────┘
```
### 4.2 關鍵輸入變數
| 變數 | 來源 | 影響範圍 | 作用 |
|------|------|--------|------|
| `expert_source` | 用戶選擇 | Step 0 | 決定專家來源 (llm/curated/dbpedia/wikidata) |
| `expert_count` | 用戶設定 | Step 0 | 專家數量 (2-8) |
| `keywords_per_expert` | 用戶設定 | Step 1 | 每專家每屬性關鍵字數 (1-3) |
| `custom_experts` | 用戶輸入 | Step 0 | 用戶指定的專家名稱 |
| `temperature` | 用戶設定 | Step 0-2 | 控制多樣性 |
### 4.3 關鍵字生成公式
```
總關鍵字數 = len(attributes) × expert_count × keywords_per_expert
範例計算:
├─ 3 個屬性 (搬運, 儲存, 展示)
├─ 3 位專家 (會計師, 心理師, 生態學家)
├─ 1 個關鍵字/專家
└─ = 3 × 3 × 1 = 9 個關鍵字
```
---
## 五、關鍵字生成影響因素
| 階段 | 影響變數 | 影響程度 | 說明 |
|------|---------|---------|------|
| **屬性生成** | `query` | 極高 | 決定 LLM 分析的語義基礎 |
| | `category_mode` | 高 | 決定類別維度 |
| | `temperature` | 中 | 越高越多樣 |
| | `model` | 中 | 不同模型知識基礎不同 |
| **專家生成** | `expert_source` | 高 | 決定專家來源與品質 |
| | `expert_count` | 中 | 2-8 位專家 |
| | `custom_experts` | 中 | 與 LLM 結合 |
| **關鍵字生成** | `experts[].domain` | 極高 | 直接決定關鍵字視角 |
| | `keywords_per_expert` | 低 | 控制數量 |
| | `source_attribute` | 高 | 決定思考起點 |
---
## 六、耦合度分析
### 6.1 高耦合連接 ⚠️
| 連接 | 耦合度 | 原因 | 風險 |
|------|--------|------|------|
| Attribute → Expert Transform | 高 | Expert 依賴 Attribute 輸出 | 結構變更需同步修改 |
| Expert 生成 → Keyword 生成 | 高 | domain 直接影響關鍵字 | domain 品質差→關鍵字無關 |
| Prompt → LLM 輸出結構 | 高 | prompt 定義 JSON 格式 | 改 prompt 需改 schema |
### 6.2 低耦合連接 ✓
| 連接 | 耦合度 | 原因 | 優點 |
|------|--------|------|------|
| curated/dbpedia/wikidata | 低 | 獨立本地檔案 | 可單獨更新 |
| SSE 通信格式 | 低 | 標準化解耦 | 向後相容 |
| useAttribute/useExpertTransformation | 低 | 獨立 hook | 可單獨複用 |
### 6.3 耦合度矩陣
| | Attribute | Transformation | Expert Transform |
|----|-----------|---------------|----|
| **Attribute** | - | 低 | 高 |
| **Transformation** | 低 | - | 低 |
| **Expert Transform** | 高 | 低 | - |
---
## 七、專家來源比較
| 來源 | 檔案 | 筆數 | Domain 品質 | 特點 |
|------|------|------|------------|------|
| `llm` | - | 動態 | 高 | LLM 根據 query 生成相關專家 |
| `curated` | curated_occupations_zh/en.json | 210 | 高 | 精選職業,含具體領域 |
| `dbpedia` | dbpedia_occupations_en.json | 2164 | 低 | 全是 "Professional Field" |
| `wikidata` | - | - | - | 未實作本地化 |
---
## 八、決策變化追蹤範例
```
Query: "腳踏車"
Category Mode: DYNAMIC_AUTO
→ LLM 建議 [材料, 功能, 用途, 使用族群]
Expert Source: "curated"
→ 隨機選取 [外科醫師(醫療與健康), 軟體工程師(資訊科技), 主廚(餐飲與服務)]
Attribute "搬運" + Expert "外科醫師"
→ LLM 思考: 醫療視角看搬運
→ Keyword: "器官運輸", "急救物流"
Description 生成:
→ "從急救醫療角度,腳踏車可改良為緊急醫療運輸工具..."
```
---
## 九、改進建議
| 問題 | 現狀 | 建議改進 |
|------|------|---------|
| domain 品質 | DBpedia 全是通用值 | ✅ 已建立精選職業 |
| 重複計算 Expert | 每類別重新生成 | 考慮 Expert 全局化 |
| Temperature 統一 | 整流程同一值 | 可按 Step 分開設定 |
| 缺乏快取 | 每次重新分析 | 加入 Attribute 快取層 |
| 語言支援 | 主要中文 | ✅ 已建立英文版 |
---
## 十、關鍵檔案清單
### Backend
- `app/routers/analyze.py` - Attribute Agent 路由
- `app/routers/expert_transformation.py` - Expert Transformation 路由
- `app/prompts/step_prompts.py` - Attribute Agent 提示詞
- `app/prompts/expert_transformation_prompt.py` - Expert Transformation 提示詞
- `app/services/expert_source_service.py` - 專家來源服務
- `app/services/llm_service.py` - LLM 調用服務
- `app/data/curated_occupations_zh.json` - 精選職業(中文)
- `app/data/curated_occupations_en.json` - 精選職業(英文)
- `app/data/dbpedia_occupations_en.json` - DBpedia 職業
### Frontend
- `src/App.tsx` - 主狀態管理
- `src/hooks/useAttribute.ts` - Attribute Agent Hook
- `src/hooks/useExpertTransformation.ts` - Expert Transformation Hook
- `src/components/TransformationInputPanel.tsx` - 轉換控制面板
- `src/types/index.ts` - 類型定義

View File

@@ -0,0 +1,9 @@
{
"metadata": {
"source": "conceptnet",
"language": "en",
"fetched_at": "2025-12-04T07:26:30.695936+00:00",
"total_count": 0
},
"occupations": []
}

View File

@@ -0,0 +1,9 @@
{
"metadata": {
"source": "conceptnet",
"language": "zh",
"fetched_at": "2025-12-04T07:26:26.994914+00:00",
"total_count": 0
},
"occupations": []
}

View File

@@ -0,0 +1,216 @@
{
"metadata": {
"source": "curated",
"language": "en",
"created_at": "2025-12-04",
"total_count": 210,
"description": "Curated common professional occupations with specific domains"
},
"occupations": [
{"name": "Surgeon", "domain": "Healthcare"},
{"name": "Internist", "domain": "Healthcare"},
{"name": "Dentist", "domain": "Healthcare"},
{"name": "Ophthalmologist", "domain": "Healthcare"},
{"name": "Psychiatrist", "domain": "Healthcare"},
{"name": "Pediatrician", "domain": "Healthcare"},
{"name": "Nurse", "domain": "Healthcare"},
{"name": "Pharmacist", "domain": "Healthcare"},
{"name": "Clinical Psychologist", "domain": "Healthcare"},
{"name": "Physical Therapist", "domain": "Healthcare"},
{"name": "Occupational Therapist", "domain": "Healthcare"},
{"name": "Nutritionist", "domain": "Healthcare"},
{"name": "Traditional Chinese Medicine Doctor", "domain": "Healthcare"},
{"name": "Veterinarian", "domain": "Healthcare"},
{"name": "Software Engineer", "domain": "Information Technology"},
{"name": "Frontend Developer", "domain": "Information Technology"},
{"name": "Backend Developer", "domain": "Information Technology"},
{"name": "Data Scientist", "domain": "Information Technology"},
{"name": "Data Engineer", "domain": "Information Technology"},
{"name": "Machine Learning Engineer", "domain": "Information Technology"},
{"name": "Cybersecurity Engineer", "domain": "Information Technology"},
{"name": "DevOps Engineer", "domain": "Information Technology"},
{"name": "UI Designer", "domain": "Information Technology"},
{"name": "UX Designer", "domain": "Information Technology"},
{"name": "Product Manager", "domain": "Information Technology"},
{"name": "Systems Analyst", "domain": "Information Technology"},
{"name": "Network Engineer", "domain": "Information Technology"},
{"name": "Cloud Architect", "domain": "Information Technology"},
{"name": "Accountant", "domain": "Finance & Business"},
{"name": "Financial Analyst", "domain": "Finance & Business"},
{"name": "Investment Manager", "domain": "Finance & Business"},
{"name": "Risk Manager", "domain": "Finance & Business"},
{"name": "Actuary", "domain": "Finance & Business"},
{"name": "Bank Manager", "domain": "Finance & Business"},
{"name": "Securities Analyst", "domain": "Finance & Business"},
{"name": "Tax Consultant", "domain": "Finance & Business"},
{"name": "Business Consultant", "domain": "Finance & Business"},
{"name": "HR Manager", "domain": "Finance & Business"},
{"name": "Marketing Manager", "domain": "Finance & Business"},
{"name": "Sales Manager", "domain": "Finance & Business"},
{"name": "Procurement Manager", "domain": "Finance & Business"},
{"name": "Entrepreneur", "domain": "Finance & Business"},
{"name": "Lawyer", "domain": "Law & Policy"},
{"name": "Judge", "domain": "Law & Policy"},
{"name": "Prosecutor", "domain": "Law & Policy"},
{"name": "Notary", "domain": "Law & Policy"},
{"name": "Legal Counsel", "domain": "Law & Policy"},
{"name": "IP Attorney", "domain": "Law & Policy"},
{"name": "Policy Analyst", "domain": "Law & Policy"},
{"name": "Diplomat", "domain": "Law & Policy"},
{"name": "Civil Servant", "domain": "Law & Policy"},
{"name": "Legislator", "domain": "Law & Policy"},
{"name": "Mediator", "domain": "Law & Policy"},
{"name": "Legal Scholar", "domain": "Law & Policy"},
{"name": "University Professor", "domain": "Education & Academia"},
{"name": "High School Teacher", "domain": "Education & Academia"},
{"name": "Middle School Teacher", "domain": "Education & Academia"},
{"name": "Elementary School Teacher", "domain": "Education & Academia"},
{"name": "Preschool Teacher", "domain": "Education & Academia"},
{"name": "Special Education Teacher", "domain": "Education & Academia"},
{"name": "Tutor", "domain": "Education & Academia"},
{"name": "Researcher", "domain": "Education & Academia"},
{"name": "Librarian", "domain": "Education & Academia"},
{"name": "Education Administrator", "domain": "Education & Academia"},
{"name": "Academic Editor", "domain": "Education & Academia"},
{"name": "Education Consultant", "domain": "Education & Academia"},
{"name": "Speech Therapist", "domain": "Education & Academia"},
{"name": "Painter", "domain": "Arts & Creativity"},
{"name": "Sculptor", "domain": "Arts & Creativity"},
{"name": "Musician", "domain": "Arts & Creativity"},
{"name": "Composer", "domain": "Arts & Creativity"},
{"name": "Conductor", "domain": "Arts & Creativity"},
{"name": "Dancer", "domain": "Arts & Creativity"},
{"name": "Actor", "domain": "Arts & Creativity"},
{"name": "Film Director", "domain": "Arts & Creativity"},
{"name": "Screenwriter", "domain": "Arts & Creativity"},
{"name": "Photographer", "domain": "Arts & Creativity"},
{"name": "Illustrator", "domain": "Arts & Creativity"},
{"name": "Animator", "domain": "Arts & Creativity"},
{"name": "Graphic Designer", "domain": "Arts & Creativity"},
{"name": "Fashion Designer", "domain": "Arts & Creativity"},
{"name": "Jewelry Designer", "domain": "Arts & Creativity"},
{"name": "Mechanical Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Electrical Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Electronics Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Chemical Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Materials Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Industrial Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Automation Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Quality Control Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Process Engineer", "domain": "Engineering & Manufacturing"},
{"name": "R&D Engineer", "domain": "Engineering & Manufacturing"},
{"name": "Production Manager", "domain": "Engineering & Manufacturing"},
{"name": "Factory Manager", "domain": "Engineering & Manufacturing"},
{"name": "Technician", "domain": "Engineering & Manufacturing"},
{"name": "Architect", "domain": "Architecture & Space"},
{"name": "Interior Designer", "domain": "Architecture & Space"},
{"name": "Landscape Designer", "domain": "Architecture & Space"},
{"name": "Urban Planner", "domain": "Architecture & Space"},
{"name": "Structural Engineer", "domain": "Architecture & Space"},
{"name": "Civil Engineer", "domain": "Architecture & Space"},
{"name": "Construction Engineer", "domain": "Architecture & Space"},
{"name": "Site Supervisor", "domain": "Architecture & Space"},
{"name": "Surveyor", "domain": "Architecture & Space"},
{"name": "Architectural Drafter", "domain": "Architecture & Space"},
{"name": "Exhibition Designer", "domain": "Architecture & Space"},
{"name": "Lighting Designer", "domain": "Architecture & Space"},
{"name": "Journalist", "domain": "Media & Communications"},
{"name": "News Anchor", "domain": "Media & Communications"},
{"name": "Editor", "domain": "Media & Communications"},
{"name": "Copy Editor", "domain": "Media & Communications"},
{"name": "Video Editor", "domain": "Media & Communications"},
{"name": "PR Specialist", "domain": "Media & Communications"},
{"name": "Advertising Planner", "domain": "Media & Communications"},
{"name": "Social Media Manager", "domain": "Media & Communications"},
{"name": "Content Creator", "domain": "Media & Communications"},
{"name": "Podcast Host", "domain": "Media & Communications"},
{"name": "Publisher", "domain": "Media & Communications"},
{"name": "Translator", "domain": "Media & Communications"},
{"name": "Interpreter", "domain": "Media & Communications"},
{"name": "Agronomist", "domain": "Agriculture & Environment"},
{"name": "Horticulturist", "domain": "Agriculture & Environment"},
{"name": "Livestock Specialist", "domain": "Agriculture & Environment"},
{"name": "Aquaculture Specialist", "domain": "Agriculture & Environment"},
{"name": "Environmental Engineer", "domain": "Agriculture & Environment"},
{"name": "Ecologist", "domain": "Agriculture & Environment"},
{"name": "Forest Ranger", "domain": "Agriculture & Environment"},
{"name": "Meteorologist", "domain": "Agriculture & Environment"},
{"name": "Geologist", "domain": "Agriculture & Environment"},
{"name": "Environmental Inspector", "domain": "Agriculture & Environment"},
{"name": "Sustainability Consultant", "domain": "Agriculture & Environment"},
{"name": "Organic Farmer", "domain": "Agriculture & Environment"},
{"name": "Executive Chef", "domain": "Hospitality & Service"},
{"name": "Pastry Chef", "domain": "Hospitality & Service"},
{"name": "Bartender", "domain": "Hospitality & Service"},
{"name": "Sommelier", "domain": "Hospitality & Service"},
{"name": "Restaurant Manager", "domain": "Hospitality & Service"},
{"name": "Hotel Manager", "domain": "Hospitality & Service"},
{"name": "Travel Planner", "domain": "Hospitality & Service"},
{"name": "Tour Guide", "domain": "Hospitality & Service"},
{"name": "Barista", "domain": "Hospitality & Service"},
{"name": "Food Critic", "domain": "Hospitality & Service"},
{"name": "Wedding Planner", "domain": "Hospitality & Service"},
{"name": "Event Planner", "domain": "Hospitality & Service"},
{"name": "Sports Coach", "domain": "Sports & Fitness"},
{"name": "Personal Trainer", "domain": "Sports & Fitness"},
{"name": "Yoga Instructor", "domain": "Sports & Fitness"},
{"name": "Athletic Trainer", "domain": "Sports & Fitness"},
{"name": "Physical Education Teacher", "domain": "Sports & Fitness"},
{"name": "Sports Psychologist", "domain": "Sports & Fitness"},
{"name": "Sports Nutritionist", "domain": "Sports & Fitness"},
{"name": "Professional Athlete", "domain": "Sports & Fitness"},
{"name": "Referee", "domain": "Sports & Fitness"},
{"name": "Strength Coach", "domain": "Sports & Fitness"},
{"name": "Sports Agent", "domain": "Sports & Fitness"},
{"name": "Social Worker", "domain": "Social Services"},
{"name": "Counselor", "domain": "Social Services"},
{"name": "Guidance Counselor", "domain": "Social Services"},
{"name": "Volunteer Coordinator", "domain": "Social Services"},
{"name": "Nonprofit Manager", "domain": "Social Services"},
{"name": "Community Organizer", "domain": "Social Services"},
{"name": "Elderly Care Worker", "domain": "Social Services"},
{"name": "Youth Counselor", "domain": "Social Services"},
{"name": "Family Therapist", "domain": "Social Services"},
{"name": "Career Counselor", "domain": "Social Services"},
{"name": "Addiction Counselor", "domain": "Social Services"},
{"name": "Pilot", "domain": "Transportation & Logistics"},
{"name": "Ship Captain", "domain": "Transportation & Logistics"},
{"name": "Train Operator", "domain": "Transportation & Logistics"},
{"name": "Air Traffic Controller", "domain": "Transportation & Logistics"},
{"name": "Logistics Manager", "domain": "Transportation & Logistics"},
{"name": "Supply Chain Manager", "domain": "Transportation & Logistics"},
{"name": "Warehouse Manager", "domain": "Transportation & Logistics"},
{"name": "Customs Broker", "domain": "Transportation & Logistics"},
{"name": "Traffic Engineer", "domain": "Transportation & Logistics"},
{"name": "Port Authority Officer", "domain": "Transportation & Logistics"},
{"name": "Physicist", "domain": "Scientific Research"},
{"name": "Chemist", "domain": "Scientific Research"},
{"name": "Biologist", "domain": "Scientific Research"},
{"name": "Astronomer", "domain": "Scientific Research"},
{"name": "Mathematician", "domain": "Scientific Research"},
{"name": "Statistician", "domain": "Scientific Research"},
{"name": "Geneticist", "domain": "Scientific Research"},
{"name": "Neuroscientist", "domain": "Scientific Research"},
{"name": "Oceanographer", "domain": "Scientific Research"},
{"name": "Archaeologist", "domain": "Scientific Research"},
{"name": "Anthropologist", "domain": "Scientific Research"},
{"name": "Sociologist", "domain": "Scientific Research"},
{"name": "Economist", "domain": "Scientific Research"},
{"name": "Historian", "domain": "Scientific Research"},
{"name": "Philosopher", "domain": "Scientific Research"}
]
}

View File

@@ -0,0 +1,216 @@
{
"metadata": {
"source": "curated",
"language": "zh",
"created_at": "2025-12-04",
"total_count": 210,
"description": "精選常見專家職業,含具體專業領域"
},
"occupations": [
{"name": "外科醫師", "domain": "醫療與健康"},
{"name": "內科醫師", "domain": "醫療與健康"},
{"name": "牙醫師", "domain": "醫療與健康"},
{"name": "眼科醫師", "domain": "醫療與健康"},
{"name": "精神科醫師", "domain": "醫療與健康"},
{"name": "小兒科醫師", "domain": "醫療與健康"},
{"name": "護理師", "domain": "醫療與健康"},
{"name": "藥師", "domain": "醫療與健康"},
{"name": "臨床心理師", "domain": "醫療與健康"},
{"name": "物理治療師", "domain": "醫療與健康"},
{"name": "職能治療師", "domain": "醫療與健康"},
{"name": "營養師", "domain": "醫療與健康"},
{"name": "中醫師", "domain": "醫療與健康"},
{"name": "獸醫師", "domain": "醫療與健康"},
{"name": "軟體工程師", "domain": "資訊科技"},
{"name": "前端工程師", "domain": "資訊科技"},
{"name": "後端工程師", "domain": "資訊科技"},
{"name": "資料科學家", "domain": "資訊科技"},
{"name": "資料工程師", "domain": "資訊科技"},
{"name": "機器學習工程師", "domain": "資訊科技"},
{"name": "資安工程師", "domain": "資訊科技"},
{"name": "DevOps工程師", "domain": "資訊科技"},
{"name": "UI設計師", "domain": "資訊科技"},
{"name": "UX設計師", "domain": "資訊科技"},
{"name": "產品經理", "domain": "資訊科技"},
{"name": "系統分析師", "domain": "資訊科技"},
{"name": "網路工程師", "domain": "資訊科技"},
{"name": "雲端架構師", "domain": "資訊科技"},
{"name": "會計師", "domain": "金融與商業"},
{"name": "財務分析師", "domain": "金融與商業"},
{"name": "投資經理", "domain": "金融與商業"},
{"name": "風險管理師", "domain": "金融與商業"},
{"name": "精算師", "domain": "金融與商業"},
{"name": "銀行經理", "domain": "金融與商業"},
{"name": "證券分析師", "domain": "金融與商業"},
{"name": "稅務顧問", "domain": "金融與商業"},
{"name": "企業顧問", "domain": "金融與商業"},
{"name": "人資經理", "domain": "金融與商業"},
{"name": "行銷經理", "domain": "金融與商業"},
{"name": "業務經理", "domain": "金融與商業"},
{"name": "採購經理", "domain": "金融與商業"},
{"name": "創業家", "domain": "金融與商業"},
{"name": "律師", "domain": "法律與政策"},
{"name": "法官", "domain": "法律與政策"},
{"name": "檢察官", "domain": "法律與政策"},
{"name": "公證人", "domain": "法律與政策"},
{"name": "法務專員", "domain": "法律與政策"},
{"name": "智財律師", "domain": "法律與政策"},
{"name": "政策分析師", "domain": "法律與政策"},
{"name": "外交官", "domain": "法律與政策"},
{"name": "公務員", "domain": "法律與政策"},
{"name": "立法委員", "domain": "法律與政策"},
{"name": "調解員", "domain": "法律與政策"},
{"name": "法律學者", "domain": "法律與政策"},
{"name": "大學教授", "domain": "教育與學術"},
{"name": "高中教師", "domain": "教育與學術"},
{"name": "國中教師", "domain": "教育與學術"},
{"name": "小學教師", "domain": "教育與學術"},
{"name": "幼教老師", "domain": "教育與學術"},
{"name": "特教老師", "domain": "教育與學術"},
{"name": "補習班老師", "domain": "教育與學術"},
{"name": "研究員", "domain": "教育與學術"},
{"name": "圖書館員", "domain": "教育與學術"},
{"name": "教育行政人員", "domain": "教育與學術"},
{"name": "學術編輯", "domain": "教育與學術"},
{"name": "教育顧問", "domain": "教育與學術"},
{"name": "語言治療師", "domain": "教育與學術"},
{"name": "畫家", "domain": "藝術與創意"},
{"name": "雕塑家", "domain": "藝術與創意"},
{"name": "音樂家", "domain": "藝術與創意"},
{"name": "作曲家", "domain": "藝術與創意"},
{"name": "指揮家", "domain": "藝術與創意"},
{"name": "舞蹈家", "domain": "藝術與創意"},
{"name": "演員", "domain": "藝術與創意"},
{"name": "導演", "domain": "藝術與創意"},
{"name": "編劇", "domain": "藝術與創意"},
{"name": "攝影師", "domain": "藝術與創意"},
{"name": "插畫家", "domain": "藝術與創意"},
{"name": "動畫師", "domain": "藝術與創意"},
{"name": "平面設計師", "domain": "藝術與創意"},
{"name": "時尚設計師", "domain": "藝術與創意"},
{"name": "珠寶設計師", "domain": "藝術與創意"},
{"name": "機械工程師", "domain": "工程與製造"},
{"name": "電機工程師", "domain": "工程與製造"},
{"name": "電子工程師", "domain": "工程與製造"},
{"name": "化學工程師", "domain": "工程與製造"},
{"name": "材料工程師", "domain": "工程與製造"},
{"name": "工業工程師", "domain": "工程與製造"},
{"name": "自動化工程師", "domain": "工程與製造"},
{"name": "品管工程師", "domain": "工程與製造"},
{"name": "製程工程師", "domain": "工程與製造"},
{"name": "研發工程師", "domain": "工程與製造"},
{"name": "生產經理", "domain": "工程與製造"},
{"name": "工廠廠長", "domain": "工程與製造"},
{"name": "技師", "domain": "工程與製造"},
{"name": "建築師", "domain": "建築與空間"},
{"name": "室內設計師", "domain": "建築與空間"},
{"name": "景觀設計師", "domain": "建築與空間"},
{"name": "都市規劃師", "domain": "建築與空間"},
{"name": "結構工程師", "domain": "建築與空間"},
{"name": "土木工程師", "domain": "建築與空間"},
{"name": "營造工程師", "domain": "建築與空間"},
{"name": "工地主任", "domain": "建築與空間"},
{"name": "測量師", "domain": "建築與空間"},
{"name": "建築繪圖員", "domain": "建築與空間"},
{"name": "展場設計師", "domain": "建築與空間"},
{"name": "燈光設計師", "domain": "建築與空間"},
{"name": "記者", "domain": "媒體與傳播"},
{"name": "主播", "domain": "媒體與傳播"},
{"name": "編輯", "domain": "媒體與傳播"},
{"name": "文字編輯", "domain": "媒體與傳播"},
{"name": "影片剪輯師", "domain": "媒體與傳播"},
{"name": "公關專員", "domain": "媒體與傳播"},
{"name": "廣告企劃", "domain": "媒體與傳播"},
{"name": "社群經理", "domain": "媒體與傳播"},
{"name": "內容創作者", "domain": "媒體與傳播"},
{"name": "播客主持人", "domain": "媒體與傳播"},
{"name": "出版人", "domain": "媒體與傳播"},
{"name": "翻譯師", "domain": "媒體與傳播"},
{"name": "口譯員", "domain": "媒體與傳播"},
{"name": "農藝師", "domain": "農業與環境"},
{"name": "園藝師", "domain": "農業與環境"},
{"name": "畜牧專家", "domain": "農業與環境"},
{"name": "水產養殖師", "domain": "農業與環境"},
{"name": "環境工程師", "domain": "農業與環境"},
{"name": "生態學家", "domain": "農業與環境"},
{"name": "森林保育員", "domain": "農業與環境"},
{"name": "氣象學家", "domain": "農業與環境"},
{"name": "地質學家", "domain": "農業與環境"},
{"name": "環保稽查員", "domain": "農業與環境"},
{"name": "永續發展顧問", "domain": "農業與環境"},
{"name": "有機農場主", "domain": "農業與環境"},
{"name": "主廚", "domain": "餐飲與服務"},
{"name": "西點師傅", "domain": "餐飲與服務"},
{"name": "調酒師", "domain": "餐飲與服務"},
{"name": "侍酒師", "domain": "餐飲與服務"},
{"name": "餐廳經理", "domain": "餐飲與服務"},
{"name": "飯店經理", "domain": "餐飲與服務"},
{"name": "旅遊規劃師", "domain": "餐飲與服務"},
{"name": "導遊", "domain": "餐飲與服務"},
{"name": "咖啡師", "domain": "餐飲與服務"},
{"name": "美食評論家", "domain": "餐飲與服務"},
{"name": "婚禮策劃師", "domain": "餐飲與服務"},
{"name": "活動企劃", "domain": "餐飲與服務"},
{"name": "運動教練", "domain": "運動與健身"},
{"name": "健身教練", "domain": "運動與健身"},
{"name": "瑜珈老師", "domain": "運動與健身"},
{"name": "運動防護員", "domain": "運動與健身"},
{"name": "體育老師", "domain": "運動與健身"},
{"name": "運動心理師", "domain": "運動與健身"},
{"name": "運動營養師", "domain": "運動與健身"},
{"name": "職業運動員", "domain": "運動與健身"},
{"name": "裁判", "domain": "運動與健身"},
{"name": "體能訓練師", "domain": "運動與健身"},
{"name": "運動經紀人", "domain": "運動與健身"},
{"name": "社工師", "domain": "社會服務"},
{"name": "心理諮商師", "domain": "社會服務"},
{"name": "輔導員", "domain": "社會服務"},
{"name": "志工協調員", "domain": "社會服務"},
{"name": "非營利組織經理", "domain": "社會服務"},
{"name": "社區營造員", "domain": "社會服務"},
{"name": "長照服務員", "domain": "社會服務"},
{"name": "青少年輔導員", "domain": "社會服務"},
{"name": "家庭治療師", "domain": "社會服務"},
{"name": "職涯諮詢師", "domain": "社會服務"},
{"name": "戒癮輔導員", "domain": "社會服務"},
{"name": "飛行員", "domain": "交通與物流"},
{"name": "船長", "domain": "交通與物流"},
{"name": "火車駕駛", "domain": "交通與物流"},
{"name": "航空管制員", "domain": "交通與物流"},
{"name": "物流經理", "domain": "交通與物流"},
{"name": "供應鏈經理", "domain": "交通與物流"},
{"name": "倉儲經理", "domain": "交通與物流"},
{"name": "報關員", "domain": "交通與物流"},
{"name": "交通工程師", "domain": "交通與物流"},
{"name": "港務人員", "domain": "交通與物流"},
{"name": "物理學家", "domain": "科學研究"},
{"name": "化學家", "domain": "科學研究"},
{"name": "生物學家", "domain": "科學研究"},
{"name": "天文學家", "domain": "科學研究"},
{"name": "數學家", "domain": "科學研究"},
{"name": "統計學家", "domain": "科學研究"},
{"name": "基因學家", "domain": "科學研究"},
{"name": "神經科學家", "domain": "科學研究"},
{"name": "海洋學家", "domain": "科學研究"},
{"name": "考古學家", "domain": "科學研究"},
{"name": "人類學家", "domain": "科學研究"},
{"name": "社會學家", "domain": "科學研究"},
{"name": "經濟學家", "domain": "科學研究"},
{"name": "歷史學家", "domain": "科學研究"},
{"name": "哲學家", "domain": "科學研究"}
]
}

File diff suppressed because it is too large Load Diff

View File

@@ -209,8 +209,9 @@ class ExpertTransformationDAGResult(BaseModel):
class ExpertSource(str, Enum): class ExpertSource(str, Enum):
"""專家來源類型""" """專家來源類型"""
LLM = "llm" LLM = "llm"
CURATED = "curated" # 精選職業210筆含具體領域
DBPEDIA = "dbpedia"
WIKIDATA = "wikidata" WIKIDATA = "wikidata"
CONCEPTNET = "conceptnet"
class ExpertTransformationRequest(BaseModel): class ExpertTransformationRequest(BaseModel):
@@ -226,7 +227,7 @@ class ExpertTransformationRequest(BaseModel):
# Expert source parameters # Expert source parameters
expert_source: ExpertSource = ExpertSource.LLM # 專家來源 expert_source: ExpertSource = ExpertSource.LLM # 專家來源
expert_language: str = "zh" # 外部來源的語言 expert_language: str = "en" # 外部來源的語言 (目前只有英文資料)
# LLM parameters # LLM parameters
model: Optional[str] = None model: Optional[str] = None

View File

@@ -53,19 +53,32 @@ def get_expert_keyword_generation_prompt(
keywords_per_expert: int = 1 keywords_per_expert: int = 1
) -> str: ) -> str:
"""Step 1: 專家視角關鍵字生成""" """Step 1: 專家視角關鍵字生成"""
experts_info = ", ".join([f"{exp['id']}:{exp['name']}({exp['domain']})" for exp in experts]) # 建立專家列表,格式更清晰
experts_list = "\n".join([f"- {exp['id']}: {exp['name']}" for exp in experts])
return f"""/no_think return f"""/no_think
專家團隊:{experts_info} 你需要扮演以下專家,為屬性生成創新關鍵字:
屬性:「{attribute}」({category}
每位專家從自己的專業視角為此屬性生成 {keywords_per_expert} 個創新關鍵字2-6字 【專家名單】
關鍵字要反映該專家領域的獨特思考方式。 {experts_list}
【任務】
屬性:「{attribute}」(類別:{category}
請為每位專家:
1. 先理解該職業的專業背景、知識領域、工作內容
2. 從該職業的獨特視角思考「{attribute}
3. 生成 {keywords_per_expert} 個與該專業相關的創新關鍵字2-6字
關鍵字必須反映該專家的專業思維方式,例如:
- 會計師 看「移動」→「資金流動」「成本效益」
- 建築師 看「移動」→「動線設計」「空間流動」
- 心理師 看「移動」→「行為動機」「情緒轉變」
回傳 JSON 回傳 JSON
{{"keywords": [{{"keyword": "詞彙", "expert_id": "expert-X", "expert_name": "名稱"}}, ...]}} {{"keywords": [{{"keyword": "詞彙", "expert_id": "expert-X", "expert_name": "名稱"}}, ...]}}
共需 {len(experts) * keywords_per_expert} 個關鍵字。""" 共需 {len(experts) * keywords_per_expert} 個關鍵字,每個關鍵字必須明顯與對應專家的專業領域相關"""
def get_single_description_prompt( def get_single_description_prompt(
@@ -76,12 +89,17 @@ def get_single_description_prompt(
expert_domain: str expert_domain: str
) -> str: ) -> str:
"""Step 2: 為單一關鍵字生成描述""" """Step 2: 為單一關鍵字生成描述"""
# 如果 domain 是通用的,就只用職業名稱
domain_text = f"{expert_domain}" if expert_domain and expert_domain != "Professional Field" else ""
return f"""/no_think return f"""/no_think
物件:「{query} 物件:「{query}
專家:{expert_name}{expert_domain} 專家:{expert_name}{domain_text}
關鍵字:{keyword} 關鍵字:{keyword}
從這位專家的視角生成一段創新應用描述15-30字說明如何將「{keyword}」的概念應用到「{query}」上。 你是一位{expert_name}。從你的專業視角生成一段創新應用描述15-30字說明如何將「{keyword}」的概念應用到「{query}」上。
描述要體現{expert_name}的專業思維和獨特觀點。
回傳 JSON 回傳 JSON
{{"description": "應用描述"}}""" {{"description": "應用描述"}}"""

View File

@@ -37,16 +37,29 @@ async def generate_expert_transformation_events(
model = request.model model = request.model
# ========== Step 0: Generate expert team ========== # ========== Step 0: Generate expert team ==========
logger.info(f"[DEBUG] expert_source from request: {request.expert_source}")
logger.info(f"[DEBUG] expert_source value: {request.expert_source.value}")
logger.info(f"[DEBUG] custom_experts: {request.custom_experts}")
yield f"event: expert_start\ndata: {json.dumps({'message': '正在組建專家團隊...', 'source': request.expert_source.value}, ensure_ascii=False)}\n\n" yield f"event: expert_start\ndata: {json.dumps({'message': '正在組建專家團隊...', 'source': request.expert_source.value}, ensure_ascii=False)}\n\n"
experts: List[ExpertProfile] = [] experts: List[ExpertProfile] = []
actual_source = request.expert_source.value actual_source = request.expert_source.value
# 過濾出實際有內容的自訂專家(排除空字串)
actual_custom_experts = [
e.strip() for e in (request.custom_experts or [])
if e and e.strip()
]
logger.info(f"[DEBUG] actual_custom_experts (filtered): {actual_custom_experts}")
# 決定使用哪種來源生成專家 # 決定使用哪種來源生成專家
# 只有在明確選擇 LLM 或有實際自訂專家時才使用 LLM
use_llm = ( use_llm = (
request.expert_source == ExpertSource.LLM or request.expert_source == ExpertSource.LLM or
request.custom_experts # 有自訂專家時,使用 LLM 補充 len(actual_custom_experts) > 0 # 有實際自訂專家時,使用 LLM 補充
) )
logger.info(f"[DEBUG] use_llm decision: {use_llm}")
if use_llm: if use_llm:
# LLM 生成專家 # LLM 生成專家
@@ -55,7 +68,7 @@ async def generate_expert_transformation_events(
query=request.query, query=request.query,
categories=all_categories, categories=all_categories,
expert_count=request.expert_count, expert_count=request.expert_count,
custom_experts=request.custom_experts custom_experts=actual_custom_experts if actual_custom_experts else None
) )
logger.info(f"Expert prompt: {expert_prompt[:200]}") logger.info(f"Expert prompt: {expert_prompt[:200]}")
@@ -78,9 +91,9 @@ async def generate_expert_transformation_events(
yield f"event: error\ndata: {json.dumps({'error': f'專家團隊生成失敗: {str(e)}'}, ensure_ascii=False)}\n\n" yield f"event: error\ndata: {json.dumps({'error': f'專家團隊生成失敗: {str(e)}'}, ensure_ascii=False)}\n\n"
return return
else: else:
# 外部來源生成專家 # 外部來源生成專家 (本地檔案,同步)
try: try:
experts_data, actual_source = await expert_source_service.get_experts( experts_data, actual_source = expert_source_service.get_experts(
source=request.expert_source.value, source=request.expert_source.value,
count=request.expert_count, count=request.expert_count,
language=request.expert_language language=request.expert_language
@@ -106,7 +119,7 @@ async def generate_expert_transformation_events(
query=request.query, query=request.query,
categories=all_categories, categories=all_categories,
expert_count=request.expert_count, expert_count=request.expert_count,
custom_experts=request.custom_experts custom_experts=actual_custom_experts if actual_custom_experts else None
) )
expert_response = await ollama_provider.generate( expert_response = await ollama_provider.generate(

View File

@@ -1,92 +0,0 @@
"""Expert 資料快取模組
提供 TTL-based 快取機制,減少外部 API 呼叫。
"""
import time
from dataclasses import dataclass
from typing import Dict, List, Optional
@dataclass
class CacheEntry:
"""快取項目"""
data: List[dict]
timestamp: float
class ExpertCache:
"""TTL 快取,用於儲存外部來源的職業資料"""
def __init__(self, ttl_seconds: int = 3600):
"""
初始化快取
Args:
ttl_seconds: 快取存活時間(預設 1 小時)
"""
self._cache: Dict[str, CacheEntry] = {}
self._ttl = ttl_seconds
def get(self, key: str) -> Optional[List[dict]]:
"""
取得快取資料
Args:
key: 快取鍵(如 "wikidata:zh:occupations"
Returns:
快取的資料列表,若不存在或已過期則回傳 None
"""
entry = self._cache.get(key)
if entry is None:
return None
# 檢查是否過期
if time.time() - entry.timestamp > self._ttl:
del self._cache[key]
return None
return entry.data
def set(self, key: str, data: List[dict]) -> None:
"""
設定快取資料
Args:
key: 快取鍵
data: 要快取的資料列表
"""
self._cache[key] = CacheEntry(
data=data,
timestamp=time.time()
)
def invalidate(self, key: Optional[str] = None) -> None:
"""
清除快取
Args:
key: 要清除的鍵,若為 None 則清除全部
"""
if key is None:
self._cache.clear()
elif key in self._cache:
del self._cache[key]
def get_stats(self) -> dict:
"""取得快取統計資訊"""
now = time.time()
valid_count = sum(
1 for entry in self._cache.values()
if now - entry.timestamp <= self._ttl
)
return {
"total_entries": len(self._cache),
"valid_entries": valid_count,
"ttl_seconds": self._ttl
}
# 全域快取實例
expert_cache = ExpertCache()

View File

@@ -1,293 +1,111 @@
"""Expert 外部資料來源服務 """Expert 本地資料來源服務
提供從 Wikidata SPARQL 和 ConceptNet API 獲取職業/領域資料的功能。 從本地 JSON 檔案讀取職業資料,提供隨機選取功能。
""" """
import json
import logging import logging
import random import random
from abc import ABC, abstractmethod from pathlib import Path
from typing import List, Optional, Tuple from typing import List, Tuple
import httpx
from .expert_cache import expert_cache
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# 資料目錄
DATA_DIR = Path(__file__).parent.parent / "data"
class ExpertSourceProvider(ABC):
"""外部來源提供者抽象類"""
@abstractmethod class LocalDataProvider:
async def fetch_occupations( """從本地 JSON 檔案讀取職業資料"""
self, count: int, language: str = "zh"
) -> List[dict]: def __init__(self, source: str):
""" """
獲取職業列表 Args:
source: 資料來源名稱 (dbpedia/wikidata)
"""
self.source = source
self._cache: dict = {} # 記憶體快取
def load_occupations(self, language: str = "en") -> List[dict]:
"""
載入職業資料
Args: Args:
count: 需要的職業數量 language: 語言代碼 (en/zh)
language: 語言代碼 (zh/en)
Returns: Returns:
職業資料列表 [{"name": "...", "domain": "..."}, ...] 職業列表 [{"name": "...", "domain": "..."}, ...]
""" """
pass cache_key = f"{self.source}:{language}"
# 檢查記憶體快取
if cache_key in self._cache:
return self._cache[cache_key]
class WikidataProvider(ExpertSourceProvider): # 讀取檔案
"""Wikidata SPARQL 查詢提供者""" file_path = DATA_DIR / f"{self.source}_occupations_{language}.json"
ENDPOINT = "https://query.wikidata.org/sparql" if not file_path.exists():
logger.warning(f"資料檔案不存在: {file_path}")
def __init__(self): return []
self.client = httpx.AsyncClient(timeout=30.0)
async def fetch_occupations(
self, count: int, language: str = "zh"
) -> List[dict]:
"""從 Wikidata 獲取職業列表"""
cache_key = f"wikidata:{language}:occupations"
# 檢查快取
cached = expert_cache.get(cache_key)
if cached:
logger.info(f"Wikidata cache hit: {len(cached)} occupations")
return self._random_select(cached, count)
# SPARQL 查詢
query = self._build_sparql_query(language)
try: try:
response = await self.client.get( with open(file_path, "r", encoding="utf-8") as f:
self.ENDPOINT, data = json.load(f)
params={"query": query, "format": "json"},
headers={"Accept": "application/sparql-results+json"}
)
response.raise_for_status()
data = response.json() occupations = data.get("occupations", [])
occupations = self._parse_sparql_response(data, language) logger.info(f"載入 {len(occupations)}{self.source} {language} 職業")
if occupations: # 存入快取
expert_cache.set(cache_key, occupations) self._cache[cache_key] = occupations
logger.info(f"Wikidata fetched: {len(occupations)} occupations") return occupations
return self._random_select(occupations, count)
except Exception as e: except Exception as e:
logger.error(f"Wikidata query failed: {e}") logger.error(f"讀取職業資料失敗: {e}")
raise return []
def _build_sparql_query(self, language: str) -> str: def random_select(self, count: int, language: str = "en") -> List[dict]:
"""建構 SPARQL 查詢"""
lang_filter = f'FILTER(LANG(?occupationLabel) = "{language}")'
return f"""
SELECT DISTINCT ?occupation ?occupationLabel ?fieldLabel WHERE {{
?occupation wdt:P31 wd:Q28640.
?occupation rdfs:label ?occupationLabel.
{lang_filter}
OPTIONAL {{
?occupation wdt:P425 ?field.
?field rdfs:label ?fieldLabel.
FILTER(LANG(?fieldLabel) = "{language}")
}}
}}
LIMIT 500
""" """
隨機選取指定數量的職業
def _parse_sparql_response(self, data: dict, language: str) -> List[dict]: Args:
"""解析 SPARQL 回應""" count: 需要的數量
results = [] language: 語言代碼
bindings = data.get("results", {}).get("bindings", [])
for item in bindings: Returns:
name = item.get("occupationLabel", {}).get("value", "") 隨機選取的職業列表
field = item.get("fieldLabel", {}).get("value", "") """
all_occupations = self.load_occupations(language)
if name and len(name) >= 2: if not all_occupations:
results.append({ return []
"name": name,
"domain": field if field else self._infer_domain(name)
})
return results if len(all_occupations) <= count:
return all_occupations
def _infer_domain(self, occupation_name: str) -> str: return random.sample(all_occupations, count)
"""根據職業名稱推斷領域"""
# 簡單的領域推斷規則
domain_keywords = {
"": "醫療健康",
"": "專業服務",
"工程": "工程技術",
"設計": "設計創意",
"藝術": "藝術文化",
"運動": "體育運動",
"": "農業",
"": "漁業",
"": "商業貿易",
"": "法律",
"": "教育",
"研究": "學術研究",
}
for keyword, domain in domain_keywords.items():
if keyword in occupation_name:
return domain
return "專業領域"
def _random_select(self, items: List[dict], count: int) -> List[dict]:
"""隨機選取指定數量"""
if len(items) <= count:
return items
return random.sample(items, count)
async def close(self):
await self.client.aclose()
class ConceptNetProvider(ExpertSourceProvider):
"""ConceptNet API 查詢提供者"""
ENDPOINT = "https://api.conceptnet.io"
def __init__(self):
self.client = httpx.AsyncClient(timeout=30.0)
async def fetch_occupations(
self, count: int, language: str = "zh"
) -> List[dict]:
"""從 ConceptNet 獲取職業相關概念"""
cache_key = f"conceptnet:{language}:occupations"
# 檢查快取
cached = expert_cache.get(cache_key)
if cached:
logger.info(f"ConceptNet cache hit: {len(cached)} concepts")
return self._random_select(cached, count)
# 查詢職業相關概念
lang_code = "zh" if language == "zh" else "en"
start_concept = f"/c/{lang_code}/職業" if lang_code == "zh" else f"/c/{lang_code}/occupation"
try:
occupations = []
# 查詢 IsA 關係
response = await self.client.get(
f"{self.ENDPOINT}/query",
params={
"start": start_concept,
"rel": "/r/IsA",
"limit": 100
}
)
response.raise_for_status()
data = response.json()
occupations.extend(self._parse_conceptnet_response(data, lang_code))
# 也查詢 RelatedTo 關係以獲取更多結果
response2 = await self.client.get(
f"{self.ENDPOINT}/query",
params={
"node": start_concept,
"rel": "/r/RelatedTo",
"limit": 100
}
)
response2.raise_for_status()
data2 = response2.json()
occupations.extend(self._parse_conceptnet_response(data2, lang_code))
# 去重
seen = set()
unique_occupations = []
for occ in occupations:
if occ["name"] not in seen:
seen.add(occ["name"])
unique_occupations.append(occ)
if unique_occupations:
expert_cache.set(cache_key, unique_occupations)
logger.info(f"ConceptNet fetched: {len(unique_occupations)} concepts")
return self._random_select(unique_occupations, count)
except Exception as e:
logger.error(f"ConceptNet query failed: {e}")
raise
def _parse_conceptnet_response(self, data: dict, lang_code: str) -> List[dict]:
"""解析 ConceptNet 回應"""
results = []
edges = data.get("edges", [])
for edge in edges:
# 取得 start 或 end 節點(取決於查詢方向)
start = edge.get("start", {})
end = edge.get("end", {})
# 選擇非起始節點的概念
node = end if start.get("@id", "").endswith("職業") or start.get("@id", "").endswith("occupation") else start
label = node.get("label", "")
term = node.get("term", "")
# 過濾:確保是目標語言且有意義
node_id = node.get("@id", "")
if f"/c/{lang_code}/" in node_id and label and len(label) >= 2:
results.append({
"name": label,
"domain": self._infer_domain_from_edge(edge)
})
return results
def _infer_domain_from_edge(self, edge: dict) -> str:
"""從 edge 資訊推斷領域"""
# ConceptNet 的 edge 包含 surfaceText 可能有額外資訊
surface = edge.get("surfaceText", "")
rel = edge.get("rel", {}).get("label", "")
if "專業" in surface:
return "專業領域"
elif "技術" in surface:
return "技術領域"
else:
return "知識領域"
def _random_select(self, items: List[dict], count: int) -> List[dict]:
"""隨機選取指定數量"""
if len(items) <= count:
return items
return random.sample(items, count)
async def close(self):
await self.client.aclose()
class ExpertSourceService: class ExpertSourceService:
"""統一的專家來源服務""" """統一的專家來源服務"""
def __init__(self): def __init__(self):
self.wikidata = WikidataProvider() self.curated = LocalDataProvider("curated") # 精選職業
self.conceptnet = ConceptNetProvider() self.dbpedia = LocalDataProvider("dbpedia")
self.wikidata = LocalDataProvider("wikidata")
async def get_experts( def get_experts(
self, self,
source: str, source: str,
count: int, count: int,
language: str = "zh", language: str = "en",
fallback_to_llm: bool = True fallback_to_llm: bool = True
) -> Tuple[List[dict], str]: ) -> Tuple[List[dict], str]:
""" """
從指定來源獲取專家資料 從指定來源獲取專家資料
Args: Args:
source: 來源類型 ("wikidata" | "conceptnet") source: 來源類型 ("dbpedia" | "wikidata")
count: 需要的專家數量 count: 需要的專家數量
language: 語言代碼 language: 語言代碼
fallback_to_llm: 失敗時是否允許 fallback由呼叫者處理 fallback_to_llm: 失敗時是否允許 fallback由呼叫者處理
@@ -296,35 +114,87 @@ class ExpertSourceService:
(專家資料列表, 實際使用的來源) (專家資料列表, 實際使用的來源)
Raises: Raises:
Exception: 當獲取失敗且不 fallback ValueError: 當獲取失敗且資料為空
""" """
provider = self._get_provider(source) # 選擇 provider
if source == "curated":
try: provider = self.curated
experts = await provider.fetch_occupations(count, language) # 精選職業支援 zh 和 en預設使用 zh
if language not in ["zh", "en"]:
if not experts: language = "zh"
raise ValueError(f"No occupations found from {source}") elif source == "wikidata":
provider = self.wikidata
return experts, source
except Exception as e:
logger.warning(f"Failed to fetch from {source}: {e}")
raise
def _get_provider(self, source: str) -> ExpertSourceProvider:
"""根據來源類型取得對應的 provider"""
if source == "wikidata":
return self.wikidata
elif source == "conceptnet":
return self.conceptnet
else: else:
raise ValueError(f"Unknown source: {source}") # 預設使用 dbpedia
provider = self.dbpedia
source = "dbpedia"
async def close(self): experts = provider.random_select(count, language)
"""關閉所有 HTTP clients"""
await self.wikidata.close() if not experts:
await self.conceptnet.close() raise ValueError(f"No occupations found from {source} ({language})")
logger.info(f"{source} 取得 {len(experts)} 位專家")
return experts, source
def get_available_sources(self) -> List[dict]:
"""
取得可用的資料來源資訊
Returns:
來源資訊列表
"""
sources = []
# 檢查精選職業(中文)
curated_zh = DATA_DIR / "curated_occupations_zh.json"
if curated_zh.exists():
with open(curated_zh, "r", encoding="utf-8") as f:
data = json.load(f)
sources.append({
"source": "curated",
"language": "zh",
"count": data["metadata"]["total_count"],
"created_at": data["metadata"]["created_at"]
})
# 檢查精選職業(英文)
curated_en = DATA_DIR / "curated_occupations_en.json"
if curated_en.exists():
with open(curated_en, "r", encoding="utf-8") as f:
data = json.load(f)
sources.append({
"source": "curated",
"language": "en",
"count": data["metadata"]["total_count"],
"created_at": data["metadata"]["created_at"]
})
# 檢查 DBpedia
dbpedia_en = DATA_DIR / "dbpedia_occupations_en.json"
if dbpedia_en.exists():
with open(dbpedia_en, "r", encoding="utf-8") as f:
data = json.load(f)
sources.append({
"source": "dbpedia",
"language": "en",
"count": data["metadata"]["total_count"],
"fetched_at": data["metadata"]["fetched_at"]
})
# 檢查 Wikidata
wikidata_zh = DATA_DIR / "wikidata_occupations_zh.json"
if wikidata_zh.exists():
with open(wikidata_zh, "r", encoding="utf-8") as f:
data = json.load(f)
sources.append({
"source": "wikidata",
"language": "zh",
"count": data["metadata"]["total_count"],
"fetched_at": data["metadata"]["fetched_at"]
})
return sources
# 全域服務實例 # 全域服務實例

View File

@@ -0,0 +1,386 @@
#!/usr/bin/env python3
"""
職業資料抓取腳本
從 Wikidata SPARQL 和 ConceptNet API 抓取職業資料,
儲存為本地 JSON 檔案供應用程式使用。
使用方式:
cd backend
python scripts/fetch_occupations.py
"""
import json
import sys
from datetime import datetime, timezone
from pathlib import Path
from typing import List
import httpx
# 輸出目錄
DATA_DIR = Path(__file__).parent.parent / "app" / "data"
def fetch_wikidata_occupations(language: str) -> List[dict]:
"""
從 Wikidata SPARQL 端點抓取所有職業(使用分頁)
Args:
language: 語言代碼 (zh/en)
Returns:
職業列表 [{"name": "...", "domain": "..."}, ...]
"""
print(f"[Wikidata] 正在抓取 {language} 職業資料(分頁模式)...")
endpoint = "https://query.wikidata.org/sparql"
page_size = 500 # 每頁筆數
all_bindings = []
offset = 0
try:
with httpx.Client(timeout=120.0) as client:
while True:
# SPARQL 查詢 - 使用 SERVICE wikibase:label (更高效)
query = f"""
SELECT DISTINCT ?occupation ?occupationLabel ?fieldLabel WHERE {{
?occupation wdt:P31 wd:Q28640.
OPTIONAL {{ ?occupation wdt:P425 ?field. }}
SERVICE wikibase:label {{ bd:serviceParam wikibase:language "{language},en". }}
}}
LIMIT {page_size}
OFFSET {offset}
"""
print(f"[Wikidata] 抓取第 {offset // page_size + 1} 頁 (offset={offset})...")
response = client.get(
endpoint,
params={"query": query, "format": "json"},
headers={
"Accept": "application/sparql-results+json",
"User-Agent": "NoveltySeeking/1.0",
},
)
response.raise_for_status()
data = response.json()
bindings = data.get("results", {}).get("bindings", [])
print(f"[Wikidata] 取得 {len(bindings)}")
if not bindings:
# 沒有更多資料了
break
all_bindings.extend(bindings)
offset += page_size
# 如果取得的筆數少於 page_size表示已經是最後一頁
if len(bindings) < page_size:
break
print(f"[Wikidata] 總共取得 {len(all_bindings)} 筆原始資料")
# 解析回應
occupations = []
for item in all_bindings:
name = item.get("occupationLabel", {}).get("value", "")
field = item.get("fieldLabel", {}).get("value", "")
if name and len(name) >= 2:
occupations.append({
"name": name,
"domain": field if field else infer_domain(name, language),
})
# 去重
seen = set()
unique = []
for occ in occupations:
if occ["name"] not in seen:
seen.add(occ["name"])
unique.append(occ)
print(f"[Wikidata] 去重後: {len(unique)} 筆職業")
return unique
except Exception as e:
print(f"[Wikidata] 錯誤: {e}")
raise
def fetch_conceptnet_occupations(language: str) -> List[dict]:
"""
從 ConceptNet API 抓取職業相關概念(使用分頁)
Args:
language: 語言代碼 (zh/en)
Returns:
職業列表 [{"name": "...", "domain": "..."}, ...]
"""
print(f"[ConceptNet] 正在抓取 {language} 職業資料(分頁模式)...")
endpoint = "https://api.conceptnet.io"
lang_code = language
page_size = 100 # ConceptNet 建議的 limit
# 起始概念
start_concepts = {
"zh": ["/c/zh/職業", "/c/zh/專業", "/c/zh/工作", "/c/zh/職務"],
"en": ["/c/en/occupation", "/c/en/profession", "/c/en/job", "/c/en/career"],
}
# 要查詢的關係類型
relations = ["/r/IsA", "/r/RelatedTo", "/r/HasA", "/r/AtLocation"]
all_occupations = []
try:
with httpx.Client(timeout=60.0) as client:
for concept in start_concepts.get(lang_code, start_concepts["zh"]):
for rel in relations:
offset = 0
max_pages = 5 # 每個組合最多抓 5 頁
for page in range(max_pages):
try:
print(f"[ConceptNet] 查詢 {concept} {rel} (offset={offset})...")
# 查詢 start 參數
response = client.get(
f"{endpoint}/query",
params={
"start": concept,
"rel": rel,
"limit": page_size,
"offset": offset,
},
)
if response.status_code != 200:
print(f"[ConceptNet] HTTP {response.status_code}, 跳過")
break
data = response.json()
edges = data.get("edges", [])
if not edges:
break
parsed = parse_conceptnet_response(data, lang_code)
all_occupations.extend(parsed)
print(f"[ConceptNet] 取得 {len(parsed)}")
if len(edges) < page_size:
break
offset += page_size
except Exception as e:
print(f"[ConceptNet] 錯誤: {e}")
break
# 去重
seen = set()
unique = []
for occ in all_occupations:
if occ["name"] not in seen:
seen.add(occ["name"])
unique.append(occ)
print(f"[ConceptNet] 去重後: {len(unique)} 筆概念")
return unique
except Exception as e:
print(f"[ConceptNet] 錯誤: {e}")
raise
def parse_conceptnet_response(data: dict, lang_code: str) -> List[dict]:
"""解析 ConceptNet API 回應"""
results = []
edges = data.get("edges", [])
for edge in edges:
start = edge.get("start", {})
end = edge.get("end", {})
# 嘗試從兩端取得有意義的概念
for node in [start, end]:
node_id = node.get("@id", "")
label = node.get("label", "")
# 過濾:確保是目標語言且有意義
if f"/c/{lang_code}/" in node_id and label and len(label) >= 2:
# 排除過於泛用的詞
if label not in ["職業", "工作", "專業", "occupation", "job", "profession"]:
results.append({
"name": label,
"domain": infer_domain(label, lang_code),
})
return results
def infer_domain(occupation_name: str, language: str) -> str:
"""根據職業名稱推斷領域"""
if language == "zh":
domain_keywords = {
"": "醫療健康",
"": "醫療健康",
"": "醫療健康",
"": "專業服務",
"工程": "工程技術",
"技術": "工程技術",
"設計": "設計創意",
"藝術": "藝術文化",
"音樂": "藝術文化",
"運動": "體育運動",
"": "農業",
"": "漁業",
"": "商業貿易",
"": "商業貿易",
"": "法律",
"": "法律",
"": "教育",
"研究": "學術研究",
"科學": "學術研究",
"": "餐飲服務",
"": "餐飲服務",
"建築": "建築營造",
"": "軍事國防",
"": "公共安全",
"消防": "公共安全",
"記者": "媒體傳播",
"編輯": "媒體傳播",
"作家": "文學創作",
"程式": "資訊科技",
"軟體": "資訊科技",
"電腦": "資訊科技",
}
else:
domain_keywords = {
"doctor": "Healthcare",
"nurse": "Healthcare",
"medical": "Healthcare",
"engineer": "Engineering",
"technical": "Engineering",
"design": "Design & Creative",
"artist": "Arts & Culture",
"music": "Arts & Culture",
"sport": "Sports",
"athletic": "Sports",
"farm": "Agriculture",
"fish": "Fishery",
"business": "Business",
"sales": "Business",
"law": "Legal",
"attorney": "Legal",
"teach": "Education",
"professor": "Education",
"research": "Academic Research",
"scien": "Academic Research",
"chef": "Culinary",
"cook": "Culinary",
"architect": "Architecture",
"military": "Military",
"police": "Public Safety",
"fire": "Public Safety",
"journal": "Media",
"editor": "Media",
"writer": "Literature",
"author": "Literature",
"program": "Information Technology",
"software": "Information Technology",
"computer": "Information Technology",
"develop": "Information Technology",
}
name_lower = occupation_name.lower()
for keyword, domain in domain_keywords.items():
if keyword in name_lower:
return domain
return "專業領域" if language == "zh" else "Professional Field"
def save_json(data: List[dict], source: str, language: str) -> None:
"""儲存資料到 JSON 檔案"""
filename = f"{source}_occupations_{language}.json"
filepath = DATA_DIR / filename
output = {
"metadata": {
"source": source,
"language": language,
"fetched_at": datetime.now(timezone.utc).isoformat(),
"total_count": len(data),
},
"occupations": data,
}
with open(filepath, "w", encoding="utf-8") as f:
json.dump(output, f, ensure_ascii=False, indent=2)
print(f"[儲存] {filepath} ({len(data)} 筆)")
def main():
"""主程式"""
print("=" * 60)
print("職業資料抓取腳本")
print(f"輸出目錄: {DATA_DIR}")
print("=" * 60)
print()
# 確保輸出目錄存在
DATA_DIR.mkdir(parents=True, exist_ok=True)
# 抓取 Wikidata
print("--- Wikidata ---")
try:
wikidata_zh = fetch_wikidata_occupations("zh")
save_json(wikidata_zh, "wikidata", "zh")
except Exception as e:
print(f"Wikidata 中文抓取失敗: {e}")
wikidata_zh = []
try:
wikidata_en = fetch_wikidata_occupations("en")
save_json(wikidata_en, "wikidata", "en")
except Exception as e:
print(f"Wikidata 英文抓取失敗: {e}")
wikidata_en = []
print()
# 抓取 ConceptNet
print("--- ConceptNet ---")
try:
conceptnet_zh = fetch_conceptnet_occupations("zh")
save_json(conceptnet_zh, "conceptnet", "zh")
except Exception as e:
print(f"ConceptNet 中文抓取失敗: {e}")
conceptnet_zh = []
try:
conceptnet_en = fetch_conceptnet_occupations("en")
save_json(conceptnet_en, "conceptnet", "en")
except Exception as e:
print(f"ConceptNet 英文抓取失敗: {e}")
conceptnet_en = []
print()
print("=" * 60)
print("抓取完成!")
print(f" Wikidata 中文: {len(wikidata_zh)}")
print(f" Wikidata 英文: {len(wikidata_en)}")
print(f" ConceptNet 中文: {len(conceptnet_zh)}")
print(f" ConceptNet 英文: {len(conceptnet_en)}")
print("=" * 60)
if __name__ == "__main__":
main()

View File

@@ -7,8 +7,9 @@ const { Title, Text } = Typography;
const EXPERT_SOURCE_OPTIONS = [ const EXPERT_SOURCE_OPTIONS = [
{ label: 'LLM 生成', value: 'llm' as ExpertSource, description: '使用 AI 模型生成專家' }, { label: 'LLM 生成', value: 'llm' as ExpertSource, description: '使用 AI 模型生成專家' },
{ label: 'Wikidata', value: 'wikidata' as ExpertSource, description: '從維基數據查詢職業' }, { label: '精選職業', value: 'curated' as ExpertSource, description: '從 210 個常見職業隨機選取(含具體領域)' },
{ label: 'ConceptNet', value: 'conceptnet' as ExpertSource, description: '從知識圖譜查詢概念' }, { label: 'DBpedia', value: 'dbpedia' as ExpertSource, description: '從 DBpedia 隨機選取職業 (2164 筆)' },
{ label: 'Wikidata', value: 'wikidata' as ExpertSource, description: '從 Wikidata 查詢職業 (需等待 API)' },
]; ];
interface TransformationInputPanelProps { interface TransformationInputPanelProps {

View File

@@ -155,7 +155,7 @@ export function useExpertTransformation(options: UseExpertTransformationOptions
}); });
}); });
}, },
[options.model, options.temperature] [options.model, options.temperature, options.expertSource]
); );
const transformAll = useCallback( const transformAll = useCallback(

View File

@@ -230,7 +230,7 @@ export interface ExpertTransformationDAGResult {
results: ExpertTransformationCategoryResult[]; results: ExpertTransformationCategoryResult[];
} }
export type ExpertSource = 'llm' | 'wikidata' | 'conceptnet'; export type ExpertSource = 'llm' | 'curated' | 'dbpedia' | 'wikidata';
export interface ExpertTransformationRequest { export interface ExpertTransformationRequest {
query: string; query: string;
@@ -240,7 +240,7 @@ export interface ExpertTransformationRequest {
keywords_per_expert: number; // 1-3 keywords_per_expert: number; // 1-3
custom_experts?: string[]; // ["藥師", "工程師"] custom_experts?: string[]; // ["藥師", "工程師"]
expert_source?: ExpertSource; // 專家來源 (default: 'llm') expert_source?: ExpertSource; // 專家來源 (default: 'llm')
expert_language?: string; // 外部來源語言 (default: 'zh') expert_language?: string; // 外部來源語言 (default: 'en')
model?: string; model?: string;
temperature?: number; temperature?: number;
} }