feat: Add Deduplication Agent with embedding and LLM methods
Implement a new Deduplication Agent that identifies and groups similar transformation descriptions. Supports two deduplication methods: - Embedding: Fast vector similarity comparison using cosine similarity - LLM: Accurate pairwise semantic comparison (slower but more precise) Backend changes: - Add deduplication router with /deduplicate endpoint - Add embedding_service for vector-based similarity - Add llm_deduplication_service for LLM-based comparison - Improve expert_transformation error handling and progress reporting Frontend changes: - Add DeduplicationPanel with interactive group visualization - Add useDeduplication hook for state management - Integrate deduplication tab in main App - Add threshold slider and method selector in sidebar 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -232,3 +232,38 @@ class ExpertTransformationRequest(BaseModel):
|
||||
# LLM parameters
|
||||
model: Optional[str] = None
|
||||
temperature: Optional[float] = 0.7
|
||||
|
||||
|
||||
# ===== Deduplication Agent schemas =====
|
||||
|
||||
class DeduplicationMethod(str, Enum):
|
||||
"""去重方法"""
|
||||
EMBEDDING = "embedding" # 向量相似度
|
||||
LLM = "llm" # LLM 成對判斷
|
||||
|
||||
|
||||
class DeduplicationRequest(BaseModel):
|
||||
"""去重請求"""
|
||||
descriptions: List[ExpertTransformationDescription]
|
||||
method: DeduplicationMethod = DeduplicationMethod.EMBEDDING # 去重方法
|
||||
similarity_threshold: float = 0.85 # 餘弦相似度閾值 (0.0-1.0),僅 Embedding 使用
|
||||
model: Optional[str] = None # Embedding/LLM 模型
|
||||
|
||||
|
||||
class DescriptionGroup(BaseModel):
|
||||
"""相似描述分組"""
|
||||
group_id: str # "group-0", "group-1"...
|
||||
representative: ExpertTransformationDescription # 代表描述
|
||||
duplicates: List[ExpertTransformationDescription] # 相似描述
|
||||
similarity_scores: List[float] # 每個重複項的相似度分數
|
||||
|
||||
|
||||
class DeduplicationResult(BaseModel):
|
||||
"""去重結果"""
|
||||
total_input: int # 輸入描述總數
|
||||
total_groups: int # 分組數量
|
||||
total_duplicates: int # 重複項數量
|
||||
groups: List[DescriptionGroup]
|
||||
threshold_used: float
|
||||
method_used: DeduplicationMethod # 使用的去重方法
|
||||
model_used: str # 使用的模型
|
||||
|
||||
Reference in New Issue
Block a user