feat: Enhance patent search and update research documentation
- Improve patent search service with expanded functionality - Update PatentSearchPanel UI component - Add new research_report.md - Update experimental protocol, literature review, paper outline, and theoretical framework Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -10,29 +10,47 @@ This document outlines a comprehensive experimental design to test the hypothesi
|
||||
|
||||
| ID | Research Question |
|
||||
|----|-------------------|
|
||||
| **RQ1** | Does multi-expert generation produce higher semantic diversity than direct LLM generation? |
|
||||
| **RQ2** | Does multi-expert generation produce ideas with lower patent overlap (higher novelty)? |
|
||||
| **RQ3** | What is the optimal number of experts for maximizing diversity? |
|
||||
| **RQ4** | How do different expert sources (LLM vs Curated vs DBpedia) affect idea quality? |
|
||||
| **RQ5** | Does structured attribute decomposition enhance the multi-expert effect? |
|
||||
| **RQ1** | Does attribute decomposition improve semantic diversity of generated ideas? |
|
||||
| **RQ2** | Does expert perspective transformation improve semantic diversity of generated ideas? |
|
||||
| **RQ3** | Is there an interaction effect between attribute decomposition and expert perspectives? |
|
||||
| **RQ4** | Which combination produces the highest patent novelty (lowest overlap)? |
|
||||
| **RQ5** | How do different expert sources (LLM vs Curated vs External) affect idea quality? |
|
||||
| **RQ6** | Does context-free keyword generation (current design) increase hallucination/nonsense rate? |
|
||||
|
||||
### Design Note: Context-Free Keyword Generation
|
||||
|
||||
Our system intentionally excludes the original query during keyword generation (Stage 1):
|
||||
|
||||
```
|
||||
Stage 1 (Keyword): Expert sees "木質" (wood) + "會計師" (accountant)
|
||||
Expert does NOT see "椅子" (chair)
|
||||
→ Generates: "資金流動" (cash flow)
|
||||
|
||||
Stage 2 (Description): Expert sees "椅子" + "資金流動"
|
||||
→ Applies keyword to original query
|
||||
```
|
||||
|
||||
**Rationale**: This forces maximum semantic distance in keyword generation.
|
||||
**Risk**: Some keywords may be too distant, resulting in nonsensical or unusable ideas.
|
||||
**RQ6 investigates**: What is the hallucination/nonsense rate, and is the tradeoff worthwhile?
|
||||
|
||||
---
|
||||
|
||||
## 2. Experimental Design Overview
|
||||
|
||||
### 2.1 Design Type
|
||||
**Mixed Design**: Between-subjects for main conditions × Within-subjects for queries
|
||||
**2×2 Factorial Design**: Attribute Decomposition (With/Without) × Expert Perspectives (With/Without)
|
||||
- Within-subjects for queries (all queries tested across all conditions)
|
||||
|
||||
### 2.2 Variables
|
||||
|
||||
#### Independent Variables (Manipulated)
|
||||
|
||||
| Variable | Levels | Your System Parameter |
|
||||
|----------|--------|----------------------|
|
||||
| **Generation Method** | 5 levels (see conditions) | Condition-dependent |
|
||||
| **Expert Count** | 1, 2, 4, 6, 8 | `expert_count` |
|
||||
| **Expert Source** | LLM, Curated, DBpedia | `expert_source` |
|
||||
| **Attribute Structure** | With/Without decomposition | Pipeline inclusion |
|
||||
| Variable | Levels | Description |
|
||||
|----------|--------|-------------|
|
||||
| **Attribute Decomposition** | 2 levels: With / Without | Whether to decompose query into structured attributes |
|
||||
| **Expert Perspectives** | 2 levels: With / Without | Whether to use expert personas for idea generation |
|
||||
| **Expert Source** (secondary) | LLM, Curated, External | Source of expert occupations (tested within Expert=With conditions) |
|
||||
|
||||
#### Dependent Variables (Measured)
|
||||
|
||||
@@ -61,34 +79,28 @@ This document outlines a comprehensive experimental design to test the hypothesi
|
||||
|
||||
## 3. Experimental Conditions
|
||||
|
||||
### 3.1 Main Study: Generation Method Comparison
|
||||
### 3.1 Main Study: 2×2 Factorial Design
|
||||
|
||||
| Condition | Description | Implementation |
|
||||
|-----------|-------------|----------------|
|
||||
| **C1: Direct** | Direct LLM generation | Prompt: "Generate 20 creative ideas for [query]" |
|
||||
| **C2: Single-Expert** | 1 expert × 20 ideas | `expert_count=1`, `keywords_per_expert=20` |
|
||||
| **C3: Multi-Expert-4** | 4 experts × 5 ideas each | `expert_count=4`, `keywords_per_expert=5` |
|
||||
| **C4: Multi-Expert-8** | 8 experts × 2-3 ideas each | `expert_count=8`, `keywords_per_expert=2-3` |
|
||||
| **C5: Random-Perspective** | 4 random words as "perspectives" | Custom prompt with random nouns |
|
||||
| Condition | Attributes | Experts | Description |
|
||||
|-----------|------------|---------|-------------|
|
||||
| **C1: Direct** | ❌ Without | ❌ Without | Baseline: "Generate 20 creative ideas for [query]" |
|
||||
| **C2: Expert-Only** | ❌ Without | ✅ With | Expert personas generate for whole query |
|
||||
| **C3: Attribute-Only** | ✅ With | ❌ Without | Decompose query, direct generate per attribute |
|
||||
| **C4: Full Pipeline** | ✅ With | ✅ With | Decompose query, experts generate per attribute |
|
||||
|
||||
### 3.2 Expert Count Study
|
||||
### 3.2 Control Condition
|
||||
|
||||
| Condition | Expert Count | Ideas per Expert |
|
||||
|-----------|--------------|------------------|
|
||||
| **E1** | 1 | 20 |
|
||||
| **E2** | 2 | 10 |
|
||||
| **E4** | 4 | 5 |
|
||||
| **E6** | 6 | 3-4 |
|
||||
| **E8** | 8 | 2-3 |
|
||||
| Condition | Description | Purpose |
|
||||
|-----------|-------------|---------|
|
||||
| **C5: Random-Perspective** | 4 random words as "perspectives" | Tests if ANY perspective shift helps, or if EXPERT knowledge specifically matters |
|
||||
|
||||
### 3.3 Expert Source Study
|
||||
### 3.3 Expert Source Study (Secondary, within Expert=With conditions)
|
||||
|
||||
| Condition | Source | Implementation |
|
||||
|-----------|--------|----------------|
|
||||
| **S-LLM** | LLM-generated | `expert_source=ExpertSource.LLM` |
|
||||
| **S-Curated** | Curated 210 occupations | `expert_source=ExpertSource.CURATED` |
|
||||
| **S-DBpedia** | DBpedia 2164 occupations | `expert_source=ExpertSource.DBPEDIA` |
|
||||
| **S-Random** | Random word "experts" | Custom implementation |
|
||||
| **S-LLM** | LLM-generated | Query-specific experts generated by LLM |
|
||||
| **S-Curated** | Curated occupations | Pre-selected high-quality occupations |
|
||||
| **S-External** | External sources | Wikidata/ConceptNet occupations |
|
||||
|
||||
---
|
||||
|
||||
@@ -251,7 +263,69 @@ def compute_patent_novelty(ideas: List[str], query: str) -> dict:
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Metrics Summary Table
|
||||
### 5.3 Hallucination/Nonsense Metrics (RQ6)
|
||||
|
||||
Since our design intentionally excludes the original query during keyword generation, we need to measure the "cost" of this approach.
|
||||
|
||||
#### 5.3.1 LLM-as-Judge for Relevance
|
||||
```python
|
||||
def compute_relevance_score(query: str, ideas: List[str], judge_model: str) -> dict:
|
||||
"""
|
||||
Use LLM to judge if each idea is relevant/applicable to the original query.
|
||||
"""
|
||||
relevant_count = 0
|
||||
nonsense_count = 0
|
||||
results = []
|
||||
|
||||
for idea in ideas:
|
||||
prompt = f"""
|
||||
Original query: {query}
|
||||
Generated idea: {idea}
|
||||
|
||||
Is this idea relevant and applicable to the original query?
|
||||
Rate: 1 (nonsense/irrelevant), 2 (weak connection), 3 (relevant)
|
||||
|
||||
Return JSON: {{"score": N, "reason": "brief explanation"}}
|
||||
"""
|
||||
result = llm_judge(prompt, model=judge_model)
|
||||
results.append(result)
|
||||
if result['score'] == 1:
|
||||
nonsense_count += 1
|
||||
elif result['score'] >= 2:
|
||||
relevant_count += 1
|
||||
|
||||
return {
|
||||
'relevance_rate': relevant_count / len(ideas),
|
||||
'nonsense_rate': nonsense_count / len(ideas),
|
||||
'details': results
|
||||
}
|
||||
```
|
||||
|
||||
#### 5.3.2 Semantic Distance Threshold Analysis
|
||||
```python
|
||||
def analyze_distance_threshold(query: str, ideas: List[str], embedding_model: str) -> dict:
|
||||
"""
|
||||
Analyze which ideas exceed a "too far" semantic distance threshold.
|
||||
Ideas beyond threshold may be creative OR nonsensical.
|
||||
"""
|
||||
query_emb = get_embedding(query, model=embedding_model)
|
||||
idea_embs = get_embeddings(ideas, model=embedding_model)
|
||||
|
||||
distances = [1 - cosine_similarity(query_emb, e) for e in idea_embs]
|
||||
|
||||
# Define thresholds (to be calibrated)
|
||||
CREATIVE_THRESHOLD = 0.6 # Ideas this far are "creative"
|
||||
NONSENSE_THRESHOLD = 0.85 # Ideas this far may be "nonsense"
|
||||
|
||||
return {
|
||||
'creative_zone': sum(1 for d in distances if CREATIVE_THRESHOLD <= d < NONSENSE_THRESHOLD),
|
||||
'potential_nonsense': sum(1 for d in distances if d >= NONSENSE_THRESHOLD),
|
||||
'safe_zone': sum(1 for d in distances if d < CREATIVE_THRESHOLD),
|
||||
'distance_distribution': distances
|
||||
}
|
||||
```
|
||||
|
||||
### 5.4 Metrics Summary Table
|
||||
|
||||
| Metric | Formula | Interpretation |
|
||||
|--------|---------|----------------|
|
||||
@@ -261,6 +335,18 @@ def compute_patent_novelty(ideas: List[str], query: str) -> dict:
|
||||
| **Query Distance** | 1 - cos_sim(query, idea) | Higher = farther from original |
|
||||
| **Patent Novelty Rate** | 1 - (matches / total) | Higher = more novel |
|
||||
|
||||
### 5.5 Nonsense/Hallucination Analysis (RQ6) - Three Methods
|
||||
|
||||
| Method | Metric | How it works | Pros/Cons |
|
||||
|--------|--------|--------------|-----------|
|
||||
| **Automatic** | Semantic Distance Threshold | Ideas with distance > 0.85 flagged as "potential nonsense" | Fast, cheap; May miss contextual nonsense |
|
||||
| **LLM-as-Judge** | Relevance Score (1-3) | GPT-4 rates if idea is relevant to original query | Moderate cost; Good balance |
|
||||
| **Human Evaluation** | Relevance Rating (1-7 Likert) | Humans rate coherence/relevance | Gold standard; Most expensive |
|
||||
|
||||
**Triangulation**: Compare all three methods to validate findings:
|
||||
- If automatic + LLM + human agree → high confidence
|
||||
- If they disagree → investigate why (interesting edge cases)
|
||||
|
||||
---
|
||||
|
||||
## 6. Human Evaluation Protocol
|
||||
@@ -306,6 +392,22 @@ How creative is this idea overall?
|
||||
7 = Extremely creative
|
||||
```
|
||||
|
||||
#### 6.2.4 Relevance/Coherence (7-point Likert) - For RQ6
|
||||
```
|
||||
How relevant and coherent is this idea to the original query?
|
||||
1 = Nonsense/completely irrelevant (no logical connection)
|
||||
2 = Very weak connection (hard to see relevance)
|
||||
3 = Weak connection (requires stretch to see relevance)
|
||||
4 = Moderate connection (somewhat relevant)
|
||||
5 = Good connection (clearly relevant)
|
||||
6 = Strong connection (directly applicable)
|
||||
7 = Perfect fit (highly relevant and coherent)
|
||||
```
|
||||
|
||||
**Note**: This scale specifically measures the "cost" of context-free generation.
|
||||
- Ideas with high novelty but low relevance (1-3) = potential hallucination
|
||||
- Ideas with high novelty AND high relevance (5-7) = successful creative leap
|
||||
|
||||
### 6.3 Procedure
|
||||
|
||||
1. **Introduction** (5 min)
|
||||
@@ -361,21 +463,27 @@ For each query Q in QuerySet:
|
||||
For each condition C in Conditions:
|
||||
|
||||
If C == "Direct":
|
||||
# No attributes, no experts
|
||||
ideas = direct_llm_generation(Q, n=20)
|
||||
|
||||
Elif C == "Single-Expert":
|
||||
expert = generate_expert(Q, n=1)
|
||||
ideas = expert_transformation(Q, expert, ideas_per_expert=20)
|
||||
|
||||
Elif C == "Multi-Expert-4":
|
||||
Elif C == "Expert-Only":
|
||||
# No attributes, with experts
|
||||
experts = generate_experts(Q, n=4)
|
||||
ideas = expert_transformation(Q, experts, ideas_per_expert=5)
|
||||
ideas = expert_generation_whole_query(Q, experts, ideas_per_expert=5)
|
||||
|
||||
Elif C == "Multi-Expert-8":
|
||||
experts = generate_experts(Q, n=8)
|
||||
ideas = expert_transformation(Q, experts, ideas_per_expert=2-3)
|
||||
Elif C == "Attribute-Only":
|
||||
# With attributes, no experts
|
||||
attributes = decompose_attributes(Q)
|
||||
ideas = direct_generation_per_attribute(Q, attributes, ideas_per_attr=5)
|
||||
|
||||
Elif C == "Full-Pipeline":
|
||||
# With attributes, with experts
|
||||
attributes = decompose_attributes(Q)
|
||||
experts = generate_experts(Q, n=4)
|
||||
ideas = expert_transformation(Q, attributes, experts, ideas_per_combo=1-2)
|
||||
|
||||
Elif C == "Random-Perspective":
|
||||
# Control: random words instead of experts
|
||||
perspectives = random.sample(RANDOM_WORDS, 4)
|
||||
ideas = perspective_generation(Q, perspectives, ideas_per=5)
|
||||
|
||||
@@ -469,20 +577,34 @@ Plot: Expert count vs diversity curve
|
||||
|
||||
## 9. Expected Results & Hypotheses
|
||||
|
||||
### 9.1 Primary Hypotheses
|
||||
### 9.1 Primary Hypotheses (2×2 Factorial)
|
||||
|
||||
| Hypothesis | Prediction | Metric |
|
||||
|------------|------------|--------|
|
||||
| **H1** | Multi-Expert-4 > Single-Expert > Direct | Semantic diversity |
|
||||
| **H2** | Multi-Expert-8 ≈ Multi-Expert-4 (diminishing returns) | Semantic diversity |
|
||||
| **H3** | Multi-Expert > Direct | Patent novelty rate |
|
||||
| **H4** | LLM experts > Curated > DBpedia | Unconventionality |
|
||||
| **H5** | With attributes > Without attributes | Overall diversity |
|
||||
| **H1: Main Effect of Attributes** | Attribute-Only > Direct | Semantic diversity |
|
||||
| **H2: Main Effect of Experts** | Expert-Only > Direct | Semantic diversity |
|
||||
| **H3: Interaction Effect** | Full Pipeline > (Attribute-Only + Expert-Only - Direct) | Semantic diversity |
|
||||
| **H4: Novelty** | Full Pipeline > all other conditions | Patent novelty rate |
|
||||
| **H5: Expert vs Random** | Expert-Only > Random-Perspective | Validates expert knowledge matters |
|
||||
| **H6: Novelty-Usefulness Tradeoff** | Full Pipeline has higher nonsense rate than Direct, but acceptable (<20%) | Nonsense rate |
|
||||
|
||||
### 9.2 Expected Effect Sizes
|
||||
### 9.2 Expected Pattern
|
||||
|
||||
```
|
||||
Without Experts With Experts
|
||||
--------------- ------------
|
||||
Without Attributes Direct (low) Expert-Only (medium)
|
||||
With Attributes Attr-Only (medium) Full Pipeline (high)
|
||||
```
|
||||
|
||||
**Expected interaction**: The combination (Full Pipeline) should produce super-additive effects - the benefit of experts is amplified when combined with structured attributes.
|
||||
|
||||
### 9.3 Expected Effect Sizes
|
||||
|
||||
Based on related work:
|
||||
- Diversity increase: d = 0.5-0.8 (medium to large)
|
||||
- Main effect of attributes: d = 0.3-0.5 (small to medium)
|
||||
- Main effect of experts: d = 0.4-0.6 (medium)
|
||||
- Interaction effect: d = 0.2-0.4 (small)
|
||||
- Patent novelty increase: 20-40% improvement
|
||||
- Human creativity rating: d = 0.3-0.5 (small to medium)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user