feat: Enhance patent search and update research documentation

- Improve patent search service with expanded functionality - Update PatentSearchPanel UI component - Add new research_report.md - Update experimental protocol, literature review, paper outline, and theoretical framework Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:52:33 +08:00
parent ec48709755
commit 26a56a2a07
13 changed files with 1446 additions and 537 deletions
--- a/research/research_report.md
+++ b/research/research_report.md
@@ -0,0 +1,472 @@
+---
+marp: true
+theme: default
+paginate: true
+size: 16:9
+style: |
+  section {
+    font-size: 24px;
+  }
+  h1 {
+    color: #2563eb;
+  }
+  h2 {
+    color: #1e40af;
+  }
+  table {
+    font-size: 20px;
+  }
+  .columns {
+    display: grid;
+    grid-template-columns: 1fr 1fr;
+    gap: 1rem;
+  }
+---
+
+# Breaking Semantic Gravity
+## Expert-Augmented LLM Ideation for Enhanced Creativity
+
+**Research Progress Report**
+
+January 2026
+
+---
+
+# Agenda
+
+1. Research Problem & Motivation
+2. Theoretical Framework: "Semantic Gravity"
+3. Proposed Solution: Expert-Augmented Ideation
+4. Experimental Design
+5. Implementation Progress
+6. Timeline & Next Steps
+
+---
+
+# 1. Research Problem
+
+## The Myth, Problem and Myth of LLM Creativity
+
+**Myth**: LLMs enable infinite idea generation for creative tasks
+
+**Problem**: Generated ideas lack **diversity** and **novelty**
+
+- Ideas cluster around high-probability training distributions
+- Limited exploration of distant conceptual spaces
+- "Creative" outputs are **interpolations**, not **extrapolations**
+
+---
+
+# The "Semantic Gravity" Phenomenon
+
+```
+Direct LLM Generation:
+  Input: "Generate creative ideas for a chair"
+
+  Result:
+    - "Ergonomic office chair"      (high probability)
+    - "Foldable portable chair"     (high probability)
+    - "Eco-friendly bamboo chair"   (moderate probability)
+
+  Problem:
+    → Ideas cluster in predictable semantic neighborhoods
+    → Limited exploration of distant conceptual spaces
+```
+
+---
+
+# Why Does Semantic Gravity Occur?
+
+| Factor | Description |
+|--------|-------------|
+| **Statistical Pattern Learning** | LLMs learn co-occurrence patterns from training data |
+| **Model Collapse** (再看看) | Sampling from "creative ideas" distribution seen in training |
+| **Relevance Trap** (再看看) | Strong associations dominate weak ones |
+| **Domain Bias** | Outputs gravitate toward category prototypes |
+
+
+
+---
+
+# 2. Theoretical Framework
+
+## Three Key Foundations
+
+1. **Semantic Distance Theory** (Mednick, 1962)
+   - Creativity correlates with conceptual "jump" distance
+
+2. **Conceptual Blending Theory** (Fauconnier & Turner, 2002)
+   - Creative products emerge from blending input spaces
+
+3. **Design Fixation** (Jansson & Smith, 1991)
+   - Blind adherence to initial ideas limits creativity
+
+---
+
+# Semantic Distance in Action
+
+```
+Without Expert:
+  "Chair" → furniture, sitting, comfort, design
+  Semantic distance: SHORT
+
+With Marine Biologist Expert:
+  "Chair" → underwater pressure, coral structure, buoyancy
+  Semantic distance: LONG
+
+Result: Novel ideas like "pressure-adaptive seating"
+```
+
+**Key Insight**: Expert perspectives force semantic jumps that LLMs wouldn't naturally make.
+
+---
+
+# 3. Proposed Solution
+
+## Expert-Augmented LLM Ideation Pipeline
+
+```
+┌──────────────┐   ┌──────────────┐   ┌──────────────┐
+│   Attribute  │ → │    Expert    │ → │    Expert    │
+│ Decomposition│   │  Generation  │   │Transformation│
+└──────────────┘   └──────────────┘   └──────────────┘
+                                              │
+                                              ▼
+                   ┌──────────────┐   ┌──────────────┐
+                   │   Novelty    │ ← │ Deduplication│
+                   │  Validation  │   │              │
+                   └──────────────┘   └──────────────┘
+```
+
+---
+
+# From "Wisdom of Crowds" to "Inner Crowd"
+
+**Traditional Crowd**:
+- Person 1 → Ideas from perspective 1
+- Person 2 → Ideas from perspective 2
+- Aggregation → Diverse idea pool
+
+**Our "Inner Crowd"**:
+- LLM + Expert 1 Persona → Ideas from perspective 1
+- LLM + Expert 2 Persona → Ideas from perspective 2
+- Aggregation → Diverse idea pool (simulated crowd)
+
+---
+
+# Expert Sources
+
+| Source | Description | Coverage |
+|--------|-------------|----------|
+| **LLM-Generated** | Query-specific, prioritizes unconventional | Flexible |
+| **Curated** | 210 pre-selected high-quality occupations | Controlled |
+| **DBpedia** | 2,164 occupations from database | Broad |
+
+Note: use the domain list (嘗試加入杜威分類法兩層? Future work?  )
+
+---
+
+# 4. Research Questions (2×2 Factorial Design)
+
+| ID | Research Question |
+|----|-------------------|
+| **RQ1** | Does attribute decomposition improve semantic diversity? |
+| **RQ2** | Does expert perspective transformation improve semantic diversity? |
+| **RQ3** | Is there an interaction effect between the two factors? |
+| **RQ4** | Which combination produces the highest patent novelty? |
+| **RQ5** | How do expert sources (LLM vs Curated vs External) affect quality? |
+| **RQ6** | What is the hallucination/nonsense rate of context-free generation? |
+
+---
+
+# Design Choice: Context-Free Keyword Generation
+
+Our system intentionally excludes the original query during keyword generation:
+
+```
+Stage 1 (Keyword):     Expert sees "木質" (wood) + "會計師" (accountant)
+                       Expert does NOT see "椅子" (chair)
+                       → Generates: "資金流動" (cash flow)
+
+Stage 2 (Description): Expert sees "椅子" + "資金流動"
+                       → Applies keyword to original query
+```
+
+**Rationale**: Forces maximum semantic distance for novelty
+**Risk**: Some keywords may be too distant → nonsense/hallucination
+**RQ6**: Measure this tradeoff
+
+---
+
+# The Semantic Distance Tradeoff
+
+```
+Too Close                 Optimal Zone                   Too Far
+(Semantic Gravity)        (Creative)                     (Hallucination)
+├─────────────────────────┼──────────────────────────────┼─────────────────────────┤
+"Ergonomic office chair"  "Pressure-adaptive seating"    "Quantum chair consciousness"
+
+High usefulness           High novelty + useful          High novelty, nonsense
+Low novelty                                              Low usefulness
+```
+
+**H6**: Full Pipeline has higher nonsense rate than Direct, but acceptable (<20%)
+
+---
+
+# Measuring Nonsense/Hallucination (RQ6) - Three Methods
+
+| Method | Metric | Pros | Cons |
+|--------|--------|------|------|
+| **Automatic** | Semantic distance > 0.85 | Fast, cheap | May miss contextual nonsense |
+| **LLM-as-Judge** | GPT-4 relevance score (1-3) | Moderate cost, scalable | Potential LLM bias |
+| **Human Evaluation** | Relevance rating (1-7 Likert) | Gold standard | Expensive, slow |
+
+**Triangulation**: Compare all three methods
+- Agreement → high confidence in nonsense detection
+- Disagreement → interesting edge cases to analyze
+
+---
+
+# Core Hypotheses (2×2 Factorial)
+
+| Hypothesis | Prediction | Metric |
+|------------|------------|--------|
+| **H1: Attributes** | (Attr-Only + Full) > (Direct + Expert-Only) | Semantic diversity |
+| **H2: Experts** | (Expert-Only + Full) > (Direct + Attr-Only) | Semantic diversity |
+| **H3: Interaction** | Full > (Attr-Only + Expert-Only - Direct) | Super-additive effect |
+| **H4: Novelty** | Full Pipeline > all others | Patent novelty rate |
+| **H5: Control** | Expert-Only > Random-Perspective | Validates expert knowledge |
+| **H6: Tradeoff** | Full Pipeline nonsense rate < 20% | Nonsense rate |
+
+---
+
+# Experimental Conditions (2×2 Factorial)
+
+| Condition | Attributes | Experts | Description |
+|-----------|------------|---------|-------------|
+| **C1: Direct** | ❌ | ❌ | Baseline: "Generate 20 ideas for [query]" |
+| **C2: Expert-Only** | ❌ | ✅ | Expert personas generate for whole query |
+| **C3: Attribute-Only** | ✅ | ❌ | Decompose query, direct generate per attribute |
+| **C4: Full Pipeline** | ✅ | ✅ | Decompose query, experts generate per attribute |
+| **C5: Random-Perspective** | ❌ | (random) | Control: random words as "perspectives" |
+
+---
+
+# Expected 2×2 Pattern
+
+```
+                      Without Experts       With Experts
+                      ---------------       ------------
+Without Attributes    Direct (low)          Expert-Only (medium)
+
+With Attributes       Attr-Only (medium)    Full Pipeline (high)
+```
+
+**Key prediction**: The combination (Full Pipeline) produces **super-additive** effects
+- Experts are more effective when given structured attributes to transform
+- The interaction term should be statistically significant
+
+---
+
+# Query Dataset (30 Queries)
+
+**Category A: Everyday Objects (10)**
+- Chair, Umbrella, Backpack, Coffee mug, Bicycle...
+
+**Category B: Technology & Tools (10)**
+- Solar panel, Electric vehicle, 3D printer, Drone...
+
+**Category C: Services & Systems (10)**
+- Food delivery, Online education, Healthcare appointment...
+
+**Total**: 30 queries × 5 conditions (4 factorial + 1 control) × 20 ideas = **3,000 ideas**
+
+---
+
+# Metrics: Stastic Evaluation
+
+| Metric | Formula | Interpretation |
+|--------|---------|----------------|
+| **Mean Pairwise Distance** | avg(1 - cos_sim(i, j)) | Higher = more diverse |
+| **Silhouette Score** | Cluster cohesion vs separation | Higher = clearer clusters |
+| **Query Distance** | 1 - cos_sim(query, idea) | Higher = farther from original |
+| **Patent Novelty Rate** | 1 - (matches / total) | Higher = more novel |
+
+---
+
+# Metrics: Human Evaluation
+
+**Participants**: 60 evaluators (Prolific/MTurk)
+
+**Rating Scales** (7-point Likert):
+
+- **Novelty**: How novel/surprising is this idea?
+- **Usefulness**: How practical is this idea?
+- **Creativity**: How creative is this idea overall?
+- **Relevance**: How relevant/coherent is this idea to the query? **(RQ6)**
+- Nonsense ? 
+
+**Quality Control**:
+
+- Attention checks, completion time monitoring
+- Inter-rater reliability (Cronbach's α > 0.7)
+
+---
+
+# What is Prolific/MTurk?
+
+Online platforms for recruiting human participants for research studies.
+
+| Platform | Description | Best For |
+|----------|-------------|----------|
+| **Prolific** | Academic-focused crowdsourcing | Research studies (higher quality) |
+| **MTurk** | Amazon Mechanical Turk | Large-scale tasks (lower cost) |
+
+**How it works for our study**:
+1. Upload 600 ideas to evaluate (subset of generated ideas)
+2. Recruit 60 participants (~$8-15/hour compensation)
+3. Each participant rates ~30 ideas (novelty, usefulness, creativity)
+4. Download ratings → statistical analysis
+
+**Cost estimate**: 60 participants × 30 min × $12/hr = ~$360
+
+---
+
+# Alternative: LLM-as-Judge
+
+If human evaluation is too expensive or time-consuming:
+
+| Approach | Pros | Cons |
+|----------|------|------|
+| **Human (Prolific/MTurk)** | Gold standard, publishable | Cost, time, IRB approval |
+| **LLM-as-Judge (GPT-4)** | Fast, cheap, reproducible | Less rigorous, potential bias |
+| **Automatic metrics only** | No human cost | Missing subjective quality |
+
+**Recommendation**: Start with automatic metrics, add human evaluation for final paper submission.
+
+---
+
+# 5. Implementation Status
+
+## System Components (Implemented)
+
+- Attribute decomposition pipeline
+- Expert team generation (LLM, Curated, DBpedia sources)
+- Expert transformation with parallel processing
+- Semantic deduplication (embedding + LLM methods)
+- Patent search integration
+- Web-based visualization interface
+
+---
+
+# Implementation Checklist
+
+### Experiment Scripts (To Do)
+- [ ] `experiments/generate_ideas.py` - Idea generation
+- [ ] `experiments/compute_metrics.py` - Automatic metrics
+- [ ] `experiments/export_for_evaluation.py` - Human evaluation prep
+- [ ] `experiments/analyze_results.py` - Statistical analysis
+- [ ] `experiments/visualize.py` - Generate figures
+
+---
+
+# 6. Timeline
+
+| Phase | Activity |
+|-------|----------|
+| **Phase 1** | Implement idea generation scripts |
+| **Phase 2** | Generate all ideas (5 conditions × 30 queries) |
+| **Phase 3** | Compute automatic metrics |
+| **Phase 4** | Design and pilot human evaluation |
+| **Phase 5** | Run human evaluation (60 participants) |
+| **Phase 6** | Analyze results and write paper |
+
+---
+
+# Target Venues
+
+### Tier 1 (Recommended)
+- **CHI** - ACM Conference on Human Factors (Sept deadline)
+- **CSCW** - Computer-Supported Cooperative Work (Apr/Jan deadline)
+- **Creativity & Cognition** - Specialized computational creativity
+
+### Journal Options
+- **IJHCS** - International Journal of Human-Computer Studies
+- **TOCHI** - ACM Transactions on CHI
+
+---
+
+# Key Contributions
+
+1. **Theoretical**: "Semantic gravity" framework + two-factor solution
+
+2. **Methodological**: 2×2 factorial design isolates attribute vs expert contributions
+
+3. **Empirical**: Quantitative evidence for interaction effects in LLM creativity
+
+4. **Practical**: Open-source system with both factors for maximum diversity
+
+---
+
+# Key Differentiator vs PersonaFlow
+
+```
+PersonaFlow (2024):   Query → Experts → Ideas
+                      (Experts see WHOLE query, no structure)
+
+Our Approach:         Query → Attributes → (Attributes × Experts) → Ideas
+                      (Experts see SPECIFIC attributes, systematic)
+```
+
+**What we can answer that PersonaFlow cannot:**
+1. Does problem structure alone help? (Attribute-Only vs Direct)
+2. Do experts help beyond structure? (Full vs Attribute-Only)
+3. Is there an interaction effect? (amplification hypothesis)
+
+---
+
+# Related Work Comparison
+
+| Approach | Limitation | Our Advantage |
+|----------|------------|---------------|
+| Direct LLM | Semantic gravity | Two-factor enhancement |
+| **PersonaFlow** | **No problem structure** | **Attribute decomposition amplifies experts** |
+| PopBlends | Two-concept only | Systematic attribute × expert matrix |
+| BILLY | Cannot isolate factors | 2×2 factorial isolates contributions |
+
+---
+
+# References (Key Papers)
+
+1. Siangliulue et al. (2017) - Wisdom of Crowds via Role Assumption
+2. Liu et al. (2024) - PersonaFlow: LLM-Simulated Expert Perspectives
+3. Choi et al. (2023) - PopBlends: Conceptual Blending with LLMs
+4. Wadinambiarachchi et al. (2024) - Effects of Generative AI on Design Fixation
+5. Mednick (1962) - Semantic Distance Theory
+6. Fauconnier & Turner (2002) - Conceptual Blending Theory
+
+*Full reference list: 55+ papers in `research/references.md`*
+
+---
+
+# Questions & Discussion
+
+## Next Steps
+1. Finalize experimental design details
+2. Implement experiment scripts
+3. Collect pilot data for validation
+4. Submit IRB for human evaluation (if needed)
+
+---
+
+# Thank You
+
+**Project Repository**: novelty-seeking
+
+**Research Materials**:
+- `research/literature_review.md`
+- `research/theoretical_framework.md`
+- `research/experimental_protocol.md`
+- `research/paper_outline.md`
+- `research/references.md`