Files

gbanyan 26a56a2a07 feat: Enhance patent search and update research documentation

- Improve patent search service with expanded functionality
- Update PatentSearchPanel UI component
- Add new research_report.md
- Update experimental protocol, literature review, paper outline, and theoretical framework

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-19 15:52:33 +08:00

14 KiB

Raw Blame History

marp, theme, paginate, size, style

marp	theme	paginate	size	style
true	default	true	16:9	section { font-size: 24px; } h1 { color: #2563eb; } h2 { color: #1e40af; } table { font-size: 20px; } .columns { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; }

Breaking Semantic Gravity

Expert-Augmented LLM Ideation for Enhanced Creativity

Research Progress Report

January 2026

Agenda

Research Problem & Motivation
Theoretical Framework: "Semantic Gravity"
Proposed Solution: Expert-Augmented Ideation
Experimental Design
Implementation Progress
Timeline & Next Steps

1. Research Problem

The Myth, Problem and Myth of LLM Creativity

Myth: LLMs enable infinite idea generation for creative tasks

Problem: Generated ideas lack diversity and novelty

Ideas cluster around high-probability training distributions
Limited exploration of distant conceptual spaces
"Creative" outputs are interpolations, not extrapolations

The "Semantic Gravity" Phenomenon

Direct LLM Generation:
  Input: "Generate creative ideas for a chair"

  Result:
    - "Ergonomic office chair"      (high probability)
    - "Foldable portable chair"     (high probability)
    - "Eco-friendly bamboo chair"   (moderate probability)

  Problem:
    → Ideas cluster in predictable semantic neighborhoods
    → Limited exploration of distant conceptual spaces

Why Does Semantic Gravity Occur?

Factor	Description
Statistical Pattern Learning	LLMs learn co-occurrence patterns from training data
Model Collapse (再看看)	Sampling from "creative ideas" distribution seen in training
Relevance Trap (再看看)	Strong associations dominate weak ones
Domain Bias	Outputs gravitate toward category prototypes

2. Theoretical Framework

Three Key Foundations

Semantic Distance Theory (Mednick, 1962)
- Creativity correlates with conceptual "jump" distance
Conceptual Blending Theory (Fauconnier & Turner, 2002)
- Creative products emerge from blending input spaces
Design Fixation (Jansson & Smith, 1991)
- Blind adherence to initial ideas limits creativity

Semantic Distance in Action

Without Expert:
  "Chair" → furniture, sitting, comfort, design
  Semantic distance: SHORT

With Marine Biologist Expert:
  "Chair" → underwater pressure, coral structure, buoyancy
  Semantic distance: LONG

Result: Novel ideas like "pressure-adaptive seating"

Key Insight: Expert perspectives force semantic jumps that LLMs wouldn't naturally make.

3. Proposed Solution

Expert-Augmented LLM Ideation Pipeline

┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│   Attribute  │ → │    Expert    │ → │    Expert    │
│ Decomposition│   │  Generation  │   │Transformation│
└──────────────┘   └──────────────┘   └──────────────┘
                                              │
                                              ▼
                   ┌──────────────┐   ┌──────────────┐
                   │   Novelty    │ ← │ Deduplication│
                   │  Validation  │   │              │
                   └──────────────┘   └──────────────┘

From "Wisdom of Crowds" to "Inner Crowd"

Traditional Crowd:

Person 1 → Ideas from perspective 1
Person 2 → Ideas from perspective 2
Aggregation → Diverse idea pool

Our "Inner Crowd":

LLM + Expert 1 Persona → Ideas from perspective 1
LLM + Expert 2 Persona → Ideas from perspective 2
Aggregation → Diverse idea pool (simulated crowd)

Expert Sources

Source	Description	Coverage
LLM-Generated	Query-specific, prioritizes unconventional	Flexible
Curated	210 pre-selected high-quality occupations	Controlled
DBpedia	2,164 occupations from database	Broad

Note: use the domain list (嘗試加入杜威分類法兩層? Future work? )

4. Research Questions (2×2 Factorial Design)

ID	Research Question
RQ1	Does attribute decomposition improve semantic diversity?
RQ2	Does expert perspective transformation improve semantic diversity?
RQ3	Is there an interaction effect between the two factors?
RQ4	Which combination produces the highest patent novelty?
RQ5	How do expert sources (LLM vs Curated vs External) affect quality?
RQ6	What is the hallucination/nonsense rate of context-free generation?

Design Choice: Context-Free Keyword Generation

Our system intentionally excludes the original query during keyword generation:

Stage 1 (Keyword):     Expert sees "木質" (wood) + "會計師" (accountant)
                       Expert does NOT see "椅子" (chair)
                       → Generates: "資金流動" (cash flow)

Stage 2 (Description): Expert sees "椅子" + "資金流動"
                       → Applies keyword to original query

Rationale: Forces maximum semantic distance for novelty Risk: Some keywords may be too distant → nonsense/hallucination RQ6: Measure this tradeoff

The Semantic Distance Tradeoff

Too Close                 Optimal Zone                   Too Far
(Semantic Gravity)        (Creative)                     (Hallucination)
├─────────────────────────┼──────────────────────────────┼─────────────────────────┤
"Ergonomic office chair"  "Pressure-adaptive seating"    "Quantum chair consciousness"

High usefulness           High novelty + useful          High novelty, nonsense
Low novelty                                              Low usefulness

H6: Full Pipeline has higher nonsense rate than Direct, but acceptable (<20%)

Measuring Nonsense/Hallucination (RQ6) - Three Methods

Method	Metric	Pros	Cons
Automatic	Semantic distance > 0.85	Fast, cheap	May miss contextual nonsense
LLM-as-Judge	GPT-4 relevance score (1-3)	Moderate cost, scalable	Potential LLM bias
Human Evaluation	Relevance rating (1-7 Likert)	Gold standard	Expensive, slow

Triangulation: Compare all three methods

Agreement → high confidence in nonsense detection
Disagreement → interesting edge cases to analyze

Core Hypotheses (2×2 Factorial)

Hypothesis	Prediction	Metric
H1: Attributes	(Attr-Only + Full) > (Direct + Expert-Only)	Semantic diversity
H2: Experts	(Expert-Only + Full) > (Direct + Attr-Only)	Semantic diversity
H3: Interaction	Full > (Attr-Only + Expert-Only - Direct)	Super-additive effect
H4: Novelty	Full Pipeline > all others	Patent novelty rate
H5: Control	Expert-Only > Random-Perspective	Validates expert knowledge
H6: Tradeoff	Full Pipeline nonsense rate < 20%	Nonsense rate

Experimental Conditions (2×2 Factorial)

Condition	Attributes	Experts	Description
C1: Direct	❌	❌	Baseline: "Generate 20 ideas for [query]"
C2: Expert-Only	❌	✅	Expert personas generate for whole query
C3: Attribute-Only	✅	❌	Decompose query, direct generate per attribute
C4: Full Pipeline	✅	✅	Decompose query, experts generate per attribute
C5: Random-Perspective	❌	(random)	Control: random words as "perspectives"

Expected 2×2 Pattern

                      Without Experts       With Experts
                      ---------------       ------------
Without Attributes    Direct (low)          Expert-Only (medium)

With Attributes       Attr-Only (medium)    Full Pipeline (high)

Key prediction: The combination (Full Pipeline) produces super-additive effects

Experts are more effective when given structured attributes to transform
The interaction term should be statistically significant

Query Dataset (30 Queries)

Category A: Everyday Objects (10)

Chair, Umbrella, Backpack, Coffee mug, Bicycle...

Category B: Technology & Tools (10)

Solar panel, Electric vehicle, 3D printer, Drone...

Category C: Services & Systems (10)

Food delivery, Online education, Healthcare appointment...

Total: 30 queries × 5 conditions (4 factorial + 1 control) × 20 ideas = 3,000 ideas

Metrics: Stastic Evaluation

Metric	Formula	Interpretation
Mean Pairwise Distance	avg(1 - cos_sim(i, j))	Higher = more diverse
Silhouette Score	Cluster cohesion vs separation	Higher = clearer clusters
Query Distance	1 - cos_sim(query, idea)	Higher = farther from original
Patent Novelty Rate	1 - (matches / total)	Higher = more novel

Metrics: Human Evaluation

Participants: 60 evaluators (Prolific/MTurk)

Rating Scales (7-point Likert):

Novelty: How novel/surprising is this idea?
Usefulness: How practical is this idea?
Creativity: How creative is this idea overall?
Relevance: How relevant/coherent is this idea to the query? (RQ6)
Nonsense ?

Quality Control:

Attention checks, completion time monitoring
Inter-rater reliability (Cronbach's α > 0.7)

What is Prolific/MTurk?

Online platforms for recruiting human participants for research studies.

Platform	Description	Best For
Prolific	Academic-focused crowdsourcing	Research studies (higher quality)
MTurk	Amazon Mechanical Turk	Large-scale tasks (lower cost)

How it works for our study:

Upload 600 ideas to evaluate (subset of generated ideas)
Recruit 60 participants (~$8-15/hour compensation)
Each participant rates ~30 ideas (novelty, usefulness, creativity)
Download ratings → statistical analysis

Cost estimate: 60 participants × 30 min × $12/hr = ~$360

Alternative: LLM-as-Judge

If human evaluation is too expensive or time-consuming:

Approach	Pros	Cons
Human (Prolific/MTurk)	Gold standard, publishable	Cost, time, IRB approval
LLM-as-Judge (GPT-4)	Fast, cheap, reproducible	Less rigorous, potential bias
Automatic metrics only	No human cost	Missing subjective quality

Recommendation: Start with automatic metrics, add human evaluation for final paper submission.

5. Implementation Status

System Components (Implemented)

Attribute decomposition pipeline
Expert team generation (LLM, Curated, DBpedia sources)
Expert transformation with parallel processing
Semantic deduplication (embedding + LLM methods)
Patent search integration
Web-based visualization interface

Implementation Checklist

Experiment Scripts (To Do)

experiments/generate_ideas.py - Idea generation
experiments/compute_metrics.py - Automatic metrics
experiments/export_for_evaluation.py - Human evaluation prep
experiments/analyze_results.py - Statistical analysis
experiments/visualize.py - Generate figures

6. Timeline

Phase	Activity
Phase 1	Implement idea generation scripts
Phase 2	Generate all ideas (5 conditions × 30 queries)
Phase 3	Compute automatic metrics
Phase 4	Design and pilot human evaluation
Phase 5	Run human evaluation (60 participants)
Phase 6	Analyze results and write paper

Target Venues

Tier 1 (Recommended)

CHI - ACM Conference on Human Factors (Sept deadline)
CSCW - Computer-Supported Cooperative Work (Apr/Jan deadline)
Creativity & Cognition - Specialized computational creativity

Journal Options

IJHCS - International Journal of Human-Computer Studies
TOCHI - ACM Transactions on CHI

Key Contributions

Theoretical: "Semantic gravity" framework + two-factor solution
Methodological: 2×2 factorial design isolates attribute vs expert contributions
Empirical: Quantitative evidence for interaction effects in LLM creativity
Practical: Open-source system with both factors for maximum diversity

Key Differentiator vs PersonaFlow

PersonaFlow (2024):   Query → Experts → Ideas
                      (Experts see WHOLE query, no structure)

Our Approach:         Query → Attributes → (Attributes × Experts) → Ideas
                      (Experts see SPECIFIC attributes, systematic)

What we can answer that PersonaFlow cannot:

Does problem structure alone help? (Attribute-Only vs Direct)
Do experts help beyond structure? (Full vs Attribute-Only)
Is there an interaction effect? (amplification hypothesis)

Approach	Limitation	Our Advantage
Direct LLM	Semantic gravity	Two-factor enhancement
PersonaFlow	No problem structure	Attribute decomposition amplifies experts
PopBlends	Two-concept only	Systematic attribute × expert matrix
BILLY	Cannot isolate factors	2×2 factorial isolates contributions

References (Key Papers)

Siangliulue et al. (2017) - Wisdom of Crowds via Role Assumption
Liu et al. (2024) - PersonaFlow: LLM-Simulated Expert Perspectives
Choi et al. (2023) - PopBlends: Conceptual Blending with LLMs
Wadinambiarachchi et al. (2024) - Effects of Generative AI on Design Fixation
Mednick (1962) - Semantic Distance Theory
Fauconnier & Turner (2002) - Conceptual Blending Theory

Full reference list: 55+ papers in research/references.md

Questions & Discussion

Next Steps

Finalize experimental design details
Implement experiment scripts
Collect pilot data for validation
Submit IRB for human evaluation (if needed)

Thank You

Project Repository: novelty-seeking

Research Materials:

research/literature_review.md
research/theoretical_framework.md
research/experimental_protocol.md
research/paper_outline.md
research/references.md

14 KiB Raw Blame History Unescape Escape