Files

gbanyan 26a56a2a07 feat: Enhance patent search and update research documentation

- Improve patent search service with expanded functionality
- Update PatentSearchPanel UI component
- Add new research_report.md
- Update experimental protocol, literature review, paper outline, and theoretical framework

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-19 15:52:33 +08:00

11 KiB

Raw Blame History

Paper Outline: Expert-Augmented LLM Ideation

Suggested Titles

"Breaking Semantic Gravity: Expert-Augmented LLM Ideation for Enhanced Creativity"
"Beyond Interpolation: Multi-Expert Perspectives for Combinatorial Innovation"
"Escaping the Relevance Trap: Structured Expert Frameworks for Creative AI"
"From Crowd to Expert: Simulating Diverse Perspectives for LLM-Based Ideation"

Abstract (Draft)

Large Language Models (LLMs) are increasingly used for creative ideation, yet they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs clustered around high-probability regions of their training distribution. This limits the novelty and diversity of generated ideas. We investigate two complementary strategies to overcome this limitation: (1) attribute decomposition, which structures the problem space before creative exploration, and (2) expert perspective transformation, which conditions LLM generation on simulated domain-expert viewpoints. Through a 2×2 factorial experiment comparing Direct generation, Expert-Only, Attribute-Only, and Full Pipeline (both factors combined), we demonstrate that each factor independently improves semantic diversity, with the combination producing super-additive effects. Our Full Pipeline achieves [X]% higher semantic diversity and [Y]% lower patent overlap compared to direct generation. We contribute a theoretical framework explaining LLM creativity limitations and an open-source system for innovation ideation.

1. Introduction

1.1 The Promise and Problem of LLM Creativity

LLMs widely adopted for creative tasks
Initial enthusiasm: infinite idea generation
Emerging concern: quality and diversity issues

1.2 The Semantic Gravity Problem

Define the phenomenon
Why it occurs (statistical learning, mode collapse)
Why it matters (innovation requires novelty)

1.3 Our Solution: Expert-Augmented Ideation

Brief overview of the approach
Key insight: expert perspectives as semantic "escape velocity"
Contributions preview

1.4 Paper Organization

Roadmap for the rest of the paper

2.1 Theoretical Foundations

Semantic distance and creativity (Mednick, 1962)
Conceptual blending theory (Fauconnier & Turner)
Design fixation (Jansson & Smith)
Constraint-based creativity

2.2 LLM Limitations in Creative Generation

Design fixation from AI (CHI 2024)
Dual mechanisms: inspiration vs. fixation
Bias and pattern perpetuation

2.3 Persona-Based Prompting

PersonaFlow (2024)
BILLY persona vectors (2025)
Quantifying persona effects (ACL 2024)

2.4 Creativity Support Tools

Wisdom of crowds approaches
Human-AI collaboration in ideation
Evaluation methods (CAT, semantic distance)

2.5 Positioning Our Work

Key distinction from PersonaFlow (closest related work):

PersonaFlow:   Query → Experts → Ideas (no problem structure)
Our approach:  Query → Attributes → (Attributes × Experts) → Ideas

PersonaFlow applies experts to whole query; we apply experts to decomposed attributes
PersonaFlow cannot isolate what helps; our 2×2 factorial design tests each factor
We hypothesize attribute decomposition amplifies expert effectiveness (interaction effect)
PersonaFlow showed experts help; we test whether structuring the problem first makes experts more effective

3. System Design

3.1 Overview

Pipeline diagram
Design rationale

3.2 Attribute Decomposition

Category analysis (dynamic vs. fixed)
Attribute generation per category
DAG relationship mapping

3.3 Expert Team Generation

Expert sources: LLM-generated, curated, external databases
Diversity optimization strategies
Domain coverage considerations

3.4 Expert Transformation

Conditioning mechanism
Keyword generation
Description generation
Parallel processing for efficiency

3.5 Semantic Deduplication

Embedding-based approach
LLM-based approach
Threshold selection

3.6 Novelty Validation

Patent search integration
Overlap scoring

4. Experiments

4.1 Research Questions

RQ1: Does attribute decomposition improve semantic diversity?
RQ2: Does expert perspective transformation improve semantic diversity?
RQ3: Is there an interaction effect between the two factors?
RQ4: Which combination produces the highest patent novelty?
RQ5: How do expert sources (LLM vs Curated vs External) affect quality?
RQ6: What is the hallucination/nonsense rate of context-free keyword generation?

4.1.1 Design Note: Context-Free Keyword Generation

Our system intentionally excludes the original query during keyword generation:

Stage 1: Expert sees attribute only (e.g., "wood" + "accountant"), NOT the query ("chair")
Stage 2: Expert applies keyword to original query with context
Rationale: Maximize semantic distance for novelty
Risk: Some ideas may be too distant (nonsense/hallucination)
RQ6 investigates this tradeoff

4.2 Experimental Setup

4.2.1 Dataset

30 queries for ideation (see experimental_protocol.md)
Selection criteria: diverse domains, complexity levels
Categories: everyday objects, technology/tools, services/systems

4.2.2 Conditions (2×2 Factorial Design)

Condition	Attributes	Experts	Description
C1: Direct	❌	❌	Baseline: "Generate 20 creative ideas for [query]"
C2: Expert-Only	❌	✅	Expert personas generate for whole query
C3: Attribute-Only	✅	❌	Decompose query, direct generate per attribute
C4: Full Pipeline	✅	✅	Decompose query, experts generate per attribute
C5: Random-Perspective	❌	(random)	Control: 4 random words as "perspectives"

4.2.3 Controls

Same LLM model (specify version)
Same temperature settings
Same total idea count per condition (20 ideas)

4.3 Metrics

4.3.1 Semantic Diversity

Mean pairwise cosine distance between embeddings
Cluster distribution analysis
Silhouette score for idea clustering

4.3.2 Novelty

Patent overlap rate
Semantic distance from query centroid

4.3.3 Quality (Human Evaluation)

Novelty rating (1-7 Likert)
Usefulness rating (1-7 Likert)
Creativity rating (1-7 Likert)
Relevance rating (1-7 Likert) - for RQ6
Interrater reliability (Cronbach's alpha)

4.3.4 Nonsense/Hallucination Analysis (RQ6) - Three Methods

Method	Metric	Purpose
Automatic	Semantic distance threshold (>0.85)	Fast screening
LLM-as-Judge	GPT-4 relevance score (1-3)	Scalable evaluation
Human	Relevance rating (1-7 Likert)	Gold standard validation

Triangulate all three to validate findings

4.4 Procedure

Idea generation process
Evaluation process
Statistical analysis methods

5. Results

5.1 Main Effect of Attribute Decomposition (RQ1)

Compare: (Attribute-Only + Full Pipeline) vs (Direct + Expert-Only)
Quantitative results
Statistical significance (ANOVA main effect)

5.2 Main Effect of Expert Perspectives (RQ2)

Compare: (Expert-Only + Full Pipeline) vs (Direct + Attribute-Only)
Quantitative results
Statistical significance (ANOVA main effect)

5.3 Interaction Effect (RQ3)

2×2 interaction analysis
Visualization: interaction plot
Evidence for super-additive vs additive effects

5.4 Patent Novelty (RQ4)

Overlap rates by condition
Full Pipeline vs other conditions
Examples of high-novelty ideas

5.5 Expert Source Comparison (RQ5)

LLM-generated vs curated vs external
Unconventionality metrics
Within Expert=With conditions only

5.6 Control Condition Analysis

Expert-Only vs Random-Perspective
Validates expert knowledge matters

5.7 Hallucination/Nonsense Analysis (RQ6)

Nonsense rate by condition (LLM-as-judge)
Semantic distance threshold analysis
Novelty-usefulness tradeoff visualization
Is the context-free design worth the hallucination cost?

5.8 Human Evaluation Results

Rating distributions by condition
2×2 pattern in human judgments
Correlation with automatic metrics

6. Discussion

6.1 Interpreting the Results

Why each factor contributes independently
The interaction: why attributes amplify expert effectiveness
Theoretical explanation via conceptual blending

6.2 Theoretical Implications

Semantic gravity as framework for LLM creativity
Two complementary escape mechanisms
Structured decomposition as "scaffolding" for creative exploration

6.3 Practical Implications

When to use multi-expert approach
Expert selection strategies
Integration with existing workflows

6.4 Limitations

LLM-specific results may not generalize
Patent overlap as proxy for true novelty
Human evaluation subjectivity
Single-language experiments

6.5 Future Work

Cross-cultural creativity
Domain-specific expert optimization
Real-world deployment studies
Integration with other creativity techniques

7. Conclusion

Summary of contributions
Key takeaways
Broader impact

Appendices

A. Prompt Templates

Expert generation prompts
Keyword generation prompts
Description generation prompts

B. Full Experimental Results

Complete data tables
Additional visualizations

C. Expert Source Details

Curated occupation list
DBpedia/Wikidata query details

D. Human Evaluation Protocol

Instructions for raters
Example ratings
Training materials

Target Venues

Tier 1 (Recommended)

CHI - ACM Conference on Human Factors in Computing Systems
- Strong fit: creativity support tools, human-AI collaboration
- Deadline: typically September
CSCW - ACM Conference on Computer-Supported Cooperative Work
- Good fit: collaborative ideation, crowd wisdom
- Deadline: typically April/January
Creativity & Cognition - ACM Conference
- Perfect fit: computational creativity focus
- Smaller but specialized venue

Tier 2 (Alternative)

DIS - ACM Designing Interactive Systems
- Good fit: design ideation tools
UIST - ACM Symposium on User Interface Software and Technology
- If system/interaction focus emphasized
ICCC - International Conference on Computational Creativity
- Specialized computational creativity venue

Journal Options

International Journal of Human-Computer Studies (IJHCS)
ACM Transactions on Computer-Human Interaction (TOCHI)
Design Studies
Creativity Research Journal

Timeline Checklist

Finalize experimental design
Collect/select query dataset
Run all experimental conditions
Compute automatic metrics
Design human evaluation study
Recruit evaluators
Conduct human evaluation
Statistical analysis
Write first draft
Internal review
Revision
Submit

11 KiB Raw Blame History Unescape Escape