Files
novelty-seeking/research/paper_outline.md
gbanyan 26a56a2a07 feat: Enhance patent search and update research documentation
- Improve patent search service with expanded functionality
- Update PatentSearchPanel UI component
- Add new research_report.md
- Update experimental protocol, literature review, paper outline, and theoretical framework

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:52:33 +08:00

11 KiB
Raw Permalink Blame History

Paper Outline: Expert-Augmented LLM Ideation

Suggested Titles

  1. "Breaking Semantic Gravity: Expert-Augmented LLM Ideation for Enhanced Creativity"
  2. "Beyond Interpolation: Multi-Expert Perspectives for Combinatorial Innovation"
  3. "Escaping the Relevance Trap: Structured Expert Frameworks for Creative AI"
  4. "From Crowd to Expert: Simulating Diverse Perspectives for LLM-Based Ideation"

Abstract (Draft)

Large Language Models (LLMs) are increasingly used for creative ideation, yet they exhibit a phenomenon we term "semantic gravity" - the tendency to generate outputs clustered around high-probability regions of their training distribution. This limits the novelty and diversity of generated ideas. We investigate two complementary strategies to overcome this limitation: (1) attribute decomposition, which structures the problem space before creative exploration, and (2) expert perspective transformation, which conditions LLM generation on simulated domain-expert viewpoints. Through a 2×2 factorial experiment comparing Direct generation, Expert-Only, Attribute-Only, and Full Pipeline (both factors combined), we demonstrate that each factor independently improves semantic diversity, with the combination producing super-additive effects. Our Full Pipeline achieves [X]% higher semantic diversity and [Y]% lower patent overlap compared to direct generation. We contribute a theoretical framework explaining LLM creativity limitations and an open-source system for innovation ideation.


1. Introduction

1.1 The Promise and Problem of LLM Creativity

  • LLMs widely adopted for creative tasks
  • Initial enthusiasm: infinite idea generation
  • Emerging concern: quality and diversity issues

1.2 The Semantic Gravity Problem

  • Define the phenomenon
  • Why it occurs (statistical learning, mode collapse)
  • Why it matters (innovation requires novelty)

1.3 Our Solution: Expert-Augmented Ideation

  • Brief overview of the approach
  • Key insight: expert perspectives as semantic "escape velocity"
  • Contributions preview

1.4 Paper Organization

  • Roadmap for the rest of the paper

2.1 Theoretical Foundations

  • Semantic distance and creativity (Mednick, 1962)
  • Conceptual blending theory (Fauconnier & Turner)
  • Design fixation (Jansson & Smith)
  • Constraint-based creativity

2.2 LLM Limitations in Creative Generation

  • Design fixation from AI (CHI 2024)
  • Dual mechanisms: inspiration vs. fixation
  • Bias and pattern perpetuation

2.3 Persona-Based Prompting

  • PersonaFlow (2024)
  • BILLY persona vectors (2025)
  • Quantifying persona effects (ACL 2024)

2.4 Creativity Support Tools

  • Wisdom of crowds approaches
  • Human-AI collaboration in ideation
  • Evaluation methods (CAT, semantic distance)

2.5 Positioning Our Work

Key distinction from PersonaFlow (closest related work):

PersonaFlow:   Query → Experts → Ideas (no problem structure)
Our approach:  Query → Attributes → (Attributes × Experts) → Ideas
  • PersonaFlow applies experts to whole query; we apply experts to decomposed attributes
  • PersonaFlow cannot isolate what helps; our 2×2 factorial design tests each factor
  • We hypothesize attribute decomposition amplifies expert effectiveness (interaction effect)
  • PersonaFlow showed experts help; we test whether structuring the problem first makes experts more effective

3. System Design

3.1 Overview

  • Pipeline diagram
  • Design rationale

3.2 Attribute Decomposition

  • Category analysis (dynamic vs. fixed)
  • Attribute generation per category
  • DAG relationship mapping

3.3 Expert Team Generation

  • Expert sources: LLM-generated, curated, external databases
  • Diversity optimization strategies
  • Domain coverage considerations

3.4 Expert Transformation

  • Conditioning mechanism
  • Keyword generation
  • Description generation
  • Parallel processing for efficiency

3.5 Semantic Deduplication

  • Embedding-based approach
  • LLM-based approach
  • Threshold selection

3.6 Novelty Validation

  • Patent search integration
  • Overlap scoring

4. Experiments

4.1 Research Questions

  • RQ1: Does attribute decomposition improve semantic diversity?
  • RQ2: Does expert perspective transformation improve semantic diversity?
  • RQ3: Is there an interaction effect between the two factors?
  • RQ4: Which combination produces the highest patent novelty?
  • RQ5: How do expert sources (LLM vs Curated vs External) affect quality?
  • RQ6: What is the hallucination/nonsense rate of context-free keyword generation?

4.1.1 Design Note: Context-Free Keyword Generation

Our system intentionally excludes the original query during keyword generation:

  • Stage 1: Expert sees attribute only (e.g., "wood" + "accountant"), NOT the query ("chair")
  • Stage 2: Expert applies keyword to original query with context
  • Rationale: Maximize semantic distance for novelty
  • Risk: Some ideas may be too distant (nonsense/hallucination)
  • RQ6 investigates this tradeoff

4.2 Experimental Setup

4.2.1 Dataset

  • 30 queries for ideation (see experimental_protocol.md)
  • Selection criteria: diverse domains, complexity levels
  • Categories: everyday objects, technology/tools, services/systems

4.2.2 Conditions (2×2 Factorial Design)

Condition Attributes Experts Description
C1: Direct Baseline: "Generate 20 creative ideas for [query]"
C2: Expert-Only Expert personas generate for whole query
C3: Attribute-Only Decompose query, direct generate per attribute
C4: Full Pipeline Decompose query, experts generate per attribute
C5: Random-Perspective (random) Control: 4 random words as "perspectives"

4.2.3 Controls

  • Same LLM model (specify version)
  • Same temperature settings
  • Same total idea count per condition (20 ideas)

4.3 Metrics

4.3.1 Semantic Diversity

  • Mean pairwise cosine distance between embeddings
  • Cluster distribution analysis
  • Silhouette score for idea clustering

4.3.2 Novelty

  • Patent overlap rate
  • Semantic distance from query centroid

4.3.3 Quality (Human Evaluation)

  • Novelty rating (1-7 Likert)
  • Usefulness rating (1-7 Likert)
  • Creativity rating (1-7 Likert)
  • Relevance rating (1-7 Likert) - for RQ6
  • Interrater reliability (Cronbach's alpha)

4.3.4 Nonsense/Hallucination Analysis (RQ6) - Three Methods

Method Metric Purpose
Automatic Semantic distance threshold (>0.85) Fast screening
LLM-as-Judge GPT-4 relevance score (1-3) Scalable evaluation
Human Relevance rating (1-7 Likert) Gold standard validation

Triangulate all three to validate findings

4.4 Procedure

  • Idea generation process
  • Evaluation process
  • Statistical analysis methods

5. Results

5.1 Main Effect of Attribute Decomposition (RQ1)

  • Compare: (Attribute-Only + Full Pipeline) vs (Direct + Expert-Only)
  • Quantitative results
  • Statistical significance (ANOVA main effect)

5.2 Main Effect of Expert Perspectives (RQ2)

  • Compare: (Expert-Only + Full Pipeline) vs (Direct + Attribute-Only)
  • Quantitative results
  • Statistical significance (ANOVA main effect)

5.3 Interaction Effect (RQ3)

  • 2×2 interaction analysis
  • Visualization: interaction plot
  • Evidence for super-additive vs additive effects

5.4 Patent Novelty (RQ4)

  • Overlap rates by condition
  • Full Pipeline vs other conditions
  • Examples of high-novelty ideas

5.5 Expert Source Comparison (RQ5)

  • LLM-generated vs curated vs external
  • Unconventionality metrics
  • Within Expert=With conditions only

5.6 Control Condition Analysis

  • Expert-Only vs Random-Perspective
  • Validates expert knowledge matters

5.7 Hallucination/Nonsense Analysis (RQ6)

  • Nonsense rate by condition (LLM-as-judge)
  • Semantic distance threshold analysis
  • Novelty-usefulness tradeoff visualization
  • Is the context-free design worth the hallucination cost?

5.8 Human Evaluation Results

  • Rating distributions by condition
  • 2×2 pattern in human judgments
  • Correlation with automatic metrics

6. Discussion

6.1 Interpreting the Results

  • Why each factor contributes independently
  • The interaction: why attributes amplify expert effectiveness
  • Theoretical explanation via conceptual blending

6.2 Theoretical Implications

  • Semantic gravity as framework for LLM creativity
  • Two complementary escape mechanisms
  • Structured decomposition as "scaffolding" for creative exploration

6.3 Practical Implications

  • When to use multi-expert approach
  • Expert selection strategies
  • Integration with existing workflows

6.4 Limitations

  • LLM-specific results may not generalize
  • Patent overlap as proxy for true novelty
  • Human evaluation subjectivity
  • Single-language experiments

6.5 Future Work

  • Cross-cultural creativity
  • Domain-specific expert optimization
  • Real-world deployment studies
  • Integration with other creativity techniques

7. Conclusion

  • Summary of contributions
  • Key takeaways
  • Broader impact

Appendices

A. Prompt Templates

  • Expert generation prompts
  • Keyword generation prompts
  • Description generation prompts

B. Full Experimental Results

  • Complete data tables
  • Additional visualizations

C. Expert Source Details

  • Curated occupation list
  • DBpedia/Wikidata query details

D. Human Evaluation Protocol

  • Instructions for raters
  • Example ratings
  • Training materials

Target Venues

  1. CHI - ACM Conference on Human Factors in Computing Systems

    • Strong fit: creativity support tools, human-AI collaboration
    • Deadline: typically September
  2. CSCW - ACM Conference on Computer-Supported Cooperative Work

    • Good fit: collaborative ideation, crowd wisdom
    • Deadline: typically April/January
  3. Creativity & Cognition - ACM Conference

    • Perfect fit: computational creativity focus
    • Smaller but specialized venue

Tier 2 (Alternative)

  1. DIS - ACM Designing Interactive Systems

    • Good fit: design ideation tools
  2. UIST - ACM Symposium on User Interface Software and Technology

    • If system/interaction focus emphasized
  3. ICCC - International Conference on Computational Creativity

    • Specialized computational creativity venue

Journal Options

  1. International Journal of Human-Computer Studies (IJHCS)
  2. ACM Transactions on Computer-Human Interaction (TOCHI)
  3. Design Studies
  4. Creativity Research Journal

Timeline Checklist

  • Finalize experimental design
  • Collect/select query dataset
  • Run all experimental conditions
  • Compute automatic metrics
  • Design human evaluation study
  • Recruit evaluators
  • Conduct human evaluation
  • Statistical analysis
  • Write first draft
  • Internal review
  • Revision
  • Submit