← Back to Demo

LLM Creative Ability Testing

Research on AI Decision-Making and Geometric Shape Generation from Qualitative Descriptors. This study investigates how Large Language Models (specifically Claude Sonnet 4.5) translate abstract qualitative descriptors into concrete geometric properties using P5.js interactive visualizations. The research employs a two-step methodology: (1) geometric parameter specification across a 1-10 scale, and (2) implementation via noise-driven procedural generation with interactive sliders.

Testing now spans Weeks 5-8 with three complementary regimes: Week 5 static 3-point prompts (separate code for levels 1/5/10), Week 6 Opus 4.1 audiovisual sliders (integrated sound synthesis linked to geometric parameters), and Week 8 fluid sliders (continuous interpolation in both any-base and sphere-only constraints). Each addition probes a different aspect of LLM reasoning—fresh-start creativity, temporal/audio mapping, and continuous parameter understanding.

Key Research Question: Can LLMs effectively translate abstract qualitative descriptors into coherent 3D geometric forms? What does the quality of these translations reveal about LLM creative reasoning capabilities and training biases?

Research Overview

Total descriptors analyzed: 8 unique qualities exercised across 3 testing modes (static, Opus A/V, fluid slider) and 2 geometry constraint regimes (any base vs sphere). Every trial uses P5.js WebGL with interactive sliders (1-10 scale, 0.1 increments). The fluid slider series evaluates both varying-noise-any-baseshape—complete freedom where the LLM selects the family per descriptor—and varying-noise-sphere-baseshape, which constrains topology to test how well abstract terms can be expressed through surface modulation alone. New Week 6 Opus 4.1 tests add rating criteria for sound-to-geometry alignment (oscillator counts, filter resonance, stereo spread). Results continue to be rated on clean interpolation, conceptual coherence, visual distinctiveness, and—where applicable—sonic mapping quality, ranging from 4/10 to 9/10 depending on descriptor and constraint.

Diagram 1: Research Framework

Placeholder for research framework diagram

Methodology

Two-Step Prompting Framework

Separates conceptual reasoning from implementation. Forces explicit parameter specification and enables metacognitive commentary:

Integrated Testing Tracks (Weeks 5-8)

Diagram 2: Two-Step Prompting Process

Placeholder for methodology diagram

Descriptors Tested

Eight qualitative descriptors currently live in the repo. Smoothness, Complexity, and Joyfulness have full coverage across static 3-point, Opus A/V, and fluid slider modes. Elegance, Tension, Organic, Fragility, and Dynamism were added during the Week 8 expansion and presently exist as fluid sliders (any vs sphere) to benchmark how well the LLM transfers its earlier learnings to new semantics.

Results & Findings

Highest Ratings (8-9/10): Smoothness (any: 8/10, sphere: 9/10), Complexity (sphere: 9/10) and Elegance (sphere: 8/10). Common success factors: clear geometric progression, conceptual coherence, visual distinctiveness across scale, appropriate noise selection, and—post Week 6—sound mappings that reinforce the geometric beat. "Clean slider transitions," "no awkward jumps in form, scale is fluid."

Lowest Ratings (4-5/10): Joyfulness (both: 4/10), Complexity (any: 5/10). Common failure factors: repetitive form generation, conflation of size with descriptor intensity, conceptual ambiguity for emotional qualities, inconsistent scale progression. "Not convinced that noise is as great of a tool for allowing AI shape generation."

Sphere-Baseshape vs Any-Baseshape: Sphere constraint improved ratings (Complexity: 5/10 → 9/10, Smoothness: 8/10 → 9/10) and stabilized the newly added descriptors (Elegance/Tension/Fragility average 7/10 sphere vs 6/10 any). Advantages: cleaner interpolation, focused variation, maintains topological consistency, avoids awkward intermediate forms. Disadvantage: less dramatic transformations, reduced creative freedom, and muted emotional cues for Joyfulness/Dynamism.

Diagram 3: Results Analysis

Placeholder for results visualization

Key Patterns Observed

Technical Implementation

System architecture combines LLM geometric reasoning with real-time 3D procedural generation. Testing environment: P5.js WebGL with orbitControl(), lighting setup (ambientLight + directionalLight), createCanvas(800, 800, WEBGL).

Diagram 4: Technical Architecture

Placeholder for technical architecture diagram

Implications & Future Work

Main Findings: LLMs can translate abstract qualitative descriptors into concrete geometric properties with moderate success (ratings: 4-9/10). Objective/mathematical descriptors (smoothness, complexity) translate significantly better than subjective/emotional descriptors (joyfulness). Constraining base shape to sphere IMPROVES output quality. LLMs demonstrate sophisticated geometric reasoning, coordinating multiple parameters holistically. Key success factors: clean interpolation, conceptual coherence, distinct visual appearance, appropriate noise selection. Key failure factors: repetitive solutions (training bias), size/scale conflation, awkward intermediates, emotional descriptor ambiguity.

Week 6-8 Additions: Opus 4.1 tests proved the model can maintain consistent slider logic while co-designing sound, but also revealed render-performance bottlenecks when vertex counts exceeded 5k. Week 8 descriptor expansion showed that learned parameter habits (e.g., Perlin bias, symmetry heuristics) carry over to new terms, yet emotional descriptors still underperform unless sphere constraints rein them in.

On LLM Creative Ability: Strengths—holistic multi-parameter coordination, mathematical precision, metacognitive awareness, inverse relationship recognition. Weaknesses—limited creative diversity (converges to expected forms), difficulty with emotional/subjective qualities, inappropriate size coupling, over-reliance on blob-like organic forms. Verdict: LLMs demonstrate genuine creative reasoning in geometric domains, particularly for objective descriptors, but show training biases and struggle with emotional semantics. Constraint-based prompting paradoxically enhances creative output quality.

Future Research Directions: Expand descriptor set (20+ qualities). Test multi-descriptor combinations (smooth + complex). Explore non-spherical constrained bases (toroid, cube). Human perceptual validation studies. Cross-LLM comparison (GPT-4, Gemini, Claude variants). Longitudinal testing for consistency. Expert artist evaluation. Alternative creative mediums beyond noise (particle systems, L-systems).

Research Context: This research represents a rigorous exploration of LLM creative ability in a constrained geometric domain. The finding that constraint improves creativity (sphere-baseshape) challenges assumptions about open-ended prompting. The struggle with emotional descriptors versus mathematical ones reveals current limitations in translating subjective experience to visual form. Research establishes that LLMs can engage in genuine geometric creativity but within bounds set by training data patterns.