LLM Creative Ability Testing

Research on AI Decision-Making and Geometric Shape Generation from Qualitative Descriptors. This study investigates how Large Language Models (specifically Claude Sonnet 4.5) translate abstract qualitative descriptors into concrete geometric properties using P5.js interactive visualizations. The research employs a two-step methodology: (1) geometric parameter specification across a 1-10 scale, and (2) implementation via noise-driven procedural generation with interactive sliders.

Testing now spans Weeks 5-8 with three complementary regimes: Week 5 static 3-point prompts (separate code for levels 1/5/10), Week 6 Opus 4.1 audiovisual sliders (integrated sound synthesis linked to geometric parameters), and Week 8 fluid sliders (continuous interpolation in both any-base and sphere-only constraints). Each addition probes a different aspect of LLM reasoning—fresh-start creativity, temporal/audio mapping, and continuous parameter understanding.

Key Research Question: Can LLMs effectively translate abstract qualitative descriptors into coherent 3D geometric forms? What does the quality of these translations reveal about LLM creative reasoning capabilities and training biases?

Research Overview

Total descriptors analyzed: 8 unique qualities exercised across 3 testing modes (static, Opus A/V, fluid slider) and 2 geometry constraint regimes (any base vs sphere). Every trial uses P5.js WebGL with interactive sliders (1-10 scale, 0.1 increments). The fluid slider series evaluates both varying-noise-any-baseshape—complete freedom where the LLM selects the family per descriptor—and varying-noise-sphere-baseshape, which constrains topology to test how well abstract terms can be expressed through surface modulation alone. New Week 6 Opus 4.1 tests add rating criteria for sound-to-geometry alignment (oscillator counts, filter resonance, stereo spread). Results continue to be rated on clean interpolation, conceptual coherence, visual distinctiveness, and—where applicable—sonic mapping quality, ranging from 4/10 to 9/10 depending on descriptor and constraint.

Diagram 1: Research Framework

Placeholder for research framework diagram

Methodology

Two-Step Prompting Framework

Separates conceptual reasoning from implementation. Forces explicit parameter specification and enables metacognitive commentary:

Step 1 - Geometric Description: Define THREE keypoints on 1-10 scale (levels 1, 5, 10). Specify exact parameters: vertex count (computational resolution), distribution pattern (spatial organization), symmetry properties, surface treatment and deformation type, deformation intensity (0.0-1.0), proportional ratios (width:height:depth). Include brief metacognition: Why does form represent descriptor? Which parameter changes most dramatically? How does vertex count relate to expression?
Step 2 - P5.js Implementation: Select appropriate noise type(s) from 8 options: (1) Perlin Noise—smooth, natural; (2) Fractal Noise—multi-scale detail; (3) Ridged Noise—sharp linear features; (4) Turbulence—chaotic swirling; (5) Voronoi— cellular regions; (6) Curl Noise—flowing paths; (7) Domain Warping—reality distortion; (8) Stepped Noise—discrete levels. Create interactive slider with smooth interpolation between keypoints. Include metacognitive commentary on design choices.

Integrated Testing Tracks (Weeks 5-8)

Static 3-Point Scale (Week 5): Nine independent prompt runs (Smoothness, Complexity, Joyfulness × {1,5,10}) reveal session-to-session variance, distinct code structures per level, and metacognitive commentary explaining parameter jumps.
Opus 4.1 Dynamic Slider + Sound (Week 6): Continuous slider for the same three descriptors links geometry to p5.sound oscillators. Vertex count governs oscillator density (1→8), deformation intensity maps to filter/detune, and symmetry loss widens stereo spread—providing a richer test of multimodal reasoning.
Fluid Slider Expansion (Week 8): Single interpolated codebases now cover Smoothness, Complexity, Joyfulness plus five new descriptors (Elegance, Tension, Organic, Fragility, Dynamism) in both any-base and sphere-constrained modes, stressing parameter continuity and constraint reasoning simultaneously.

Diagram 2: Two-Step Prompting Process

Placeholder for methodology diagram

Descriptors Tested

Eight qualitative descriptors currently live in the repo. Smoothness, Complexity, and Joyfulness have full coverage across static 3-point, Opus A/V, and fluid slider modes. Elegance, Tension, Organic, Fragility, and Dynamism were added during the Week 8 expansion and presently exist as fluid sliders (any vs sphere) to benchmark how well the LLM transfers its earlier learnings to new semantics.

Smoothness (9/10 sphere, 8/10 any): Surface continuity, absence of discontinuities. Mapped to C⁰, C¹, C² continuity classes. Deformation intensity inversely correlated (0.85 → 0.05). "Insanely smooth output. Best demonstration of understanding yet."
Complexity (9/10 sphere, 5/10 any, Opus A/V slider): Information density, degrees of freedom, multi-scale detail. Vertex count exponential increase (8 → 12,288 any; 42 → 163,842 sphere). Opus tests show oscillator count following the same exponential curve. "Limiting open-endedness positively influenced generation. Almost gets toooo complex."
Joyfulness (4/10 both): Upward energy, expansion, celebratory movement. Struggled with emotional-to-geometric translation across every mode. "Not convincing. Size and scale increase simultaneously which feels wrong."
Elegance: Refined proportions and graceful form transitions; sphere-only mode highlights ratio discipline while any-base mode experiments with tapering columns.
Tension: Forces in opposition creating visual stress; leverages anisotropic noise and opposing scale factors to simulate pull directions.
Organic: Natural, biomorphic qualities resembling living forms; emphasizes Perlin + curl noise layering.
Fragility: Delicate structures suggesting vulnerability; sphere mode keeps thin shells intact while any-base trials explore strut-like lattices.
Dynamism: Sense of movement and kinetic energy; relies on time-evolving noise offsets and asymmetric scaling.

Results & Findings

Highest Ratings (8-9/10): Smoothness (any: 8/10, sphere: 9/10), Complexity (sphere: 9/10) and Elegance (sphere: 8/10). Common success factors: clear geometric progression, conceptual coherence, visual distinctiveness across scale, appropriate noise selection, and—post Week 6—sound mappings that reinforce the geometric beat. "Clean slider transitions," "no awkward jumps in form, scale is fluid."

Lowest Ratings (4-5/10): Joyfulness (both: 4/10), Complexity (any: 5/10). Common failure factors: repetitive form generation, conflation of size with descriptor intensity, conceptual ambiguity for emotional qualities, inconsistent scale progression. "Not convinced that noise is as great of a tool for allowing AI shape generation."

Sphere-Baseshape vs Any-Baseshape: Sphere constraint improved ratings (Complexity: 5/10 → 9/10, Smoothness: 8/10 → 9/10) and stabilized the newly added descriptors (Elegance/Tension/Fragility average 7/10 sphere vs 6/10 any). Advantages: cleaner interpolation, focused variation, maintains topological consistency, avoids awkward intermediate forms. Disadvantage: less dramatic transformations, reduced creative freedom, and muted emotional cues for Joyfulness/Dynamism.

Diagram 3: Results Analysis

Placeholder for results visualization

Key Patterns Observed

Vertex Count Progression: All descriptors show exponential increase for greater expressiveness. Vertex count = geometric "vocabulary." Higher resolution enables finer detail articulation.
Symmetry Evolution: Complexity breaks symmetry (perfect → partial → none). Smoothness preserves/enhances symmetry (none → partial → perfect). Joyfulness selectively breaks symmetry for "spontaneity."
Deformation Relationships: Complexity increases monotonically (0.0 → 0.95). Smoothness decreases monotonically (0.85 → 0.05)—smoothness is ABSENCE of irregularity. Joyfulness increases monotonically (0.1 → 0.95)—more energetic expression outward.
Noise Biases: Perlin used in ALL implementations as primary or secondary. LLM defaults to Perlin when in doubt. Blob-like forms at high values suggest training bias toward common procedural generation patterns.

Technical Implementation

System architecture combines LLM geometric reasoning with real-time 3D procedural generation. Testing environment: P5.js WebGL with orbitControl(), lighting setup (ambientLight + directionalLight), createCanvas(800, 800, WEBGL).

P5.js (WEBGL mode): JavaScript creative coding library. Real-time 3D rendering with hardware acceleration. Parametric mesh generation using beginShape()/endShape() with TRIANGLE_STRIP. Custom noise implementations. Vertex count optimization (capped at 5,000-10,000 for performance). Target: 30fps minimum.
Claude Sonnet 4.5: Two-step geometric reasoning: (1) conceptual mapping of descriptors to parameters (vertex count, deformation, symmetry), (2) procedural P5.js code generation with noise selection and interpolation logic. Week 5 static prompts and Week 8 interpolation prompts expose differences in its code reuse.
Noise Functions: Eight types available for surface modulation. Usage patterns: Perlin (most common, "natural" default), Fractal/Multi-octave (complexity, hierarchical detail), Ridged (low smoothness, angular features), Domain Warping (maximum complexity), Stepped/Quantized (geometric quantization), Curl (joyfulness, spiraling motion), NO NOISE (maximum smoothness).
Interactive Controls: createSlider(1, 10, 5, 0.1). Real-time parameter interpolation between three keypoints. Smooth transitions using lerp() and pow() curves for perceptual spacing. Static 3-point mode snaps between three discrete renderings for contrast studies.
P5.sound + Opus 4.1 Mapping: Week 6 testing introduced oscillator banks, filter modulation, and stereo imaging tied directly to geometry data. JSON files in src/data/*-opus.json document how vertex complexity, deformation intensity, and symmetry loss control sonic density.

Diagram 4: Technical Architecture

Placeholder for technical architecture diagram

Implications & Future Work

Main Findings: LLMs can translate abstract qualitative descriptors into concrete geometric properties with moderate success (ratings: 4-9/10). Objective/mathematical descriptors (smoothness, complexity) translate significantly better than subjective/emotional descriptors (joyfulness). Constraining base shape to sphere IMPROVES output quality. LLMs demonstrate sophisticated geometric reasoning, coordinating multiple parameters holistically. Key success factors: clean interpolation, conceptual coherence, distinct visual appearance, appropriate noise selection. Key failure factors: repetitive solutions (training bias), size/scale conflation, awkward intermediates, emotional descriptor ambiguity.

Week 6-8 Additions: Opus 4.1 tests proved the model can maintain consistent slider logic while co-designing sound, but also revealed render-performance bottlenecks when vertex counts exceeded 5k. Week 8 descriptor expansion showed that learned parameter habits (e.g., Perlin bias, symmetry heuristics) carry over to new terms, yet emotional descriptors still underperform unless sphere constraints rein them in.

On LLM Creative Ability: Strengths—holistic multi-parameter coordination, mathematical precision, metacognitive awareness, inverse relationship recognition. Weaknesses—limited creative diversity (converges to expected forms), difficulty with emotional/subjective qualities, inappropriate size coupling, over-reliance on blob-like organic forms. Verdict: LLMs demonstrate genuine creative reasoning in geometric domains, particularly for objective descriptors, but show training biases and struggle with emotional semantics. Constraint-based prompting paradoxically enhances creative output quality.

Future Research Directions: Expand descriptor set (20+ qualities). Test multi-descriptor combinations (smooth + complex). Explore non-spherical constrained bases (toroid, cube). Human perceptual validation studies. Cross-LLM comparison (GPT-4, Gemini, Claude variants). Longitudinal testing for consistency. Expert artist evaluation. Alternative creative mediums beyond noise (particle systems, L-systems).

Research Context: This research represents a rigorous exploration of LLM creative ability in a constrained geometric domain. The finding that constraint improves creativity (sphere-baseshape) challenges assumptions about open-ended prompting. The struggle with emotional descriptors versus mathematical ones reveals current limitations in translating subjective experience to visual form. Research establishes that LLMs can engage in genuine geometric creativity but within bounds set by training data patterns.