OpenSCAD Bench Results

An experiment in LLM spatio-textual reasoning capabilities. The agents are given tool call allowing them to render screenshots of the models, allowing them to iteratively improve the design. Feel free to click on the individual images to interactively explore the resulting models and read the entire input prompts

See the GitHub for the source code.

Agent
Battery Case
battery_case
Chess Rook
chess_rook
Citronhaj Stand
citronhaj_stand
Planetary Gearbox
planetary_gearbox
3D Printer Torture Test
torture_test
Towel Hook
towel_hook
anthropic-claude-opus-4.5
#1
#2
#1
#1
#1
#1
#1
#2
anthropic-claude-opus-4.6
#1
#2
#1
#1
#1
#1
#1
#2
anthropic-claude-sonnet-4.5
#1
#2
#1
#1
#1
#1
#1
#2
google-gemini-3-flash-preview
#1
#2
#1
#1
#1
#1
#1
#2
google-gemini-3-pro-preview
#1
#2
#1
#1
#1
#1 ⚠
#1
#2
openai-gpt-5.1-codex-max
#1
#2
#1
#1
#1
#1
#1
#2
openai-gpt-5.1-codex-mini
#1
#2
#1
#1
#1
#1
#1
#2
openai-gpt-5.2
#1
#2
#1
#1
#1
#1
#1
#2
openai-gpt-5.2-codex
#1
#2
#1
#1
#1
#1
#1
#2