OpenSCAD Bench Results

An experiment in LLM spatio-textual reasoning capabilities. The agents are given tool call allowing them to render screenshots of the models, allowing them to iteratively improve the design. Feel free to click on the individual images to interactively explore the resulting models and read the entire input prompts

See the GitHub for the source code.

Agent	Battery Case battery_case	Chess Rook chess_rook	Citronhaj Stand citronhaj_stand	Planetary Gearbox planetary_gearbox	3D Printer Torture Test torture_test	Towel Hook towel_hook
anthropic-claude-opus-4.5	#1 #2	#1	#1	#1	#1	#1 #2
anthropic-claude-opus-4.6	#1 #2	#1	#1	#1	#1	#1 #2
anthropic-claude-sonnet-4.5	#1 #2	#1	#1	#1	#1	#1 #2
google-gemini-3-flash-preview	#1 #2	#1	#1	#1	#1	#1 #2
google-gemini-3-pro-preview	#1 #2	#1	#1	#1	#1 ⚠	#1 #2
openai-gpt-5.1-codex-max	#1 #2	#1	#1	#1	#1	#1 #2
openai-gpt-5.1-codex-mini	#1 #2	#1	#1	#1	#1	#1 #2
openai-gpt-5.2	#1 #2	#1	#1	#1	#1	#1 #2
openai-gpt-5.2-codex	#1 #2	#1	#1	#1	#1	#1 #2

Interactive 3D View

Result

Task Prompt

Solution Code