FusionClaw: an honest look at AI-assisted CAD

FusionClaw. Model on Printables under CC BY — credit Sunnyday Technologies.

The companion piece on LinkedIn walked through the six iterations one at a time. This piece doesn’t. The blow-by-blow is the dull part. The interesting part is what the session quietly demonstrated about the architecture of AI-assisted CAD — what the loop actually looks like, what it’s good at, what it isn’t, and what that means for the next test.

The setup, in one paragraph

Claude runs in a chat window. The Autodesk Fusion MCP server installs as an extension inside Claude Desktop and exposes the Fusion API as tools. Every prompt becomes a small Python script that Claude writes, executes inside a live Fusion document, and reads back from — geometry data, screenshots, occasional errors. The mouse is not in the loop. The Fusion UI is not, strictly speaking, what the AI is interacting with. The AI is writing code against a CAD kernel and reading the kernel’s response.

This matters because it places the approach in a different architectural class than “LLM emits CadQuery code” or “LLM writes KCL against a purpose-built kernel.” It is closer to chat → live application state: the kernel is doing the geometry, Fusion is doing the persistence and rendering, and the AI is doing the structuring and the self-critique loop. None of those layers are new. The wiring between them is.

The session in six frames

Six prompts. Six rebuilds. The story is in the frames more than the prose.

1. Vague prompt in, geometrically clean output. Symmetric, cylindrical, equal-fingered — technically a claw, anatomically a tuning fork.

2. Reference photograph supplied. The AI decomposed its own errors honestly and rebuilt: asymmetric palm, dominant upper jaw, the basic anatomy of an actual chelae.

3. Cross-section corrected to the bite plane after explicit mechanical context (the squash needs to be on Z, not Y, because that’s the load axis). Three discrete bodies, ready for combine.

4. Boolean union into a single clean solid. Finger mass increased and a red enamel appearance located unprompted in the Fusion material library.

5. AI-proposed refinements before they were asked for: knuckle bump, sharper hook, base taper. First frame that reads as a chelae without explanation.

6. Inner concavity cut into both jaws. Small boolean remnants cleaned by hand in about thirty seconds — the only step the AI couldn’t complete on its own.

What worked

The geometric iteration loop is genuinely fast. A correction that would have taken a human five to ten minutes of menu-hunting and feature-tree archaeology took one prompt and a single rebuild. The end-to-end cycle — describe a problem, get a script, watch it execute, see the result — is short enough that the friction normally associated with mechanical CAD largely disappears.

Self-critique against a reference image is unexpectedly honest. The first attempt produced a tuning fork. We uploaded a photograph of an actual lobster claw and asked, plainly, whether the geometry matched. The response decomposed the differences cleanly — symmetric vs. asymmetric, equal-fingered vs. dominant upper jaw, missing knuckle, no hook at the tip — and rebuilt from scratch against that critique. The next attempt was immediately recognisable as a chelae. This is not magic; it is what visual reasoning models are good at when given the right structural prompt.

The model library lookup was unprompted and correct. Asked for a glossy red, the AI located a red enamel appearance in Fusion’s native libraries and applied it without being told which one to use. A small thing. Worth flagging because it is exactly the kind of low-value but cycle-eating task that engineers stop noticing they spend time on.

Anatomy decomposition into discrete bodies survived a boolean union. The palm, the upper jaw, and the movable lower jaw were authored as separate bodies, then combined. The combine produced a single clean solid — the kind of thing that, when scripted naively, produces shrapnel.

What didn’t

Spatial reasoning needed explicit mechanical context. One iteration produced a claw that looked approximately right but was squashed in the wrong direction. In a real lobster chelae, the jaw cross-section is deepest along the bite axis — the mass is oriented to resist bending loads in the grip plane. That isn’t aesthetic; it’s structural. The AI couldn’t infer this from the geometry. Once stated explicitly — “the squash needs to be on Z, not Y, because that’s the bite plane” — the correction was instant. The lesson generalises: the AI is fast at execution and good at honest self-critique, but it does not independently reason about mechanical function from geometry alone. Function arrives in the prompt or it doesn’t arrive.

UI semantics are opaque. When a face was selected in Fusion before a screenshot, the selection appeared as a blue overlay in the captured image. The AI interpreted that as a separate geometric feature rather than a standard UI selection state. This is a known shape of error in vision-driven agentic systems — the model sees pixels, not affordances — and it shows up the moment a screenshot includes anything other than the geometry itself. In production, this means: keep the chrome out of the frame, or expect to spend prompts disambiguating it.

Boolean cleanup left geometric remnants. The inner-concavity cuts produced small slivers and seams the AI’s own cleanup pass couldn’t fully resolve. We cleaned them by hand in about thirty seconds. Worth noting because this is exactly the class of artifact a downstream validation tool should catch automatically — geometry that looks right in a render but contains topology issues that bite at export, slicing, or mesh.

What this implies

Three things, with calibrated confidence.

First, the chat-to-live-application pattern is genuinely productive for generation. Six prompts is not a benchmark, it is a single data point — but it is a useful data point, because the failure modes it surfaced are the right failure modes to surface early. Anything we eventually trust this loop to do at scale will need a scaffold that handles them.

Second, the failure modes are concentrated at the boundary. The kernel did its job. The AI did its job. The breakdowns happened where the AI had to interpret something that wasn’t in its native modality — a UI selection state, a mechanical-function constraint, a sub-millimetre cleanup decision. That boundary is where verification tooling earns its keep.

Third, generation has visibly out-paced verification. We covered this in Will AI-generated CAD outrun V&V?. The FusionClaw session is a small in-vivo confirmation: the AI produced a clean exportable solid in minutes, and the only thing checking whether the geometry was correct was a human looking at a render. For a 3D-printable trinket, that is fine. For a load-bearing assembly, it is not.

What’s next: pointing this at a real assembly

The next test is the one we actually care about.

FusionClaw is a single body. A printer is not. M3-CRETE — our open-source pallet-scale concrete printer — is hundreds of parts, dozens of fastener stacks, multiple kinematic axes, and a tolerance budget that has to close before a single chip leaves the CNC. Geometry generation is the easy half of that problem. The hard half is everything that has to be true between the parts: interferences resolved, adjacencies right, motors sized for the loads they actually see, fasteners stacking to the dimensions the drawings claim, service access still possible after assembly.

That is CADCLAW’s territory. CADCLAW is the open-source validation framework we developed against M3-CRETE: it runs interference, adjacency, dimensional, kinematic, tolerance, and disassembly gates over a STEP assembly the way pytest runs assertions over a Python module — on every commit, in CI, with structured findings and a pass/fail exit code. It already ships an MCP server, which means an AI coding assistant can call its gates as tools.

The next benchmark is the obvious one:

Claude-Fusion: the FusionClaw approach, scaled to an assembly. Same chat-to-live-application loop, but pointed at the M3-CRETE gantry instead of a single body. We measure iteration count, output quality, and how long it takes to produce a STEP that survives an honest validation pass.
Claude-CADCLAW: the same target, but the AI authors against CADCLAW’s structured Python harness from the start — with the validation gates running every iteration as part of the loop, not bolted on at the end.

The two approaches are not competing for the same job. They are competing on a sharper question: where does verification belong in the loop? Bolted on at the end, when the geometry already exists and you grade it? Or threaded through the loop from the first iteration, so the AI never produces geometry the harness hasn’t already accepted? The FusionClaw session strongly suggests the former is the failure mode and the latter is where the discipline goes. We will see whether the data agrees.

FusionClaw is on Printables under CC BY — credit Sunnyday Technologies. The companion piece, with the iteration-by-iteration narrative, is up on LinkedIn. The CADCLAW leg of the benchmark follows.

— Sunnyday Technologies / CADCLAW

Tagged ai-cad, Autodesk Fusion, Cadclaw, Mcp, verification-and-validation