Here is what kept happening:
- Claude Code would do genuinely impressive things — complex logic
- clean abstractions
- no color
- No sense that a human would ever actually use this thing
Meanwhile, Google Stitch was sitting right there, generating beautiful interfaces from a single text prompt, and the agent had absolutely no idea it existed.
The gap was not Stitch's fault. The gap was workflow intelligence. Agents do not know how to ideate a design direction, manage multi-screen consistency, handle Stitch's inconsistent ID formats, or turn a generated screen into production-ready framework code. So I built the layer that teaches them all of that.
Why agents fail at design (it's not what you think)
The obvious assumption is that AI agents fail at UI because they cannot "think visually." That is not really what is happening. The deeper problem is that design work has a workflow — research, direction, generation, iteration, consistency checks — and agents have no structure for any of it when pointing them at a design API.
Then there is the API friction. Stitch's endpoints have inconsistent ID formats: some want a raw numeric string like 3780309359108792857, others want it prefixed as projects/3780309359108792857. An agent hitting these endpoints cold will fail in ways that are confusing to debug and nearly impossible to self-correct. That single formatting inconsistency accounts for something like 80% of agent failures with Stitch.
The fix is an API wrapper that normalizes everything before the agent ever touches a URL.
The architecture: 35 skills across 5 layers
The library is organized into 35 modular skills, each living in its own directory with a consistent structure that agents can actually learn from. Every skill has a SKILL.md for workflow documentation, an examples/ folder with real pattern templates, a references/ folder for design contracts, and scripts/ for helpers. The examples folder is the detail that matters most — agents copy real patterns instead of hallucinating boilerplate, which is the difference between something that works and something that confidently generates nonsense.
The five layers handle distinct concerns:
- API normalization — wraps Stitch's raw endpoints, handles ID translation, prevents the format errors that kill most agent sessions
- Design ideation — research, trend analysis, direction proposals before a single screen gets generated
- Orchestration — detects request specificity and routes accordingly (hex colors and layout details mean skip ideation; vague request means trigger research first)
- Design system management — the
stitch-design-systemskill maintains visual consistency across multiple screens via a DESIGN.md contract, so you get a coherent product instead of a gallery of one-offs - Framework code generation — converts a single Stitch design into Next.js, React, Svelte, React Native, SwiftUI, or plain HTML, with proper Server/Client splits, TypeScript, dark mode, ARIA accessibility, and design tokens baked in
The orchestrator: the skill that ties it together
The orchestrator skill is where the workflow intelligence lives, and it is the piece I am most pleased with. It reads the incoming request and makes a judgment call: is this specific enough to go straight to generation, or does it need a research and ideation pass first? A request with a hex palette and layout constraints gets routed directly to Stitch. A request like "make it look good" triggers design research, trend analysis, and a direction proposal before anything gets generated.
It also handles batch generation with auto-continuation — so a multi-screen product flow does not require the developer to manually kick off each screen and track state. The orchestrator manages the sequence, applies the design system between screens, and continues until the batch is complete. That sounds like a small thing until you have manually shepherded a ten-screen flow through an agent session and realized you have been doing project management for a robot.
One install command, any agent
The CLI installer (bin/stitch-kit.mjs) detects which agent environment you are running — Claude Code, Codex CLI, Cursor, VS Code — and installs the agent definition, skill library, and MCP config automatically. No manual path configuration, no reading documentation about where to put JSON files. It validates the setup on completion and tells you if something is wrong. Adding support for a new client is a registry pattern, which means extending it later is a ten-minute job rather than a surgery.
This matters because the install experience is often where developer tools quietly die. A tool that requires three configuration files and a specific folder structure that differs by OS gets abandoned before anyone sees whether it works. I was not interested in building something clever that nobody could actually set up.
Who this is actually for
This is a tool for teams building or extending AI coding agents who need those agents to produce real UI — not HTML wireframes, not unstyled components, but production-ready screens that match a design direction and stay consistent across a product. It is also for developers evaluating design-to-code automation who want to understand what that pipeline actually looks like when you take it seriously.
The broader pattern here is one I keep coming back to: AI agents are not limited by intelligence, they are limited by workflow structure. Give an agent a clear process, real examples to learn from, and tools that handle the API friction — and the quality gap between "AI-generated" and "human-reviewed" closes faster than most people expect.
If you are building agent workflows that need to touch design, or you are evaluating where AI-generated UI fits into your stack, I am happy to talk through what this looks like in practice. No pitch, just a conversation about the actual problem.
