Gusto Claude Code Case Study: Shipping Without Figma or Jira

Jackson Yew June 30, 2026 10 min read

Key takeaway

The lesson from the Gusto Claude Code story is not that every team should delete Figma, Jira, and docs. The lesson is that a small, senior, tightly aligned team can move faster when it removes handoff theater and keeps product judgment close to the code. I would test the operating model before copying the tool stack.

You should read the Gusto Claude Code story as an operating model, not a tool recap. Lenny's Newsletter reported that Gusto shipped a full AI product line with a 5-person team in 10 weeks using Claude Code, a perma-Zoom, and zero documentation.

The common mistake is to copy the weird part first.

No Figma. No Jira. No docs. That is the headline people remember. But the useful part is harder. A small team removed handoff work, then replaced it with live context, senior taste, fast review, and clear ownership.

Most teams only do the first half. They cut the meeting. They cut the ticket. They cut the doc. Then they wonder why the work gets messy.

I would test the Gusto Claude Code model as a sprint design, not as a new religion. The lesson is not that every team should delete process. The lesson is that weak process slows builders down, but missing ownership breaks teams faster.

If you are a founder, CEO, or builder, the real question is simple. Can your team keep product judgment close to the code without losing risk control?

That is also why this belongs inside AI implementation, not just software dev. As of June 2026, the market has moved past "can AI write code?" The better question is whether your team can change review, feedback, and decision loops around AI-assisted work.

I built too many early AI sprints with too much planning. I know because I used to do the same thing. We would map every role, tool, prompt, and edge case before the first working version existed. The plan felt safe. The work stayed soft.

My rule now is different. Ship one narrow lane first. Watch where it breaks. Then add the process that carries real risk.

For AI SEO and client work, the same rule applies. A strong shared brain beats scattered notes. That is why I treat How to Build a Client Brain for AI SEO Work as the pillar idea here. The tool is not the brain. The operating loop is.

What did Gusto actually build with Claude Code?

Gusto used Claude Code to help a small team ship a new AI product line in 10 weeks, according to Eddie Kim's interview in Lenny's Newsletter. The useful lesson is not "Claude Code made the product." The lesson is that Gusto put AI coding inside a tight product loop. A 5-person team stayed close to the work, kept feedback live, and reduced handoffs. That matters more than the tool name. As of June 2026, Anthropic describes Claude Code as an agentic coding tool that can work in the terminal, read a codebase, edit files, run commands, and support dev workflows. That makes it useful when the team already knows what good looks like. I would not read this as a magic coding story. I would read it as a senior team using AI to shrink the gap between intent and shipped product.

The Gusto cofounder angle matters because this was not a side experiment buried under layers of permission. Eddie Kim, Gusto's CTO and cofounder, was close enough to the work to collapse a lot of normal product translation. That changes the system. When executives write code, review product decisions, and sit near the build loop, the team does not wait for strategy to arrive secondhand.

Why did the team avoid Figma, Jira, and docs?

Teams often add process because they fear blur. Then they call the drag governance. That is the trap. Figma, Jira, and docs are not bad. They are useful when many people need shared state across time. But in a small exploratory sprint, those same tools can become handoff theater. A builder writes a ticket before the shape is clear. A designer polishes a screen before the flow is tested. A PM writes a doc that no one trusts after day three. Gusto's no-docs setup worked because the team used live context instead. The perma-Zoom acted like a shared room. Decisions moved fast because the people with taste and authority were present. I would not copy this into a large team cold. If you remove artifacts, you must replace them with named owners, clear scope, and faster review.

This is also product development without PMs in the narrowest, riskiest sense. It does not mean no product management. It means the product work was carried by the people in the room, through direct judgment, fast customer understanding, and live tradeoff calls. That can work for a five-person team building toward a tight launch. It breaks when no one is actually doing the product thinking.

Normal product handoff flow looks like this: idea, doc, design, ticket, build, review, QA, launch. A Gusto-style live build loop looks more like this: intent, live build, review, revise, test, ship, learn.

That map is simple on purpose. The first path protects scale. The second path protects speed. Most teams mix them badly. They keep the heavy handoffs, then add AI on top. That is how AI becomes one more tool in a slow system.

If your team is trying to improve AI search and content production, the same mistake shows up. People buy tools before they fix the system. I wrote about that in AI SEO Tools for Team Implementation, because tools only help when the work loop is already clear.

How did Claude Code change the team's build rhythm?

Claude Code changes rhythm when it sits inside the build loop, not beside it. Used lightly, it is autocomplete with a louder voice. Used well, it helps turn product intent into working code, then helps inspect, revise, and test the change. Anthropic's own Claude Code best practices point toward this kind of active workflow, where the tool reads context, edits files, and works through tasks with the developer. That can compress the gap between "we should try this" and "we can click it." But speed is not proof. Faster code still needs human taste, architecture judgment, QA, and hard product limits. I test AI coding tools by watching rework, not demo speed. If the first version ships fast but creates unclear debt, the team did not get faster. It only moved the cost downstream.

A real Claude Code workflow is not one prompt and a pull request. It is closer to a loop. State the intent, let the agent inspect the codebase, ask it to make a narrow change, run tests, review the diff, throw away weak attempts, and keep the parts that survive inspection. The AI agent stack matters here, but only after the workflow is clear. Claude Code, tests, evals, CI, logging, and review rules need to act like one build system, not a pile of clever tools.

This is where founders get the lesson wrong. They see a fast demo and ask, "Can my team code 3x faster?" I would ask a better question. "Can my team decide, review, and learn faster without hiding risk?"

That shift matters. AI coding tools expose the quality of your product judgment. They do not replace it. If the team cannot name the user, the edge case, the release bar, or the kill switch, Claude Code will only help them make the wrong thing faster.

For coding teams, this is why architecture still matters. I made the same point in Agentic Vibe Coders Need Architecture, Not More Prompts. AI can speed the hands. It does not give the team a spine.

What made the 5-person team structure work?

The 5-person team worked because small teams can keep context alive. That is the real edge. A tight group can hear the customer pain, discuss the tradeoff, change the code, review the result, and decide the next move without waiting for a weekly sync. The likely ingredients were senior ownership, narrow scope, live communication, fast feedback, and low handoff cost. Compare that with a Jira-heavy flow where work gets split before the team knows the problem. One person writes the story. Another shapes the design. Another builds one slice. Another tests it later. By then, the team may be solving yesterday's guess. Team size changes the math. The more people you add, the more written interfaces you need. That is not weakness. That is coordination cost. My rule is simple. Small teams can run on live trust. Bigger teams need explicit decisions.

The perma-Zoom detail matters because it replaced a pile of small status loops. It was not just a video call. It was a shared workbench. That only works when people are willing to think in public and make decisions without hiding behind polish.

The risk is burnout and noise. A live room can turn into a pressure cooker if no one owns scope. So I would set rules before copying it. Who can change the product direction? Who can block a release? What work must still be written down? What bugs cannot ship?

This is where the "trash-can method" is useful. Treat early AI output as disposable until it proves itself. Generate options, test them against the product bar, delete the weak ones quickly, and avoid falling in love with the first working draft just because it appeared fast.

This is also why AI implementation is a leadership problem. You can read more of that frame in How to Lead a Hybrid Human-AI Team Without Losing Control.

When should founders copy this approach?

Founders should copy this approach when the work is narrow, exploratory, and close to the customer. Good fits include new product discovery, internal tools, prototype-to-beta loops, and small AI implementation bets. Bad fits include regulated workflows, multi-team platform changes, unclear data access, or anything where no one owns the release risk. The decision matrix is plain. Use a no-docs sprint when the team is small, senior, and live. Use lightweight docs when one or two other teams need context. Use full governance when compliance, security, or many teams depend on the work. I would also set a time box. Ten days is enough to see if the loop works. Ten weeks without controls is too long for most teams. The tool matters, but the operating model decides whether the tool changes output.

The Gusto Claude Code story is useful because it shows what happens when product judgment stays close to code. That is rare. Many companies push judgment up and work down. The people with context cannot ship. The people who ship cannot decide.

A tier-one product launch raises the bar. If the work is going in front of real customers, leadership cannot treat the sprint like an internal hack week. The startup-style shipping only works when the team keeps the launch bar explicit: what must be true before release, what can be rough, what must be measured, and who has authority to stop the train.

If you are a CEO, do not start by asking your whole company to work this way. Pick one lane. Pick one squad. Pick one customer problem. Then judge the model by shipped work, not team excitement.

For broader AI rollout, AI Implementation for CEOs: A Practical Rollout Plan gives a slower path. That is the better fit when the risk surface is wide.

How should an AI implementation team test this safely?

Run a 10-day pilot. Use one small squad, one product lane, one live communication channel, and one clear definition of shipped work. Do not start with a giant roadmap. Start with a customer pain that can become a working slice. Track five numbers: cycle time, rework, escaped defects, customer feedback, and decision latency. If cycle time drops but defects rise, you did not win. If the team ships faster but no one can explain why decisions were made, you removed too much process. I would keep a thin decision log even in a no-docs sprint. One line per major call is enough. What changed? Who decided? Why now? My rule is to remove process only after I know which risk that process was carrying. That is the part most teams skip.

I would make the workflow eval-first wherever AI is touching product behavior. Before the team asks Claude Code to build, define the checks that will tell you whether the change works. That could be unit tests, golden examples, regression prompts, customer scenario checks, or a human review rubric. Without evals, AI-assisted software development turns into confidence theater.

I would also make one person own the AI boundary. What can Claude Code touch? What needs review? What data is off limits? What tests must pass before merge? These are not heavy rules. They are the guardrails that let speed stay useful.

The first pilot should produce a simple before-and-after readout. Show the old flow. Show the live build loop. Show where time dropped. Show where rework rose. Show what you will keep.

That is the real lesson from Gusto. Do not worship the missing tools. Study the replacement system.

If you want help turning this into a real AI implementation sprint for your team, start with the operating loop, not the software list. I help builders pick the right lane, set the review rules, and test whether AI changes shipped output. learn more

FAQ

What did Gusto do with Claude Code?

According to Eddie Kim's interview on Lenny's Newsletter, Gusto used Claude Code as part of a compressed product build where a 5-person team shipped a new AI product line in 10 weeks. The notable part is not only the coding tool. It is the operating model around it: tight team size, live communication, minimal documentation, and fewer handoffs. I would read this as an implementation case study, not a blanket argument that every team should remove product process.

Should teams stop using Figma, Jira, and documentation when using AI coding tools?

No. That is the easy but wrong takeaway. The Gusto story shows that a small, senior team can sometimes move faster without heavy artifacts when everyone shares context live and owns the outcome. Larger teams still need written decisions, design alignment, roadmap visibility, QA records, and compliance trails. My rule is simple: remove a process only when you know what risk it was managing and what will replace it.

Why did the 5-person team matter in the Gusto case study?

The small team size is central because it reduces coordination cost. Five people can share context quickly, clarify decisions live, and keep product judgment close to implementation. A 30-person team cannot usually operate the same way without creating confusion. The mistake many founders make is copying the visible tactic, such as fewer docs, while missing the hidden condition: the team must be senior enough and aligned enough to carry more judgment in the room.

How should a founder test a Claude Code product workflow?

I would test it with a narrow 10-day sprint before changing the whole product process. Pick one product lane, one senior technical owner, one product decision-maker, and one measurable shipping goal. Track cycle time, rework, bugs, customer feedback, and decision latency. The goal is not to prove Claude Code is impressive. The goal is to see whether the team can turn AI-assisted building into shipped customer value without creating hidden cleanup work.

What is the main risk of building with no docs?

The main risk is that speed becomes memory. When decisions live only in calls, the team can move quickly for a short burst, but future teammates may not understand why choices were made. This becomes painful when the product needs support, compliance review, onboarding, or platform integration. I would keep the build loop lightweight, but still capture the few decisions that will matter later: scope, architecture tradeoffs, customer assumptions, and known risks.

Is this a Claude Code story or an AI implementation story?

It is both, but the more useful reading is AI implementation. Claude Code may have compressed the coding loop, but the team also changed how decisions, communication, and execution happened. That is where most companies get stuck. They buy AI tools, then run them through the same slow approval and handoff structure. The Gusto case is useful because it points to the operating model a company may need if it wants AI tools to change actual shipping velocity.