The Claude-Native Law Firm: Is Legal AI Cooked?

Noah Waisberg • March 6, 2025 • 8 minute read

Zack Shapiro’s post about his two-person “Claude-Native Law Firm” has taken off (millions of views, lots of hype, lots of hot takes). The core message is:

Foundation models are so strong now that a small firm can do serious work with Claude, without needing much specialized legal software

I found it a good and provocative read.¹

That said, I do not think it is evidence that “foundation models win, vertical legal AI is dead.”

Here’s my take:

Can AI make lawyers faster and better? Definitely! This should be uncontroversial at this point (though my sense is many lawyers have a lot more room for AI adoption).
Are foundation models (Claude/OpenAI/Gemini) incredible, and very useful in lots of lawyer tasks? Yes!
Does that mean task-specific legal AI software is dead? No. Specialized tools can be meaningfully better, especially when volume and risk are high.

I am not a neutral observer. While I am a big gen AI user, I have been building legal AI software since 2011. It’s very possible I’m anchored to past lessons and underweighting how quickly foundation models are improving.²

Legal AI Tools Are (Mostly) More Than Just AI

One of the quirks of today’s legal AI era is that the AI in legal AI tools is often heavily foundation models. It wasn’t always so, and isn’t the case everywhere. For example, our contract review tech incorporates machine learning models we improved over a decade+ starting at Kira, and also foundation models to do things where we think they can do it better than AI we build ourselves. But many pieces of legal AI (including very popular ones) are heavily or exclusively powered by foundation models.

Shapiro seems to argue that this makes these tools nothing more than “wrappers,” even calling some of them “the Juicero of legal tech.” (Wow, legaltech commentary has gotten a lot spicier than it once was!) While some legal AI is closer to being a wrapper on foundation models, that wrapping can have a lot of value to firms, especially big ones — think audit trails, usage monitoring, data residency in specific jurisdictions, or firm-wide prompt libraries that codify institutional knowledge.

But the AI itself is often only part of the value. A well-designed UI honed for a specific use case (maybe with pre-built prompts, maybe with something deeper) can make a huge difference. And that difference gets amplified when the same task needs to be done at scale. Not just doing diligence on 10 or 20 contracts, but regularly running diligence on hundreds, or working with a team of 10 to complete diligence on tens of thousands of agreements. In a huge deal where missing a single restrictive covenant could cost millions. Or being an in-house counsel who has to negotiate the same license agreement or NDA over and over and over again. Claude skills you built over hours of tinkering are genuinely useful. But they’re not the same as UI and models shaped by thousands of hours of development and feedback from real users.

Products bring other advantages too. They can deeply automate a workflow, including building in moments where a human-in-the-loop should check AI results for accuracy. And while it’s great to have a tool built just for you, there are real advantages to a tool that is built for many users. Vendor-built workflow tools with a big user base get the benefit of lots of product feedback from different types of customers. Back at Kira, it seemed like Latham benefitted from Davis Polk and Freshfields and Integreon and Deloitte (and hundreds more customers) — some similar, some very different — all giving us feedback. They might face different problems, encounter edge situations, and suggest ideas that would benefit others down the road.

Basically, a raw foundation model (like Claude or GPT, both of which helped with elements of this piece; or maybe one wrapped by a legaltech vendor like Harvey or Legora, if these meet your needs better) is a very useful tool that lawyers should have access to. But, if they do a lot of certain tasks, they may get more from using a specific legal AI tool targeted for those tasks. Note that this may not come up as much in Shapiro’s smaller firm practice.

Pot vs. Rice Cooker

Picture of a rice cooker next to a pot — Here’s my rice cooker next to a pot.

A good analogy here is pot vs. rice cooker. Saying “foundation models mean legal AI is cooked” is like saying “rice cookers are totally useless when you have a pot.” While it’s true you don’t need a rice cooker if you have a pot, millions of rice cookers are sold every year. (GPT says annual rice cooker unit sales are on the order of ~140–200 million units per year! Claude is less certain, saying unit sales numbers are hard to pin down precisely.) And, as someone who grew up making rice in a pot until I was ~30, I can say that rice cookers are wonderful, I’m glad I have them, and I’m currently pretty tempted to get another specialty one just because it’s better at short grain rice!

A foundation model is like a pot. It’s flexible. You can make a lot of different meals, and can even make rice many different ways (e.g., saute vegetables first, then add rice, then add wine). If you’re a strong cook (or you’re willing to learn), you can do amazing things with it. But who hasn’t burnt rice cooking it in a pot?

A vertical, task-specific product is a rice cooker. It does fewer things, but it’s hard to beat at cooking rice. It’s ridiculously consistent, and generates better rice with less effort. There is almost no chance of messing rice up.

If you cook rice only occasionally, a pot’s great. But if you cook a lot of rice, rice cookers are terrific!

Legal work is the same. Some categories of work are basically “rice”: high-volume, repetitive (but not trivial), error-intolerant, full of workflow constraints, and expensive when done inconsistently. That’s where specialized tools don’t just add a “wrapper.” They can add a lot of value.

“But couldn’t you just do this with Claude + prompts?”

Sometimes! Especially if you’re a power user, you enjoy building workflows, and the task is occasional or low-risk enough that “pretty good” is fine. Shapiro’s post is basically a case study in what a power user can do. And I ♥️ that. (I also think more lawyers should do this kind of tinkering.)

But here’s the thing worth noticing: as you invest more and more into your Claude prompts and workflows, what you’re building starts to look a lot like software. You’re encoding business logic. You’re managing edge cases. You’re trying to make outputs consistent and reliable. At some point it stops being “just prompts” and becomes a product, except without the QA, the versioning, the user feedback loops, or the ability to survive someone leaving the firm.

And there are things a vendor focused on a specific legal task can do that are really hard to replicate on your own, no matter how good your prompts get. A vendor has likely put in thousands of hours of development, systematically tested accuracy across a huge range of real-world documents, learned from hundreds of clients’ edge cases, and built integrations into the tools you actually work in.³ You can get surprisingly far with Claude and some well-crafted prompts, but “surprisingly far” and “as good as a mature, purpose-built tool” are not the same thing. If the task matters enough to your practice, it might be time to just buy the rice cooker.

The synthesis (what I think the future actually looks like)

I don’t think this is “foundation models vs. legal software.” Both are going to matter, and most lawyers will end up using some of each.

But one thing the current AI moment makes very easy to underestimate is the 80/20 problem. Foundation models are incredible at getting you [80]% of the way there, fast. You can throw a contract at Claude and get a surprisingly useful summary in seconds. That’s genuinely amazing, and for some legal work, it’s enough.

But a lot of the time when clients are hiring Biglaw, they’re hiring precisely for the situations where 80% isn’t nearly good enough. They need 100%, or (more realistically) something close-ish to it. They need to know that every restrictive covenant was caught, that every change-of-control provision was flagged, that nothing slipped through. That last 20% is where the real difficulty (and the real value) lives, and it’s where purpose-built tools with systematic accuracy testing, structured workflows, and years of refinement can earn their keep.

Shapiro’s post is a great demonstration of what a motivated lawyer can do with foundation models today. More lawyers should experiment this way. But for recurring work where the stakes are highest and the margin for error is smallest, I think the rice cooker still has a strong place.

It is also 🤌 as a piece of lawyer marketing! So good, especially at this moment where many clients would like lawyers to deliver services more efficiently by harnessing AI. ↩︎
Discussions with my teammates Dr. Adam Roegiest and Susan Fox inspired this piece, and their comments improved it. Mistakes are mine though! GPT and Claude also helped me express and clean up the piece. ↩︎
At Kira when we sold it, our product development team was ~100 people. They tended to be really high quality. We had been working on the product for over a decade (with a lot less people in the early days). ↩︎

The Claude-Native Law Firm: Is Legal AI Cooked?

Legal AI Tools Are (Mostly) More Than Just AI

Pot vs. Rice Cooker

Here’s my rice cooker next to a pot.

“But couldn’t you just do this with Claude + prompts?”

The synthesis (what I think the future actually looks like)