This Week’s Theme
This week I had a call with a friend who is a technology attorney and data governance advisor based in the US, about something that keeps coming up in almost every conversation I have with lawyers about AI. The concern about whether AI providers are training their models on your data.
It literally comes up in every call. It came up in the private sessions I was running with lawyers over the past weeks. A lawyer in Brazil mentioned she stopped using AI for important research last year because she was worried Claude would retain what she asked. A lawyer in Peru said his firm is holding off on entering client information entirely until they feel confident about where it goes. A technology committee chair at a US employment law firm told me she spent weeks reading privacy policies trying to figure out what ChatGPT actually does with her conversations.
Every one of them was asking the same question, and every one of them was focused on the same thing: is this tool learning from my data?
On enterprise and (some) team plans, the vendors are contractually committed to not training on your data under commercial data processing agreements. That protection is real, and it is not to be neglected.
What I keep seeing, though, is that once someone hears “we don’t train on your data”, the rest of the evaluation stops. And the questions that remain unanswered after that point are the ones that carry the most exposure.
What the Enterprise Agreement Actually Covers And Where It Stops
When you sign an enterprise or team agreement with Anthropic, OpenAI, Google, or any of the major providers, the data processing agreement includes a commitment that your inputs and outputs will not be used for model training. This is a meaningful contractual protection, and for many firms it has become the threshold question: once that box is checked, the tool feels safe to use.
The technology lawyer I mentioned made a good point during our call that I think captures the gap precisely. The training question, in her view, is already answered by the enterprise DPA. The question she keeps coming back to with her clients is whether the vendor is retaining your data, under what terms, and who else in the vendor’s chain can access it.
Every enterprise agreement I have reviewed includes a data retention clause, and every one of those clauses includes some version of the same carve-out: the provider may retain a copy of your data for law enforcement, audit, or compliance purposes. Even when the agreement says data will be deleted after 30 days, that deletion does not extend to the retained copy.
This means that if you draft an estate plan in Claude and include your client’s name, address, family details, financial information, all of that is sitting inside a retained copy that your DPA explicitly permits the vendor to keep. And depending on how the provider’s sub-processor agreements are structured, that retained data may be accessible to third parties you have never evaluated.
What’s interesting is that this is the same structure that has existed for decades in SaaS contracting. The view I share, is that AI should be treated the way any other enterprise software is treated: you need contractual indemnification, you need to understand the sub-processor chain, and you need to know where your data sits and who can reach it. The “no training” commitment is one piece of a much larger due diligence process, and by itself it leaves all of these questions open.
Why Lawyers Keep Asking About Training Instead
So for the past few weeks, I’ve been running private sessions with lawyers from multiple continents and jurisdictions, and the training question dominated the early part of almost every one.
Part of what drives it is that the experience of using AI feels like it is learning from you. You tell it something, and the next time you open a conversation, it seems to remember. It adapts to your tone and preferences. It references things you mentioned weeks earlier. That feels like training, and the distinction between contextual memory and actual model training is genuinely difficult to explain, because it runs counter to what the product experience suggests.
During one of our Monday sessions, Lotte, my co-founder, explained this well: “You get the feeling when you talk to an AI that you say something and it learns from that, and then it understands you better the more you talk to it. So it feels like the more you talk, the more you’re training your own model”. That instinct is completely understandable, and it is also the source of a significant amount of misplaced concern.
The AI model itself has no memory. It processes inputs, generates outputs, and retains nothing between sessions at the model level. What creates the feeling of continuity is a memory layer that engineers have built into the application, a running summary of your prior conversations, your preferences, your instructions. That layer sits between you and the model, and it is governed by the same data processing agreement as everything else. You can switch it off. The model itself has not learned anything from your interactions. What you are experiencing is the application keeping notes on your behalf, within the same DPA that governs everything else.
Once this distinction lands, the conversation tends to open up. People start asking what is actually happening to their data inside the system, who can access it, and what controls they have over it. And those questions lead somewhere much more useful.
The Due Diligence Checklist
Here’s what is worth paying closer attention to:
What the enterprise deal doesn’t cover.
Firms sign the enterprise agreement as is and treat it as the finish line for data protection due diligence. But in practice, the standard terms are a starting point. Most enterprise AI agreements include a standard retention window, typically 30 days, during which the provider stores your inputs and outputs. What many firms do not realize is that zero data retention is available as a negotiable term with several major providers at the enterprise tier. Under a ZDR commitment, the provider processes your data in transit and does not store it at rest at all, which meaningfully changes the risk profile for firms handling privileged client information or protected health information.
Sub-processor exposure.
Your agreement is with the AI vendor, but that vendor may use sub-processors for infrastructure, analytics, validation, or access management. If those sub-processors are not held to the same standards, your data protection breaks down at a point in the chain you cannot see. This is what happened when OpenAI’s third-party analytics vendor exposed user data, the exposure was not at OpenAI’s level, but at a sub-processor that was handling data under less rigorous terms.
No indemnification.
Most enterprise AI agreements do not include meaningful indemnification for data incidents involving your client information. If something goes wrong, the contractual remedies available to you may be limited to the subscription fee. For a law firm handling privileged client data, the gap between that contractual ceiling and the actual exposure is enormous.
The public version supply chain.
All of this assumes you are on an enterprise plan. On consumer and free plans, the protections are structurally different. Data may be used for training, conversations may be reviewed by human labelers as part of data annotation pipelines, sometimes by contractors in jurisdictions with very different data protection standards.
What This Means For How You Evaluate AI Vendors
For a law firm evaluating AI vendors, the training question has a straightforward answer on any enterprise agreement. The evaluation questions that carry the most weight are the ones that come after: what does the data retention clause permit, who are the sub-processors and what are their DPAs, what indemnification exists for a data incident involving client information, and where does the data physically reside.
These are the same questions you would ask of any SaaS vendor handling client data. With AI, the training concern has absorbed so much attention that these questions often go unasked entirely.
This is part of the architecture work I am doing right now with a large US firm that is preparing to deploy Claude across their teams. The training question was answered in the first few minutes of scoping, but the actual work is in the questions that come after: what the pilot group looks like, what data is in play from day one, how their existing infrastructure is configured, what the vendor review process requires, and what additional contractual terms need to be in place before client data enters the system.
Private AI Roundtable Sessions: Why We Started Them and How to Join
Over the past few weeks we have been running something new alongside the newsletter and the LinkedIn Live sessions. These are private, small-group sessions with lawyers, AI champions, innovation leads, paralegals, clerks and consultants working through the same set of questions you are reading about here, except in a setting where people can speak openly, ask the things they would not ask on a public livestream, and hear from others in different jurisdictions dealing with the same problems.
We had participants joining from the US, Netherlands, Peru, UK, Canada, Spain, Indonesia, Germany, Italy, India, …just to name a few. The conversations have covered everything from how to approach data cleaning before AI adoption, to the difference between agents and workflows, to the privacy concerns we discussed in this edition.
The valuable part of this sessions, based on the feedback we’ve gotten is that the group is small enough for everyone to speak, the perspectives span different jurisdictions and practice contexts, nothing is recorded, and there is no product being sold during the session. People bring their actual questions and situations, and the conversation goes wherever it needs to go.
If you are working inside a firm leading AI adoption, advising firms on AI as a consultant, or building a practice around AI from the ground up, these sessions are designed for you.
We are continuing to run them on a regular basis and keeping the groups intentionally small.
If you would like to join one, reach out to me and I will get you into the next round.
Legal AI in Action
🎬 10 Lessons From Advising Law Firms on AI
Why leading with discovery, scoping after you understand the workflow, and knowing the difference between contractual and technical coverage is what keeps AI projects in law together through delivery.
🎙 From AI policy to AI practice
Paul Crafer is a VP for AI Governance at Assessed Intelligence and a ForHumanity Fellow, where he has spent the last year drafting audit standards for AI systems across more than 50 certification schemes worldwide. In this conversation we talk about why governance keeps stalling at policy and never reaches practice, what an audit actually looks like for a firm using AI on client work, why vibe-coded tools inside regulated organizations are a governance problem that very few people are discussing, and what inter-agent vulnerabilities mean for any firm running more than one AI system at the same time.
🎙 Next Tuesday at 2pm CET!
My next guest on Rok’s Legal AI Conversations is Anastasia Boyko, a Yale-trained lawyer and legal innovation consultant whose career spans big law, legal tech, ALSPs, and legal education, including building the leadership and innovation program at Yale Law School.
We get into why most law firms cannot articulate their own purpose and how that makes every AI decision harder, what happens when firms treat AI adoption as a tool problem instead of an operating model problem, why the billable hour is breaking under the pressure of AI and what comes after it, and what the profession is losing by skipping the cognitive steps that make lawyers good at their jobs in the first place.
Each edition of Legal AI Brief brings practical lessons from firms using AI safely.