20 Million ChatGPT Conversations

Photo by Steve Johnson on Unsplash
 

A federal court just ordered OpenAI to produce 20 million anonymized ChatGPT conversation logs to news organizations suing for copyright infringement. District Judge Sidney H. Stein affirmed a magistrate judge’s order compelling OpenAI to produce the entire sample, not just cherry-picked conversations implicating plaintiffs’ works. This ruling raises immediate questions about data governance, user privacy, and the legal exposure embedded in every AI interaction.

The discovery dispute arises from a consolidated action combining 16 copyright lawsuits filed by The New York Times, the Chicago Tribune, and numerous authors who allege OpenAI used their copyrighted works to train ChatGPT without permission. OpenAI initially proposed 20 million logs (0.5% of its preserved data) and argued that was “surely more than enough.” The plaintiffs agreed. Then OpenAI changed course, proposing to run keyword searches and produce only conversations implicating plaintiffs’ specific works. The court rejected that approach.

Judge Stein found that even logs without reproductions of plaintiffs’ works are discoverable because they bear on OpenAI’s fair use defense. Fair use analysis examines how the challenged use affects the market for the original works. Logs showing what ChatGPT produces across a broad range of queries could reveal patterns relevant to whether its outputs compete with or substitute for copyrighted content.

The privacy argument failed because ChatGPT users “voluntarily submitted their communications” to OpenAI, unlike wiretap subjects in a securities case the company cited. Courts will weigh privacy interests against relevance and expect de-identification and protective orders rather than wholesale withholding of data.

Based on this court order, AI conversation logs are discoverable electronically stored information. Every prompt you’ve ever typed, every response you’ve received, is now part of a potential paper trail. If your organization doesn’t have an AI data governance policy, this ruling is your wake-up call.


More By This Author:

Looking Forward To CES 2026
The AI Race No One Is Talking About: AGI, China, And National Security
The AI Power Play

Disclosure: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it.

How did you like this article? Let us know so we can better customize your reading experience.

Comments

Leave a comment to automatically be entered into our contest to win a free Echo Show.