Is it a crime to learn something by reading a copyrighted book? What if I later summarized that book to a friend or wrote a description of it online? Of course, these things are perfectly legal when a person he does them. But does this change when an AI system does the reading, learning and summarizing?
Sarah Silverman, comedian and author of the book The bather, he seems to think so. She and several other authors are suing OpenAI, the tech company behind the popular AI chatbot ChatGPT, through which users send text messages and receive AI-generated responses.
Last week, a federal judge broadly rejected their requests.
The ruling is certainly good news for OpenAI and ChatGPT users. It’s also good news for the future of AI technology in general. AI tools could be completely hamstrung by the expansive view of copyright law that Silverman and the other authors here envision.
Want to learn more about sex, technology, bodily autonomy, law and online culture? Sign up to Sex and technology from Reason and Elizabeth Nolan Brown.
The authors’ complaints and OpenAI’s response
Teaching artificial intelligence to communicate and “think” like a human does Very of text. To this end, OpenAI used a massive dataset of books to train the language models that power its artificial intelligence. (“It is the volume of text used, more than any particular selection of text, that really matters,” OpenAI explained in its motion to dismiss.)
Silverman and others say this violates federal copyright law.
Authors Paul Tremblay and Mona Awad filed a class-action complaint to this effect against OpenAI last June. Silverman and authors Christopher Golden and Richard Kadrey filed a class-action complaint against OpenAI in July. The trio also filed a similar lawsuit against Meta. In all three cases, the lead attorney was antitrust lawyer Joseph Saveri.
“As with too many class action lawyers, the goal is generally to enrich class action lawyers, rather than actually stop any real wrongs,” he suggested Techdirt Editor-in-Chief Mike Masnick when the suits were first filed. “Saveri is not a copyright expert, and the lawsuits… prove it. There are a lot of assumptions about how Saveri seems to think copyright law works, which is completely inconsistent with how it actually works .”
In both complaints against OpenAI, Saveri claims that copyrighted works, including books by the authors in this lawsuit, “were copied by OpenAI without consent, without credit, and without compensation.”
This is a really strange way to characterize how AI training datasets work. Yes, AI tools “read” the works in question to learn, but they don’t need to copy the works in question. It’s also a strange understanding of copyright infringement, similar to claiming that someone reading a book to learn about a topic for a presentation is infringing the work, or that search engines are infringing when they crawl web pages to index them.
The authors in these cases also object to ChatGPT spitting out summaries of their books, among other things. “When ChatGPT was asked to summarize the books written by each of the plaintiffs, it generated very accurate summaries,” says Silverman et al. grievance.
Again, putting it in any other context shows how silly it is. Do book reviewers violate the copyrights of the books they review? Is someone reading a book and tweeting about the plot violating copyright law?
It would be different if ChatGPT reproduced copies of books in their entirety or excerpted large, verbatim passages. But the activity cited by the authors in their complaints is not this.
The copyright claims in this case “misunderstand the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that adequately leave room for innovations such as the large language models now at the forefront of ‘artificial intelligence,'” OpenAI said in its motion to dismiss some of the claims.
He suggested that the doctrine of fair use – designed in recognition of the fact “that the use of copyrighted materials by innovators in transformative ways does not infringe copyright” – applies in this case and in the case of “countless artificial intelligence products”. [that] they were developed by a wide range of technology companies.”
The Court intervenes
The authors who prevail here could seriously hinder the creation of AI-based language learning models. Fortunately, the court doesn’t believe many of their arguments. In a February 12 ruling, Judge Araceli Martínez-Olguín of the U.S. District Court for the Northern District of California dismissed most of the authors’ claims against OpenAI.
This included claims that OpenAI engaged in “indirect copyright infringement,” violated the Digital Millennium Copyright Act (DMCA), and was guilty of negligence and unjust enrichment. The judge also partially dismissed a claim of unfair competition under California law, while allowing the perpetrators to partially proceed with that claim (especially since California’s understanding of “unfair competition” in this case is so broad ).
Silverman and the other authors in these cases “did not claim that ChatGPT results contain direct copies of the copyrighted books,” Martínez-Olguín noted. And they “fail to explain what the results entail or claim that any particular result is substantially similar – or similar at all – to their books.”
The judge also rejected the idea that OpenAI had removed or altered copyright management information (as prohibited by Section 1202(b) of the DMCA). “Plaintiffs provide no facts to support this claim,” Martínez-Olguín wrote. “In fact, the complaints include excerpts of ChatGPT output that include multiple references to [the authors’] names.”
And if OpenAI has not violated the DMCA, other claims based on such alleged infringement, such as those that OpenAI distributed works with copyright management information removed or engaged in illegal or fraudulent business practices, also fail.
More battles against AI and copyright to come
This is not the end of the debate between authors and OpenAI. The judge has not yet ruled on their claim of direct copyright infringement because OpenAI has not yet tried to dismiss it. (The company said it will try to fix the issue later in the case.)
The judge will also allow the parties to file an amended complaint if they wish.
Given the weakness of their legal arguments and the judge’s dismissal of some claims, “it is difficult to see how the cases will survive,” Masnick writes. (See her post for a more detailed look at the claims involved here and why a judge threw them out.)
Unfortunately, we’re almost certain to continue to see people sue AI companies (language models, image generators, etc.) on dubious grounds, because America is in the midst of a growing tech panic. And every time a new tech panic takes hold, we see people trying to make money and/or make a name for themselves by launching a series of flimsy accusations in the form of lawsuits. We’ve seen it with social media companies and Section 230, social media and alleged harm to adolescent mental health, all sorts of popular tech companies and antitrust laws.
Now that AI is the darling of technological exuberance and hysteria, many people – from Federal Trade Commission bureaucrats to enterprising lawyers to all manner of traditional media creators and purveyors – are looking to make money for themselves themselves by these technologies.
“I understand why media companies don’t like people learning about their documents, but I believe that just as humans are allowed to read documents on the Internet, learn from them and synthesize completely new ideas, artificial intelligence should also be allowed to do it. ,” commented Andrew Ng, co-founder of Coursera and adjunct professor at Stanford. “I would like to see training on the public Internet covered by fair use – society would be better off that way – although it will ultimately be up to legislators and courts whether it really is.”
Unlike many people who write about technology, I don’t predict greater disruptions, good or bad, coming from AI any time soon. But there are many smaller benefits and efficiencies that AI can bring us, if we can stop people from hindering its development with a maximalist reading of copyright law.