AI Lawsuit Developments in 2024: A Year in Review

The proliferation of generative artificial intelligence (GAI) models over the past few years has given rise to well over thirty copyright infringement lawsuits by copyright owners against GAI developers. While no final decisions have been reached and the cases are in different stages, there were several developments—including consolidations, transfers, and orders on motions to dismiss—in 2024 that hint at out how some courts are leaning. This blog will focus on the most significant takeaways from 2024 and look ahead to what is sure to be a pivotal year for copyright and GAI in 2025.

Cases Filed by Visual Artists

One of the earliest cases brought against GAI developers by a group of creators is Andersen v. Stability AI, filed back in January of 2023. The case involves a group of visual artists that sued four different companies, alleging copyright infringement and right of publicity violations for the use of their works in training data sets for the AI image-generating platforms Stable Diffusion, the Midjourney Product, DreamStudio, and DreamUp. The case has been moving slowly, with the court granting motions to dismiss for all claims except the direct infringement claims and the plaintiffs filing amended complaints. A key development in 2024 was that the plaintiffs filed a notice of voluntary dismissal of claims related to removal of copyright management information (CMI) under the Digital Millennium Copyright Act (DMCA). The dismissal came after a court in the Northern District of California ruled in Doe v. GitHub that DMCA Section 1202(b) claims require “identicality” between original works and copies from which CMI is removed. However, the plaintiffs noted that in the event the Ninth Circuit does not require “identicality” in Section 1202(b) claims, they would request a reconsideration of the DMCA claims. More on Doe v. Github later.

Another important development came in August when Judge William Orrick issued an order granting in part and denying in part motions to dismiss a first amended complaint, which clearly rejected the defendants’ argument that the AI companies only copied unprotectable “data” that exists as statistical representations in a model and therefore was non-infringing. The order also explains that AI technology is so different from past technologies, that past copyright infringement cases involving such technologies may have little influence on these AI infringement cases. More on the significance of that order can be found here.

Cases Filed by News Organizations

The first case against a GAI company by a news publisher came in late 2023 when the New York Times (NYT) filed a lawsuit against Microsoft and OpenAI in the Southern District of New York, alleging direct, vicarious, and contributory copyright infringement, and removal of copyright management information under the DMCA over the copying and use of NYT’s works to train ChatGPT. In 2024, the case was consolidated with two other cases filed by news organizations, Daily News et al. v. Microsoft Corp. et al. and The Center for Investigative Reporting v. OpenAI. A significant development in the case came in November, when the court issued an opinion and order denying OpenAI’s motion to compel evidence related to the NYT’s business practices or the use of generative AI by its employees. The court explained that the production of such material was irrelevant to a fair use analysis and made clear that the AI company’s reliance on the Google v. Oracle case does not support an interpretation that the fourth fair use factor analysis requires consideration of the copyright owner’s other uses or licensing of their own works to nonparties.

Another case brought in early 2024, Raw Story Media v. OpenAI, involves two online news organizations that only brought one Section 1202(b) violation claim (and no direct infringement claims) against OpenAI for removing copyright management information (CMI) from works that were used to train ChatGPT. OpenAI moved to dismiss the sole claim, arguing, among other things, that the plaintiffs failed to adequately plead that OpenAI knowingly removed CMI with knowledge that it would result in infringement—something known as the “double scienter” requirement. In November, the court issued an order granting OpenAI’s motion to dismiss in its entirety, finding that Plaintiffs lack Article III standing to pursue an injunction or damages retroactively under 1202(b).

The plaintiffs in Raw Story have since filed a memorandum of law in support of a motion to file an amended complaint, but the case is another example of the difficult road copyright owners face with Section 1202 DMCA claims. In a similar lawsuit, The Intercept Media, Inc., v. OpenAI Inc., a news publisher brought CMI claims against OpenAI only to see the court dismiss the 1202(b)(3) claims—related to the distribution of works with CMI removed or altered—with prejudice. The court did allow the 1202(b)(1) claims to proceed past the motion to dismiss stage, but that claim is likely to face the same “double scienter” challenges.

A final news publisher case brought in 2024 worth mentioning is Dow Jones v. Perplexity, which alleges that plaintiffs’ copyrighted works are accessed and copied as part of Perplexity’s “retrieval-augmented generation” (RAG) database. Unlike companies that develop and train their own GAI models, Perplexity’s model functions as a search engine that incorporates previously developed and trained LLMs. The plaintiffs allege that Perplexity’s models then repackage original copyrighted works into verbatim or near verbatim summaries and responses to users’ prompts. The case will be one to watch in 2025 given the specific focus on RAG technology.

Cases Filed by Authors of Literary Works

The majority of infringement cases against GAI companies have been brought by authors of literary works, and 2024 saw new cases filed, existing cases consolidated, and significant orders on motions to dismiss. In Tremblay v. OpenAI, which was consolidated with Silverman v. OpenAI and Chabon v. OpenAI in late 2023, groups of book authors continued their fight against OpenAI in the Northern District of California for the alleged copying of their books from illegal online “shadow libraries” and using them to train ChatGPT. In 2024, the case moved slowly due to discovery disputes, but the claims were also narrowed down to one count of direct copyright infringement through an amended complaint, and OpenAI filed an answer denying the claim and, in the alternative, arguing fair use.

Another similar consolidated case involving book authors is Kadrey v. Meta, which is a class-action lawsuit against brought against Meta in 2023 in the Northern District of California accusing the AI developer of copyright infringement related to the unauthorized use of plaintiffs’ books to train its proprietary large language model (LLM) LLaMA. After claims were narrowed down to direct copyright infringement and the case was consolidated with Chabon v. Meta in 2023, in early 2024 Meta filed its response, admitting that portions of the Books3 dataset was used to train the first and second versions of LLaMA, but arguing that fair use excuses their infringement. Then in September, Judge Chhabria held a status conference during which he said that he would not certify the Plaintiff class and chastised Plaintiffs’ counsel, accusing them of being “either unwilling or unable to litigate properly.” Soon after the conference, David Boies, founding member of the Boies, Schiller, Flexner law firm, joined as co-counsel for the plaintiffs. The case has also since been consolidated with another lawsuit brought by authors against Meta, Farnsworth v. Meta.

In Authors Guild v. OpenAI, a class action case brought in the Southern District of New York in 2023 (and consolidated with Sancton v. OpenAI and Basbanes v. Microsoft) alleging copyright infringement over the mass ingestion of literary works to train ChatGPT, the past year saw few developments due to ongoing discovery disputes. However, in February, OpenAI filed its answer to an amended complaint, arguing that any copying of plaintiffs’ works qualifies as transformative fair use. The transformative purpose fair use argument is one that will likely be made in most, if not all, of the GAI copyright infringement cases in response to input-side infringement claims. But it’s worth noting that in the wake of Warhol v. Goldsmith, even if a GAI companies use of copyrighted materials to train their models was found to be transformative, that would not be dispositive of a fair use finding. 

A couple of other cases brought by authors in 2024 to watch this year are O’Nan v. Databricks and Bartz v. Anthropic. O’Nan involves claims that Databricks’ MosaicML model was trained on curated datasets that include the Books3 dataset, which consists of copyrighted works scraped from illegal online “shadow libraries.” It was recently consolidated with Makkai et al v. Databricks, Inc. et al, and the defendants have filed an answer to the complaint, arguing transformative fair use and demanding a jury trial. A similar class action lawsuit brought by authors, Bartz v. Anthropic, argues that Anthropic downloaded and copied hundreds of thousands of copyrighted books that were made available through pirate websites and incorporated into “The Pile” training dataset. In October, Anthropic filed an answer to the complaint, denying many of the plaintiffs’ claims and offering a number of affirmative defenses, including fair use, failure to state a claim, de minimis copying, lack of copyright standing, lack of volitional conduct on behalf of Anthropic, and that the plaintiffs’ works or elements of the works at issue are not copyrightable.

The first lawsuit brought against a GAI company by copyright owners in musical works, Concord v. Anthropic, was brought by music publishers in 2023 and alleges direct, contributory, and vicarious copyright infringement as well as CMI removal claims related to the unlawfully copying and distribution of the plaintiffs’ musical works, including lyrics, to develop Anthropic’s Claude chatbot. In June, the court granted Anthropic’s motion to transfer the case to the Northern District of California, which may be a less sympathetic venue for music copyright owners than Tennessee. In November, a hearing was held on the plaintiff’s motion for preliminary injunction, and then, just before the end of the year, the parties reached an agreement whereby Anthropic must maintain guardrails to prevent future AI tools from producing infringing material from copyrighted content. While the agreement is a step in the right direction, it only partially resolves the music publishers’ motion for preliminary injunction—the question over whether Anthropic will be prohibited by the court from training future AI models with plaintiffs’ lyrics is yet to be decided.

Two similar lawsuits, UMG v. Suno and UMG v. Udio, were brought in 2024 by record labels against AI music generators in the District Court for the District of Massachusetts alleging that the companies are liable for direct copyright infringement related to the unauthorized use of both pre and post-1972 recordings to train their models. Both complaints offer evidence of potentially infringing outputs that mimic identifiable features of Plaintiffs’ works. In August, both defendants filed answers to the complaints that do not deny copying plaintiffs’ works to train the models but argue that it qualifies as “intermediate” copying and thus fair use.

Other Cases

As mentioned earlier, Doe v. GitHub is a case originally filed in 2022 that has seen important developments related to CMI removal claims in 2024. The case was originally brought by a group of GitHub programmers filed against Microsoft and OpenAI for allegedly violating their open source licenses and scraping their code to train Microsoft’s Artificial Intelligence tool, GitHub Copilot. As far as copyright claims, the case only involved allegations of violations of the DMCA for the removal of CMI, and those claims were dismissed with prejudice in 2024 for failing to meet the “identicality” requirement that (while not required by the statute) has been implemented by some circuit courts. In September, the court issued an order granting Plaintiffs’ motion for leave to appeal the decision dismissing the 1202(b) claims, and in December, the Ninth Circuit granted the interlocutory appeal. The appeal will be watched close by GAI litigation stakeholders, as the outcome could have a far-reaching impact on CMI claims in other cases.

Another case to watch in 2025 is Thomson Reuters v. Ross, which is a copyright infringement case brought way back in 2020 alleging that Ross, a competitor legal research service, obtained copyrighted content from a Westlaw subscriber to develop its own competing product based on machine learning. After years of delays, the case was set to go to trial in August but was postponed at the last minute and the parties were told to bring new summary judgment motions on copyrightability, validity, infringement and fair use. The parties filed renewed motions in November and a hearing was held on December 5, during which Ross argued that it didn’t directly input Westlaw text into AI model. Barring any other unexpected postponements, the case will likely go to trial in 2025 and may give us the first glimpse at how a jury will assess important GAI related copyright claims. 

A final case to watch in 2025 is Vacker v. ElevenLabs, which was brought last year by a group of voice actors in the U.S. District Court for the District of Delaware against AI company ElevenLabs for allegedly using their voice recordings in audio books to train its AI model. While the complaint does not include any claims of direct copyright infringement, it includes counts of DMCA violations under 1201(a) for the circumvention of digital rights management (DRM) protections and 1202(b) for the removal or alteration of CMI. The fate of those claims could hang in the balance of the aforementioned appeal in Doe v. Github.

Conclusion

Over the last year we saw a number of new cases filed against GAI companies, with copyright owner plaintiffs seeming to learn lessons from earlier cases and focus their claims on direct infringement at the training stage and violations of the DMCA’s copyright management information provisions. We got a glimpse of how defendants will present fair use arguments in the form of transformative purpose use and intermediate copying, and we’ve seen largely successful challenges to the DMCA claims. With the Doe v. Github interlocutory appeal on 1202(b) teed up for early 2025, there is likely to be a decision that will either allow those claims to move forward or see them cut from existing and future complaints. Of course, the elephant in the room remains the direct copyright infringement claims for the unauthorized use of copyrighted works for training, and we could get the first major decisions on those claims this year.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

get blog updates