Mid-Year Review: AI Lawsuit Developments in 2024
The development and launch of new generative artificial intelligence (AI) technologies over the last two years has been revolutionary. But these advancements have been accompanied by significant concerns from copyright owner and creator communities about what materials are being ingested for training and how to hold AI companies accountable for the mass unauthorized use of their copyrighted works.
While a few lawsuits against generative AI companies have been ongoing for years, 2023 saw a steady stream of complaints filed by groups of artists and large copyright owner organizations against some of the biggest names in generative AI. The trend has continued in recent months, and there are now roughly 25 different lawsuits winding their way through various courts. Now that we’re more than halfway through the year, let’s look back at some of the biggest AI-related litigation developments in 2024.
AI and Copyrightability
While nearly all of the lawsuits involving generative AI are infringement cases, there’s one significant authorship case, Thaler v. Perlmutter, that was appealed in 2024. Back in early 2022, the U.S. Copyright Office Review Board affirmed a denial of a registration for a two-dimensional artwork titled “A Recent Entrance to Paradise,” that the applicant, Steven Thaler, claimed was “authored” by an algorithm called the “Creativity Machine.”
Following a complaint filed by Thaler against the Copyright Office alleging that the Office’s denial of Thaler’s registration application for the AI-authored work was an arbitrary and capricious agency action, in August 2023, the U.S. District Court for the District of Columbia granted the Office’s motion for summary judgment. The order explained that the “defendants are correct that human authorship is an essential part of a valid copyright claim” and “a bedrock requirement of copyright.” In October 2023, Thaler filed a notice of appeal to the U.S. Court of Appeals for the District of Columbia Circuit, and oral arguments are scheduled for September 2024.
The outcome of the appeal will be closely watched by artists utilizing AI technologies, the Copyright Office, and other stakeholders. But it’s hard to see how the D.C. Circuit would overturn the lower court’s decision, which was based on decades of case law, longstanding Copyright Office policy, and the U.S. Constitution. Moreover, the Copyright Office has been clear that it has and will grant registrations for works created in part using generative AI—it just won’t grant them for works created wholly using generative AI where no human authorship is claimed or where certain elements generated by AI are not disclaimed.
Infringement Class Actions Face Hurdles
One of the first major infringement cases brought against generative AI companies was Andersen v. Stability AI in 2023, which involves allegations by a group of visual artists that Stability AI, Midjourney, and DeviantArt used the artists’ copyrighted works without permission to train their AI models. In October 2023, a district court in the Northern District of California largely granted motions to dismiss by the defendant AI companies but, importantly, allowed the direct copyright infringement claims to move forward and allowed plaintiffs leave to amend. In May 2024 the court issued a proceedings and tentative rulings order, indicating that it will allow plaintiffs to file a second amended complaint and deny all motions to dismiss the direct and induced copyright infringement claims.
The Andersen case has been a bellwether for other class action cases against AI companies that have followed, in that many of the lawsuits have also been stripped of broad claims related to allegedly infringing output and removal of copyright management information (CMI) under the Digital Millennium Copyright Act (DMCA). However, direct infringement claims related to the unauthorized use and copying of the plaintiffs’ works for training purposes have survived and will be the decisive issue in many (if not all) of the infringement cases.
Cases like L. v. Alphabet, which was brought in July 2023 by a group of authors of literary works against Google, were similarly trimmed down through motions to dismiss to only include claims of direct infringement related to input-side training. There have been similar class action lawsuits brought by groups of authors in 2024, including Nazemian v. Nvidia and O’Nan v. Databricks, which allege that the AI companies trained their models on curated datasets that included the Books3 dataset, which consists of copyrighted works scraped from illegal online “shadow libraries.” The cases both include counts of direct and vicarious infringement, as well as violations of the plaintiffs’ rights to “make derivative works, publicly display copies (or derivative works), or distribute copies (or derivative works).” But given what we’ve seen in earlier filed cases, these plaintiffs will likely face an uphill battle with the general claims related to derivative outputs. Additionally, the defendants have answered the complaints with a number of defensive arguments, including de minimis copying and transformative fair use.
Other recent class actions, Zhang v. Google, Dubus et al v. NVIDIA Corporation, and Makkai et al v. Databricks, Inc. et al, were all filed in April or May of 2024 and involve groups of visual artists and authors bringing claims against AI companies in the district court for the Northern District of California over the unauthorized use of plaintiffs’ works to train different generative AI models. Unlike in the initial complaints in prior cases, the plaintiffs in these cases only allege that the AI companies are liable for direct (and in one case, vicarious) copyright infringement for the copying of plaintiffs’ works. It’s likely that the plaintiffs—and the law firms representing them, some of which are involved in many of the other class actions—have read the writing on the wall and are now trimming out claims that they’ve seen dismissed in other cases.
Consolidations
Early 2024 saw the consolidations of many lawsuits involving similar claims against the same AI companies. In February, Chabon, et al. v. OpenAI was consolidated with Tremblay v. OpenAI (which was itself consolidated with Silverman v. OpenAI in late 2023). All three cases are class action lawsuits brought by authors of literary works against OpenAI in the Northern District of California, and they all accuse OpenAI of copyright infringement related to the unauthorized use of plaintiffs’ works to train its proprietary large language model (LLM), ChatGPT. Similar to many of the class action lawsuits described above, the complaints allege that OpenAI harvested mass quantities of literary works through illegal online “shadow libraries” and made copies of plaintiffs’ works during the training process. Following consolidation, in March the plaintiffs filed a first consolidated amended complaint that now only includes one claim of direct copyright infringement related to training ChatGPT on plaintiffs’ works.
Also in early 2024, Meta filed an answer to an amended complaint in two class action lawsuits, consolidated as Kadrey v. Meta, brought against it by authors of literary works. The answer admits that portions of the Books3 dataset were used to train the first and second versions of Llama but argues that fair use excuses the infringement of any copyrighted works. On July 1, an order was issued consolidating the case once again with Huckabee v. Meta, another class action brought by authors against Meta in 2023, and voluntarily dismissing plaintiffs Michael Chabon and Ayelet Waldman.
One other group of cases, also involving class actions brought by authors of literary works, were consolidated in early 2024 under Authors Guild v. OpenAI Inc. The three cases—Basbanes v. Microsoft, Sancton v. OpenAI, and Authors Guild v. OpenAI—were filed between September 2023 and January 2024 in the Southern District of New York, and they all include claims against OpenAI and Microsoft over the mass ingestion of literary works to train ChatGPT. The Authors Guild complaint specifically cites to examples of ChatGPT being prompted to generate detailed outlines of possible sequels to the plaintiffs’ works and accurate and detailed summaries of such works, including specific chapters of books.
OpenAI filed an answer to the consolidated amended complaint in February, arguing, among other things, that its use of plaintiffs works qualifies as transformative fair use. The case is now in the discovery phase, with summary judgment briefing scheduled for early 2025. This case, along with others where defendants are claiming transformative fair use, will provide the first insight into how courts will assess transformative fair use as applied to generative AI training. It’s worth noting that post-Warhol, even a finding in favor of transformative fair use shouldn’t control a fair use analysis and may not be enough to get AI companies off the hook for infringement.
Corporate Copyright Owners Step Up to the Plate
Getty Images was one of the first to file lawsuits against a generative AI company—Stability AI in both the US and UK in early 2023—but the cases have been slow to advance due to jurisdictional challenges and discovery disputes. One notable recent development is that on July 8, 2024, Getty filed a second amended complaint in the US case, which includes additional allegations relating to personal jurisdiction. The amended complaint also drops a claim of removal or alteration of copyright management information (CMI) in violation of 1202(b), but Getty continues to allege that Stability provided false CMI in violation of 1202(a).
In late 2023 and the first half of 2024, a number of lawsuits were filed against AI companies by other large copyright owner organizations, including music publishers, record labels, online media companies, and traditional newspapers. These lawsuits, which are working their way through various federal courts, typically have targeted claims, strengthened by supporting evidence of copying.
One of these lawsuits filed in late 2023, Concord v. Anthropic, involves a group of music publishers that sued the AI company, Anthropic, in the Middle District of Tennessee for direct, contributory, and vicarious copyright infringement as well as CMI removal claims. The complaint alleges that Anthropic unlawfully copied and distributed plaintiffs’ musical works, including lyrics, to develop its AI chatbot, Claude. To back up the claims, the plaintiffs show evidence that, when prompted, Claude generates output that copies the publishers’ lyrics in a near verbatim manner. In June, a memorandum opinion was issued granting Anthropic’s motion to transfer to the Northern District of California after finding that the Court does not have personal jurisdiction over Anthropic. It’s not the outcome the plaintiffs wanted, as they likely sought out to secure a jurisdiction (which includes Nashville) that would be more sympathetic to the music publishing industry.
In December 2023, the New York Times (NYT) filed a lawsuit against Microsoft and OpenAI in the Southern District of New York, alleging direct, vicarious, and contributory copyright infringement, as well as removal of CMI related to the copying and use of NYT’s works to train its ChatGPT model. The complaint details the prevalence of the publishers’ articles in training data sets used to develop ChatGPT in addition to evidence of ChatGPT generating verbatim outputs of significant portions of various NYT articles. The complaint also notes that NYT attempted to negotiate an agreement with the defendants for the use of NYT’s works in new digital products, but that the defendants refused to an agreement, arguing that their conduct was excused under the fair use doctrine. After motions to dismiss and responses were filed by the parties in early 2024, the court issued a case management order in May setting deadlines for motions for summary judgment and replies for early 2025.
A pair of lawsuits were filed by online media companies on the same day in February 2024 against OpenAI in the Southern District of New York. In both Raw Story Media v. OpenAIand The Intercept Media, Inc., v. OpenAI Inc., the complaints do not include counts of direct copyright infringement, but rather only allege violations of section 1202(b) for the removal of CMI from works that were used without authorization to train ChatGPT. In April, OpenAI responded to both complaints and argued that the plaintiffs failed to state a claim under 1202(b) because they lack standing, failed to specify the works at issue, and that they did not adequately plead scienter. It’s unclear why the plaintiffs limited their claims to removal of CMI, but they’ll likely need to provide evidence of specific works that were stripped of CMI if their lawsuits are to proceed.
Most recently, two lawsuits were filed by Universal Music Group and other record labels against the AI music generator companies Suno and Udio. The complaint against Suno was filed in the District Court for The District of Massachusetts and alleges that the AI company is liable for direct copyright infringement related to the unauthorized use of both pre and post-1972 recordings to train its generative AI music model. The same allegations are made in the complaint against Udio, which was filed in the Southern District of New York. Both complaints offer evidence of potentially infringing outputs that mimic identifiable features of Plaintiffs’ works. However, neither complaint claims that these outputs are infringing derivative works…yet. It seems that rather than focus on whether the output is infringing, the record labels would rather focus on infringement on the input side and use the examples of potentially infringing outputs as evidence that their works were in fact copied.
Conclusion
If there’s one thing we’ve learned from recent developments so far in 2024 in infringement cases against generative AI companies, it’s that plaintiffs are getting wiser about what claims to bring and are now focusing almost entirely on claims of direct infringement for input-side copying. Motions to dismiss either haven’t challenged those claims, or they have, and courts have rejected them.
We’re also seeing very strong cases brought by corporate copyright owners (e.g., New York Times, Universal Music) that show irrefutable evidence of unauthorized copying. To that end, we’re also beginning to see the emergence (in court filings) of a position that AI companies have promoted over the last few years: that any copying done for training purposes qualifies as transformative fair use. It’s a bold gamble, especially after the Supreme Court reigned in transformative fair use in Warhol v. Goldsmith, and it’s one all generative AI stakeholders will be watching closely.
As we move into the second half of 2024, it doesn’t look like there will be any slowing down of infringement cases filed against AI companies. There will soon likely be around 30 different cases, and while they can be hard to keep track of, be sure to check in with the Copyright Alliance’s AI case tracker webpage for updates.
If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!