On January 3, we published part one of this blog series summarizing the biggest copyright-related AI activities that took place within the federal government. In today’s post, we pick back up with the AI-copyright theme, focusing on the multitude of lawsuits filed last year against AI developers by a range of creators and copyright owners, mostly over the unauthorized use of copyrighted works for AI training purposes. Here are some highlights from those court cases and what to expect in 2024.
AI and Copyrightability
There was one court case, Thaler v. Perlmutter, which took on the important issue of whether and if so, when something created using AI can be protected by copyright. Dr. Stephen Thaler sought to register a 2-D image generated by an AI machine called the “Creativity Machine,” as a work made for hire because he was the owner of the AI system. The Copyright Office denied the registration application, and, in early 2022, affirmed the denial on the basis that Thaler failed to show requisite human authorship in the work and that the work could not qualify as a work made for hire. Thaler sued and in the summer of 2023 Judge Beryl Howell of the U.S. District Court for the District of Columbia issued an opinion agreeing with the Office “that human authorship is an essential part of a valid copyright claim” and is “a bedrock requirement of copyright.” In October, Thaler filed a notice of appeal with the U.S. Court of Appeals for the District of Columbia Circuit. So, this case will continue and we should expect a decision by the appellate court sometime in 2024. We might also see court challenges arise from other instances where the Copyright Office refused registration for works where AI generated-elements and human authorship were intertwined.
AI and Copyright Infringement
Last year, there were thirteen new copyright-related lawsuits alone filed against AI companies—the majority of which were filed as class-action lawsuits. At the heart of these complaints, visual artists, book authors, songwriters, and other creators and copyright owners are alleging infringement of their copyrights resulting from the ingestion of protected works to train AI models. The sheer number of these lawsuits and the pace at which they were filed are not surprising. This is in part because the capabilities of AI technologies have exploded, and AI companies have failed to meaningfully address or remedy the harms to creators and copyright owners related to the mass scraping and unauthorized use of expressive works to train commercial AI models. In the past year, a few of these cases moved forward with court decisions on various motions. In those cases, it has become evident that while courts may not be as convinced about some of the other legal claims being brought, AI companies’ attempts to dismiss the direct copyright infringement claims arising from ingestion issues have either failed or not even been argued by the defendant AI company. Here are the AI and copyright law cases in which there were notable developments.
Doe v. GitHub
The GitHub case was one of two cases mentioned in this blog post that was filed prior to 2023. In late 2022, a group of GitHub programmers filed a class action lawsuit against Microsoft and OpenAI for allegedly violating their open source licenses and scraping their code to train Microsoft’s Artificial Intelligence (AI) tool, GitHub Copilot. On May 11, the district court for the Northern District of California issued an order granting in part and denying in part the motions to dismiss made by Microsoft and OpenAI. Many claims were dismissed with leave to amend, and the order says that the plaintiffs must identify specific instances of their code reproduced by Copilot or Codex to strengthen their property rights claim. Plaintiffs filed a first amended complaint on July 21, followed by renewed motions to dismiss by OpenAI and Microsoft.
Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.
The Ross case is the second of two cases mentioned in this blog post which was filed prior to 2023. We include it here because of the action taken in the case in 2023. In 2020, Thomson-Reuters sued Ross Intelligence, which is a competitor legal research service, for copyright infringement, alleging that Ross obtained copyrighted works from a Westlaw subscriber to develop its own competing product based on machine learning. The claims allege that an AI bot systematically mined, collected, and downloaded content from the Westlaw database. On September 25, a memorandum opinion was issued by the district court for the District of Delaware, largely denying the cross motions for summary judgment made by the parties. The court explained that there is still a genuine factual dispute over the copyrightability of Westlaw’s headnotes, and that although Ross actually copied portions of bulk memos, the question of substantial similarity must be decided by a jury.
Anderson v. Stability AI
On January 13, artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed the first class-action lawsuit of the year against Stability AI, Midjourney, and DeviantArt in the Northern District of California, alleging copyright infringement and right of publicity violations for the use of the plaintiffs’ works in training data sets for the AI image-generating platforms Stable Diffusion, the Midjourney Product, DreamStudio, and DreamUp. In October, the court largely granted the motions to dismiss made by the defendants, but also granted plaintiffs leave to amend the claims. Though the headlines tended to frame the decision as a loss for the creative community, that was not an accurate summary of the dismissal because the most important claims in the case were not dismissed (and, to a lesser extent, because the plaintiffs were given leave to amend). Importantly, the court denied Stability AI’s motion to dismiss the plaintiffs’ direct copyright infringement claims with respect to the images scraped/ingested into the LAION training datasets used to train Stable Diffusion, and also held that plaintiffs’ assertions that their works had likely been used in the LAION datasets per results from the “Have I Been Trained” website adequately supported her infringement claims at this stage of the lawsuit.
Getty Images v. Stability AI
On February 3, Getty Images filed a copyright and trademark infringement suit against Stability AI in the U.S. District Court for the District of Delaware alleging that Stability AI “copied more than 12 million photographs from Getty Images’ collection, along with the associated captions and metadata, without permission from or compensation to Getty Images, as part of its efforts to build a competing business.” In addition to willful and intentional copyright infringement claims, Getty also alleged that Stability AI removed or altered copyright management information (CMI), provided false copyright management information, and infringed Getty Images’ trademarks. The case is still in the discovery phase. A parallel lawsuit filed in the United Kingdom will go to trial in 2024.
Tremblay v. OpenAI
On June 28, two authors of literary works filed a class-action lawsuit in the U.S. District Court for the Northern District of California accusing OpenAI of copyright infringement related to the unauthorized use of plaintiffs’ works to train its proprietary LLM, ChatGPT. The complaint alleges that OpenAI harvested mass quantities of literary works through illegal online “shadow libraries” and made copies of plaintiffs’ works during the training process. In addition to claims for direct infringement, the complaint alleges that every output of ChatGPT is an infringing derivative of plaintiffs’ works for which OpenAI is vicariously liable. On August 28, OpenAI filed a motion to dismiss the “ancillary claims” of vicarious infringement, violation of the Digital Millennium Copyright Act (DMCA), unfair competition, negligence, and unjust enrichment, but importantly, like Meta in its ongoing lawsuit, did not respond to the direct infringement claim, which OpenAI says it “will seek to resolve as a matter of law at a later stage of the case.”
Kadrey v. Meta and Silverman v. OpenAI
On July 7, Sarah Silverman, Christopher Golden, and Richard Kadrey brought two separate class-action lawsuits in the district court for the Northern District of California against Meta and OpenAI. In the first lawsuit against OpenAI, the plaintiffs accused OpenAI of copyright infringement related to the unauthorized use of plaintiffs’ books to train its proprietary LLM, ChatGPT. The complaint alleges that OpenAI harvested mass quantities of literary works through illegal online “shadow libraries” and made copies of plaintiffs’ works during the training process. In addition to claims for direct infringement, the complaint alleges that every output of ChatGPT is an infringing derivative of plaintiffs’ works for which OpenAI is vicariously liable. Also included in the lawsuit were claims under the DMCA for the removal of CMI under section 1202(b), as well as claims for unfair competition, negligence, and unjust enrichment. In the second lawsuit, the plaintiffs accused Meta of copyright infringement related to the unauthorized use of plaintiffs’ books to train its proprietary LLM, LLaMA, and made similar allegations and claims as in the lawsuit against OpenAI.
In November, the court granted Meta’s motion to dismiss (with leave to amend), rejecting plaintiffs’ claims that the LLaMa model itself is an infringing derivative work and that every output of the model qualifies as an infringing derivative of the input. It explained that “plaintiffs are wrong to say that, because their books were duplicated in full as part of the LLaMA training process, they do not need to allege any similarity between LLaMA outputs and their books to maintain a claim based on derivative infringement.” Rejecting the 1202(b) violation claims, the court found that “there are no facts to support the allegation that LLaMA ever distributed the plaintiffs’ books, much less did so ‘without their CMI.’” The order also dismissed the unjust enrichment and negligence claims. Meta’s motion to dismiss did not challenge the direct copyright infringement claims arising from unauthorized copying of the books for training the LLaMA model, which means those claims survive and will be taken up by the court.
J.L. v. Alphabet Inc.
On July 11, a group of anonymous plaintiffs filed a class-action lawsuit against Google for the use of personal information and various copyrighted works to train its AI models. Among other claims, the plaintiffs allege direct and vicarious copyright infringement and DMCA violations for removal of CMI. The complaint alleges that Google’s LLM, Bard, is able to generate summaries of copyrighted books or output that reproduces verbatim excerpts from copyrighted books. In addition to damages, the plaintiffs requested an injunction compelling the establishment of an independent AI council to monitor and oversee Google AI products and the destruction and purging of class members’ Personal Information, which includes copyrighted works and creative content. In October, Google filed a motion to dismiss, which the court has yet to rule on.
Chabon v. OpenAI & Chabon v. Meta
On September 8, a group of authors, including Michael Chabon, filed a class action lawsuit in the district court for the Northern District of California alleging direct and vicarious copyright infringement and removal of CMI, as well as state-related claims including unfair competition and negligence for copying and using the authors’ books in training ChatGPT. The complaint alleges that when prompted, ChatGPT provides extremely detailed summaries, examples, and descriptions of the authors’ works, and that the authors’ writing styles can be accurately imitated. The plaintiffs are suing for copyright infringement and removal of CMI, as well as state-related claims including unfair competition and negligence. On September 12, the same group of plaintiffs filed a similar lawsuit against Meta. No further significant actions were taken on the OpenAI case, but the case against Meta was consolidated in December with the previously mentioned lawsuit Kadrey v. Meta.
Authors Guild v. OpenAI Inc.
On September 19, the Authors Guild and a group of authors including David Baldacci, Mary Bly, John Grisham, George R.R. Martin, Jodi Picoult, and Roxana Robinson, filed a class action lawsuit against OpenAI in the district court for the Southern District of New York alleging copyright infringement claims over the mass ingestion of literary works to train ChatGPT and for infringing outputs generated by the AI machine. The complaint cites to examples of ChatGPT being prompted to generate detailed outlines of possible sequels to the plaintiffs’ works and accurate and detailed summaries of such works, including specific chapters of books. No further significant actions were taken on the case in 2023.
Huckabee v. Meta
On October 17, a group of authors including former Arkansas governor, Mike Huckabee, and best-selling Christian author, Lysa TerKeurst, filed a class-action lawsuit in the district court for the Southern District of New York against Meta, Microsoft, EleutherAI, and Bloomberg for direct and vicarious copyright infringement, removal of CMI, and various other state-law claims. The plaintiffs allege that the defendants infringed by using plaintiffs’ books to develop defendants’ LLMs using the “Books 3” training dataset. The lawsuit also asserts that AI research company, Eleuther AI, is liable for copyright infringement for hosting and distributing “The Pile” dataset, which includes Books3. According to court dockets, towards the end of the year, defendant-party Bloomberg filed a letter with the judge to dismiss claims, the plaintiffs voluntarily dismissed EleutherAI from the lawsuit, and the case was transferred to the district court for the Northern District of California for the claims made against Meta and Microsoft.
Concord Music Group, Inc. v. Anthropic PBC
On October 18, music publishers Universal Music Publishing Group, Concord Music Group, and ABKCO, filed a lawsuit in the district court for the Middle District of Tennessee against the AI company, Anthropic, alleging direct, contributory, and vicarious copyright infringement as well as CMI removal claims. The plaintiffs allege that Anthropic unlawfully copied and distributed plaintiffs’ musical works, including lyrics, to develop Anthropic’s generative AI chatbot, Claude. The plaintiffs state that when prompted, Claude generates output that copies the publishers’ lyrics. The plaintiffs’ complaint claims that 500 works have been infringed and requests statutory damages of $75 million for copyright infringement. On November 22, Anthropic, filed a motion to dismiss the lawsuit, arguing that the Middle District of Tennessee was not the proper district to hear the case. No decision has been rendered on the motion, but it’s not surprising that the technology company doesn’t want to go to trial in Tennessee, where a more sympathetic audience for the rights of songwriters and copyright owners might be found.
Sancton v. OpenAI
On November 21, a complaint was filed by a group of nonfiction writers against OpenAI and Microsoft in the Southern District of New York. The proposed class action lawsuit, led by Julian Sancton, accuses the companies of direct and contributory infringement related to the unauthorized use of plaintiffs’ literary works to train ChatGPT. Notably, the contributory infringement claims are directed at Microsoft for materially contributing to OpenAI’s direct infringement by providing investment money and supercomputing systems. No further significant actions were taken on the case in 2023, but it will be interesting to see how accused contributory infringers respond to the claims moving forward.
The New York Times Company v. Microsoft
To close out an already busy year of generative AI-related litigation, The New York Times Company (NYT) filed a lawsuit in late December against Microsoft and OpenAI in the Southern District of New York, alleging direct, vicarious, and contributory copyright infringement, removal of CMI under the DMCA, unfair competition, and trademark dilution claims over the copying and use of NYT’s copyright protected works to train the ChatGPT model. After discussing the prevalence of the NYT’s articles in training data sets used to develop ChatGPT, the complaint provides evidence of ChatGPT generating verbatim outputs of significant portions of various NYT articles. Unlike some earlier filed cases that included questionable claims that were subsequently rejected, the NYT case presents strong evidence of copying and clear claims that will be difficult for OpenAI to defend.
These cases will continue to unfold and progress in 2024. We also expect more lawsuits to be filed against AI companies in 2024, as we have seen complaints make stronger claims with clear evidence of the infringement occurring during the AI ingestion and training processes. To stay apprised on AI and copyright news, visit our AI and copyright webpage and sign up for our AI Alert.
If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!