Insights from Court Orders in AI Copyright Infringement Cases

There are now well over thirty lawsuits that have been filed by copyright owners in U.S. federal court against AI companies, accusing them of direct copyright infringement for using copyrighted works without authorization to develop their AI models.^[1] Many of the cases also include claims for unauthorized removal or alteration of copyright management information (also known as Section 1202(b) claims) and claims alleging (e.g., claims alleging the AI models produce infringing derivative works), but in many instances those claims were dismissed. Despite these (and other) claims being dismissed, it is important to understand that none of the direct infringement claims for unauthorized use of the works have been dismissed and those claims are at the very heart of virtually all these cases.

Under U.S. copyright law, copying entire copyrighted works and storing them for more than a transitory period—which is what AI developers do—is unquestionably copyright infringement. However, the AI developer’s infringing conduct is excused if either the copyright owner has licensed the AI developer to make the training copies, or if an applicable exception in the Copyright Act excuses the infringing conduct. The only possible exception in the Copyright Act that might apply to the AI ingestion is the fair use exception, found in Section 107 of the Copyright Act. Thus, it should surprise no one that the primary issue in virtually all of the pending AI-copyright cases being litigated in federal courts today is whether the ingestion of an unlicensed copyrighted work for training of a generative artificial intelligence (GAI) model qualifies as fair use.

Whether ingestion for training qualifies as a fair use is the big issue. It’s the one we are holding our collective breath to see what the courts decide. To date, no court has ruled on fair use in any of these cases yet.

Of course, it’s common copyright law knowledge that whether a use qualifies as a fair use is a fact-specific inquiry that is decided on a case-by-case basis. This is especially true for these AI-copyright infringement cases. That’s because the AI models—whether the model is an LLM, image generator, music generator, or other model—operate differently and the copyrighted works at issue in each of the cases—books, visual art, music and other works—are different. (If you don’t believe me just read Judge Orrick’s comments on the Andersen v. Stability AI case, which is detailed in the next section.) Consequently, no one case should rule the day. So, even when the first fair use decision is handed down, we’ll likely have to reserve judgment until we have a handful or more cases that have been decided under different factual scenarios.

While we wait for these fair use decisions, it may be valuable to take a look at the few instances where a court has issued an order or opinion in these cases to get some insights on how they might handle the fair use issue or consider other relevant cases.

There are three cases that we can look to for such insights: Andersen v. Stability AI, The New York Times v. OpenAI, and Concord Music Group, Inc. v. Anthropic. It is useful to note that each of these federal AI copyright infringement cases deal with different subject matter—images, text, and music, respectively—and different defendant AI developers.

Andersen v. Stability AI

In August 2024, in the Andersen v. Stability AI case pending before the Northern District of California, Judge William Orrick issued an order granting in part and denying in part motions to dismiss a first amended complaint. The case (a class action suit) was brought by visual artists Sarah Andersen, Kelly McKernan, and Karla Ortiz against Stability AI, Midjourney, and DeviantArt for direct copyright infringement for copying plaintiffs’ copyrighted works for training their AI image-generating models (and other claims related to such use). As with the two other cases discussed later in this blog, we should be careful not to read too much into the order since it was issued at a very early stage of the case and relates solely to a motion to dismiss. Nevertheless, Judge Orrick’s order may provide some insights into how courts may treat claims of direct infringement claims for copying at the input stage.

One significant takeaway from the Andersen order is Judge Orrick’s clear rejection of the argument that the AI company only copied unprotectable “data” that exists as statistical representations in a model and therefore was non-infringing. Judge Orrick explains that the fact that plaintiffs’ images “may be contained in Stable Diffusion as algorithmic or mathematical representations—and are therefore fixed in a different medium than they may have originally been produced in—is not an impediment to the claim ….”

Perhaps the most significant takeaway from the Andersen order is Judge Orrick’s acknowledgment that AI technology is so different from past technologies, that past copyright infringement cases involving such technologies may have little influence on these AI infringement cases. One argument that many AI companies tend to espouse is that, for copyright purposes, AI technology is no different than the VCR, xerography machines, or other technologies that were found to be non-infringing. Judge Orrick quickly and clearly debunks that notion. In his order, he explains that AI infringement is not at all similar to the copyright infringement cases involving the sales of VCRs, saying:

“…this is a case where plaintiffs allege that Stable Diffusion is built to a significant extent on copyrighted works and that the way the product operates necessarily invokes copies or protected elements of those works. The plausible inferences at this juncture are that Stable Diffusion by operation by end users creates copyright infringement and was created to facilitate that infringement by design.”

The distinction that Judge Orrick is making here is that AI is different from past technologies that were found to be a fair use because AI ingests copies before making the AI models available to users. In contrast, with VCRs and similar copying devices, the copies were made by the consumer, not beforehand by the defendants.

Judge Orrick goes on to explain that because of the unique nature of GAI, the “run of the mill” copyright cases relied upon by the defendants (and many other AI companies) are unhelpful. He notes that defendants’ reliance on cases where a showing of substantial similarity between works is required when determining whether an inference of copying can be supported is unhelpful “in this case where the copyrighted works themselves are alleged to have not only been used to train the AI models but also invoked in their operation.”

He then goes a step further to explain that even decisions in other AI (presently pending) cases are not influential where they involve different AI models because of the many operational differences between the GAI models and the types of copyrighted works that are ingested. In making this point, Judge Orrick says:

“The products at issue here—image generators allegedly trained on, relying on, and perhaps able to invoke copyrighted images—and the necessary allegations regarding the products’ training and operations, are materially different from those in Kadrey [a case involving an LLM model].”

New York Times v. OpenAI

In the copyright infringement case brought by The New York Times against OpenAI in the Southern District of New York,[2] OpenAI filed a motion that had little if anything to do with its fair use defense. The motion was to compel The New York Times (NYT) to provide information related to the NYT’s use and views on GAI tools (which OpenAI unsuccessfully argued was necessary for its fair use arguments). In November, the court issued an opinion and order denying OpenAI’s motion, stating that such information is not relevant to OpenAI’s fair use arguments and explaining that the AI company’s reliance on the Google v. Oracle case does not support an interpretation that the fourth fair use factor analysis requires consideration of the copyright owner’s other uses or licensing of their own works to nonparties.

Further supporting this proposition, the court also cited to the thirty-year-old decision by the Second Circuit in American Geophysical Union v. Texaco, an important case which held that a for-profit corporate library’s systematic making of entire copies of journal articles for its employees was not a fair use, because of, among other things, the licensing options that could have been utilized such as those offered by the Copyright Clearance Center. The fact that the court cites to this case (which focuses on voluntary collective licensing options) and not the many other cases decided over the past thirty years may be noteworthy.

What is perhaps most significant about the order is the court’s concluding remarks, stating:

“This case is about whether Defendant trained their LLMs using Plaintiff’s copyrighted material, and whether that use constitutes copyright infringement. It is not a referendum on the benefits of Gen AI….”

Concord v. Anthropic

In the third and final AI copyright infringement case discussed here, Concord Music Group, Inc. v. Anthropic, plaintiff music publishers filed a motion for a preliminary injunction against Anthropic for infringement in the training and regurgitating of lyric outputs of Anthropic’s AI model, Claude. A hearing was held in late November 2024. According to reports (which we later confirmed as largely accurate), the presiding judge, Judge Eumi Lee, when discussing fair use for Anthropic’s copying of plaintiffs’ works to train its model said, “fair use in the training of AI models is ‘a novel, cutting-edge question that pushes the boundaries of fair use and copyright law’”. We should be careful not to read too much into one statement during a hearing, however, “pushing the boundaries” of something typically (but not always) means that something is outside those boundaries. As one reference source explains it, “usually when someone pushes the boundaries, the person is violating a rule or violating accepted behavior—not by a lot, but it does feel like a violation.” That would seem to indicate that Anthropic’s use in this case would not qualify as fair use. Granted, that’s not much to go on, but that is all we have for the time being.

Looking at all three federal AI copyright infringement cases together at an early stage of the litigation, it certainly does appear that the courts seem to be leaning toward agreeing with plaintiff copyright owners’ claims that AI developers are infringing their rights by ingesting their works for training purposes without their permission. At the very least, judges are exhibiting a healthy skepticism of AI companies’ broad stroke fair use arguments. Only time will tell whether that is an accurate assessment. If you want to stay up to date on the latest copyright-related AI news, I encourage you to sign up to receive our AI copyright alerts.

[1] Reports on the number of case pending may differ due to the fact that several infringement cases have been consolidated. For example, there have been similar cases filed against the same AI companies in the same district courts that involve similar groups of plaintiffs – and in those instances, two or sometimes three cases have been consolidated down into one case.

[2] The Times case was filed almost a year ago and has since been consolidated with another case filed against OpenAI by the Daily News.

If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

get blog updates