5 Takeaways from the Copyright Office’s Report on Generative AI Training

Earlier this month, the U.S. Copyright Office released the third of four reports in its study for Congress on copyright and AI. There is little doubt that this third report is the most important and consequential to copyright and AI stakeholders because it addresses issues relating to training of AI systems on copyrighted works and whether such training constitute fair use in certain instances. That is the issue at the heart of about forty pending lawsuits between copyright owners and AI companies.

It bears mentioning that this third report is a pre-published version that the Office notes will be substantively identical to the final version of the report and will only be different in that the final version will include an executive summary and formatting clean ups. To the best of our knowledge, the Office has never issued a prepublication report. The reasons they did so here are beyond the scope of this blog because it is irrelevant to the substance of the third study, which is what this blog will focus on. The fact that this was pre-publication version should have no impact on how Congress or the courts consider the report since its substance will not change when the final version is released.

As the most-qualified expert U.S. government office on copyright issues, the Copyright Office’s third AI report is a prime example of what we have grown used to with the Office—high-quality, balanced, and thoughtful analysis of complex copyright issues. The Office understands that there is no such thing as a use that is categorically fair use, and thus it does not make categorical or definitive conclusions in the report. Nor does the report pick sides or pick winners or losers. Rather, it is a nonpartisan and unbiased approach that will help advance our collective knowledge and policymaking on these important fair use and licensing issues.

The few criticisms of the report are a result of the report being balanced, and because it does not agree with some of these critics’ views of fair use at every turn. In the report, the Copyright Office takes the sensible and correct approach that AI training is not categorically fair use and that whether a use qualifies as a fair use is a matter of context and degree. These are concepts strongly supported in well-established case law, especially in the recent Supreme Court decision in Warhol v. Goldsmith, which are discussed and applied in detail in the Copyright Office’s third AI report.

But don’t just take our word for it. We strongly encourage people to read the full report—as it’s well drafted and thoughtfully considers the issues from all perspectives. But for those who don’t have the time, here are five key takeaways from the U.S. Copyright Office’s third AI Report.

1) Transformative Use Test Must Focus on the Ultimate Purpose of the Use

One key takeaway from the Office’s report is its confirmation that the transformative use subfactor requires looking at the ultimate purpose of the infringing use of the copyrighted work for GAI training. This conclusion applies a holistic test of weighing various factors on both the input and output side of the generative AI equation to more accurately determine whether the use of the copyrighted work really is transformative under the first fair use factor. The Office provides a scale of transformativeness in the GAI context, and states that “[w]here a model is trained on specific types of works in order to produce content that shares the purpose of appealing to a particular audience, that use is, at best, modestly transformative.” (emphases added) The Office notes that this is often the case in the GAI context, giving this example:

“Training an audio model on sound recordings for deployment in a system to generate new sound recordings aims to occupy the same space in the market for music and satisfy the same consumer desire for entertainment and enjoyment.”

This is key to the transformative use analysis. As the Office explains, “the use of a model may share the purpose and character of the underlying copyrighted works without producing substantially similar content.” (emphasis added) In other words, in the context of GAI, the transformative use test requires the AI developer to provide evidence to illustrate that its ultimate purpose in the function and deployment of the GAI (as trained on ingested copyrighted works) differs from the purpose of the underlying copyrighted work. Just because an AI company declares that it had a different purpose or character for the use, doesn’t make it so (see Andy Warhol Foundation v. Goldsmith for more on this).

The Office further notes, evidence of implementing safeguards to limit infringing output is relevant to the extent it could show that the GAI was meant to function and be deployed in a way where it serves a different purpose from the work upon which it was trained—tipping the scale further towards a finding of transformative use. This holistic approach of weighing various considerations is exactly the balancing act contemplated by the fair use doctrine. As noted above, where the ultimate purpose of using music, artwork, and other copyrighted works to train an AI model is so that the AI can generate similar output, the purpose and character of the use will often be similar enough to preclude a finding of transformativeness under the first factor. Moreover, the report stresses that whether an AI developer had lawful access to the works used in training should be considered under the first factor and that it should ultimately weigh against fair use “where [training] is accomplished through illegal access.”

Whatever a court’s assessment may be on the transformative use subfactor in the GAI context, it is crucial is that the analysis be balanced in the overall structure of the larger fair use analysis. As the Supreme Court warned in the Andy Warhol Foundation v. Goldsmith decision, and the Copyright Office recognizes in its report, the transformative subfactor must not be allowed to dominate the entire fair use test (or even a factor-one analysis). Courts must weigh transformative use against other vital factors in any fair use analysis, including commerciality under factor one and market harm under factor four.

Ultimately, the Copyright Office concludes that in some cases, AI use of copyrighted works for training purposes will be transformative to weigh in favor of fair use and in other instances it will not. That is consistent with the way that fair use analyses are regularly and correctly conducted. The fact that the Copyright Office does not conclude that every possible instance of AI use of copyrighted works for training purposes is a transformative use and therefore also a fair use, has some critics wrongly complaining that the report’s conclusions favor copyright owners. But quite simply, that is not how fair use has worked for over 150 years. In short, while we may not like or fully agree with the Copyright Office’s analysis of transformative use as applied to GAI, there can be little doubt that they generally get it right.

2) Debunking False Narratives That AI Training Is Just Like Human Learning and Is Therefore Fair Use

AI companies frequently claim that AI models learn just like humans, and that AI training is thus inherently transformative under the first factor and thus heavily weighs the scales in favor of fair use. But the way AI developers train their AI systems is not equivalent to the way humans learn or the way human educators teach—nor would that be conclusive of a fair use finding. The Office notes: “Fair use does not excuse all human acts done for the purpose of learning.” When humans learn, they actively contribute to a creative ecosystem in a way that results in the creation and distribution of further works—paying an admission fee or tax dollars to go to a museum, purchasing a book to read and analyze in the classroom, etc. Moreover, the Office states: “AI learning is different from human learning in ways that are material to the copyright analysis . . . Generative AI training involves the creation of perfect copies with the ability to analyze works nearly instantaneously. The result is a model that can create at superhuman speed and scale.” All of these differences matter to the fair use analysis. This begs the question—are AI companies saying that AI should be granted greater privileges than humans, such that commercial AI training should qualify as a fair use where a “similar” human use would not? It sure seems that way. When laws and policies favor computers over humans, we are all in big trouble.

The Office also pushes back on another false narrative: that AI training is inherently transformative because it is not copying creative expression. Responding to that argument the Office states: “Language models are trained on examples that are hundreds of thousands of tokens in length, absorbing not just the meaning and parts of speech of words, but how they are selected and arranged at the sentence, paragraph, and document level—the essence of linguistic expression. Image models are trained on curated datasets of aesthetic images because those images lead to aesthetic outputs. Where the resulting model is used to generate expressive content, or potentially reproduce copyrighted expression, the training use cannot be fairly characterized as ‘non-expressive.’”

3) Market Dilution Is a Form of Market Harm to Consider Under the Fourth Factor

The Office thoroughly analyzes the fourth fair use factor which considers the effect of the use upon the potential market for or value of the copyrighted work. This is the most important factor when discussing AI and fair use issues, and the Office first notes that “[t]he copying involved in AI training threatens significant potential harm to the market for or value of copyrighted works” because (1) GAI models can produce substantially similar outputs that are direct substitutes of ingested copyrighted works which leads to lost sales and (2) even where “style” is imitated or the GAI output is not substantially similar to any specific copyrighted work, the use of copyright-protected works to train GAI that produce such output can dilute markets for works “similar to those found in its training data.” The Office acknowledges in the report (and so have its critics) that it is uncharted territory as to how the adverse impact to a market for a general class of works posed by generative AI training would be analyzed under the fourth fair use factor. But the Office takes a step back to embrace the holistic nature of the fair use doctrine to point out that “the fourth factor should not be read so narrowly” as the “[t]he statute on its face encompasses any ‘effect’ upon the potential market.”

The Office recognizes that it is unjust for the demand for human-authored works to be “diluted” by an overwhelming flood of AI-generated works which have been trained by ingesting those same works. It acknowledges several comments that pointed out some of the market harms, including from music stakeholders who pointed to dilution by wholly AI-generated songs of the pool of royalties that would have been payable to human music creators. The Office wraps up its discussion about dilution of the market remarking that “GAI’s ability to imitate style or protected works—an ability that was “made possible by its use in training[—] may impact the creator’s market.” (emphasis added)

This is a salient point and distinguishes GAI cases from prior cases about the potential anticompetitive effect of the contested use. Judges are actively grappling with this very issue, as we saw in the oral arguments during the hearing in the authors-class action lawsuit in Kadrey v. Meta. But as the report explains, the market dilution concept (one of many other important considerations in the fourth fair use factor) appreciates the unique, never-seen-before kind of market harm to creative works and creators caused by widespread, unrestricted copying of expressive works to train GAI technologies. Particularly as style “does capture protectible elements of an original work of authorship,” and cases like Sony Corp. of Am. v. Universal City Studios, Inc., have previously considered harm that would arise from the contested use if it becomes widespread and unrestricted—the market dilution concept is comfortably within the realm of a sound consideration as part of the fourth factor analysis.

4) Voluntary AI Licensing Is Working and Should Continue to Develop in the Free Market

The Copyright Office recommends that “the licensing market [should be permitted] to continue to develop without government intervention.” Pointing to evidence of a number of voluntary direct and collective licensing agreements, the Office notes that these “developments demonstrate that voluntary licensing may be workable, at least in certain contexts particularly where training is focused on valuable content that can be licensed in relatively high volumes (e.g., popular music and stock photography), or in fields where the number of copyright owners is limited.” Copyright law is the foundation upon which these successful free market agreements in the GAI licensing space are made, and it allows both the licensor and licensee to agree on terms that are tailored to the specific needs of the parties.

In response to AI companies who complain about having to license mass troves of copyrighted works and the potential burden of individually securing millions of licenses and permissions, the Office notes that “[c]ollective licensing can play a significant role in facilitating AI training, reducing what might otherwise be thousands or even millions of transactions to a manageable number.” Indeed, collective licensing is not a new concept and has been a historically robust and efficient solution for when there are massive numbers of licensees. In the context of GAI, there are, in fact, several startup companies whose business models have been developed to provide a collective licensing solution for AI companies seeking to secure permissions from a large range of copyright owners. Case studies like the Copyright Clearance Center show that collective licensing solutions have worked, and there is no reason to think that those solutions cannot continue to work—as long as there are robust copyright laws that support these voluntary agreements between parties.

5) RAG Is Very Likely Infringing and Does Not Qualify for the Fair Use Exception

If there is anything that the Office took a more definitive tone and stance on in the report, it was the infringement and fair use implications surrounding retrieval-augmented-generation (RAG) technologies that use copyright-protected works without authorization. RAG is a technique employed on top of certain GAI foundational models where the models retrieve third-party sources to generate output in response to a user prompt. The issues presented by RAG are subject to some of the most high-profile AI and copyright lawsuits that are pending in the courts right now, mostly by news media companies both large and small.

The Office notes that RAG implicates copyright in two main ways: (1) works are copied and stored in a retrieval database to supply to an AI model based on user prompting and (2) the AI system copies works from external sources as part of the retrieval process (specific websites etc.). Moving on to the fair use analysis, the Office notes that under the transformative test under the first factor, a separate analysis needs to be conducted on RAG because the process involves retrieving specific works with the end goal of enhancing the AI output/response. In other words, the purpose and character of RAG output is too similar to that of the retrieved works to qualify the use as transformative under factor one. Further, in its fourth factor analysis of market harm, the Office notes the definite harm by unlicensed use of works for RAG purposes, noting: “. . . RAG augments AI model responses by retrieving relevant content during the generation process, resulting in outputs that may be more likely to contain protectable expression, including derivative summaries and abridgments. A user for whom the augmented response “satisf[ies] the… need” for the original work will not pay to obtain it in the marketplace.”

Conclusion

The Copyright Office provides a thoughtful and balanced examination of the landscape of cases and the technical as well as legal considerations when conducting an infringement and fair use analysis of GAI training. As the Copyright Office repeatedly states throughout the report, the fair use analysis in the GAI context will be very nuanced and specific to the type of AI model, the outputs of the AI model, the way copyrighted works were used for training, and a host of other factors. As Congress and the courts grapple with these issues, they should look to the report for a thorough, detailed, and nuanced discussion about GAI ingestion and fair use issues that will help them better understand how to apply fair use in the context of AI training.

If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

get blog updates