Does the Use of Copyrighted Works to Train AI Qualify as a Fair Use?

by Cala Coffman

As lawsuits challenging the use of copyrighted works in generative artificial intelligence (AI) systems begin to snowball, there is one question at the forefront of the conversation: does the unauthorized use of copyrighted materials to train generative AI qualify for the fair use exception? As with all fair use determinations, the answer is that it depends; what is clear is that fair use does not magically excuse all AI use of copyrighted works as training materials. However, as courts begin to hear these cases, fair use and AI will likely remain at the forefront of copyright conversations.

Before further examining this question, it is important to explain that any fair use analysis is highly fact dependent. Anyone who makes overbroad statements that every use of copyrighted works for the purpose of training AI qualifies for the fair use exception, is patently wrong and has overlooked the nuances necessary to determining whether an infringing use may qualify for the fair use exception. The same can be said for those who make overbroad statements that every use of copyrighted works for the purpose of training AI is not a fair use. However, it is possible to make some educated generalizations about how a court might analyze the four fair use factors and to consider how existing case law may be applicable in situations where copyrighted works are used to train generative AI.

First Fair Use Factor: The Character and Purpose of the AI Use

The first fair use factor, the purpose and character of the use, encompasses what will likely be a popular argument for proponents of the position that using copyrighted works to train AI qualifies for the fair use exception. The first factor takes into consideration whether the use is a commercial or nonprofit educational use, and whether the work “transforms” and adds something new to the copyrighted work. If a use is found to be non-commercial, that finding weighs toward a finding of fair use, while uses that are commercial weigh against a finding a fair use. However, the question of commerciality is just one part of the analysis of the first factor, and therefore is not dispositive.

In many cases, AI platforms trained on copyrighted works are used for commercial purposes. This is especially true of some of the most popular generative AI tools. Although some generative AI platforms may not initially appear to be commercial, platforms like Midjourney, Dal-E, and ChatGPT are now offering commercial subscriptions to their users (for example, OpenAI offers a subscription model for ChatGPT). Furthermore, even if an AI is not yet commercial, any plans AI developers make to enter the commercial market will also result in this factor weighing against a fair use finding. The bottom line regarding this part of the first fair use factor and AI is that courts are not likely to look kindly on an infringing use that reproduces copyrighted works to make money or otherwise financially benefit from their expressive, protectedelements.

Instead, the crux of many AI developers’ fair use defenses will come down to the transformative purpose element of the first factor analysis, which has been an increasingly determinative (and amorphous) factor in fair use cases.

Many developers argue that training generative AI on copyrighted works is transformative because AIs scan works to identify and utilize “patterns inherent in human-generated media.” However, as will be discussed in the second part to this blog, transformativeness requires something more than reproduction for consumption, such as communication of information about the underlying work, or inclusion of information about the underlying work in a database. While findings of transformative uses have often disproportionately led to decisions that the use qualifies as a fair use, courts should still recognize that even if training AI is found to be a transformative use, that does not automatically mean that it qualifies as a fair use.

Second Fair Use Factor: Nature of the Copyrighted Work

Under the second fair use factor, courts analyze the nature of the copyrighted work; for example, whether the copyrighted work is factual or creative. Many generative AIs training on copyrighted works may run afoul of the second factor of this analysis because they are trained, at least in part, on highly creative works like visual art, music, or writings. While this factor is rarely dispositive, when the underlying work is creative, it weighs against a finding of fair use. To be clear, since fair use is a fact intensive analysis, there very well may be AI platforms that train on more factual works, and that may sway this factor towards a fair use exception.

Third Fair Use Factor: Amount of the Copyrighted Works Used

The third fair use factor considers the amount and substantiality of the portion used in relation to the copyrighted work as a whole. Courts have held that when an infringer copies the entirety of a copyrighted work, or the work’s creative “heart,” this factor almost always weighs against fair use, especially where multiple complete works are copied. Although in some cases copying of an entire work may be permissible because it is necessary to accomplish a transformative purpose, an infringer may take no more than necessary to achieve the transformative purpose.

The logistics of most AI generation and training involve a complete and total copying of multiple copyrighted works. In its recent complaint against Stability AI, for example, Getty Images describes that the reproduction of its high quality images, paired with detailed text descriptions has “been critical to successfully training the Stable Diffusion model to deliver relevant output in response to text prompts.” In other words, AI generators must copy as much as possible from expressive works, including the most expressive or crucially creative parts of the work, to achieve the purpose of training to generate quality output.

In a statement to the USPTO, and relying on Authors Guild v. Google, Open AI argues that the amount of a copyrighted work copied is not the point of the third factor, but rather the amount of a copyrighted work made available to the public. Open AI admits that the use of entire works is “reasonably necessary” to create an accurate AI but argues that substantial copying should not matter when the copy is not made available to the public. This argument is completely unsupported by the Copyright Act, and if courts were to adopt this approach it would eviscerate the reproduction right by requiring a distribution to take place for a violation of the reproduction right to occur. Moreover, the fair use exception explicitly directs courts to look at the amount and substantiality of the copyrighted work that is used as opposed to a judicially created “public access” theory. Add to that the fact that while OpenAI and other developers say that copies are not made available to the public, it’s unclear whether or how repositories of works that are created without authorization for training purposes are safeguarded against further reproduction and distribution. Although the third factor is not dispositive and (like the other factors) is highly fact dependent, where works are produced in their entirety, this factor will likely weigh against a fair use exemption.

Fourth Fair Use Factor: The Impact on the Value and Market for the Copyrighted Work

The fourth factor of a fair use analysis weighs the effect the infringing use has on the potential market for or value of the copyrighted work. Courts have held this factor to weigh against a fair use finding when the infringing work acts as a market substitute for the copyrighted work, and sometimes even when an infringing use lays outside the markets a copyright owner currently occupies (so long as that market is one a copyright holder might reasonably enter).

There are strong arguments to be made that AI training on copyrighted works harms the market and value for those copyrighted works. Foremost is the fact that many developers do not compensate copyright owners for the works used to train generative AI, despite the fact that many copyright owners presently offer AI training licenses. This destroys copyright owners’ licensing markets. Generative AI systems trained on copyrighted works harms the market for copyright owners to license works for AI training datasets. Many artists and copyright owners offer licenses for their works to be included in AI training datasets. Getty Images for example offers licenses for AI developers to use its images in training datasets (licenses which Stability AI did not obtain). Other copyright owners like academic publishers and others also offer AI licenses. The continual development of these licensing markets points to the fact that copyright owners are part and parcel of the AI developing world and are working with or are open to working with AI developers to advance AI innovation and tools. Courts have recognized where such a viable market exists to help artists capture value from reproduction and distribution of their copyrighted works, potential licensing value from that market should be considered in a fair use analysis.

Conclusion

As discussed, fair use cases involving generative AI training on copyrighted works will be highly fact dependent. While some AI-related uses may qualify as fair use, unauthorized use of copyrighted material to train AI systems cannot be handwaved by a broad fair use exception that disregards the rights of creators. Neither the Copyright Act nor case law (as will be discussed in the second installment of this blog) would support such a broad fair use exception for AI. Without the factual nuances of a real application, it is difficult to say how courts may decide fair use and AI cases. However, while technological innovation will often test our understanding and application of fair use, the underlying principles of copyright law must not be cast aside in favor of in an unhinged race of technological advancement which may lead to harmful and irreversible consequences. AI should be responsibly and ethically developed, and developers must respect artists and copyright owners’ rights.

If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

get blog updates