Generative AI Licensing Isn’t Just Possible, It’s Essential

There is a recurring argument made by many generative AI companies and their supporters that getting permission to use copyright-protected material to train AI models is overly burdensome and would stifle innovation. The gist of these claims is that AI models need to be trained on as much material as possible, and that requiring AI companies to get permission or license all the material they need will grind generative AI development to a halt, drive investment and innovation overseas, and hurt competition. While it’s not always said out loud, underscoring such assertions is a belief that copyright law in general is an impediment to technological advancements and should be applied sparingly (if at all) in the name of unfettered innovation.  

This isn’t a new trend. When technological advancements change the way copyrighted works are reproduced or distributed, those who question whether the use is infringing are often accused of standing in the way of innovation. Similarly, the idea of licensing is dismissed as unnecessary, impossible, and something that would hinder technological progress.

Now more than ever, these positions must be rejected. Generative AI models are built on the expressive works of creators at a scale unlike any technologies before them. The mass unauthorized use of copyright protected works to train models that produce materials that supplant the market for the original works is a serious threat to creators and their livelihoods. Gaining permission to use copyrighted works and entering into licensing agreements may be a speed bump for AI companies, but it’s one that will ultimately benefit them and is necessary to preserve the progress of the (human) arts.

Generative AI has made remarkable progress since late 2022 when new large language models and image generators began to push the boundaries of what we thought possible. But as new models were launched and new companies formed, copyright owners began to question how the models were trained: What materials did these models ingest? How are datasets compiled and how do developers gain access to works? How can we tell if and what works were ingested? Don’t these companies need permission to scrape websites and use works?

Perhaps not surprisingly, AI companies take the self-serving position that they do not need permission to scrape the entire internet and use whatever material they like to train their models. In fact, they say that it is absolutely necessary for them to do so, lest the model not function optimally. They’ve also said that licensing is impossible. In comments submitted to the Copyright Office in response to its current AI study, OpenAI said that “[t]he diversity and scale of the information available on the internet is thus both necessary to training a ‘well-educated’ model …and also makes licensing every copyrightable work contained therein effectively impossible.”

Meta’s comments to the Copyright Office claim that “[i]mposing a first-of-its-kind licensing regime now, well after the fact, will cause chaos as developers seek to identify millions and millions of rightsholders, for very little benefit, given that any fair royalty due would be incredibly small in light of the insignificance of any one work among an Al training set.” Meta’s and OpenAI’s comments reveal the truth behind the generative AI race: many companies have already made unauthorized use of massive amounts of copyrighted works, and they think they shouldn’t have to stop now because large-scale licensing would be difficult. But while licensing every single piece of copyrighted material scraped from the internet may not be practical, that’s not what copyright owners are asking for, and voluntary licensing would result in permission to use vast quantities of valuable, curated works. 

Meta’s comments also reveal their misguided take that simply because compensation to any given copyright holder may be small, it shouldn’t be required at all. But whether the amount of compensation going to each creator is too small or not isn’t AI companies’ problem. Just because compensation may not be substantial in the eyes of those companies after it’s been distributed to an individual copyright owner does not mean that they aren’t or shouldn’t be required to pay something. Moreover, royalties can add up to considerable money, particularly as AI companies scale their businesses, and payments to copyright owners can grow proportionately over time. For example, the creator and copyright owner of the Sesame Street theme song may have only received a few cents every time it was played, but those plays added up over the years to a significant amount. Under the AI companies’ view, since the copyright owner would only be receiving miniscule payments per use, that means they shouldn’t be paid at all—and even Oscar the Grouch recognizes that argument as garbage.

The Loss of an Innovative Edge is Nothing More than a Scare Tactic  

Generative AI companies and their supporters have repeatedly raised the spectre of innovation moving overseas and the United States losing a competitive edge if licensing is required. In written testimony before the House Subcommittee on Courts, Intellectual Property, and the Internet, a lawyer whose firm represents AI companies argued that “any royalty providing meaningful compensation to individual creators could impose an enormous financial burden on AI companies that would either bankrupt them or push all but the largest companies out of the market (or out of the country).” In OpenAI’s comments in response to the Copyright Office’s AI study, it warned that “[a] restrictive interpretation of fair use in the AI training context would put the U.S. at odds with this growing trend and could drive massive investments in AI research and supercomputing infrastructures overseas.”

The idea that AI companies do not need to license or get permission to use copyright protected works for training rests almost entirely on their claim that what they are doing categorically qualifies as fair use. To be sure, there may be anomalous situations in which the unauthorized use of copyrighted materials for training may qualify as fair use, but many commercial generative AI companies claim that any interpretation of fair use that does not excuse their conduct would drive innovation outside of the United States. But threatening to move their companies to other countries because of “restrictive” copyright laws is nothing more than a scare tactic. It’s not as if AI companies, even if they moved to a foreign country, wouldn’t avail themselves of the U.S. market. Moreover, we’re already seeing AI companies adapt to the EU directives on AI and copyright because they can’t afford not to do business there. The same is true for the U.S., and the truth is there are many other factors that would go into a company’s decision to base their business in the U.S. or abroad—not the least of which is first amendment protections. 

By invoking the risks of falling behind foreign countries in the generative AI race, AI companies draw from an old playbook in hopes it will resonate with policymakers. Indeed, it’s a scary proposition to think about the United States falling behind in an AI race with countries like China and Russia, but it’s a scare tactic that should be recognized as such. Warnings about the U.S. losing an innovative edge have been disproved time and time again as the U.S. remains the world leader in innovation and technological development. It’s past time this page from the anti-copyright playbook is permanently retired.

Faux Concerns About “The Little Guys” Are Misleading

Another common argument heard from big tech companies is that requiring licensing (or really imposing any accountability for copyright infringement) would harm small startups and stifle competition. This is such an absurdly hypocritical position that it’s hardly worth a response. But just so we’re clear, the entrenched tech companies leading generative AI development do not care about small start-ups and have a history of quashing competition to the point that they are regularly investigated and found guilty of anticompetitive activity.

A report from October 2024 detailing how a generative AI startup, Character.AI, was abandoning efforts to develop a large language model (LLM) explained that “[t]he market for generative AI models is barely two years old, and yet big tech has already eliminated much of the competition.” While Character.AI gave up on developing their own model, it struck a deal with Google in which the tech giant essentially absorbed the smaller startup. According to the article, that deal was similar to one Amazon struck with two AI startups, Adept and Covariant. The article concludes that, “[o]verall, the result is a much thinner competitive market for model development and boosts for the largest players in the game. The Wall Street Journal also recently warned that “Big Tech’s playbook for expanding its dominance is familiar,” and that it shouldn’t be allowed to use it to “tighten its grip over AI.”

What’s clear is that claims that OpenAI or Google are just concerned about startups and healthy competition when they oppose licensing are totally bogus and self-serving. They would just as soon spend money quashing or acquiring a competitor than actually compensating a creator or copyright owner when they scrape their works and feed them into their AI system. What’s more, their faux concern about small startups not being able to afford licenses ignores the simple fact that licensing terms can and do differ depending on the market and who is seeking a license. It’s not a one size fits all approach, and no one would expect a small startup to pay what a giant tech company does. 

How do We know Voluntary Licensing Works? Because It Already Is

There is already high demand for corpuses of copyrighted works for ingestion by AI systems, and copyright owners are offering and entering into various licensing agreements. Publishers and copyright owners of scientific and research works such as Elsevier, JSTOR, the Copyright Clearance Center (and many others) have either offered or entered into licensing agreements that allow for text and data mining (TDM) or other generative AI training uses. Visual media giant Getty Images has struck several licensing deals with generative AI companies for use of their vast catalog of stock images for training. Reddit has partnered with Google on a $60 million-per-year deal to provide content for training its AI models, including Gemini. Multiple news organization, including NewsCorp, the Associated Press, the Atlantic, and the Financial Times, have reached deals with OpenAI for use of their works to train ChatGPT. The list goes on and on–with new licensing deals being announced almost daily.

While those licensing agreements are largely one-to-one deals between established copyright owner companies and generative AI developers, there are a number of recently launched voluntary collective licensing organizations. Startups like Created by Humans, Calliope, Dataset Providers Alliance are working to aggregate and license copyrighted works for ingestion by AI models and offer individual copyright owners the opportunity to be compensated for use of their works. While the organizations are still in their early stage, they show that there is a path towards voluntary licensing that would allow smaller copyright owners to control their works and earn incremental revenue.

The fact that some companies developing generative AI tools are entering into licensing agreement is a testament to the fact that they understand that it benefits them and their users. There are a number of advantages to licensing, including potentially not having to disclose what material your model is trained on. There’s also the benefit of gaining access to a curated, high-quality works that are already tagged with metadata. Finally, licensing would eliminate or limit liability surrounding infringement that so many AI companies are now dealing with in various lawsuits.

Conclusion

Ultimately, there are long term benefits to all stakeholders that would flow from voluntary licensing regimes. At a fundamental level, if generative AI companies continue with mass unauthorized use of copyrighted material without permission and payment, the incentives for humans to create would be greatly diminished. As the Second Circuit recently articulated in the Hachette v. Internet Archive decision, allowing for the large scale copying and distribution of copyrighted works without permission from or payment to copyright owners “diminishes the incentive to produce new works.” That court went on to explain that outcome is “not an approach that the Copyright Act permits.” In the end, perhaps generative AI will take the place of humans and produce more works at a faster rate than we ever could, but that’s not a future most humans want to see. What most humans want AI to do is to make their dinner or clean their homes so that they have more time to create—not the other way around.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

get blog updates