Why AI Opt-Out Systems Don’t Work

Throughout the history of the United States, our copyright system has always been an opt-in system. That means if someone wants to use a copyrighted work, they have to get permission from the copyright owner before using the work (unless an exception in the law applies in which case no permission would be needed). But AI companies now want to reverse American copyright history by implementing opt-out systems that will solely inure to their benefit.

But there’s a reason that copyright law has never embraced an opt-out system and that’s because opt-out systems do not work. There are many legal, technical, operational, and policy problems and inefficiencies with an opt-out system. So many, in fact, that that it is impossible to explain them all in one blog post. But that’s not going to stop us from trying. So, here’s a short list of the most prevalent problems with AI opt-out systems and they don’t work.

Under the Copyright Act, copyright owners are granted the exclusive right to use and to authorize others to use their copyrighted works. Copyright is foundationally and primarily an opt-in system. Unless there’s an applicable legal exception that allows a user to use the copyrighted work without the copyright owner’s permission (such as fair use), the work cannot be used unless the user first obtains authorization from the copyright owner (i.e., unless the copyright owner opts in).

Typically, authorization takes the form of a license between the copyright owner and the user. The onus to seek permission is appropriately placed on the user, after which it is up to the copyright owner to decide whether they want to grant permission to that user for that use. Thus, as a copyright user, under the law, an AI company is required to get permission from copyright owners before using their works. Many AI companies are not doing this—and it’s the largest, most successful ones that are the biggest culprits.

As an opt-in regime, copyright law gives copyright owners the greatest freedom and flexibility to determine when, whether, and how to exercise their copyright through the grant of exclusive rights. Opt-out systems proposed by AI companies on the other hand essentially would grant themselves a right to use a copyrighted work whenever and however they choose unless or until a copyright owner expressly objects. In other words, AI companies want to flip the voluntary and exclusive nature of copyright upside down on its head because asking for permission is inconvenient for them.

Opt-Out Systems Don’t Work When Copyrighted Works Have Already Been Ingested

One of the big problems with opt-out systems in the generative AI (GAI) context is that they fail to address the mass unauthorized scraping and ingestion of potentially billions of copyrighted works to train GAI models that occurs before an opt out is effectuated. Typically, by the time a copyright owner learns about an opt-out system for a specific AI model or company and opts out, the GAI model has already been trained on their works. In the instances where AI companies have offered an opt-out system, the system has almost always been implemented after the model was already developed and trained on copyright protected works. And when the GAI model has already been trained on the works, it’s too late because AI developers are unable toremove copies of those works from the model. This makes opt-out systems meaningless for rightsholders who don’t want their works to be used to train these GAI models. Simply seeking permission to use the work in the first place is the correct and legal approach and would be less costly.

There are so many different AI models and systems that there simply is no way for the average copyright owner to know about every opt-out system for each and every AI model and system.  Putting the burden on copyright owners to identify every opt-out system and then requiring them to effectuate an opt-out in every instance for catalogues of works is a huge burden—especially so for prolific creators. As Ed Newton-Rex details in this article, “[y]ou run an opt-out scheme if you want most people to neglect to opt-out, whether intentionally or not.”  

The Ubiquitous Nature of Copyrighted Works on the Internet Makes Opt-Out a Herculean Task

Even if a copyright owner effectuated an opt-out to prevent future GAI ingestion of their copyrighted works, comprehensive opt-out is an unachievable goal because copyrighted works often exist in multiple places on the internet, which makes it nearly impossible for a copyright owner to apply an opt-out indicator to every copy of a work in existence. For example, a single song can be streamed on a digital streaming platform, played as the background music of a user uploaded video on a social media platform or in advertisements, or displayed as notes or lyrics on a website. The existence of downstream derivative works that adapt, transform, and recast copyright protected works contributes to the herculean task of effectuating opt-out for the underlying work. In most cases, it would be impossible for the rightsholder to opt out in a way where every single downstream use would be properly tagged with the proper opt-out signal to prevent GAI scraping and use. The ubiquitous nature of how copyrighted works are enjoyed and distributed online demonstrates the extent of the impracticalities of applying opt-out mechanisms to legitimate copies of the work.

Further, effectuating opt-out for illegal copies of copyrighted works is near impossible. Pirate copies of works that are available on illicit sites are completely out of the copyright owner’s control. It is well-known that AI companies have scraped, copied, and used pirated copies of creative works illegally obtain from illicit sources to train their GAI models (and in some instances, these companies have even redistributed pirated copies themselves, but that’s a story for another day). Even if a copyright owner is able to opt out on every legitimate copy of their work, these pirate copies will still exist. which will not include an opt-out indicator. Unless AI companies decide to grow a conscience sometime soon, they will continue to scrape these pirate copies from illicit sites despite a copyright owner opting out. An opt-out regime fails to address or ameliorate any of these problems and certainly does not afford the rightsholder any semblance of control.

Existing Technical Tools to Implement Opt-Out Systems Are Ineffective

There are existing technical tools and tools in development stages that theoretically might enable rightsholders to prevent AI bots and crawlers from accessing and scraping their copyrighted works. However, as explained in more detail below, these existing technical tools have significant limitations because (i) they are only effective to the extent opt-out is recognized, respected, and not circumvented, and (ii) these tools were not created to address scraping for GAI ingestion and thus may actually end up doing more harm than good when used. In fact, in manycases, these technical tools are often circumvented and/or ignored by bots and crawlers deployed by AI companies, developers, and other users. Take Common Crawl for example, which regularly ignored and bypassed paywall mechanisms and other technical tools to scrape entire websites containing copyrighted works to place them in archives which major AI companies used to train GAI models.

Robots.txt protocol is one technical tool that comes up often in opt-out discussions. While robots.txt does alert scraping tools not to ingest the associated copyrighted work, its effectiveness is very limited. This is due in part because it is only effective to the extent it is recognized and respected. The other major problem is with robots.txt protocol, which was designed to prevent a search engine from indexing the work, not to prevent scraping for GAI purposes. Thus, using robots.txt will not only prevent scraping for GAI purposes but also prevent a search engine from indexing the work. Most copyright owners do not want their works scraped and used for GAI training but do want their work to be scraped for search engine purposes—so they can be found on the internet and make money from their creativity. Robot.txt is not sophisticated enough to make this distinction. Therefore, if copyright owners want to prevent their works from being scraped and used for AI training, their only option is to remove their entire online presence from internet search and therefore, likely destroying their business.

A further limitation with robots.txt is that it does not attach to the copyrighted work itself and instead operates at the URL or website level. That means even if a web crawler or scraper does respect robots.txt for a particular website, a copy of the copyrighted work that exists elsewhere on the internet is not prevented from being crawled or scraped for AI purposes. For example, if copies of copyrighted works are available on pirate sites outside of the copyright owner’s control, and those sites don’t employ robots.txt, then those copies will end up being included in the training set anyway.

It is worth noting that while there are collaborations happening on an industry level to develop better tools to specifically address crawling and scraping by AI bots, these solutions are nascent and are just one tiny piece of the larger puzzle in a copyright owner’s ability to effectively enforce and protect their works in the digital environment.

Opt-Out Systems Are Tremendously Burdensome for Individual Creators

While opt-out systems are very burdensome on copyright owners, individual creators and artists are particularly disadvantaged when it comes to implementing technological solutions and monitoring for theft of their works. Most individual creators want to spend their time, energy, money, and resources to create art and hone their craft—not researching, learning, and implementing software code, protocols, and other technologies to prevent others from stealing their creative works and using them without permission. Most individual creators already lack resources or technical expertise to regularly monitor for theft of their works or to take technical and other steps to combat old-fashioned digital piracy, let alone find and implement sophisticated ways to combat the multitudes of bots and crawlers employed by AI companies and developers to scrape creative works without permission. It’s in everyone’s best interest that we let these creators do what they do best—create—so that the public can avail themselves of their creativities. And as mentioned previously, by the time they find out about the latest opt-out function, system, or tool, it’s too late—these creators’ and artists’ works have most likely already been scraped and ingested to train AI models.

An opt-out system disproportionately places the burden on individual creators to monitor for GAI ingestion of their works and to use novel, fluctuating technological solutions to prevent ingestion. It’s ultimately just another obstacle to the enjoyment of their copyrights that diverts them from creating new works for the public to enjoy. That is not a sustainable way to encourage more human creativity and innovation.   

The Binary Nature of Opt-Out Systems Creates an Obstacle to Licensing

Most opt-out systems are limited by virtue of their inherent, binary nature; either the work can be used or it cannot. This leaves no opportunity for the parties to negotiate terms of use unless the rightsholder and AI company reach some agreement for use of the creative work. But this is something they can already do under current copyright laws—by licensing. Creative, industry-led technical solutions are continuing to be developed and discussed to allow for less-binary systems where opt-outs are conditioned on other terms, like payment—but inherently that makes the system not an opt-out system anymore. It is just a license.

As seen on our AI licensing webpage, the markets for AI copyright licensing have blossomed with many creative solutions, partnerships, and arrangements between the AI and creative sectors. The AI licensing markets have also given rise to a host of small to medium-sized AI companies that have built their entire businesses upon commitments, partnerships, and licenses secured with copyright owners under the copyright legal framework. Opt-in, permissions-based agreements and licenses have resulted in more, not fewer, partnerships between the AI and creative sectors. An opt-out system on the other hand, is completely uninspired because it assumes that AI training must be a zero-sum game—nipping creativity and innovation in the bud and hindering AI innovation.

Opt-Out Systems Violate U.S. International Treaty Obligations

Legal regimes implementing opt-out systems in the GAI context, especially as part of legal exceptions, risk violating international treaty obligations under the Berne Convention, a major international copyright treaty with 182 signatory countries, of which the United States is one.  Article 5 of the Berne Convention states that copyright protection cannot be subject to a formality. Opt-out is a formality.  Under the Berne Convention, opt-out regimes, particularly those in the context of GAI-related copyright exceptions, make the exercise and enjoyment of exclusive rights conditional on the copyright owner undertaking the impermissible formality of opting out. Such regimes do not have a place in any country’s copyright systems and certainly not in the U.S.

Opt-Out Systems Are Ineffective Unless There’s Transparency

Whatever shred of utility an opt-out system may have in the context of GAI ingestion and training, it is rendered completely useless if there are no accompanying transparency standards or obligations to enforce the opt-out and hold AI companies accountable. AI companies that offer opt-out systems have no real obligation to rightsholders to ensure that these systems are actually working. Bills like the TRAIN Act illustrate that transparency obligations on AI companies are essential to an AI ecosystem that is developed and used in a responsible, respectful, and ethical manner. If AI companies are offering opt-out systems to ensure AI models are being developed and used in such a manner and to ensure rights are being respected, they should not have issues disclosing what copyrighted works they use for training. Transparency measures ensure that any opt-out systems offered by AI companies or required by legislation are respecting creators and copyright owners’ rights.

Is There Any Benefit to Opt-Out?

For the reasons we explain above, it is essential that an opt-out system not be mandatory. However, there may be a role for voluntary notice systems that let the copyright owner notify an AI company that they do not want their works used by the AI company. When copyright owners put copyright AI companies (or other users) on notice that their works are not permitted to be used for GAI training, these objections must be respected by AI companies (and users) regardless of the form of that notice. If an AI company disregards the opt-out notice and scrapes, ingests, and otherwise uses a copyrighted work contrary to such notice, that AI company or developer should be liable for willful infringement and be subject to heightened damage awards under the Copyright Act.

Conclusion

In this blog, we highlighted some of the key problems of an opt-out system. But we are not alone here. Many reporters, creative sector leaders, and others have also voiced their strong opposition to opt-out systems and explained why they do not work. Perhaps, the best of these comes from ex-Stability AI executive and now CEO of Fairly Trained, Ed Newton-Rex, who lists ten reasons why opt-out is no good. We could not agree more with Newton-Rex and others on the problems with opt-out systems. Put simply—opt-out systems do not work. They undermine fundamental copyright law principles and inhibit true creativity and innovation in the creative and technology sectors.

Opt-out is not a solution. The solution is and has always been to respect the rights of creators and copyright owners and whether and how they choose to exercise their rights. This is done not by undermining those rights, like an opt-out system does, but rather by encouraging copyright licensing to develop and train generative AI models. That solution, not opt out, is the best way to ensure that the U.S. AI industries thrive and partner with our creative economy.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

get blog updates