Requiring AI Transparency Won’t Destroy the Trade Secrets of AI Companies

During the year, several bills have been introduced in Congress that, if passed, would require generative AI companies to be transparent and disclose to the public the makeup and source of materials used to train their models. Some of these bills would impact copyright and at least one is specifically focused on transparency for copyright purposes—H.R. 7913, the Generative AI Copyright Disclosure Act of 2024, introduced by Representative Adam Schiff (D-CA).

Although the Copyright Alliance has taken no position on these bills, in both our comments submitted to the U.S. Copyright Office in the fall of 2023 and a blog we posted over the summer, we go into great detail to explain why “transparency is an essential element of an AI ecosystem that is developed and used in a responsible, respectful, and ethical manner.”

Some Artificial Intelligence (AI) companies disagree with this view and have opposed these bills and the notion of transparency more generally. [1] Their stated concern is that requiring transparency about the material they use to train their AI systems will require them to divulge their trade secrets and doing so would extinguish any trade secret protection that exists in their training methods or AI models, including proprietary algorithms.

The Copyright Alliance has long been a staunch supporter and advocate for strong and effective intellectual property protection. In fact, it’s part of our mission statement. We would not support legislation or policies that undermine such protection. So why do we support transparency in AI? Because the trade secret concerns of these AI companies are a smoke screen.

Before moving forward on this issue, it may make sense to start by explaining what a trade secret is. And to do that, let’s go to the source—Black’s Law dictionary—which defines a trade secret as “a formula, process, device, or other business information that is kept confidential to maintain an advantage over competitors; information — including a formula, pattern, compilation, program, device, method, technique, or process — that (1) derives independent economic value, actual or potential, from not being generally known or readily ascertainable by others who can obtain economic value from its disclosure or use, and (2) is the subject of reasonable efforts, under the circumstances, to maintain its secrecy.” (emphasis added)

It is also helpful to refer to the U.S. Patent and Trademark Office’s definition, which defines a trade secret as:

“information that has either actual or potential independent economic value by virtue of not being generally known,
has value to others who cannot legitimately obtain the information, and
is subject to reasonable efforts to maintain its secrecy.”

It notes that “all three elements are required; if any element ceases to exist, then the trade secret will also cease to exist.” Examples of trade secrets include things like the secret formula for Coca-Cola or the process behind the New York Times Bestseller list.

Based on these definitions, let’s examine what about the AI training process might qualify as a trade secret. Certainly, details about weights or parameters used by the AI company in constructing a generative model could potentially qualify as a trade secret. But the copyright community is not asking for such information. What the copyright community is simply requesting is for AI companies to disclose the copyrighted works that they use to train their AI system when these works are: (i) being used to develop a generative AI (GAI) system that is made available to the public; and (ii) not owned by the AI developer or licensed (for training purposes) from the copyright owner by the AI developer. Nor is the copyright community asking for transparency where imposition of a transparency requirement would be contrary to or inconsistent with obligations under other laws (such as privacy laws), contracts, or collective bargaining agreements.

In effect, the copyright community is only asking for AI developers to disclose the copyrighted works used to train their system without a license, which typically means those that were scraped off the internet. As evidenced by the definitions above, there is no trade secret protection in the collection of these works because they are publicly available to anyone who has internet access. There also wouldn’t be trade secret protection for the process of scraping the entire internet for material, as it is a generally known and ascertainable way to create a training dataset. Moreover, many AI companies are using the same sources (e.g., common crawl, Books3) for these materials and most AI developers see their competitive advantage in their algorithms, and do not view choice of training materials as a source of competitive advantage.

Those AI companies that are opposing copyright transparency seem to be arguing to policymakers that trade secret law protects the identity of the works they use to train their AI models and thus transparency requirements should not be imposed on them. But policymakers would be well advised to not take the bait. As explained in more detail in our earlier blog, transparency is a crucial element of any AI policy, especially policies relating to enforcement of copyright. But transparency of training data is critically important for many reasons well beyond copyright, including detecting bias, understanding the source of AI system errors, etc.

Transparency is not a new concept. There is already a trend towards transparency in other countries and in various states. For example, within the United States there are various state laws where AI transparency is already required, albeit outside the context of copyright. [2] Outside the United States, the European Union’s AI Act entered into force across all EU Member States on August 1, 2024. The EU AI Act recognizes the importance of copyright transparency by mandating that generative AI providers make publicly available a sufficiently detailed summary of the copyrighted works they use to train their models.

Imposing reasonable transparency requirements on AI developers is one of the most important steps that Congress and the Administration can take to protect the public as well as the creative community. The notion that trade secret protection is an obstacle to obtaining that goal is a fallacy.

[1] Not all AI companies oppose transparency. For example, at least one company (IBM) is apparently disclosing the sources of their training models, regardless of any legal requirement to do so.

[2] Of course, since states cannot enact copyright-related laws, these state AI transparency laws relate to non-copyright issues.

get blog updates