The Largest IP Theft in History: Takeaways from the Senate Hearing on AI and Copyright Piracy

On July 16, the Senate Judiciary Committee’s Subcommittee on Crime and Counterterrorism held a hearing titled Too Big to Prosecute?: Examining the AI Industry’s Mass Ingestion of Copyrighted Works for AI Training. While some courts may struggle to articulate why these pervasive pirating activities of AI companies seem so disturbing—Senators on the Subcommittee took charge in demonstrating the ridiculous, un-American position that what they referred to as “the largest IP theft in history” should ever be condoned. Here is a recap of what happened at the hearing.

Opening Statements

“Today’s hearing is about the largest intellectual property theft in American history . . . Here is the truth that nobody wants to admit. AI companies are training their models on stolen material. Period. That is just the fact of the matter . . . We’re talking about piracy. We’re talking about theft.”

-Chairman Senator Josh Hawley (R-MO)

Chairman of the Subcommittee, Senator Hawley (R-MO) could not have started the hearing off on a stronger note. Highlighting the incredible scope of AI companies’ pirating activities, he noted that AI companies stole “billions of pages of copyrighted works, enough to fill 22 libraries the size of the Library of Congress…This theft was not an innocent mistake—they knew exactly what they were doing.” He then highlighted evidence that illustrated that AI companies were aware of the illegalities but bulldozed ahead anyway. Hawley emphatically stated: “This is not just aggressive business tactics. This is criminal conduct.”

Chairman Hawley also tore apart the AI companies’ argument that their acts were necessary to develop AI so that the U.S. can win in the AI race against China, saying: “every time they say things like ‘We can’t let China beat us.’ Let me just translate that for you. . . . What they’re really saying is ‘Give us truckloads of cash and let us steal everything from you and make billions of dollars on it.’ That’s the translation.”

He concluded his opening remarks just as strongly as he started, saying:

“Here’s the bottom line. We have got to do something to protect the people of this country. I’m all for innovation, but not at the price of illegality. I’m all for innovation, but not at the price of destroying the intellectual property of the average man and woman in this country. We have laws for a reason—those laws ought to be enforced. Big Tech should not be above the law. Enough is enough. It is time to enforce the law.”

In his opening remarks, Ranking Member, Senator Durbin (D-IL) highlighted the economic contributions of the creative industries of over $1 trillion to the U.S. economy, underscoring the importance of the American copyright system. Senator Durbin expressed disbelief in AI companies’ reliance on pirating creative works as a business tactic stating: “As Anthropic’s CEO put it, Anthropic had many places from which it could purchase, but preferred to steal them to avoid, quote ‘legal/practice/business slog’—whatever that means . . .It [also] kept pirated copies of works it already downloaded anyway—I don’t get that.”

Maxwell Pritt Details Scale of AI Piracy

First of the witnesses to give opening statements was Maxwell Pritt, counsel for creator-plaintiffs in many of the ongoing lawsuits against AI companies, including the Kadrey v. Meta case. Pritt’s testimony stressed the jaw-dropping scope of AI companies’ reliance on online repositories of stolen copyrighted works (some of which have been prosecuted by the FBI and Department of Justice) to seek a competitive advantage.

He noted that AI companies took “tens of millions, if not hundreds, of books and scholarly publications and articles for free instead of buying them or licensing them from copyright owners.” Additionally, Pritt explained that Meta in particular had pirated over 200 terabytes worth of pirated copyrighted works from multiple pirate e-repositories and also made copies and distributed over 40 terabytes worth to other pirates through peer-to-peer sharing. Pritt explained that in addition to the depth of piratical activities engaged by Meta, what was even more shocking is that top-level executives approved these practices and went ahead anyway.

Pritt also pushed back against the argument made by AI companies that their mass acts of infringement are justified or that exceptions are needed for AI companies in order for the U.S. to win the AI race against China, stating:

“Nonsense. Our tech companies employ the best and brightest minds in the world and they are the wealthiest corporations in the world. It is not credible that these companies can invest hundreds of billions of dollars in hiring talent and building data centers to power their commercial AI products and models. But they can’t pay a single cent to copyright owners.”

In response to a line of questioning from Chairman Hawley about the licensing options as an alternative to Meta’s engagement in piracy, Pritt highlighted that discovery in the Kadrey case showed that Meta had contemplated dedicating tens to hundreds of millions of dollars for licensing, but they had forgone that plan to instead copy pirated works. Senator Hawley expressed his disbelief of how hundreds of millions of dollars could have been given to creators, but that they saw not a single cent.

Later in the hearing, Chairman Hawley inquired about whether Meta tried to hide its use of pirated works, to which Pritt explained that discovery in the Kadrey litigation revealed that when Meta started to utilize Anna’s Archive to get pirated works, they intentionally chose a third-party service to get server space that would make it difficult to trace Meta’s torrenting activities back to Meta. Drilling further on the point of Meta’s knowledge of the illegal and criminal aspects of its activities, Chairman Hawley asked Pritt to opine on evidence of Meta’s knowledge. Pritt stated: “No court, including the Supreme Court has ever held that rank piracy is somehow fair use. And instead, the Supreme Court case law . . . says that fair use presupposes good faith and fair dealing.”

Professor Michael Smith of Carnegie Mellon University provided an opening statement that focused on how digital piracy creates perverse incentives that results in a lose-lose-lose—for society, creators, and tech companies. He first addressed the AI companies’ argument that copyright enforcement would stifle innovation, stating: “while times have changed, the underlying economic principles are the same today as they were in 2000. And by applying those principles, I think we can draw many of the same conclusions.”

As Professor Smith noted, IP and copyright as a “barrier to innovation” is a tired play from the old tech playbook that has been proven wrong as experienced in the age of the start of the Internet and rise of digital technologies. He noted that instead, we see now that copyright law and licensing actually form the backbone of these technologies, which work to sustain both creators and technology companies—streaming platforms for music and television works being one such example. Smith noted that GAI is no different, and that to conclude that copyright can be disrespected to secure a technological future would introduce perverse incentives against the face of evidence that the economic principles of incentivizing creators through the grant of exclusive rights and respecting those rights actually results in greater economic and societal value and innovation in technology. A long list of the burgeoning AI and copyright licenses illustrates that copyright laws have been working to drive value and pave a path forward in the GAI and copyright debate—a future in robust and sustainable GAI technologies is at peril if criminal activities in the AI training process are excused and condoned as “fair use” or under any other exception.

“Allowing Generative AI companies to launder licensable content through piracy” as Professor Smith put it, harms the markets for copyrighted works, the licensing markets for copyrighted works, and creates perverse incentives for pirates to continue scaling their criminal networks and enterprises. He concluded,

“I think today we have a similar opportunity to create a win-win-win for society, creators, and tech firms by making it clear that piracy is wrong. And that a vibrant technology economy depends on a vibrant creative economy . . . On our current path we risk killing the goose—or in this case the authors, musicians, coders, and filmmakers—who laid the golden eggs that are key to the present and future value of generative AI output.”

Later in the hearing, Senator Peter Welch (D-VT) asked about the dangers of permitting GAI companies to use copyrighted works without compensating the copyright owners. Professor Smith responded that the drafters of our Constitution thought that copyright was a “really good idea,” and that GAI companies are making it easier for others to steal, directly participating in piratical activities, and are inherently supporting pirate networks, further creating perverse incentives to pirate and steal while undermining the positive incentives flowing from licensing under copyright law to encourage creation and innovation. He stated:

“When you’re signing a licensing agreement with a generative AI company, you’re signing with a gun held to your head. Because they can say, ‘Either sign what I’m offering, or I’m going to go steal it instead.”

Professor Smith noted that a viable path forward for both GAI companies and creators is possible if copyright laws are enforced and respected.

Professor Bhamati Viswanathan Speaks to Criminal Infringement

Professor Bhamati Viswanathan of New England Law provided an opening statement that focused on how AI companies’ use of pirated works is a “crime compounding a crime”—a crime, which she stressed, is not a victimless crime. She also stressed that the piracy of creative works denigrates Constitutional rights, stating:

“This is enshrined in the United States Constitution. The intellectual property clause is one of the things that makes this country not just great—but robust, powerful and economically hugely successful . . . This is truly at risk right now—this entire incentive structure that was brilliantly thought of by our Founding Fathers.”

And AI companies have options, besides engaging in piratical activities and supporting criminal enterprises that have been prosecuted by the U.S. government. Professor Viswanathan highlighted:

“The solution is licensing. It already exists. The licensing of works. The fair compensation of creators . . . You cannot compromise the livelihoods creators . . . What we need is for new technologies to flourish fairly, sustainably, in ways that makes sense to us and that have already been provided for by our Constitution, by the U.S. copyright law, by intellectual property law itself.”

Chairman Hawley asked Professor Viswanathan a line of questions, inquiring about the pirating process and the sources from which these pirated works are copied and further proliferated by AI companies. Professor Viswanathan noted that the pirate websites and repositories that AI companies utilized have been prosecuted by the federal government, and most importantly, that engaging in piratical networks ultimately supports these criminal enterprises. Noting the pirate repository, Anna’s Archive, in particular, Professor Viswanathan highlighted how the pirate website advertised and offered to AI companies for training purposes large datasets of stolen copyright protected material for sale or for data exchange.

Later in the hearing when Senator Durbin inquired about the knowledge standards for determining criminal copyright infringement, Professor Viswanathan again stressed how employees within GAI companies knew of the obvious illegality of engaging in piratical activities. Moreover, she opined on the applicability of the fair use doctrine in the context of use of pirated works for GAI training, noting, “This is not what fair use was intended to achieve or facilitate . . . But boy— this does not seem consonant with what fair use was ever meant to do.”

David Baldacci Provides an Author’s Perspective

“I’m only one man, but books transformed my life, propelling me to a far better existence. I’m sure there are aspects of AI that will also transform the world. But if you want to bet on which side is more transformational, for all of us, I will bet on books every single time.”

Best-selling author, David Baldacci, provided an opening statement focusing on his personal experiences with pervasive digital piracy and the effect of having at least 44 of his 60-some novels repetitively copied and used to train Meta’s GAI model. When Baldacci’s son prompted ChatGPT to write a plot that read like his novels, Baldacci stated:

“That’s when I found out the AI community had taken most of my novels without permission and fed them into their machine learning system. I truly felt like someone had backed up a truck to my imagination and stolen everything I’d ever created.”

Later on in the hearing, in response to Senator Durbin’s line of questioning about the novel writing process, Baldacci stated that this experience with ChatGPT made him feel like he had been robbed, and that “[t]his is not supposed to happen in this country.”

Baldacci also pointed out:

“I’m aware of the argument that what AI did to me and other writers is no different than an aspiring writer reading other books and learning how to use them in original ways. I can tell you from personal experience that is flatly wrong.

Later in the hearing, Baldacci responded to Senator Durbin’s question about whether he polices against infringement, stating: “the only thing a software platform can do is take from what has already been created. They can’t create anything really on their own. They take my mishmash and put it all together and throw it out the other end. But it still looks like my stuff—because it is my stuff.”

Baldacci also talked about his openness to the idea of licensing his books and the importance of licensing to his craft and for the craft of writing in general. He stated that he licenses his books “all over the world” for all sorts of mediums and formats, and that he is always available to entertain offers to negotiate licensing agreements for the use of his books. He went on to explain the harm to creator incentives:

“[] the uncertainty of stealing stuff from pirate sites operated in Russia just so you can gain an advantage and you don’t really care about what happens to the likes of me and other writers coming up . . . I make a lot of money for my publishers, and my publishers use that money to take risks on new writers coming up that they ordinarily would not have been able to take a risk on. So when you hurt established writers like me, you hurt all the other writers coming behind us.”

Professor Edward Lee Argues Transformative Fair Use

Professor Edward Lee of Santa Clara University School of Law provided an opening statement arguing that under the fair use analysis, GAI training is extremely transformative which could be different if there are other factors like a showing of market harm to the copyright owner or infringing AI-generated output.

One general point that Professor Lee raised, which got lots of pushback, was that courts, Congress, and the states should exercise caution in handling GAI copyright issues, lest they jeopardize the U.S.’s national priority in AI and becoming a global leader in AI by 2030. All Members of the Subcommittee shut down this false narrative— that engaging in the AI companies’ “largest IP theft in history” facilitated by disrespecting the copyright is necessary to benefit the United States and its interests in being a global leader in AI, its technological future, national security interests, and ability to “beat China” in the AI arms race. When asked by Senator Durbin whether AI companies would benefit from the use of copyrighted works for their own commercial interests, Professor Lee argued that while AI companies did have a commercial interest, ultimately everything would generally benefit the U.S. and its national interest in winning the AI race. To this, Senator Durbin cooly replied: “And [creators like] Mr. Baldacci should be prepared to pay the price for that. Right?”

Chairman Hawley further grilled Professor Lee on the national security/China point, unpacking the fallacy of it all. After noting that authors like David Baldacci are U.S. citizens, Hawley noted:

“You’re saying that the mass theft and potential impoverishment of American citizens ultimately redounds to the good of America? . . . It just sounds strange to me to say that the United States as a nation is going to benefit from the mass violation of its citizens’ rights.”

Most of Professor Lee’s responses to these questions focused on the fact that fair use is a case-by-case analysis and on prior case law that failed to substantively opine on the bad-faith issue. But as Chairman Hawley pointed out:

“[Fair use] is an equitable doctrine. And these companies are not exactly coming to it with clean hands are they? . . . They went to a pirated illegal site and took [the copyrighted works]. And now they’re coming to claim the cover of equity? That seems kind of strange. Is that how equitable law works?”

Chairman Hawley went on to say:

“I don’t think it’s that complicated. I think it’s pretty simple. I think in America we have rights. Those rights are what protect us. These rights are being violated. And if we’re going to succeed as a nation and uphold our principles as a nation, we better darn well enforce the individual rights on which the nation is founded.” 

Conclusion

Chairman Hawley’s focused hearing helped spotlight the disturbing implications of GAI companies’ use of pirated creative works to train their commercial AI models. At the end of the hearing, he concluded:

“If this isn’t infringement, Congress needs to do something. If the answer is that the biggest corporation in the world worth trillions of dollars can come and take an individual author’s work like Mr. Baldacci, lie about it, hide it, and profit off of it—and there is nothing our law does about that, we need to change the law . . . I hope this is motivation for this body that we need to be paying attention to what is going on here.”

Read the Copyright Alliance Statement on the SJC Crime Subcommittee AI Hearing.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

get blog updates