Copyright Chaos in California: Two AI Cases, Two Days Apart, Two Very Different Decisions

Over the past few years, we have seen the number of AI training copyright infringement/fair use court cases steadily climb and waited with great anticipation for the day when decisions in these cases started to be handed down. In February, we got our first decision on AI training and fair use when the district court for the District of Delaware granted summary judgment on direct copyright infringement in the Thomson Reuters v. Ross case, holding ingestion of Thomson Reuter’s copyrighted works by Ross for AI training purposes did not qualify as a fair use because the use was not transformative and harmed the potential market for AI training data. That case whetted our appetites but, since it involved AI but not generative AI, it was more of an amuse-bouche. That all changed last week when not just one, but two, court decisions were announced, applying the fair use defense to AI training to determine the liability of AI companies for training on copyrighted works without permission. But instead of being sated by these long-awaited decisions, the contradictory nature of these two decisions gave many a serious case of indigestion.
Bartz and Kadrey Issued Almost Back-to-Back
On June 23, Judge William Alsup of the district court for the Northern District of California granted summary judgment in the Bartz v. Anthropic case. The fact that this case was the first to be decided was a surprise to many, primarily because it was filed less than a year ago and discovery was still ongoing. It is very unusual for a case to be decided on summary judgment before discovery is completed because there are presumably facts not yet in the record that could impact the decision. If Judge Alsup had shown some patience and waited until the end of discovery to rule, he would have had a fuller record on which to base his decision, which may in turn have altered the decision or at least the analysis.
Judge Alsup considered fair use with respect to three separate acts: (1) the use of the works to train a generative AI model, (2) the conversion of purchased print copies to digital for the purpose of storing those digital copies in a “digital library,” and (3) the downloading of pirated copies of books for inclusion in the digital library. He held that the first two qualified as fair use but the third did not, finding that “the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use” but that “[c]reating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.”
While we were all still digesting and analyzing the Bartz decision, a second AI copyright training/fair use case, Kadrey v. Meta, was decided less than 48 hours later. This case was also pending in the Northern District of California but overseen by a different judge, Judge Vincent Chhabria, who granted summary judgment for Meta. While Judge Chhabria found Meta’s use to be a fair use, he also concluded that “[i]n cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use. No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books. And some cases might present even stronger arguments against fair use.”
Despite the fact that both cases were decided in the same district courts within days of one another and that both held that AI training by Meta and Anthropic was fair use, the decisions in these cases could not have been more different and are not the “wins” for AI companies that some reporting has claimed. In Bartz, Judge Alsup’s analysis that resulted in the holding that AI training on copyrighted works is fair use could be very problematic for copyright owners if adopted by other courts.[1] In contrast, most copyright owners would be very happy if other courts adopted much of the reasoning and analysis used by Judge Chhabria in the Kadrey decision. In Bartz, Judge Alsup made very clear in the decision that the use of pirated works from shadow libraries like Books3 and LibGen to create Anthropic’s “digital library” is not fair use. That could
result not only in massive liability for Anthropic in this case but—since most other AI companies also make use of pirate libraries to build their AI models—massive liability for many other AI companies as well. The strong stance against use of pirated works should make copyright owners
very happy. But in Kadrey, Judge Chhabria shockingly found that Meta’s use of these pirate libraries did not tilt the scale against fair use, which resulted in Meta having no liability.
Below is a detailed breakdown of areas where the decisions were similar and where they differed.
Similarities Between the Two Cases
There were very few areas where the two judges agreed, and even when their results were similar, in many, if not all instances, their approaches and analysis getting to that result were very different.
Transformative Use Analysis Under the First Fair Use Factor
Both judges found that using the copyrighted work for AI training was a transformative use under the first fair use factor. They both also concluded that the first fair use factor favored the AI company defendant. Judge Alsup based his decision that the use was transformative on three considerations: (i) the output did not infringe; (ii) AI training is just like human learning; and (iii) AI is game-changing technology. His complete reliance on these factors was woefully incorrect. Similarly egregious was Judge Alsup’s utter failure to consider commerciality or justification for the use, both of which are mandated by the Supreme Court’s recent decision in Warhol v. Goldsmith. We will address all these cataclysmic errors in a future blog post on the transformative use analysis in both cases.
Judge Chhabria’s analysis of transformative use and the first fair use factor was much better than Judge Alsup’s, but ultimately it too was incorrect. One major flaw with both judges’ analyses was the incorrect focus and weight placed on the need for plaintiffs to prove that the output of the AI systems were substantially similar to plaintiffs’ works. Substantial similarity is the test for infringement—not fair use. Similarity of purpose of the two uses is the test for fair use. Another commonality in the two decisions is that both judges’ transformative use analyses are devoid of any discussion of the actual legal standard used to determine transformative use. Under well-established case law and re-emphasized in the Supreme Court’s recent decision in Warhol v. Goldsmith, a use is not transformative merely because it produces something new and innovative. To be transformative the use must be justified in that it (i) furthers the purpose of copyright without harming the market for the work(s) being used; and (ii) the user must show that using the work is necessary to achieve this purpose. It’s hard to believe that neither order acknowledges this critical instruction from Warhol, and both decisions are fatally flawed because of both judges’ failure to address the justification requirement.
Copying of Copyrightable Expression, Not Functional Elements
Both judges acknowledged that the books are valuable for AI training because of their copyrightable expression and rejected the AI companies attempts to claim that they were only using the “functional elements” or “non-expressive elements” of the books. Specifically, Judge Alsup said “Anthropic came to value most highly for its data mixes books like the ones Authors had written, and it valued them because of the creative expressions they contained” and “[c]opies selected for inclusion in training sets were selected because they were complete and because they contained rich protectible expression.”
Similarly, Judge Chhabria rejected Meta’s argument that it “only used the plaintiffs’ books to gain access to their ‘functional elements,’ not to capitalize on their creative expression” by explaining that “Meta’s use of the plaintiffs’ books does depend on the books’ creative expression.” As Meta itself notes, LLMs are trained through learning about “‘statistical relationships between words and concepts’ and collecting ‘statistical data regarding word order, frequencies [what words are used and how often], grammar, and syntax.’ Word order, word choice, grammar, and syntax are how people express their ideas.”
Licensing Markets Under the Fourth Fair Use Factor
The fourth fair use factor—the effect of the use on the actual and potential markets—is indisputably the most important of the four factors. Despite this, the direct licensing market consideration in factor four was virtually nonexistent in both decisions. With very little explanation—both judges concluded that there was no market harm because copyright owners are not legally entitled to the AI training market.
Judge Alsup’s discussion of licensing markets merely states that “an emerging market for licensing [] works for the narrow purpose of training … is not one the Copyright Act entitles Authors to exploit.” He provides no explanation whatsoever for this incorrect conclusion. Judge Chhabria’s discussion is even shorter, as he spends a mere two sentences concluding (without explanation) that “to prevent the fourth factor analysis from becoming circular and favoring the rightsholder in every case, harm from the loss of fees paid to license a work for a transformative purpose is not cognizable.”
Both judges are incorrect because they ignore the important realities that a robust emerging market for licensing of AI training material already exists. Licensing markets under the fourth factor may only be circular and non-cognizable when the market being considered is a potential licensing market and the judge is trying to determine whether that potential market is too speculative. But when there is an actual market that already exists, the circularity argument has no place and both judges were incorrect to summarily claim the argument is circular. In fact, one could argue that by denying an actual market exists and saying that copyright owners have no right to those markets because of fair use, the judges are the ones engaging in circularity—not the copyright owners.
Narrow Holdings Are Limited to Specific Uses at Issue
Both judges made very clear that their decisions were very narrow and limited to the facts before them.For example, both judges made clear that their decisions may have been completely different had there been similarity in the output or had the infringing uses been different.[2] Thus, the decisions in both cases should have little adverse impact on cases where the plaintiffs have shown similarity in output or different infringing uses—like in Concord v. Anthropic, NYT v. OpenAI, Getty v. Stability AI and Disney v. Midjourney and many other cases. Judge Chhabria went even further to narrow the impact of the Kadrey decision by stating that “[t]here is certainly no rule that when your use of a protected work is ‘transformative,’ this automatically inoculates you from a claim of copyright infringement” and “this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.” In fact, Judge Chhabria’s order has already been recognized by many as providing a roadmap for how to win an AI infringement case.
Differences Between the Two Cases
Comparison of AI Training to Human Learning in the First Factor
Throughout the Bartz decision, Judge Alsup regularly compares AI training to human learning to justify his flawed conclusions related to transformative use under the first fair use factor and licensing markets under the fourth fair use factor. Judge Chhabria is highly critical of Judge Alsup’s comparison of human learning and AI training saying “using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.”
Again, on page 17 of the Kadrey decision, Judge Chabbria chastises Judge Alsup for his inapt comparison, saying that:
“[A]n LLM’s consumption of a book is different than a person’s. An LLM ingests text to learn ‘statistical patterns’ of how words are used together in different contexts. It does so by taking a piece of text from its training data, removing a word from that text, predicting what that word will be, and updating its general understanding of language based on whether it was right or wrong—and then repeating this exercise billions or trillions of times with different text. This is not how a human reads a book.”
and
“[U]nlike the hypothetical professor [that Judge Alsup references is the Bartz case], Meta did not just give the plaintiffs’ books to one person. Any person can use that tool to help them create further expression, whether by having it help them brainstorm or research for a creative writing project (like plaintiff David Henry Hwang, a playwright and screenwriter) or by having it write code to develop new software programs (like Lockheed Martin). By creating a tool that anyone can use, Meta’s copying has the potential to exponentially multiply creative expression in a way that teaching individual people does not.”
Between the two, Judge Chhabria is clearly correct.
The chasm between Judge Alsup and Judge Chhabria on human learning and AI training could not be any deeper. When the cases are appealed, if the Ninth Circuit agrees with Judge Chhabria, as I suspect they will, then the first and fourth fair use factor analysis relating to AI training in Bartz completely falls apart as does Judge Alsup’s holding that AI training on legitimate works is fair use.
Market Dilution Under the Fourth Fair Use Factor
Another area where the two judge’s disagree is how to handle market dilution under the fourth fair use factor. In Bartz, Judge Alsup (leaning heavily on his human learning analogy again) simply says that the “Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act.” In doing so, he summarily dismisses, without ever truly considering, the harm caused by market dilution due to indirect substitution.
In contrast, Judge Chhabria delivers a very thoughtful and lengthy discussion of the indirect substitutional impacts that could harm the copyright owners actual and potential markets. He explains that:
“Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So, by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.”
And again, later he explains that:
“This case… involves a technology that can generate literally millions of secondary works, with a miniscule fraction of the time and creativity used to create the original works it was trained on. No other use—whether it’s the creation of a single secondary work or the creation of other digital tools—has anything near the potential to flood the market with competing works the way that LLM training does. And so the concept of market dilution becomes highly relevant.”
Finally, he concludes (with what seems like an admonishment of Judge Alsup) by saying:
“Courts can’t stick their heads in the sand to an obvious way that a new technology might severely harm the incentive to create, just because the issue has not come up before. Indeed, it seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor—and thus win the fair use question overall—in cases like this.”
Again, Judge Chhabria seems to have the much better argument here. Whether the Ninth Circuit adopts his views or not remains to be seen, but it is very clear that, unlike Judge Alsup who took the more pedestrian route, Judge Chhabria understands and appreciates that the advent of AI technology and the scale of its impact on copyright’s incentives to create and distribute new copyrighted works for the public to enjoy requires a more thoughtful approach that is more consistent with the spirit and purpose of copyright and the fair use defense.
Impact of Using Pirated Works
Another important area where the two decisions vary immensely is in their consideration of the use of pirated works from shadow libraries, like Z-Library and Books3. Surprisingly, in Kadrey Judge Chhabria excused Meta’s use of pirated works from shadow libraries because he concluded that (i) there was no evidence that Meta’s acts benefitted “these libraries or perpetuated their unlawful activities—for instance, if they got ad revenue from Meta’s visits to their websites;” and (ii) use of these pirated works was justified because they were eventually used by Meta for AI training purposes which he found to be a transformative use. This approach is tremendously misguided and dangerous. To immunize a defendant for engaging in massive acts of piracy so long that they can show that somewhere downstream the pirated copies were used for a purpose that qualifies as fair use is not much different than allowing someone to burgle someone’s house because they needed medication located inside or stealing someone’s car because they needed to drive to the grocery store to pick up food. That is not how the law works, and certainly not how copyright law works.
Judge Alsup understood this. In his decision, he takes a completely a different approach to Anthropic’s use of pirated copies from illicit online websites and services to build a digital library, finding that such acts disqualified Anthropic from availing itself of a successful fair use defense. Judge Alsup properly rejected the notion that use of pirated works can qualify as a fair use and makes this abundantly clear by stating that “piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded” and “Anthropic is wrong to suppose that so long as you create an exciting end product, every ‘back-end step, invisible to the public,’ is excused.” He then goes on to correctly distinguish the use of pirated works by Anthropic from many other fair use cases. In sum, this is yet another area where the decisions by the two judges could not be more different.
Importantly, the three differences noted above are not edge issues; they are crucial to the analysis and eventual outcome in these cases. It should be very interesting to see how the Ninth Circuit handles these discrepancies, but until such time we are left with complete copyright chaos in California.
[1] There is much the court got wrong in its analysis, that might not only be corrected on appeal but also impact another court’s willingness to adopt the same reasoning.
[2] As noted above, a focus on similarity of output in the first factor is incorrect.