The Unauthorized Use of Copyrighted Material as Training Data
As the world of technology continues to evolve, one of its most intriguing phenomena, artificial intelligence (AI), has taken center stage. While these new technologies offer exciting creative opportunities, copyright owners are beginning to challenge AI developers in the courts over the permissionless use of their copyrighted works for the training of these AI tools. These AI copyright cases could potentially clarify the intersection of AI and copyright law, at least on the input side of this innovative new technology.
Getty Images Lawsuit Against Stability AI – United States
With regards to the copyright management information claims, Getty Images argues that the output generated by Stable Diffusion often contains a modified version of Getty Images’ watermark, “underscoring the clear link between the copyrighted images that Stability AI copied without permission and the output its model delivers.” The complaint goes on to allege that Stability AI knowingly falsified, removed, or altered Getty Images’ watermarks and metadata with the intent to induce, enable, facilitate, or conceal infringement of Getty Images’ copyrights.
As one of the first infringement lawsuits brought against a developer of a generative AI tool for unauthorized use of copyrighted materials for training purposes, the case has the potential to influence future development of AI systems by addressing principal copyright issues like fair use. While Stability has not yet filed a response to the complaint, it will almost surely adopt the position of other AI systems such as OpenAI, which claims that training AI on copyright protected materials qualifies as a transformative purpose that weighs heavily in favor of fair use.
The outcome of this case could also have significant impact on whether creators and copyright owners’ ability and right to license under the Copyright Act will continue to be undermined in the AI context, jeopardizing the livelihoods and crafts of millions of human creators. The lawsuit also touches on another interesting feature of copyrighted works through which AI developers draw incredible value—metadata cleaning and tagging, which streamlines AI training, but is a value that copyright owners would be less incentivized to provide if AI developers are allowed to use their works without permission.
Getty Images Lawsuit Against Stability AI – United Kingdom
Earlier in January 2023, Getty Images announced a lawsuit against Stability AI in the High Court of Justice in London. Similar to the allegations in the U.S. lawsuit, Getty Images claims that Stability AI infringed upon Getty Images’ copyrighted images and works by using them to train Stability’s AI.
While the full details of Getty Images’ UK lawsuit have yet to be made public, the case, similar to its U.S. lawsuit, could have a significant impact on the unauthorized use of copyrighted material for AI systems in the United Kingdom.
The case will be one to closely watch, particularly as the United Kingdom government announced its intent to reconsider a problematic proposal that would have created a broad exception for use of copyrighted works for any AI training. This case has the potential to spark more international conversations about how AI systems and the use of copyrighted works as training material will be treated in different jurisdictions globally.
Visual Artists’ Class-Action Lawsuit Against Stability AI, Midjourney, and DeviantArt
On January 13, 2023, award-winning visual artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed a class action complaint in the United States District Court Northern District of California San Francisco Division, against defendants Stability AI Ltd. and Stability AI, Inc., Midjourney, Inc., and DeviantArt, Inc. The plaintiffs allege that their works were used without permission as input materials to train and develop various AI image generators including Stable Diffusion (Stability AI), DreamStudio (by Stability), the Midjourney Product (Midjourney), and DreamUp (DeviantArt). The plaintiffs also assert that Stability AI generated reconstructed copies of the plaintiff’s works, which they argue qualify as unauthorized derivative works. The plaintiffs point out that the defendants reap substantial commercial and profit on the value of these copyrighted images, highlighting that the defendants’ AI machines generate images “‘in the style’ of a particular artist are already sold on the internet, siphoning commissions from the artists themselves.” The plaintiffs also argue that the defendants are liable for vicarious copyright infringement and violate the Digital Millennium Copyright Act (DMCA) by altering or removing copyright management information from the images owned by the plaintiffs and programming the AI to omit any CMI as part of its output.
Similar to the case Getty filed in the United States, this AI copyright case could have a lasting impact on whether training AI systems on copyrighted works qualifies as fair use and whether the output of a generative AI system qualify as derivatives of the works it is trained on. Unlike the Getty case, which makes clear that Stable Diffusion “at times” produces images that are derivative of Getty’s copyrighted works that Stability AI copied, the visual artist plaintiffs make a broader claim that all the output of Stable Diffusion is derivative of the works it trains on. In the context of a class action, this claim may be tough to demonstrate. While the complaint includes an example of an instance in which the plaintiffs allege a derivative work is generated from source images, extrapolating that claim to cover all of the AI system’s output would likely be very difficult to prove.
Further, the complaint refers to Stable Diffusion as a “21st century collage tool,” which seems to be used in an effort to oversimplify the AI machine. However, it should be noted that collage is an artistic medium that often utilizes unique skills and techniques to create original works that qualify for copyright protection. Even when collage artists make use of copyrighted material without authorization, the use may qualify as fair use. For plaintiffs to argue Stable Diffusion’s use of the copyrighted materials results in collages that definitely do not qualify for the fair use exception may be too broad an allegation. Lastly, as this case develops and transitions into the discovery phase it will be interesting to learn about the quantity of allegedly infringed works and how the court will attempt to certify that the proposed class of works are registered works.
Programmers’ Class Action Lawsuit Against GitHub
On November 3, 2022, a class action lawsuit was filed in the United States District Court Northern District of California San Francisco Division, by a group of anonymous programmers against Microsoft, GitHub (a Microsoft subsidiary), and OpenAI alleging a violation of Section 1202 of the DMCA for unauthorized and unlicensed use of the programmers’ software code to develop the defendants’ AI machines, Codex and Copilot. Both are assistive AI-based systems offered to software programmers and trained on a large collection of publicly accessible software code and other materials, including the allegedly infringed software code created by the plaintiffs.
Plaintiffs contend that Microsoft and GitHub used the plaintiffs’ materials without complying with open-source licensing terms, resulting in an unlawful reproduction of the plaintiffs’ copyrighted codes and violating various attribution requirements under the licenses. While the complaint does not include the type of traditional copyright infringement claims seen in the other cases discussed above, it alleges that OpenAI violated Section 1202 of the DMCA, which makes it unlawful to provide or distribute false CMI with the intent to induce or conceal infringement.
In January, Microsoft and OpenAI filed motions to dismiss in the case, arguing that the plaintiffs lacked standing to bring the case because they failed to argue they suffered specific injuries from the companies’ actions. The companies also argued that the lawsuit did not identify particular copyrighted works they misused or contracts that they breached.
As the case proceeds, it will be interesting to see how the court will apply the requirement of attribution and the provisions of Section 1202 when, as OpenAI argues, no copyrighted works have been identified. An additional hurdle for the plaintiff’s 1202 claim is that the statute will only hold a defendant liable if they intentionally altered or removed CMI knowing that such conduct would “induce, enable, facilitate, or conceal infringement.” The outcome in this AI copyright case could have a significant impact on different AI industries and how AI system developers approach attribution and licensing practices when using copyrighted works for AI training.
Thomson Reuters Enterprise Centre v. ROSS Intelligence Inc.
In May 2020, Plaintiffs Thomson Reuters Enterprise Centre GmbH (Thomson Reuters”) and West Publishing Corporation (“West”) sued Defendant ROSS Intelligence Inc. (“ROSS”) in the United States District Court for the District of Delaware, for copyright infringement relating to the unlawful use of the plaintiffs’ unique platform capabilities. Plaintiffs operate and market Westlaw, a widely known legal search platform used throughout the legal industry. ROSS developed a new legal search platform using AI, and to do so, the company partnered with LegalEase Solutions, LLC, to improve ROSS’s search tool. According to plaintiffs, however, LegalEase “used a bot … to download and store mass quantities of [plaintiff’s] proprietary information,” which it then provided to ROSS.
Plaintiffs alleged that LegalEase’s activities constituted copyright infringement because it used plaintiff’s headnotes to assist ROSS in formulating questions, used key numbers and headnotes to locate judicial opinions, and at one point assisted ROSS in classifying cases under certain legal topics. After the court denied ROSS’s motion to dismiss, concluding that the plaintiffs’ copyright claims were adequate ROSS then filed a motion for summary judgment in early January 2023, asserting its affirmative defense of fair use.
ROSS argues that (1) the use of Westlaw’s content was functional and transformative, (2) the copyright protection for the copied Westlaw materials is “thin,” (3) the amount used holds little weight because “any copying was intermediate and the final ROSS product does not contain any copyrighted materials,” and (4) ROSS’s product did not replace the market for Westlaw’s works.
On February 6, Thomson filed its opposition to ROSS’s motion, arguing that (1) ROSS’s purpose in using the Westlaw Content was to create a legal research product that would compete with and replace Westlaw, without any further transformative purposes, (2) the Westlaw content is creative, which weighs against fair use and undermines ROSS’s claim it did not copy protectable content, (3) the copying was both qualitatively and quantitatively substantial, and (4) ROSS harmed the market for Westlaw content by taking and using Westlaw content to simply generate a ROSS product to displace Westlaw’s product.
In addition to addressing the question of whether training AI on copyrighted materials constitutes transformative fair use, this case is likely to provide a unique opportunity in understanding how courts will analyze a fair use defense related to AI training on materials that include legal opinions that, while themselves not subject to copyright protection, are still accompanied by creative expressive materials created and owned by Thomson Reuters. Furthermore, the court is likely to consider whether scraping material for AI training purposes from a website in violation of terms of service results in breach of contract liability.
UAB Planner 5D v. Facebook, Inc.
In 2019, UAB Planner 5D filed a complaint in the United States District Court Northern District of California, for copyright infringement and trade secret misappropriations against Facebook, Inc., Facebook Technologies, LLC, and The Trustees of Princeton University. Planner 5D, a Lithuanian company, operates a home design website that allows users to create virtual interior design scenes using a library of virtual objects (such as tables chairs, and sofas) to populate the scenes. Planner 5D claimed it is the copyright owner of the three-dimensional objects and scenes, and in the compilation of the objects and scenes.
Planner 5D alleged that computer scientists at Princeton downloaded the entirety of Planner 5D’s data collection of objects and scenes because of the collection’s uniquely large and realistic qualities. It also alleges that not only did Princeton use this data for their own research purposes, but that Princeton also posted the data to a publicly accessible Princeton URL and labeled it the ‘SUNCG dataset.’ Planner 5D alleged that Facebook was also interested in its objects and scenes collection that would help it tap into the commercial potential of scene recognition technology. After a motion to dismiss copyright claims for Planner 5D’s failure to show its objects and scenes are subject to copyright protection was granted in July 2020, Planner 5D amended its complaint and a second motion to dismiss by Facebook was denied by the court in April 2021.
On February 17, 2023, Facebook filed a motion for summary judgment arguing that Planner 5D cannot establish ownership of a valid copyright. The defendants argue that the discovery phase “confirmed that Planner 5D’s works are data files that cannot be copyrighted as computer programs, as literary works as they lack human authorship, or as pictorial works because they lack originality.” Because of these findings, the defendants jointly moved for summary judgment on all of Planner 5D’s claims. Oral arguments will take place Wednesday, July 12, 2023, in San Francisco, California.
While the core claim of this case concern copyright infringement, the lawsuit also touches on another interesting AI copyright dispute — human authorship vs. AI generation. There is a strong likelihood that Facebook’s defense — that Planner 5D’s scenes and objects are not human authored — may be strengthened by recent announcements by the U.S. Copyright Office (USCO). In response to Kris Kashtanova’s attempt to register their graphic novel, Zarya at the Dawn, the USCO only granted the registration to what it deemed as human-authored elements within the work, but concluded that images in the work that were generated by the AI technology were not the product of human authorship, and not included in the scope of copyright protection in the registration.
Additionally, the USCO released an AI registration policy statement, to clarify its practices for examining and registering works that contain AI-generated material. With regard to the Office’s application of the human authorship requirement, this statement clarifies that “in the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of ‘mechanical reproduction’ or instead of an author’s ‘own original mental conception, to which [the author] gave visible form.’” The USCO’s recent correspondence will likely influence the outcome of this case and subsequently impact the AI copyrightability aspect of this case.
As these AI copyright cases proceed and new cases arise (including ones on AI authorship which are discussed in part two of this blog series), the U.S. Copyright Office and the courts will continue to consider important issues surrounding the unauthorized use of copyrighted materials for training AI systems. While the outcomes of these disputes are sure to impact the development of AI systems, it’s essential that the foundational principles of copyright law are recognized and that the rights of creators and copyright owners are upheld.
If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!