Facilitating Efficient and Effective Copyright Licensing for AI

Post publish date: January 23, 2024

AI models have an almost insatiable appetite for content. To date, the vast majority of training content, whether books or blogs or songs or images, has been scraped from the web without authorization from the copyright owners. Many AI companies argue that the use of copyrighted content in this context is “fair use” and does not require a license or compensation. But copyright owners are crying foul, and lawsuits are being filed at a rapid pace, including leading lawsuits from the Authors Guild and most recently the New York Times. Although fair use may be a defense to copyright infringement, only courts can determine if unauthorized copying is justified after performing a complex, fact-based, multi-factor assessment.

Big technology companies occasionally acknowledge that copyright owners deserve to be compensated, and a few early licensing deals have been concluded. Sam Altman, CEO of OpenAI, has repeatedly positioned himself as a moderate on the issue, “We’re trying to work on new models where if an AI system is using your content, or if it’s using your style, you get paid for that.”

However, other AI stakeholders are arguing that copyright owners are not entitled to payments for the use of their works to train AI platforms, and moreover, that any such requirements will stifle innovation and push it offshore. Writing to the US Copyright Office, the well-known VC firm Andreessen Horowitz put it this way: “The bottom line is this, imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development.”

However, it is more likely that litigation rather than licensing will kill or disincentivize AI innovation. Companies may be less likely to use or develop AI technology if they may be sued. Moreover, companies will spend tens of millions to defend against lawsuits during the years it will take courts to sort out the law.  However, companies can move forward and establish license agreements now, either directly or through third parties. In fact, setting these commercial precedents will be likely to influence regulators in the future in a way that benefits copyright owners and AI developers alike, in part by proving that the commercial licensing of content will not be fatal to AI innovation. 

For AI development to continue at a rapid pace, a layer of service providers will need to be created to enable transactions without imposing additional friction into the system. A new class of service providers will be launched that will license content from aggregation points (i.e. publishers, agents, etc.), process the data with metatags and tokens, and license the data for training of AI models. This will provide an important legal precedent for the generative-AI industry, while creating new revenue-streams for copyright owners.

Calliope Networks is one such service provider. Calliope Networks is founded by executives with experience in AI and copyright licensing. Calliope Networks’ strategy is to aggregate books from authors and publishers and then to process the books for licensing and ingestion by AI models. Although AI systems have benefitted from the use of free content, new academic research suggests that higher quality aggregated works are significantly more valuable for effective training of large language models (LLMs) which underly generative AI systems. As a recent research report from Microsoft put it, “High quality data can… improve the state-of-the-art of LLMs, while dramatically reducing the dataset size and training compute.” 

One of Calliope Networks’ goals is to facilitate the legitimate use of copyrighted content by generative AI systems, ensuring that as AI evolves, so too does the respect for intellectual property. Calliope Networks is developing a platform that can operate at a speed and scale that can be an enabler to the growth of generative AI, rather than a burden.

Companies like Calliope Networks aren’t limited to the processing and licensing of original content. Calliope Networks in particular is also setting its sights on advanced products designed to enhance copyrighted works in a licensed, monetized fashion. Co-founder and CTO Jim Golden explains, “Our vision transcends mere compliance. We aim to expand the creative landscape by enabling ancillary AI-generated products that can, for example, enrich the experience of reading a book, but do so in a manner that ensures that the authors and publishers benefit appropriately.”

Copyright owners need to act to protect their right to control and be compensated for the use of their works. There is no doubt that the development of AI may represent a perilous and frightening future for creators. Nonetheless, the situation today is the absolute worst-case scenario: works are being used to create competing works, and copyright owners are not being compensated. If copyright owners choose to ignore the situation or simply refuse to demand commercial engagement with AI companies, courts and regulators may begin to believe the cries from Silicon Valley that the only way to ensure continued AI innovation is to broaden the fair use doctrine. 

Licensing will not be a panacea.  It may not provide all of the protections that publishers and authors will want.  Licensing will not protect against the eventual creation of AI-generated books for example. But refusing to engage in the licensing of copyrighted works may only serve to cede the battlefield to the technology companies and perpetuate the worst-case scenario of unauthorized use of creative works without compensation for years to come.

About the Author: Dave Davis is the CEO of Calliope Networks.  He previously served as Chief Commercial Officer of the Motion Picture Licensing Corporation and was an executive at Twentieth Century Fox, Paramount Pictures, NBCUniversal, and the Motion Picture Association.  Dave has a BA from Wesleyan University and a JD from the University of Michigan School of Law.  He can be contacted at Dave@calliopenetworks.ai


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

AI and Copyright in 2023: In the Courts

Post publish date: January 4, 2024

On January 3, we published part one of this blog series summarizing the biggest copyright-related AI activities that took place within the federal government. In today’s post, we pick back up with the AI-copyright theme, focusing on the multitude of lawsuits filed last year against AI developers by a range of creators and copyright owners, mostly over the unauthorized use of copyrighted works for AI training purposes. Here are some highlights from those court cases and what to expect in 2024.

AI and Copyrightability

There was one court case, Thaler v. Perlmutter, which took on the important issue of whether and if so, when something created using AI can be protected by copyright. Dr. Stephen Thaler sought to register a 2-D image generated by an AI machine called the “Creativity Machine,” as a work made for hire because he was the owner of the AI system. The Copyright Office denied the registration application, and, in early 2022, affirmed the denial on the basis that Thaler failed to show requisite human authorship in the work and that the work could not qualify as a work made for hire. Thaler sued and in the summer of 2023 Judge Beryl Howell of the U.S. District Court for the District of Columbia issued an opinion agreeing with the Office “that human authorship is an essential part of a valid copyright claim” and is “a bedrock requirement of copyright.” In October, Thaler filed a notice of appeal with the U.S. Court of Appeals for the District of Columbia Circuit. So, this case will continue and we should expect a decision by the appellate court sometime in 2024. We might also see court challenges arise from other instances where the Copyright Office refused registration for works where AI generated-elements and human authorship were intertwined.

Last year, there were thirteen new copyright-related lawsuits alone filed against AI companies—the majority of which were filed as class-action lawsuits. At the heart of these complaints, visual artists, book authors, songwriters, and other creators and copyright owners are alleging infringement of their copyrights resulting from the ingestion of protected works to train AI models. The sheer number of these lawsuits and the pace at which they were filed are not surprising. This is in part because the capabilities of AI technologies have exploded, and AI companies have failed to meaningfully address or remedy the harms to creators and copyright owners related to the mass scraping and unauthorized use of expressive works to train commercial AI models. In the past year, a few of these cases moved forward with court decisions on various motions. In those cases, it has become evident that while courts may not be as convinced about some of the other legal claims being brought, AI companies’ attempts to dismiss the direct copyright infringement claims arising from ingestion issues have either  failed or not even been argued by the defendant AI company. Here are the AI and copyright law cases in which there were notable developments. 

Doe v. GitHub

The GitHub case was one of two cases mentioned in this blog post that was filed prior to 2023. In late 2022, a group of GitHub programmers filed a class action lawsuit against Microsoft and OpenAI for allegedly violating their open source licenses and scraping their code to train Microsoft’s Artificial Intelligence (AI) tool, GitHub Copilot. On May 11, the district court for the Northern District of California issued an order granting in part and denying in part the motions to dismiss made by Microsoft and OpenAI. Many claims were dismissed with leave to amend, and the order says that the plaintiffs must identify specific instances of their code reproduced by Copilot or Codex to strengthen their property rights claim. Plaintiffs filed a first amended complaint on July 21, followed by renewed motions to dismiss by OpenAI and Microsoft.

Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.

The Ross case is the second of two cases mentioned in this blog post which was filed prior to 2023. We include it here because of the action taken in the case in 2023. In 2020, Thomson-Reuters sued Ross Intelligence, which is a competitor legal research service, for copyright infringement, alleging that Ross obtained copyrighted works from a Westlaw subscriber to develop its own competing product based on machine learning. The claims allege that an AI bot systematically mined, collected, and downloaded content from the Westlaw database. On September 25, a memorandum opinion was issued by the district court for the District of Delaware, largely denying the cross motions for summary judgment made by the parties. The court explained that there is still a genuine factual dispute over the copyrightability of Westlaw’s headnotes, and that although Ross actually copied portions of bulk memos, the question of substantial similarity must be decided by a jury.

Anderson v. Stability AI 

On January 13, artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed the first class-action lawsuit of the year against Stability AI, Midjourney, and DeviantArt in the Northern District of California, alleging copyright infringement and right of publicity violations for the use of the plaintiffs’ works in training data sets for the AI image-generating platforms Stable Diffusion, the Midjourney Product, DreamStudio, and DreamUp. In October, the court largely granted the motions to dismiss made by the defendants, but also granted plaintiffs leave to amend the claims. Though the headlines tended to frame the decision as a loss for the creative community, that was not an accurate summary of the dismissal because the most important claims in the case were not dismissed (and, to a lesser extent, because the plaintiffs were given leave to amend). Importantly, the court denied Stability AI’s motion to dismiss the plaintiffs’ direct copyright infringement claims with respect to the images scraped/ingested into the LAION training datasets used to train Stable Diffusion, and also held that plaintiffs’ assertions that their works had likely been used in the LAION datasets per results from the “Have I Been Trained” website adequately supported her infringement claims at this stage of the lawsuit.

Getty Images v. Stability AI

On February 3, Getty Images filed a copyright and trademark infringement suit against Stability AI in the U.S. District Court for the District of Delaware alleging that Stability AI “copied more than 12 million photographs from Getty Images’ collection, along with the associated captions and metadata, without permission from or compensation to Getty Images, as part of its efforts to build a competing business.” In addition to willful and intentional copyright infringement claims, Getty also alleged that Stability AI removed or altered copyright management information (CMI), provided false copyright management information, and infringed Getty Images’ trademarks. The case is still in the discovery phase. A parallel lawsuit filed in the United Kingdom will go to trial in 2024.

Tremblay v. OpenAI

On June 28, two authors of literary works filed a class-action lawsuit in the U.S. District Court for the Northern District of California accusing OpenAI of copyright infringement related to the unauthorized use of plaintiffs’ works to train its proprietary LLM, ChatGPT. The complaint alleges that OpenAI harvested mass quantities of literary works through illegal online “shadow libraries” and made copies of plaintiffs’ works during the training process. In addition to claims for direct infringement, the complaint alleges that every output of ChatGPT is an infringing derivative of plaintiffs’ works for which OpenAI is vicariously liable. On August 28, OpenAI filed a motion to dismiss the “ancillary claims” of vicarious infringement, violation of the Digital Millennium Copyright Act (DMCA), unfair competition, negligence, and unjust enrichment, but importantly, like Meta in its ongoing lawsuit, did not respond to the direct infringement claim, which OpenAI says it “will seek to resolve as a matter of law at a later stage of the case.”

Kadrey v. Meta and Silverman v. OpenAI

On July 7, Sarah Silverman, Christopher Golden, and Richard Kadrey brought two separate class-action lawsuits in the district court for the Northern District of California against Meta and OpenAI.  In the first lawsuit against OpenAI, the plaintiffs accused OpenAI of copyright infringement related to the unauthorized use of plaintiffs’ books to train its proprietary LLM, ChatGPT. The complaint alleges that OpenAI harvested mass quantities of literary works through illegal online “shadow libraries” and made copies of plaintiffs’ works during the training process. In addition to claims for direct infringement, the complaint alleges that every output of ChatGPT is an infringing derivative of plaintiffs’ works for which OpenAI is vicariously liable. Also included in the lawsuit were claims under the DMCA for the removal of CMI under section 1202(b), as well as claims for unfair competition, negligence, and unjust enrichment. In the second lawsuit, the plaintiffs accused Meta of copyright infringement related to the unauthorized use of plaintiffs’ books to train its proprietary LLM, LLaMA, and made similar allegations and claims as in the lawsuit against OpenAI.

In November, the court granted Meta’s motion to dismiss (with leave to amend), rejecting plaintiffs’ claims that the LLaMa model itself is an infringing derivative work and that every output of the model qualifies as an infringing derivative of the input. It explained that “plaintiffs are wrong to say that, because their books were duplicated in full as part of the LLaMA training process, they do not need to allege any similarity between LLaMA outputs and their books to maintain a claim based on derivative infringement.” Rejecting the 1202(b) violation claims, the court found that “there are no facts to support the allegation that LLaMA ever distributed the plaintiffs’ books, much less did so ‘without their CMI.’” The order also dismissed the unjust enrichment and negligence claims. Meta’s motion to dismiss did not challenge the direct copyright infringement claims arising from unauthorized copying of the books for training the LLaMA model, which means those claims survive and will be taken up by the court.

J.L. v. Alphabet Inc.

On July 11, a group of anonymous plaintiffs filed a class-action lawsuit against Google for the use of personal information and various copyrighted works to train its AI models. Among other claims, the plaintiffs allege direct and vicarious copyright infringement and DMCA violations for removal of CMI. The complaint alleges that Google’s LLM, Bard, is able to generate summaries of copyrighted books or output that reproduces verbatim excerpts from copyrighted books. In addition to damages, the plaintiffs requested an injunction compelling the establishment of an independent AI council to monitor and oversee Google AI products and the destruction and purging of class members’ Personal Information, which includes copyrighted works and creative content. In October, Google filed a motion to dismiss, which the court has yet to rule on.

Chabon v. OpenAI & Chabon v. Meta

On September 8, a group of authors, including Michael Chabon, filed a class action lawsuit in the district court for the Northern District of California alleging direct and vicarious copyright infringement and removal of CMI, as well as state-related claims including unfair competition and negligence for copying and using the authors’ books in training ChatGPT. The complaint alleges that when prompted, ChatGPT provides extremely detailed summaries, examples, and descriptions of the authors’ works, and that the authors’ writing styles can be accurately imitated. The plaintiffs are suing for copyright infringement and removal of CMI, as well as state-related claims including unfair competition and negligence. On September 12, the same group of plaintiffs filed a similar lawsuit against Meta. No further significant actions were taken on the OpenAI case, but the case against Meta was consolidated in December with the previously mentioned lawsuit Kadrey v. Meta.

Authors Guild v. OpenAI Inc.

On September 19, the Authors Guild and a group of authors including David Baldacci, Mary Bly, John Grisham, George R.R. Martin, Jodi Picoult, and Roxana Robinson, filed a class action lawsuit against OpenAI in the district court for the Southern District of New York alleging copyright infringement claims over the mass ingestion of literary works to train ChatGPT and for infringing outputs generated by the AI machine. The complaint cites to examples of ChatGPT being prompted to generate detailed outlines of possible sequels to the plaintiffs’ works and accurate and detailed summaries of such works, including specific chapters of books. No further significant actions were taken on the case in 2023.

Huckabee v. Meta

On October 17, a group of authors including former Arkansas governor, Mike Huckabee, and best-selling Christian author, Lysa TerKeurst, filed a class-action lawsuit in the district court for the Southern District of New York against Meta, Microsoft, EleutherAI, and Bloomberg for direct and vicarious copyright infringement, removal of CMI, and various other state-law claims. The plaintiffs allege that the defendants infringed by using plaintiffs’ books to develop defendants’ LLMs using the “Books 3” training dataset. The lawsuit also asserts that AI research company, Eleuther AI, is liable for copyright infringement for hosting and distributing “The Pile” dataset, which includes Books3. According to court dockets, towards the end of the year, defendant-party Bloomberg filed a letter with the judge to dismiss claims, the plaintiffs voluntarily dismissed EleutherAI from the lawsuit, and the case was transferred to the district court for the Northern District of California for the claims made against Meta and Microsoft.

Concord Music Group, Inc. v. Anthropic PBC

On October 18, music publishers Universal Music Publishing Group, Concord Music Group, and ABKCO, filed a lawsuit in the district court for the Middle District of Tennessee against the AI company, Anthropic, alleging direct, contributory, and vicarious copyright infringement as well as CMI removal claims. The plaintiffs allege that Anthropic unlawfully copied and distributed plaintiffs’ musical works, including lyrics, to develop Anthropic’s generative AI chatbot, Claude. The plaintiffs state that when prompted, Claude generates output that copies the publishers’ lyrics. The plaintiffs’ complaint claims that 500 works have been infringed and requests statutory damages of $75 million for copyright infringement. On November 22, Anthropic, filed a motion to dismiss the lawsuit, arguing that the Middle District of Tennessee was not the proper district to hear the case. No decision has been rendered on the motion, but it’s not surprising that the technology company doesn’t want to go to trial in Tennessee, where a more sympathetic audience for the rights of songwriters and copyright owners might be found.

Sancton v. OpenAI

On November 21, a complaint was filed by a group of nonfiction writers against OpenAI and Microsoft in the Southern District of New York. The proposed class action lawsuit, led by Julian Sancton, accuses the companies of direct and contributory infringement related to the unauthorized use of plaintiffs’ literary works to train ChatGPT. Notably, the contributory infringement claims are directed at Microsoft for materially contributing to OpenAI’s direct infringement by providing investment money and supercomputing systems. No further significant actions were taken on the case in 2023, but it will be interesting to see how accused contributory infringers respond to the claims moving forward.

The New York Times Company v. Microsoft

To close out an already busy year of generative AI-related litigation, The New York Times Company (NYT) filed a lawsuit in late December against Microsoft and OpenAI in the Southern District of New York, alleging direct, vicarious, and contributory copyright infringement, removal of CMI under the DMCA, unfair competition, and trademark dilution claims over the copying and use of NYT’s copyright protected works to train the ChatGPT model. After discussing the prevalence of the NYT’s articles in training data sets used to develop ChatGPT, the complaint provides evidence of ChatGPT generating verbatim outputs of significant portions of various NYT articles. Unlike some earlier filed cases that included questionable claims that were subsequently rejected, the NYT case presents strong evidence of copying and clear claims that will be difficult for OpenAI to defend.

Conclusion

These cases will continue to unfold and progress in 2024. We also expect more lawsuits to be filed against AI companies in 2024, as we have seen complaints make stronger claims with clear evidence of the infringement occurring during the AI ingestion and training processes. To stay apprised on AI and copyright news, visit our AI and copyright webpage and sign up for our AI Alert.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

AI and Copyright Law in 2023: Federal Government Activities

Post publish date: January 3, 2024

2023 was an extremely busy year for artificial intelligence (AI). That was especially true for copyright issues related to AI, which sparked several Congressional hearings, a study by the U.S. Copyright Office, and other government engagements studying the intersection of AI and copyright. In part one of this blog series we highlight the most important AI copyright-related activities taking place in Congress, the federal government, and the U.S. Copyright Office which occupied our attention in 2023. In part two, we’ll explore the various court cases involving copyright and AI.

Throughout the year the Copyright Office was extremely busy with a host of AI and copyright related events and activities including issuing an official guidance on registering works with AI-generated elements, publishing opinion letters rejecting certain registration applications of works containing AI-generated elements, and hosting listening sessions on industry-specific AI and copyright law issues.  

Policy Statement on Registering AI-Generated Output and Application of the Guidance to Applications:In March, the Copyright Office kicked off its AI activities by issuing a statement of policy clarifying its practices for examining and registering works that contain material generated by AI technology, effective immediately. The Office explained that human authorship is required for copyright protection, that it will refuse to register works solely generated by AI, and that any material which is not the product of human authorship must be disclaimed on a registration application. For more information and discussion about the Office’s registration guidance, check out our blog post.

Throughout the year, the Copyright Office applied this guidance to reject various applications for works containing AI-generated elements. Several of these decisions were made public by the Copyright Office. These include:

Kashtanova: The Copyright Office began proceedings to investigate the copyright registration for a graphic novel, Zarya of the Dawn, after the agency became aware of news reports indicating that the applicant, Kristina Kashtanova, created the work using the AI tool, Midjourney. In February, the Office sent a letter in response to Kashtanova’s letter defending copyrightability of the work, explaining that the Office would be reissuing a certificate of registration that would not extend to any AI-generated material in the graphic novel. The Office stated that because the AI “generat[ed] the images in an unpredictable way” and the AI tool was “not controlled or guided” by Kashtanova, the images did not have sufficient human authorship.

Allen: In September, the Copyright Office rejected a second request for reconsideration made by an artist, Jason Allen, refusing to register a 2-D image titled Théâtre D’opéra Spatial because the work contained more than a de minimis amount of AI-generated work which Allen refused to disclaim on the registration application. The Office rejected Allen’s three claims to human authorship, stating: (1) the image, as generated by Midjourney, lacked human authorship because Allen’s sole contribution was to input text prompts into Midjourney; (2) the Office could not decide whether Allen contributed any human authorship to the image via adjustments made to the image via Adobe products because there was a lack of information; and (3) the use of Gigapixel AI to scale the image did not introduce new, original elements into the image and that these acts did not amount to authorship.

Sahni: In December, the Copyright Office published its review board opinion, rejecting the registration application filed for a 2-D image titled Suryast. The application was filed by Ankit Sahni, who listed himself and his AI machine, RAGHAV Artificial Intelligence Painting App, as co-authors of the image. Sahni has filed registration applications for the same image in India listing the AI as a co-author. The application was initially accepted in India but is subject to a withdrawal and review process. The Office concluded that the image was not a product of human authorship because the expressive elements of the pictorial authorship were not provided by Sahni. The Office found that Sahni’s three inputs of providing the base input image, a style image, and the values to have the AI generate the style did not control how the expressive elements appeared in the output and did not amount to copyrightable contributions.

Listening Sessions: To begin the process of learning about the impact of AI on copyright, the Copyright Office hosted a series of AI listening sessions on generative AI and copyright in the spring so that they could hear from stakeholders and other interested parties on various issues related to the topic. The Office held four sessions, with each session focusing on the impact of AI on a different type of work. The first session covered literary works, the second session addressed works of visual arts, the third covered audiovisual works, and the final session covered music and sound recordings. Across the sessions, speakers addressed how AI tools are used by creators, the harms regarding AI ingestion of copyrighted works, AI licensing markets, the Office’s AI registration guidance, the effects of AI on creators, and many other issues. The information gathered during the sessions was intended to inform the Copyright Office’s approach to a formal study later in the year (see below). 

AI Webinars: Following the listening sessions, the Copyright Office continued its examination into AI issues by hosting two webinars. The first webinar, titled Guidance for Works Containing AI-Generated Content, consisted of the Office providing examples applying its AI registration guidelines to different types of works and explained whether, when, and how AI-generated elements should be disclaimed in the electronic copyright application system. In the second webinar, titled International Copyright Issues and Artificial Intelligence, presenters discussed how other countries are approaching copyright questions related to AI such as AI authorship, AI training, exceptions and limitations, and infringement. Panelists also provided an overview of legislative developments in other regions and highlighted possible areas of convergence and divergence involving generative AI.

AI and Copyright Study: All these activities culminated with the Office publishing a notice of inquiry and request for comments in the Federal Register for its Artificial Intelligence and Copyright study at the end of the summer. The Office solicited comments to help it collect factual information and policy views relevant to copyright law and policy and to inform the agency on issues involving the use of copyrighted works to train AI models, the appropriate levels of transparency and disclosure with respect to the use of copyrighted works, and the legal status of AI-generated outputs. By the time the dust had settled at the October 30th comment deadline, the Office had received around 10,000 initial comments, and then over 600 additional reply comments at the December deadline. All these comments can all be found on the NOI docket webpage. The Copyright Alliance submitted comments and reply comments highlighting, among many other things, concerns surrounding infringement related to the unauthorized use of copyrighted works for training. Other commentors included thousands of individual creators, several Copyright Alliance members, and the Federal Trade Commission, which filed comments noting that the agency has an interest in copyright-related issues to the extent that AI can harm a creator’s ability to compete in markets, as well as other issues such as misleading information about the authorship of AI-generated works.

Biden Administration Activities

While the Copyright Office was swept up in a whirlwind of AI and copyright activities, the Biden Administration was also busily examining these issues as it began to be alerted to the ways generative AI technologies were harming creators and copyright owners. Here are a few highlights of AI and copyright-related activities from the Biden Administration over the past year as it looks at ways of addressing AI and copyright law issues under the AI Executive Order.

NTIA Solicits Comments on AI Accountability: In the beginning of the year, the National Telecommunications and Information Administration (NTIA) published a request for comments regarding self-regulatory, regulatory, and other measures and policies that are designed to provide assurance that AI systems are legal, effective, ethical, safe, and otherwise trustworthy. The Copyright Alliance filed comments discussing the need for increased accountability and transparency in the context of ingestion of copyrighted works by AI systems, and how respecting copyrighted works results in trustworthy AI systems.

NAIAC Hosts Briefing Session on AI and Copyright Issues: In September, the Biden Administration’s National Artificial Intelligence Advisory Committee (NAIAC) hosted a briefing session on IP and Copyright, featuring a presentation by Aaron Cooper, BSA | The Software Alliance; Keith Kupferschmid, CEO, Copyright Alliance; and Catherine Stihler, Creative Commons. Through this session, the NAIAC sought to better understand the changing landscape of AI and the challenges of adapting existing copyright regulations or possibly introducing new ones. The panelists addressed copyright concerns pertaining to music, film, and written works, and highlighted the potential pitfalls of using unlicensed data for training generative AI. Emphasis was also placed on the importance of reliable, diverse training data for AI and the fact that the U.S. Copyright Office was also studying the complex implications on copyright law by generative AI technologies.

FTC Hosts Roundtable on AI and Creative Economy Issues: A month later, the Federal Trade Commission (FTC) held an excellent roundtable titled Creative Economy and Generative AI during which a variety of professionals and representatives from a broad range of creative disciplines discussed how AI tools are reshaping their respective lines of work and how they’re responding to these changes. This roundtable by the FTC illuminated the specific copyright and other concerns of the creative community, and the ways generative AI was harming creators and copyright owners’ markets and ability to engage in further creative endeavors. In her opening statement, FTC Chair, Lina Khan, acknowledged the unique challenges that AI-generated content poses to creative industries and recognized the importance of shaping regulatory policies in this rapidly evolving landscape. FTC Commissioner, Alvaro Bedoya, provided closing remarks, expressing profound concerns about the impact of generative AI on creators and the importance of preserving the uniqueness of human creativity. He emphasized that while AI may have expansive capabilities, it should not be expected to replace the genius of human creativity. He argued that the foundation of genius lies in people and that AI cannot extinguish human creativity. He emphasized the need to consider new legal frameworks to address such developments, drawing parallels to the creation of the Federal Trade Commission in 1914 to address innovations in unfair competition. In the end of 2023, the FTC published a report titled Generative Artificial Intelligence and the Creative Economy Staff Report: Perspectives and Takeaways, summarizing the information provided to the agency during the roundtable. In its report the FTC noted that “. . . targeted enforcement under the FTC’s existing authority in AI-related markets can help to foster fair competition and protect people in creative industries and beyond from unfair or deceptive practices.” The report also highlighted potential areas of further inquiry, including the effects or lack thereof of opt-out regimes by AI companies, the status of “unlearning” research, and the long-term effects of practices of AI companies from the uncompensated and unauthorized use of creators’ works.

President Biden Issues AI Executive Order: In October, President Biden signed the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (EO), which spans a number of AI-related topics. Section 5.2 of the EO, titled Promoting Innovation, addresses copyright in paragraph (c)(iii). It says: “within 270 days of the date of this order or 180 days after the United States Copyright Office of the Library of Congress publishes its forthcoming AI study that will address copyright issues raised by AI, whichever comes later, consult with the Director of the United States Copyright Office and issue recommendations to the President on potential executive actions relating to copyright and AI. The order provides that the recommendations shall address any copyright and related issues discussed in the United States Copyright Office’s study, including the scope of protection for works produced using AI and the treatment of copyrighted works in AI training.” The EO also directs the Secretary of Homeland Security, in consultation with the Attorney General, to develop a training, analysis, and evaluation program to mitigate AI-related IP risks and specifies the details of the program. A summary of the EO can be found in the White House’s Fact Sheet. It is encouraging to see that the Biden Administration is keyed into the fact that generative AI has affected the creative community, and it will be interesting to see the specific recommendations made once the U.S. Copyright Office issues its studies on AI and copyright law issues.

Congressional Activities

Members of Congress and their staff also spent a good portion of 2023 learning about and addressing the AI issues that affect and harm creative professionals. Although there were numerous Congressional hearings and bills introduced on AI issues as they affect the creative community, we highlight here a few hearings in which AI and copyright law issues were the specific focus of the hearing. Some of these hearings featured a representative mix of witnesses and speakers from the creative community who are affected by AI technologies. However, some hearings unfortunately did not provide the most affected members of the creative community with an opportunity to provide views. We hope this will change for future Congressional hearings and meetings in which AI and copyright concerns are up for discussion, and that a diverse set of creative voices can provide feedback on AI issues to our nation’s lawmakers.    

House Judiciary IP Subcommittee: In May, the House Judiciary Subcommittee on Courts, Intellectual Property, and the Internet held a hearing titled Artificial Intelligence and Intellectual Property: Part I—Interoperability of AI and Copyright Law. Witnesses included Sy Damle, Latham & Watkins LLP; Chris Callison-Burch, Associate Professor of Computer and Information Science, University of Pennsylvania; Ashley Irwin, President, Society of Composers and Lyricists; Dan Navarro, Grammy-nominated songwriter, singer, recording artist, and voice actor; and Jeffrey Sedlik, President & CEO, PLUS Coalition. Lawmakers and witnesses discussed a myriad of pressing topics at the intersection of AI and copyright law, including the use of copyright-protected works in the training of generative AI models, copyright protection of works that were produced with the assistance of generative AI, and the economic impact of generative AI on creators and creative industries. Speaking about ingestion, Damle and Callison-Burch both incorrectly insisted that training AI systems categorically qualifies as a fair use. In response, former General Counsel for the U.S. Copyright Office, Jon Baumgarten, penned a letter to Members of the Subcommittee, warning them against these false categorical statements. Meanwhile, the creatives and artists on the panel—including Irwin, Navarro, and Sedlik—raised concern that generative AI systems are being trained using troves of rights holders’ works without their consent, credit, or compensation. Another issue that witnesses and members grappled with was whether the output of AI systems could be eligible for copyright protection. Finally, there seemed to be a consensus among panelists and members on the Subcommittee that transparency will be a key component to the responsible implementation of AI into our society.

Senate Judiciary IP Subcommittee: In July, the Senate Judiciary Committee’s IP Subcommittee held a hearing titled Artificial Intelligence and Intellectual Property—Part II: Copyright. Witnesses included Jeffrey Harleston, General Counsel and EVP of Universal Music Group; Karla Ortiz, Concept Artist, Illustrator, and Fine Artist; Matthew Sag, Professor of Law, Emory University School of Law; Dana Rao, EVP, General Counsel and Chief Trust Officer, Adobe; and Ben Brooks, Head of Public Policy, Stability AI. The Senators and panelists discussed various topics including the ability of artists to opt-out of AI training and the feasibility and technical effects of respecting those requests on AI development; the role and desirability of licensing copyrighted works for AI ingestion; voluntary, standardized, or other international frameworks around transparency in all stages of AI development and use; and the application of fair use law in the context of AI training. During the hearing, Ortiz highlighted that whether copyrighted works are licensed for AI use should be the artist’s choice, as is the status quo under copyright law—but that this fundamental right and ability to make this choice had been stripped from artists as copyrighted works are being used for AI training without any credit, compensation, or consent. Rao, citing to the development process of Adobe’s Firefly platform, highlighted how AI systems trained on limited datasets that include licensed materials as opposed to indiscriminate scraping and ingestion of works without authorization or licensing.

Senator Schumer AI Insight Forum on Copyright: In November, Senator Chuck Schumer (D-NY) held the seventh closed-door AI meeting in the Insight Forum series. The meeting focused on transparency, explainability, and intellectual property and copyright issues. Speakers included Ali Farhadi, Allen Institute for AI; Ben Brooks, Stability AI; Curtis LeGyt, National Association of Broadcasters (NAB); Danielle Coffey, News/Media Alliance; Jon Schleuss, News Guild; Vanessa Holtgrewe, IATSE; Duncan Crabtree-Ireland, SAG-AFTRA; Ben Sheffner, Motion Picture Association (MPA); Dennis Kooker; Sony Music Entertainment; Rick Beato, musician; and Ziad Sultan, Spotify. The speakers generally agreed that no legislative changes were necessary at this time because the issues were being played out in the courts. Speaker statements are available on the Senator’s webpage. Though some members of the creative industries or their collective representatives were present for the hearing, it was disheartening to see that individual creators and authors—critically from the visual arts and literary fields—were affirmatively not invited to speak and share their views at the meeting. These creators are the most immediately affected and harmed by generative AI technologies, as evidenced by the slew of the AI-copyright related lawsuits brought in the past year. The absence of these creators in this meeting casts a pall on any impact that this meeting otherwise would have had.

Senate Judiciary Subcommittee on Privacy, Technology, and Law: On May 16, the Senate Judiciary Subcommittee on Privacy, Technology, and Law held a hearing titled Oversight of A.I.: Rules for Artificial Intelligence. Witnesses included Samuel Altman, CEO, OpenAI; Christina Montgomery, Chief Privacy & Trust Officer, IBM; and Gary Marcus, Professor Emeritus, New York University. The hearing encompassed a wide range of concerns related to AI, including privacy, job disruption, copyright, licensing of AI products, and the impact of Section 230. The focus was on identifying the regulatory measures needed to address these concerns and ensure the responsible development and deployment of AI technologies. Senators Marsha Blackburn (R-TN) and Amy Klobuchar (D-MN) emphasized the importance of compensating creators and copyright owners for the use of their works to train AI. Senator Blackburn specifically suggested SoundExchange’s model. Altman informed members that OpenAI is actively developing a copyright system designed to provide compensation to artists whose work has been utilized in the creation of new content. He also said that content creators should have a say in how their voices, likenesses, and copyrighted content are used to train AI models, stating that “creators should deserve control over how their creations are used” and that OpenAI is working with artists and creators on licensing. However, again, regrettably, there was no one from the creative community invited to this panel to provide even a small window into the views of those most affected by generative AI technologies.  

Looking Forward to 2024

The federal government, whether in Congress, the executive offices, or at the U.S. Copyright Office, was extremely engaged on AI and copyright law issues in 2023. In 2024, we can expect the Copyright Office’s reports from its AI and copyright law study in addition to recommendations the Biden Administration will set forth based on the Copyright Office’s findings. We can also expect Congress to remain engaged in copyright law and AI issues, with perhaps several more hearings, as other AI developments, particularly in the courts, continue to unfold.

Stay tuned for Part 2 of this blog series, which will highlight some of the major AI and copyright law cases from 2023. For more information on AI and copyright issues in the meantime, visit our AI and Copyright webpage and sign up for our AI Alert for the latest news on this subject.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

Generative AI, Copyrighted Works, and the Quest for Ethical Training Practices

Post publish date: December 14, 2023

The legal and ethical concerns surrounding generative artificial intelligence (AI) systems being trained on copyrighted works are currently under scrutiny, with the U.S. Copyright Office conducting an Artificial Intelligence Study to address such practices. The study aims to provide insights that will assist the Copyright Office and other stakeholders in better comprehending the extent to which AI systems may be infringing protected works. Simultaneously, both a White House Executive Order and the Federal Trade Commission (FTC) are also actively exploring the potential adverse effects of AI on copyright owners and their interests, along with numerous Congressional officials holding hearings, forums, and meetings.

Hundreds of creative and copyright community members—large and small, individuals and organizations, and spanning all disciplines of creativity—have shared their perspectives in their comments submitted to the Copyright Office’s study on the use of copyrighted works to train AI systems. From concerns about loss of control over their creative works to issues related to loss of revenue, these submissions reflect the opinions of the creative and copyright community. Below are just a few of the comments received by the Office from creative stakeholders.

Association of American Publishers (AAP): Believes Congress and the Copyright Office should monitor the training of generative AI systems. AAP, an organization that represents many of the largest book, journal, and education publishers across the United States, expressed concern for its members, stating in its initial AI Study comments that “The wholesale reproduction of copyrighted works for purposes of training and developing AI systems is infringement,” that “the unlicensed ingestion of copyrighted materials for training does not qualify as fair use,” and “any framework intended to promote AI development must not diminish the copyrights of the authors and publishers whose works are essential to a free, flourishing, and well-informed society.”

American Intellectual Property Law Association (AIPLA): Believes in human created content and the essential nature of copyright. AIPLA, an organization that represents both users and owners of intellectual property, stated in its comments that, “We are compelled to note, as a fundamental value statement, that we believe human-generated creative works are irreplaceably possessed of inherent value. Art, music, dance, and storytelling are cultural universals found in all human societies throughout history. These expressive endeavors are foundational elements of human culture, [which] provide the medium for the inheritance of culture, and the shared exploration of the human condition.”

American Society of Media Photographers (ASMP) and North American Nature Photography Association (NANPA): Assert that AI technology must be implemented in a manner that respects copyright law. ASMP and NANPA, which work to protect and promote the interests of professional photographers and all visual creators, assert that, “most significantly, the need for artificial intelligence technology [must] be implemented in a manner that respects the basic principles of our copyright laws.”

American Society of Composers, Authors, and Publishers (ASCAP): Concerned that the unchecked use of AI threatens to undermine the purpose of copyright laws. ASCAP, a performing rights organization (PRO) that collectively licenses the non-dramatic public performance rights of its members’ musical compositions on a non-exclusive basis to music users that publicly perform music in virtually every communications media, asserts that,While generative AI has the potential to enhance human creative efforts, the unchecked use of this technology threatens to undermine the very purpose of the copyright laws by supplanting, rather than supporting human creative work…the AI industry must be held responsible under all applicable laws—including existing copyright laws and state and federal legal frameworks—to ensure that it does not unfairly and illegally exploit the work of human artists, writers, and other creators.”

Authors Guild: Calls the training of AI systems using copyrighted works “self-evidently unfair.” The Authors Guild is a national non-profit association of more than 14k professional, published writers of all genres. In its comments to the Copyright Office’s AI Study, the Guild asks (and then answers) the question of why AI large language model (LLM) developers rely on pirated copies of eBooks. The Authors Guild asserts that it’s “Because the only places to get a trove of eBook texts without permission from the copyright owners (or licensing Google’s Google Books collection) are from pirate websites. The developers understood that they needed large numbers of books, but they did not want to pay to license them.” The comments go on to assert that “This practice must be brought to an end…Not only is it self-evidently unfair, but if unrestrained, will put many human writers out of work and devalue our literature and culture.”

Copyright Clearance Center (CCC): Asserts that “AI development must be paired with an appreciation of and respect for creators and copyright.” CCC, an organization that has more than 40 years of experience in copyright and information management solutions, noted in its AI Study comments that, “…AI systems have incredible potential to support our society and economy in ways both familiar and yet unknown. To fulfil this potential, AI development must be paired with an appreciation of and respect for creators and copyright. Copyright is an engine of innovation, a key part of economic activity, and incentivizes the creation of foundational materials upon which AI is often built. Support for copyright is crucial to our culture, science, jobs, and the advancement of AI itself.”

Digital Media Licensing Association (DMLA): Wants to ensure that AI models respect copyright and that content is lawfully obtained. DMLA, which represents the interests of entities that primarily license still and motion images to publishers, the media, advertisers, designers, and others, asserts that “[it] supports the potential and opportunity that generative AI models offer but wants to ensure that the models respect copyright, that training content is lawfully obtained, that there is transparency in the content that is used to train the models, including that records are maintained and accessible in order to identify content that is used in training, and that the creative community continues to receive benefit if their works are used in the creation of generative.”

Directors Guild of America (DGA): Believes in the need to safeguard the creative vision of directors. DGA, which represents the interests of more than 19,000 directors and members of the directorial teams who create feature films, television programs, commercials, documentaries, news, and other motion picture productions, noted in its AI Study comments that, “It is essential for policymakers and the courts to strike a balance… by developing meaningful and enforceable guardrails to prevent technological advancements from undermining intellectual properties, job opportunities, and artistic integrity…The technological boundaries of [generative] AI are virtually limitless, emphasizing the need to safeguard the integrity and the singular creative vision of a director.”

Entertainment Software Association (ESA): Supports the use of AI and copyright laws and protections. ESA, which represents the video game industry, is no stranger to the use of AI in gaming. As noted in ESA’s comments,“Artificial intelligence technology has been deployed in games for over two decades as useful tools for a variety of purposes, such as background and terrain generation, processing or analysis of data within a game, or quality control.”

However, ESA further noted that “Copyright protection is vital to the innovative AI technologies incorporated into video games,” and it urged the Copyright Office to “make legal and policy recommendations to Congress that will incentivize creativity [and] encourage copyright registration,” along with the advancement of generative AI technology.

Getty Images: Supports “responsibly developed and properly licensed AI models.” Getty Images, a visual content creator with more than 800k global customers, and the co-developer of Generative AI by Getty Images (which is trained exclusively on Getty Images content and data), submitted comments to the Copyright Office’s AI Study in support of “responsibly developed and licensed AI models.” However, Getty’s comments also go on to note that “There are significant risks to consumers, creators, and the public interest when developers of AI Systems and AI Models exploit copyrighted content without permission from the relevant copyright holders.”

Vince Gilligan: Asserts that AI companies are currently “hawking guessing machines.” Vince Gilligan is the creator of the hit TV series “Breaking Bad.” In his comments to the Copyright Office’s AI Study, Gilligan states, I find the phrase ‘artificial intelligence’ to be problematic. To me, it’s false advertising. It implies that the current state of the art is a technology that can independently think and create—when in fact it can’t and doesn’t. Someday, true generative artificial intelligence may come to exist. But right now, what companies such as OpenAI, Google and Meta are hawking are only elaborate guessing machines…I’ll admit, it’s an impressive trick. But it’s not creation. It’s not storytelling…Meanwhile, the large language models these companies use to work their magic are made up of hundreds of millions of pages of novels, textbooks, screenplays, magazine articles, social media posts, limericks, clipped recipes, you-name-it. I’m sure every word of ‘Breaking Bad’ has been jammed in there somewhere. Only, I don’t remember giving anyone the okay to do that.”

Graphic Artists Guild: Says it is “critical that copyright remain true to its purpose: to incentivize creators by assigning to them the exclusive rights to their works.” The Graphic Artists Guild, a trade association representing the interests of illustrators, designers, web developers, animators, and other visual artists that rely on licensing, notes in its AI Study comments that, “The major concerns [for our members] are the use of visual artists’ copyrighted images, without consent or compensation, in [AI] training datasets; competition from the vast quantities of visual content created by AI image generators; the generation of visual works, which resemble their original works in the outputs; and the ability of users to [copy] a visual artist’s unique style. To protect the livelihoods of professional graphic artists, it is critical that copyright remain true to its purpose: to incentivize creators by assigning to them the exclusive rights to their works. Policies around generative AI must consider first and foremost the interests of the human creators, without whom this technology would not exist.”

Motion Picture Association (MPA): Believes the U.S. Copyright Office has “an important role to play in ensuring a careful and considered approach to AI and copyright. The leading global advocate of the film, television, and streaming industry, believes that the Copyright Office will play a pivotal role in what unfolds regarding copyright and AI, noting that “MPA’s overarching view, based on the current state, is that while AI technologies raise a host of novel questions, those questions implicate well-established copyright law doctrines and principles. At present, there is no reason to conclude that these existing doctrines and principles will be inadequate to provide courts and the Copyright Office with the tools they need to answer AI-related questions as and when they arise. The Copyright Office has an important role to play in ensuring a careful and considered approach to AI and copyright.”

National Music Publishers’ Association (NMPA): Urges U.S. Copyright Office to “err on the side of protecting human creators.” NMPA, the trade association representing American music publishers and their songwriting partners, urged “proactive protections for creators” by noting that, “[T]he development of the generative AI marketplace is marked by breathtaking speed, size, and complexity. Hindsight may well prove that there is no hyperbole in saying that generative AI is the greatest risk to the human creative class that has ever existed…Even more alarming is that we do not know how long the window is to act before it is too late. We therefore implore the [U.S. Copyright] Office to support proactive protections for human creators and, where there is uncertainty, to err on the side of protecting human creators.”

News/Media Alliance: Finds pervasive unauthorized use of publisher content to power generative AI technologies. The News/Media Alliance, which represents more than 2,200 diverse U.S. news and magazine publishers—recently published a white paper and a technical analysis, along with submitting comments to the Copyright Office on the use of publisher content to power generative artificial intelligence technologies. Per its comments, the News/Media Alliance asserted that, “Together, the [study comments, white paper, and technical analysis] document the pervasive, unauthorized use of publisher content by AI developers, the impact this may have on the sustainability and availability of high-quality original content, and the legal implications of such use.” Based on its white paper and research, the News/Media Alliance requested that the Copyright Office publicly clarify that use of publishers’ content for commercial generative AI training and development is likely to compete with—and harm—publisher businesses, “which is disfavored as a fair use.”

Professional Photographers of America (PPA): Believes there must be accountability surrounding photos used to train AI systems. With nearly 35k members, PPA is one of the world’s largest nonprofit organizations that serves professional photographers. In PPA’s comments to the U.S. Copyright Office, it stated, “Our knowledge, understanding, and view of our natural world will diminish as creators are no longer incentivized to create…Our wonder for the world will be replaced with the wonder for whether anything we see is real or just more AI. In this almost certain eventuality, all of humanity loses. To the extent that AI systems generate images that are substantially similar to the creative work of photographers, there must be accountability. They cannot be allowed to steal photographers’ work and then compete with those photographers using that same work.”

Recording Industry Association of America (RIAA) and the American Association of Independent Music (A2IM): They are not seeing AI implemented responsibly or ethically. RIAA and A2IM represent individual music makers and music organizations. In filing their comments together for the Copyright Office’s study, the organizations noted, “AI can be enormously beneficial when it is implemented in a responsible, respectful, and ethical manner. Like every new technology, AI will undoubtedly push creative boundaries and help shape recording artists’ visions and expand their commercial reach. We embrace AI’s potential as a tool to support human creativity, provided that it is not used to supplant human creativity. As exciting as AI is, … we are not seeing it implemented in a responsible, respectful, and ethical manner. In particular, the unauthorized ingestion of our members’ copyrighted works for purposes of training generative AI systems amounts to copyright infringement on a massive scale and causes significant economic harm to our members and their sound recording artists.” 

SAG-AFTRA: Concerned unauthorized AI use will discourage future human creativity and expression. SAG-AFTRA, the nation’s largest labor union representing working media artists with more than 160,000 members that include actors, recording artists, broadcasters, and many others, noted in their comments that, “The number of AI technologies and the quality of AI-generated content has increased exponentially in a very short time. Unchecked, we are rapidly barreling toward a future where creators will have to compete with unauthorized versions of themselves or their works. This is already starting to happen in the creative fields; it is unfair, and it will discourage future human creativity and expression.”

What will happen next? 

The initiatives that will follow the Copyright Office’s AI Study remain unknown, and a similar degree of ambiguity surrounds the ongoing efforts of Congress, the White House, and various government agencies that are striving to navigate the intersection of AI and copyright issues. However, one consistent theme emerges from the comments submitted to the Office’s AI Study by members of the creative community. Across the board, there is a collective assertion that more ethical and transparent training practices must be employed by AI companies. This includes obtaining permissions from—and providing compensation to—the rightful owners of the content used.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

How Existing Fair Use Cases Might Apply to AI

Post publish date: April 13, 2023

Fair use and AI has been a popular topic recently given the flurry of developments to generative AI systems. As discussed in the first installation of this blog, it is impractical to make broad predictions about how courts might rule on fair use and AI. However, after a high-level analysis of the four fair use factors, it is helpful to look at a few influential fair use cases to determine how they may be useful to guide us in determining how courts might analyze the four fair use factors when copyrighted works are used as training materials for AI. We picked the cases most frequently referenced when discussing fair use of training materials for AI.

Authors Guild v. Google

This is the case most AI developers reference when they say AI use of copyrighted training materials qualifies as a fair use. It is important to revisit the court’s analysis, because the principles from this case as applied to the typical AI-training materials fact pattern, particularly on the court’s transformative fair use reasoning, do not appear to be so clear-cut in favor of a fair use finding.

While the Second Circuit held Google’s copying of books to create a searchable database in Authors Guild v. Google qualified as fair use, its decision was limited significantly to the specific facts of that case. Those facts included not only the actions Google took to secure the reproductions of the books but also the fact that that Google was using the books to create a searchable database that would provide information about them. Instead of using copyrighted works to create a new product that could usurp the market for the underlying work, the court found that Google used the books to shed light on information about the book, for example, pinpointing where and how many times the word “whale” is used in Moby Dick. The court  noted that Google also took significant steps to secure the copies of books it used in its database, such as only showing “snippets” of works to highlight a search term and implementing anti-hacking measures. Due to these security measures, the court concluded that there was little risk that Google’s actions could serve as a substitute for the copied works.

Moreover, the court found that Google’s reproduction of copyrighted works did not create significant market harm for copyright owners. The Second Circuit held that, although snippet view would surely cause some decrease in sales, demand satisfied by snippet view would be for the work’s factual elements, not its creative, protected elements.

Unlike in Authors Guild v. Google, the generative AI training does not provide factual information about the copyrighted works. Instead, most generative AI reproduce and draw on the expressive elements from the copyrighted works as part of a process that results in works that would often act as market substitutes for the training materials—to say nothing of the harm caused to copyright owners who already offer licenses for AI training. A market exists for copyright owners to license their works for use in AI training datasets, and a court granting a fair use exception would destroy that market.

Texaco v. American Geophysical Union

Texaco v. American Geophysical Union may be a particularly important case in considering AI as an emerging technology. In Texaco, a commercial research company created an internal policy of photocopying and disseminating journal articles to hundreds of scientists. On appeal, the Second Circuit held that considering the existing licensing market for photocopying, the company’s practice harmed the publisher’s right to derive value from its copyrighted work. Although the licensing market in Texaco was relatively new and still developing, the court held that its existence weighed against fair use because the researchers had an alternative to infringement by simply licensing their photocopies. Although reproduction of the work was internal to the company and article copies were not distributed to the public, the Second Circuit focused on the fact that the Copyright Act grants the exclusive right of reproduction to the copyright holder. Texaco affirms the importance of this right, standing alone, even when the right to distribution has not been implicated. The research company also argued that its use was transformative because photocopying converted articles into a “useful format,” and because the copying aided in scientific research. The court disagreed, holding instead that a fair use exception cannot apply to a process, only to a work of authorship, and that the defendant could not “gain fair use insulation. . . simply because such copying is done by a company doing research.” The fact that the company’s research was done for commercial purposes further supported the denial of a fair use exemption.  

The general process of training generative AI on copyrighted materials shares several characteristics with the defendant’s photocopying in Texaco. The technological process of generative AI training seems to wholly reproduces a copyrighted work. Furthermore, like the burgeoning photocopying license market in Texaco, copyright owners offer AI training licenses. Just as the court ruled in the fair use case discussed above, Authors Guild v. Google, allowing a fair use exception for AI training would effectively destroy this licensing market.

Perfect10 v. Amazon.com

Perfect10 v. Amazon may become a battle ground for what constitutes a transformative means to an end rather than a non-transformative use. In this case, Perfect10 sued Google (and others) for its use of thumbnail versions of Perfect10’s copyrighted images on its search platform. The Ninth Circuit held that the defendants’ use was transformative because the thumbnails merely functioned as a “pointer,” providing social benefit by merely pointing to where a consumer could find the full images. The court’s decision was also influenced by the fact that because the image copies Google made were so small, they could not act as substituted for Perfect 10’s copyrighted works (and were also significantly smaller than the smallest resolution copies Perfect 10 licensed). It should be noted that Google’s use in this case did not merely repackage copyrighted works to recapture the artistic value they provided, the court found that Google instead created an entirely novel value by providing information about the works copied, which is not the case with AI generated art and the training of AI systems on copyrighted works.

Fox News v. TVEyes

Fox News v. TVEyes further demonstrates the tension between the first and fourth factors of the fair use analysis. In this case, defendant TVEyes offered a subscription service allowing consumers access to television programming to find, download, and share specific content like certain dialogue or other features of that programming. While the Second Circuit held that TVEyes’ service of increasing technological efficiency of viewing and sharing relevant clips was somewhat transformative, this transformative purpose could not outweigh the great harm to Fox by usurping its licensing market for clips. The court emphasized that transformativeness requires more than a “repackage” of a copyrighted work by “altering the [copyrighted work] with new expression, meaning or message.” Although TVEyes “modestly” transformed the work, the court held that the fourth factor outweighed the first because TVEyes “undercut[] Fox’s ability to profit from licensing searchable access to its copyrighted content to third parties.” The court correctly repositioned the importance of potential market harm and displacement of a copyright owner’s market within the fair use analysis.  

Generative AI trained on copyrighted works may not fare much better than the TVEyes program under a similar analysis. Many developers argue that training generative AI creates transformative value from copyrighted works. However, like TVEyes, these developers undercut copyright owner’s ability to license their works for AI development— a market that had already existed before the recent AI boom. The Second Circuit strongly emphasized the importance of the fourth factor in its analysis of TVEyes’ program, and the gravity of this factor, particularly in the light of existing markets must be given its proper weight in the fair use analysis of the use of copyrighted works in a generative AI training case.

Conclusion

The cases discussed in this blog may provide insight to how courts will rule in cases of generative AI training on copyrighted works. While future cases will be highly fact dependent, keeping in mind the principles of the fair use cases discussed in this blog may provide insight into how courts will rule in this area.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

Does the Use of Copyrighted Works to Train AI Qualify as a Fair Use?

Post publish date: April 11, 2023

As lawsuits challenging the use of copyrighted works in generative artificial intelligence (AI) systems begin to snowball, there is one question at the forefront of the conversation: does the unauthorized use of copyrighted materials to train generative AI qualify for the fair use exception? As with all fair use determinations, the answer is that it depends; what is clear is that fair use does not magically excuse all AI use of copyrighted works as training materials. However, as courts begin to hear these cases, fair use and AI will likely remain at the forefront of copyright conversations.

Before further examining this question, it is important to explain that any fair use analysis is highly fact dependent. Anyone who makes overbroad statements that every use of copyrighted works for the purpose of training AI qualifies for the fair use exception, is patently wrong and has overlooked the nuances necessary to determining whether an infringing use may qualify for the fair use exception. The same can be said for those who make overbroad statements that every use of copyrighted works for the purpose of training AI is not a fair use. However, it is possible to make some educated generalizations about how a court might analyze the four fair use factors and to consider how existing case law may be applicable in situations where copyrighted works are used to train generative AI.  

First Fair Use Factor: The Character and Purpose of the AI Use

The first fair use factor, the purpose and character of the use, encompasses what will likely be a popular argument for proponents of the position that using copyrighted works to train AI qualifies for the fair use exception. The first factor takes into consideration whether the use is a commercial or nonprofit educational use, and whether the work “transforms” and adds something new to the copyrighted work. If a use is found to be non-commercial, that finding weighs toward a finding of fair use, while uses that are commercial weigh against a finding a fair use. However, the question of commerciality is just one part of the analysis of the first factor, and therefore is not dispositive.

In many cases, AI platforms trained on copyrighted works are used for commercial purposes. This is especially true of some of the most popular generative AI tools. Although some generative AI platforms may not initially appear to be commercial, platforms like Midjourney, Dal-E, and ChatGPT are now offering commercial subscriptions to their users (for example, OpenAI offers a subscription model for ChatGPT). Furthermore, even if an AI is not yet commercial, any plans AI developers make to enter the commercial market will also result in this factor weighing against a fair use finding. The bottom line regarding this part of the first fair use factor and AI is that courts are not likely to look kindly on an infringing use that reproduces copyrighted works to make money or otherwise financially benefit from their expressive, protectedelements. 

Instead, the crux of many AI developers’ fair use defenses will come down to the transformative purpose element of the first factor analysis, which has been an increasingly determinative (and amorphous) factor in fair use cases.

Many developers argue that training generative AI on copyrighted works is transformative because AIs scan works to identify and utilize “patterns inherent in human-generated media.” However, as will be discussed in the second part to this blog, transformativeness requires something more than reproduction for consumption, such as communication of information about the underlying work, or inclusion of information about the underlying work in a database. While findings of transformative uses have often disproportionately led to decisions that the use qualifies as a fair use, courts should still recognize that even if training AI is found to be a transformative use, that does not automatically mean that it qualifies as a fair use.

Second Fair Use Factor: Nature of the Copyrighted Work

Under the second fair use factor, courts analyze the nature of the copyrighted work; for example, whether the copyrighted work is factual or creative. Many generative AIs training on copyrighted works may run afoul of the second factor of this analysis because they are trained, at least in part, on highly creative works like visual art, music, or writings. While this factor is rarely dispositive, when the underlying work is creative, it weighs against a finding of fair use. To be clear, since fair use is a fact intensive analysis, there very well may be AI platforms that train on more factual works, and that may sway this factor towards a fair use exception.

Third Fair Use Factor: Amount of the Copyrighted Works Used

The third fair use factor considers the amount and substantiality of the portion used in relation to the copyrighted work as a whole. Courts have held that when an infringer copies the entirety of a copyrighted work, or the work’s creative “heart,” this factor almost always weighs against fair use, especially where multiple complete works are copied. Although in some cases copying of an entire work may be permissible because it is necessary to accomplish a transformative purpose, an infringer may take no more than necessary to achieve the transformative purpose.

The logistics of most AI generation and training involve a complete and total copying of multiple copyrighted works. In its recent complaint against Stability AI, for example, Getty Images describes that the reproduction of its high quality images, paired with detailed text descriptions has “been critical to successfully training the Stable Diffusion model to deliver relevant output in response to text prompts.” In other words, AI generators must copy as much as possible from expressive works, including the most expressive or crucially creative parts of the work, to achieve the purpose of training to generate quality output.  

In a statement to the USPTO, and relying on Authors Guild v. Google, Open AI argues that the amount of a copyrighted work copied is not the point of the third factor, but rather the amount of a copyrighted work made available to the public. Open AI admits that the use of entire works is “reasonably necessary” to create an accurate AI but argues that substantial copying should not matter when the copy is not made available to the public. This argument is completely unsupported by the Copyright Act, and if courts were to adopt this approach it would eviscerate the reproduction right by requiring a distribution to take place for a violation of the reproduction right to occur. Moreover, the fair use exception explicitly directs courts to look at the amount and substantiality of the copyrighted work that is used as opposed to a judicially created “public access” theory. Add to that the fact that while OpenAI and other developers say that copies are not made available to the public, it’s unclear whether or how repositories of works that are created without authorization for training purposes are safeguarded against further reproduction and distribution. Although the third factor is not dispositive and (like the other factors) is highly fact dependent, where works are produced in their entirety, this factor will likely weigh against a fair use exemption.

Fourth Fair Use Factor: The Impact on the Value and Market for the Copyrighted Work

The fourth factor of a fair use analysis weighs the effect the infringing use has on the potential market for or value of the copyrighted work. Courts have held this factor to weigh against a fair use finding when the infringing work acts as a market substitute for the copyrighted work, and sometimes even when an infringing use lays outside the markets a copyright owner currently occupies (so long as that market is one a copyright holder might reasonably enter).

There are strong arguments to be made that AI training on copyrighted works harms the market and value for those copyrighted works. Foremost is the fact that many developers do not compensate copyright owners for the works used to train generative AI, despite the fact that many copyright owners presently offer AI training licenses. This destroys copyright owners’ licensing markets. Generative AI systems trained on copyrighted works harms the market for copyright owners to license works for AI training datasets. Many artists and copyright owners offer licenses for their works to be included in AI training datasets. Getty Images for example offers licenses for AI developers to use its images in training datasets (licenses which Stability AI did not obtain). Other copyright owners like academic publishers and others also offer AI licenses. The continual development of these licensing markets points to the fact that copyright owners are part and parcel of the AI developing world and are working with or are open to working with AI developers to advance AI innovation and tools. Courts have recognized where such a viable market exists to help artists capture value from reproduction and distribution of their copyrighted works, potential licensing value from that market should be considered in a fair use analysis.

Conclusion

As discussed, fair use cases involving generative AI training on copyrighted works will be highly fact dependent. While some AI-related uses may qualify as fair use, unauthorized use of copyrighted material to train AI systems cannot be handwaved by a broad fair use exception that disregards the rights of creators. Neither the Copyright Act nor case law (as will be discussed in the second installment of this blog) would support such a broad fair use exception for AI. Without the factual nuances of a real application, it is difficult to say how courts may decide fair use and AI cases. However, while technological innovation will often test our understanding and application of fair use, the underlying principles of copyright law must not be cast aside in favor of in an unhinged race of technological advancement which may lead to harmful and irreversible consequences. AI should be responsibly and ethically developed, and developers must respect artists and copyright owners’ rights.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

Current AI Copyright Cases – Part 2

Post publish date: April 6, 2023

Cases/Disputes Involving AI Copyright Authorship

As the second in our Current AI Copyright Cases series, this blog highlights disputes involving AI copyright authorship (part one focused on AI cases related to training data issues). As creators continue to test the boundaries and creativity of AI technology, these challenges with the United States Copyright Office (USCO) detail important considerations of human and AI authorship and if copyright registration allows for AI-related authorship.  

Thaler v. Perlmutter

In June 2022, Dr. Stephen Thaler filed suit against Shira Perlmutter, the Register of Copyrights and Director of the United States Copyright Office (USCO), as well as the USCO, arguing that the USCO’s denial of his copyright registration for a work claimed to be authored by AI is an arbitrary and capricious agency action and not in accordance with the law. In November 2018, Thaler applied to register an image produced by one of Thaler’s AI systems, called the “Creativity Machine,” which produced the two-dimensional image titled “A Recent Entrance to Paradise.” The Copyright Office denied the copyright registration application because the image “lack[ed] the human authorship necessary to support a copyright claim.”

Thaler subsequently filed two requests for reconsideration to the USCO in September 2019 and May 2020. Each reconsideration was denied by the USCO, which explained that since copyright law is limited to “original intellectual conceptions of the author,” it refused to register the claim because it determined a human being did not create the image. The Office also asserted that Plaintiff had failed to either provide evidence that the image is the product of human authorship or convince the USCO to “depart from a century of copyright jurisprudence.”

On January 10, Thaler filed a motion for summary judgment on the grounds that no genuine issue as to any material fact exists and that he is entitled to judgment as a matter of law. While Thaler confirmed that the submission lacked traditional human authorship, he argued that the USCO’s human authorship requirement was unsupported by law. Largely basing his copyright ownership arguments on common law property principles, Thaler claimed that this denial creates “a novel requirement for copyright registration that is contrary” to the language within the Copyright Act, contrary to the purpose of the Act, and contrary to the Constitutional to promote the progression of science.

On February 7, the Copyright Office filed a response to Thaler’s motion for summary judgment, asking the court to deny Thaler’s motion and grant its cross motion. The Office cites to the many sections of the Copyright Office Compendium that address the “longstanding requirement” of human authorship, explaining that registration decisions that rely on the Compendium cannot be deemed to be “arbitrary and capricious”—and therefore cannot, as Thaler claims, run afoul of the Administrative Procedures Act (APA). The Office also reiterates that the language of the Copyright Act, Supreme Court precedent, and federal court decisions refusing to extend copyright protection to non-human authorship all support its position. Finally, the motion argues that the court must reject Thaler’s arguments that he is the owner of the work based on common law or the work made for hire doctrine “(1) it does not affect whether a work is within the scope of copyright; and (2) the Work did not meet the statutory requirements that the work be prepared either by an ‘employee’ or pursuant to ‘a written instrument.’”

On March 7, Thaler submitted an opposition to the Office’s motion for summary judgment. He argues that contrary to the Office’s arguments, the plain language of the Copyright Act and its constitutional mandate does not limit copyright protection to human authorship and that historically, non-human authorship has been legitimized by recognizing copyright protections for anonymous and pseudonymous works and through the work made for hire doctrine. The brief also argues that the Copyright Office is not entitled to deference under the Administrative Procedure Act (APA) because while it is empowered to interpret its own regulations, it is not empowered to interpret the Copyright Act itself.

Most recently, on April 5, the Copyright Office filed a reply to Stephen Thaler’s opposition to its motion for summary judgment. The reply reiterates the Office’s position that Thaler’s assertions that the Copyright Act allows for non-human authorship are incorrect and not supported by statutory text. Once again arguing that Thaler misconstrues the work made for hire doctrine, the reply explains that Thaler’s “Creativity Machine” lacks the capacity to enter into a valid contract and cannot plausibly qualify as Thaler’s agent or employee. Going on to refute Thaler’s argument that case law supports non-human authorship, the reply cites appellate court decisions that have uniformly rejected authorship for things like celestial beings and animals. Addressing policy issues, the Office asserts that Thaler disclaims the importance of economic incentives for human creators. Finally, the Office explains that courts’ deference to the Copyright Office is routine and that its decision is not arbitrary and capricious.

Kris Kashtanova’s Attempt to Register “Zarya of the Dawn” with the Copyright Office

AI and copyright authorship issues came up again for the Office, but this time it involved an artist who claimed to be the author of a work though AI was used in the creation process, In September 2022, Kris Kashtanova, a New York City artist and AI consultant and researcher, received a copyright registration for a graphic novel titled Zarya of the Dawn made using the commercial AI art generator Midjourney. This registration was subsequently widely publicized as the first known instance of an AI-generated work being successfully registered with the USCO.

However, on October 28, 2022, Kashtanova, received notice from the U.S. Copyright Office (USCO) that the registration may be canceled. The USCO initiated this notice on the basis that “the information in [her] application was incorrect or, at a minimum, substantively incomplete” due to Kashtanova’s use of an artificial intelligence generative tool (“the Midjourney service”) as part of the creative process.

In November 2022, Kashtanova responded to the Office’s correspondence, asserting that graphic novel reflects Kashtanova’s authorship in various ways. They claimed that the visual structure of each image, the selection of the poses and points of view, and the juxtaposition of the various visual elements within each picture were consciously chosen. In this respect, they argued that the creative selections are similar a photographer’s selection of a subject, a time of day, and the angle and framing of an image.

Kashtanova also argued that, even if their work does not meet this legal standard of authorship, the work should still be copyrightable as a compilation under § 101 of the Copyright Act. The Copyright Act defines a compilation as “a work formed by the collection and assembling of preexisting materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship.” Kashtanova claimed that the definition of a compilation does not require that the materials used to create a compilation be copyrightable themselves, and that the Midjourney-associated images used in the work should be classified as data.

On February 21, 2023, the USCO issued a response to Kashtanova’s November 2022 correspondence, concluding that Kashtanova “is the author of the Work’s text as well as the selection, coordination, and arrangement of the Work’s written and visual elements.” While the USCO granted the registration as to those elements, it concluded that images in the work that were generated by the Midjourney technology are not the product of human authorship, and therefore do not receive copyright protections. The USCO explains that “because the current registration for the Work does not disclaim its Midjourney-generated content, [it] intends to cancel the original certificate issued to Kashtanova and issue a new [certificate] covering only the expressive material that she created.”

Conclusion

As these cases and disputes continue to move in their respective processes in the courts and the Copyright Office, we will begin to get definitive answers to the increasingly complex questions surrounding AI authorship. Already, the Office has issued a notice of “clarification” in its registration practices for examining and registering works that contain material generated by the use of AI technology. The Office notes that this statement of policy “describes how the Office applies copyright law’s human authorship requirement to applications to register such works” and highlights its considerations in authorship and analysis of the authorship elements within the production of a work. We describe a few takeaways to consider from that guidance in this blog post. These resolutions are likely to provide more clarity as to what is considered human authorship and whether AI technology can “author” works or “own” copyrighted works, resulting in a lasting impact on the intersection of AI and copyright law.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

Current AI Copyright Cases – Part 1

Post publish date: March 30, 2023

The Unauthorized Use of Copyrighted Material as Training Data

As the world of technology continues to evolve, one of its most intriguing phenomena, artificial intelligence (AI), has taken center stage. While these new technologies offer exciting creative opportunities, copyright owners are beginning to challenge AI developers in the courts over the permissionless use of their copyrighted works for the training of these AI tools. These AI copyright cases could potentially clarify the intersection of AI and copyright law, at least on the input side of this innovative new technology.

Getty Images Lawsuit Against Stability AI – United States

In early February, Getty Images filed a complaint in the United States District Court for the District of Delaware against Stability AI, alleging that the developer of the popular Stable Diffusion AI image generator infringed Getty’s copyrighted photographs, removed or altered copyright management information (CMI), provided false copyright management information, and infringed its trademarks. Getty Images claims that Stability AI copied photographs from its website and used over 12 million images and associated metadata to train Stable Diffusion, despite express terms of use on Getty’s website expressly prohibiting such uses.

With regards to the copyright management information claims, Getty Images argues that the output generated by Stable Diffusion often contains a modified version of Getty Images’ watermark, “underscoring the clear link between the copyrighted images that Stability AI copied without permission and the output its model delivers.” The complaint goes on to allege that Stability AI knowingly falsified, removed, or altered Getty Images’ watermarks and metadata with the intent to induce, enable, facilitate, or conceal infringement of Getty Images’ copyrights.

As one of the first infringement lawsuits brought against a developer of a generative AI tool for unauthorized use of copyrighted materials for training purposes, the case has the potential to influence future development of AI systems by addressing principal copyright issues like fair use. While Stability has not yet filed a response to the complaint, it will almost surely adopt the position of other AI systems such as OpenAI, which claims that training AI on copyright protected materials qualifies as a transformative purpose that weighs heavily in favor of fair use.

The outcome of this case could also have significant impact on whether creators and copyright owners’ ability and right to license under the Copyright Act will continue to be undermined in the AI context, jeopardizing the livelihoods and crafts of millions of human creators. The lawsuit also touches on another interesting feature of copyrighted works through which AI developers draw incredible value—metadata cleaning and tagging, which streamlines AI training, but is a value that copyright owners would be less incentivized to provide if AI developers are allowed to use their works without permission.

Getty Images Lawsuit Against Stability AI – United Kingdom

Earlier in January 2023, Getty Images announced a lawsuit against Stability AI in the High Court of Justice in London. Similar to the allegations in the U.S. lawsuit, Getty Images claims that Stability AI infringed upon Getty Images’ copyrighted images and works by using them to train Stability’s AI.

While the full details of Getty Images’ UK lawsuit have yet to be made public, the case, similar to its U.S. lawsuit, could have a significant impact on the unauthorized use of copyrighted material for AI systems in the United Kingdom.

The case will be one to closely watch, particularly as the United Kingdom government announced its intent to reconsider a problematic proposal that would have created a broad exception for use of copyrighted works for any AI training. This case has the potential to spark more international conversations about how AI systems and the use of copyrighted works as training material will be treated in different jurisdictions globally.

Visual Artists’ Class-Action Lawsuit Against Stability AI, Midjourney, and DeviantArt

On January 13, 2023, award-winning visual artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed a class action complaint in the United States District Court Northern District of California San Francisco Division, against defendants Stability AI Ltd. and Stability AI, Inc., Midjourney, Inc., and DeviantArt, Inc. The plaintiffs allege that their works were used without permission as input materials to train and develop various AI image generators including Stable Diffusion (Stability AI), DreamStudio (by Stability), the Midjourney Product (Midjourney), and DreamUp (DeviantArt). The plaintiffs also assert that Stability AI generated reconstructed copies of the plaintiff’s works, which they argue qualify as unauthorized derivative works. The plaintiffs point out that the defendants reap substantial commercial and profit on the value of these copyrighted images, highlighting that the defendants’ AI machines generate images “‘in the style’ of a particular artist are already sold on the internet, siphoning commissions from the artists themselves.” The plaintiffs also argue that the defendants are liable for vicarious copyright infringement and violate the Digital Millennium Copyright Act (DMCA) by altering or removing copyright management information from the images owned by the plaintiffs and programming the AI to omit any CMI as part of its output.

Similar to the case Getty filed in the United States, this AI copyright case could have a lasting impact on whether training AI systems on copyrighted works qualifies as fair use and whether the output of a generative AI system qualify as derivatives of the works it is trained on. Unlike the Getty case, which makes clear that Stable Diffusion “at times” produces images that are derivative of Getty’s copyrighted works that Stability AI copied, the visual artist plaintiffs make a broader claim that all the output of Stable Diffusion is derivative of the works it trains on. In the context of a class action, this claim may be tough to demonstrate. While the complaint includes an example of an instance in which the plaintiffs allege a derivative work is generated from source images, extrapolating that claim to cover all of the AI system’s output would likely be very difficult to prove.

Further, the complaint refers to Stable Diffusion as a “21st century collage tool,” which seems to be used in an effort to oversimplify the AI machine. However, it should be noted that collage is an artistic medium that often utilizes unique skills and techniques to create original works that qualify for copyright protection. Even when collage artists make use of copyrighted material without authorization, the use may qualify as fair use. For plaintiffs to argue Stable Diffusion’s use of the copyrighted materials results in collages that definitely do not qualify for the fair use exception may be too broad an allegation. Lastly, as this case develops and transitions into the discovery phase it will be interesting to learn about the quantity of allegedly infringed works and how the court will attempt to certify that the proposed class of works are registered works.

Programmers’ Class Action Lawsuit Against GitHub 

On November 3, 2022, a class action lawsuit was filed in the United States District Court Northern District of California San Francisco Division, by a group of anonymous programmers against Microsoft, GitHub (a Microsoft subsidiary), and OpenAI alleging a violation of Section 1202 of the DMCA for unauthorized and unlicensed use of the programmers’ software code to develop the defendants’ AI machines, Codex and Copilot. Both are assistive AI-based systems offered to software programmers and trained on a large collection of publicly accessible software code and other materials, including the allegedly infringed software code created by the plaintiffs.

Plaintiffs contend that Microsoft and GitHub used the plaintiffs’ materials without complying with open-source licensing terms, resulting in an unlawful reproduction of the plaintiffs’ copyrighted codes and violating various attribution requirements under the licenses. While the complaint does not include the type of traditional copyright infringement claims seen in the other cases discussed above, it alleges that OpenAI violated Section 1202 of the DMCA, which makes it unlawful to provide or distribute false CMI with the intent to induce or conceal infringement. 

In January, Microsoft and OpenAI filed motions to dismiss in the case, arguing that the plaintiffs lacked standing to bring the case because they failed to argue they suffered specific injuries from the companies’ actions. The companies also argued that the lawsuit did not identify particular copyrighted works they misused or contracts that they breached.

As the case proceeds, it will be interesting to see how the court will apply the requirement of attribution and the provisions of Section 1202 when, as OpenAI argues, no copyrighted works have been identified. An additional hurdle for the plaintiff’s 1202 claim is that the statute will only hold a defendant liable if they intentionally altered or removed CMI knowing that such conduct would “induce, enable, facilitate, or conceal infringement.”  The outcome in this AI copyright case could have a significant impact on different AI industries and how AI system developers approach attribution and licensing practices when using copyrighted works for AI training.

Thomson Reuters Enterprise Centre v. ROSS Intelligence Inc. 

In May 2020, Plaintiffs Thomson Reuters Enterprise Centre GmbH (Thomson Reuters”) and West Publishing Corporation (“West”) sued Defendant ROSS Intelligence Inc. (“ROSS”) in the United States District Court for the District of Delaware, for copyright infringement relating to the unlawful use of the plaintiffs’ unique platform capabilities.  Plaintiffs operate and market Westlaw, a widely known legal search platform used throughout the legal industry. ROSS developed a new legal search platform using AI, and to do so, the company partnered with LegalEase Solutions, LLC, to improve ROSS’s search tool. According to plaintiffs, however, LegalEase “used a bot … to download and store mass quantities of [plaintiff’s] proprietary information,” which it then provided to ROSS.

Plaintiffs alleged that LegalEase’s activities constituted copyright infringement because it used plaintiff’s headnotes to assist ROSS in formulating questions, used key numbers and headnotes to locate judicial opinions, and at one point assisted ROSS in classifying cases under certain legal topics. After the court denied ROSS’s motion to dismiss, concluding that the plaintiffs’ copyright claims were adequate ROSS then filed a motion for summary judgment in early January 2023, asserting its affirmative defense of fair use.

ROSS argues that (1) the use of Westlaw’s content was functional and transformative, (2) the copyright protection for the copied Westlaw materials is “thin,” (3) the amount used holds little weight because “any copying was intermediate and the final ROSS product does not contain any copyrighted materials,” and (4) ROSS’s product did not replace the market for Westlaw’s works.

On February 6, Thomson filed its opposition to ROSS’s motion, arguing that (1) ROSS’s purpose in using the Westlaw Content was to create a legal research product that would compete with and replace Westlaw, without any further transformative purposes, (2) the Westlaw content is creative, which weighs against fair use and undermines ROSS’s claim it did not copy protectable content, (3) the copying was both qualitatively and quantitatively substantial, and (4) ROSS harmed the market for Westlaw content by taking and using Westlaw content to simply generate a ROSS  product to displace Westlaw’s product.

In addition to addressing the question of whether training AI on copyrighted materials constitutes transformative fair use, this case is likely to provide a unique opportunity in understanding how courts will analyze a fair use defense related to AI training on materials that include legal opinions that, while themselves not subject to copyright protection, are still accompanied by creative expressive materials created and owned by Thomson Reuters. Furthermore, the court is likely to consider whether scraping material for AI training purposes from a website in violation of terms of service results in breach of contract liability.

UAB Planner 5D v. Facebook, Inc.

In 2019, UAB Planner 5D filed a complaint in the United States District Court Northern District of California, for copyright infringement and trade secret misappropriations against Facebook, Inc., Facebook Technologies, LLC, and The Trustees of Princeton University. Planner 5D, a Lithuanian company, operates a home design website that allows users to create virtual interior design scenes using a library of virtual objects (such as tables chairs, and sofas) to populate the scenes. Planner 5D claimed it is the copyright owner of the three-dimensional objects and scenes, and in the compilation of the objects and scenes.

Planner 5D alleged that computer scientists at Princeton downloaded the entirety of Planner 5D’s data collection of objects and scenes because of the collection’s uniquely large and realistic qualities. It also alleges that not only did Princeton use this data for their own research purposes, but that Princeton also posted the data to a publicly accessible Princeton URL and labeled it the ‘SUNCG dataset.’ Planner 5D alleged that Facebook was also interested in its objects and scenes collection that would help it tap into the commercial potential of scene recognition technology. After a motion to dismiss copyright claims for Planner 5D’s failure to show its objects and scenes are subject to copyright protection was granted in July 2020, Planner 5D amended its complaint and a second motion to dismiss by Facebook was denied by the court in April 2021.

On February 17, 2023, Facebook filed a motion for summary judgment arguing that Planner 5D cannot establish ownership of a valid copyright. The defendants argue that the discovery phase “confirmed that Planner 5D’s works are data files that cannot be copyrighted as computer programs, as literary works as they lack human authorship, or as pictorial works because they lack originality.” Because of these findings, the defendants jointly moved for summary judgment on all of Planner 5D’s claims. Oral arguments will take place Wednesday, July 12, 2023, in San Francisco, California.

While the core claim of this case concern copyright infringement, the lawsuit also touches on another interesting AI copyright dispute — human authorship vs. AI generation. There is a strong likelihood that Facebook’s defense — that Planner 5D’s scenes and objects are not human authored — may be strengthened by recent announcements by the U.S. Copyright Office (USCO). In response to Kris Kashtanova’s attempt to register their graphic novel, Zarya at the Dawn, the USCO only granted the registration to what it deemed as human-authored elements within the work, but concluded that images in the work that were generated by the AI technology were not the product of human authorship, and not included in the scope of copyright protection in the registration.

Additionally, the USCO released an AI registration policy statement, to clarify its practices for examining and registering works that contain AI-generated material. With regard to the Office’s application of the human authorship requirement, this statement clarifies that “in the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of ‘mechanical reproduction’ or instead of an author’s ‘own original mental conception, to which [the author] gave visible form.’” The USCO’s recent correspondence will likely influence the outcome of this case and subsequently impact the AI copyrightability aspect of this case.

Conclusion

As these AI copyright cases proceed and new cases arise (including ones on AI authorship which are discussed in part two of this blog series), the U.S. Copyright Office and the courts will continue to consider important issues surrounding the unauthorized use of copyrighted materials for training AI systems. While the outcomes of these disputes are sure to impact the development of AI systems, it’s essential that the foundational principles of copyright law are recognized and that the rights of creators and copyright owners are upheld.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

Three Takeaways When Registering Your Copyright in an AI-Assisted Work

Post publish date: March 28, 2023

It seems that all over the news, Artificial Intelligence (AI) dominates the headlines. Copyright is no exception. Given its recent focus on the scope of copyright in an AI-assisted work and in an AI generated work from two-high profile registration cases, the U.S. Copyright Office recently published a Copyright Registration Guidance (Guide) to further explain its examination practices and policies when reviewing registration applications for works containing AI generated material.

The Office cautions that while it is tasked with reviewing the applications, it is ultimately up to the applicants “to disclose the inclusion of AI-generated content in a work submitted for registration and to provide a brief explanation of the human author’s contributions to the work.”

While creators and copyright owners should complete the registration application as truthfully and as accurately as they can, they can do so with the comfort of knowing that a mistake on the registration application will not ultimately invalidate the registration. Also, applicants must remember, that while registration provides many benefits, it is not a necessary requirement for copyright protection because copyright protections are automatic from the moment a copyrighted work is created (regardless of whether or when it is registered with the Office). So if a work is not registered or a registration is deemed to be invalid, it does not mean copyright laws suddenly don’t apply to the work.

Nevertheless, registration is crucial for creators and copyright owners to fully enjoy their rights under the Copyright Act, since registration enables the copyright owner to do things like bring a federal lawsuit and to recover statutory damages in federal court. Since the Office recently granted a copyright registration for portions of an AI-assisted work, it shows that creators can register works created in part using AI. But determining levels of human authorship versus AI generations can be tricky. Based on the Office’s Guide, here are the three takeaways to keep in mind when registering the copyright for an AI-assisted work.

1. Descriptions of Human Authorship on an Application Must Go Beyond Asserting That Creative Prompts Were Entered into an AI Machine

In the Guide, the Office stresses that an applicant must describe the human authorship in the creative elements of the work being registered in the “Author Created” section of the application, and disclaim AI-generated content in the work that is more than de minimis (more on that in takeaway number two).

This is important because a Copyright Office examiner will be analyzing this information and will follow the Office’s policy to refuse registration for a work generated by AI, or for the parts of a work generated by AI, on the basis that AI generated content do not have human authorship.

The Office doubled down on that policy, explaining in the Guide that when it considers the copyright in AI-assisted works it will mainly observe “whether the AI contributions [in the work] are the result of “mechanical reproduction” by the AI or is instead “an author’s ‘own original mental conception, to which [the author] gave visible form.”

The Office illustrates the difference between such “mechanical reproduction” by AI and human authorship in an example where a poem is generated by AI based on a prompt from a human, “write a poem about copyright law in the style of William Shakespeare.” In the example, the Office takes the position that the AI determines the results in the expressive elements of the generated poem like the rhyming pattern, the words in each line, and the structure of the text — not the human who entered the prompt. Later on in the Guide, the Office more explicitly states that entering a prompt (even if the prompt itself may be a creative, copyrightable expression) into an AI machine alone “does not mean that the material generated from a copyrightable prompt is itself copyrightable.” In other words, a registration applicant won’t be able to claim human authorship in AI generated materials or elements in the work by merely stating that they entered prompts into an AI machine.

Some applicants may have difficulty with the Office’s position, as there are arguments that entering certain types of prompts (either one, or many) into an AI machine may result in sufficient creative ideation and control which amounts to human authorship in the AI generated content. But based on the Guide and at least for now, it may be quite difficult for applicants to convince the Office otherwise for registration purposes.

The Office subsequently states that applicants must detail their creative choices made to the AI generated content, such as the creative selection, coordination, and arrangement of that content in the work being registered. For example, if an applicant incorporates AI-generated text into a larger textual work, the Office states that the applicant should “claim the portions of the textual work that is human-authored” (i.e., the creative elements not generated by an AI machine).

2. Applicants Must Disclaim AI-Generated Content in the Work that is More Than De Minimis

As noted previously, the Office also asks applicants to explain what parts of the work should be excluded from the registration application. In its normal registration practices, the Office does ask registration applicants to disclose such information, since the scope of copyright protection for a registered work does not extend to certain parts of a work, like previously published and public domain materials.

Because the Office takes the position that a registration does not cover materials resulting from non-human authorship (i.e., there is no copyright in AI generated content) the Office directs applicants to detail “AI-generated content that is more than de minimis to be excluded from the application.” The Office points applicants to its examination practices on the registration of unclaimable materials and definition of de minimis in its Compendium, which states:

“Creative authorship is deemed “de minimis” when a work does not contain the minimal degree of original, creative expression required to satisfy the originality test in copyright.”

In providing examples, the Office notes that materials like brief quotes and short phrases may fall into that de minimis category. So, for any AI generated materials in a work that falls above that threshold, the Office will look to applicants to describe those disclaimed materials so that the registration can reflect the scope of copyright in the AI-assisted work.

3. Works Containing AI Generated Materials Must Be Registered Using the Standard Application

One of the biggest takeaways from the Guide is that the Office is requiring applicants seeking to register the copyright in their AI-assisted works to only use the Standard Application form. In a footnote, the Office reasons that its other types of application forms currently don’t contain fields where applicants can disclaim unprotectable material, like AI-generated content. As an example, the Office notes that its Single Application “may only be used if ‘[a]ll of the content appearing in the work’ was ‘created by the same individual.’” Presumably, this also means that applicants cannot use the group registration options to register their works that have AI-generated elements. But as more creative communities utilize and develop AI into their works, the hope is that the Office will increase the flexibility of the registration system to accommodate for the registration needs of creative industries and creators who want to register the copyright in their AI-assisted works, beyond the Standard Application, such as those who frequently use the current group registration options.

Just Do Your Best!

If applicants are still unsure of how to fill out the application according to the guidelines the Office set forth, the Office also notes that applicants can simply provide a general statement that the work contains AI-generated material so that the registration examiner can follow up with the applicant. However, this can result in a prolonged examination time because correspondence between the applicant and the Office would lengthen the amount of time it takes for the Office to ultimately issue a copyright registration for the AI assisted work. Taking time to think through the human authorship expressions and AI generated materials will inevitably help an applicant to streamline the registration process. Applicants can also sigh in relief that even if they made a mistake on their application form, their registration won’t be invalidated in court. There are also opportunities to directly reach out the Office to correct pending applications and to file supplementary registration to correct granted registrations.

The Office continues to study the implications of AI in copyright law and will be holding a series of public listening sessions this spring. But with this Guide, they have set a few guideposts to how applicants could register the copyright in AI assisted works. Though authorship issues, especially in AI, can be confusing for applicants to think through, we hope some of the pointers above will help registration applicants along in the process.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

AI and Copyright: AI Policies Must Respect Creators and their Creativities

Post publish date: December 8, 2022

The exponential development of Artificial Intelligence (AI) systems represents a profound achievement of the digital age that brings with it tremendous opportunities. In fact, many in the creative community are already using or plan to use AI for the creation of a wide range of works and are developing and pushing AI capabilities to explore new creative horizons. But as with many advances in technology, these new opportunities come with challenges and often raise difficult legal questions.

Artists, authors, and many other types of creators are increasingly concerned (and rightly so) about the tendency of some to ignore or discount issues relating to copyright in the AI context. Though the application of copyright law to AI may be tricky at times, it is essential that these issues not be ignored or given short shrift in the AI discussion just because AI is the shiny new toy. Rather than sweep important copyright issues under the rug for fear of slowing AI’s progress, policymakers, lawmakers, stakeholders, and the public must respect the rights of creators and copyright owners and recognize and appreciate the underlying goals and purposes of our copyright system.

The Relationship Between AI and Copyright Law

Well before the recent explosion in AI development, the creative community has been using AI technologies and innovating in the space. On the output side of AI, creators and copyright holders use or actively develop AI technologies as part of their larger creative process. For example, video game developers utilize AI systems and technologies to provide new and improved gaming experiences, such as when players interact with in-game, non-player characters. Television and filmmakers also often incorporate computer generated images (CGI) into their works that are created with the help of AI programs. In the music industry, artists and producers are employing AI tools for everything from beat creation to voice modulation.

On the input side of AI, the primary way the creative community drives AI innovation and development is by creating and disseminating copyrighted works that are used to train AI systems. At best, the creative community collaborates with and innovates as a part of the AI community by developing and improving AI technologies, licensing works for use as training material, or using AI as a tool to generate or make new works. At worst, copyrighted works are used by AI technologies without authorization or licenses—sometimes for the purpose of creating works that serve as direct market substitutes for the ingested works—undermining the rights of artists, creators, and copyright owners and their abilities to protect, license, and enforce their copyrights.

Sadly, it is the latter situation that is becoming more widespread in the AI world where the works of countless creators and copyright owners are being used without permission or compensation. This especially true and especially harmful with commercial AI uses.

Using Copyrighted Works as AI Inputs

An AI’s creative output is only good as the corpus of creativities it ingests. For example, if a human prompts an AI machine to generate an image to accurately imitate the work of a famous visual artist, like Jean-Michel Basquiat, the AI machine must analyze and copy the expressions that are unique to Basquiat’s works. But what is sometimes lost on AI system developers and users is that the underlying works, which the AI draws from, are more often than not created by a human creator. That human creator depends on the rights and protections granted to them by copyright law to commercialize and control their works, including the ability to license their works for use as AI input (or to stop others from using their works without authorization).

Creators and copyright owners also make contributions to the development of AI technologies by priming copyrighted works for optimal AI application and development through activities like semantic enrichment, metadata tagging, content normalization and data cleanup. Copyright owner-curated and prepared AI training data sets, databases, or collections of works also feature additional benefits like secured licensing and permissions from third parties, which reduce privacy and infringement risks for AI developers and users. It is copyright laws which incentivize and protect the investments creators and copyright owners make when creating and preparing these kinds of works for AI.

Text-and-data mining (TDM) is the process through which AI machines develop their unique algorithms and capabilities by analyzing, reproducing, and otherwise using valuable data and expressions often contained in copyrighted materials. TDM uses input materials to develop trends, algorithms, and methods that can generate output works. During TDM, an AI machine might very well be analyzing numbers, statistics, and other non-copyrightable pieces of information. But AI machines that generate output like images, videos, or songs, risk displacing or substituting for the very underlying copyrighted works that are used to train the AI. During TDM of copyrighted works, the AI machines are often culling the expressive, copyrightable value contained in the ingested works, as mentioned previously with the Basquiat example.

Many creators and copyright owners currently offer TDM licenses so their works can be used to train AI systems. Copyright law enables creators and copyright owners to create works that are fuel for AI development, and there must be respect and recognition of the laws and rights that protect their ability to license (or not license) their works. Particularly where AI machine outputs serve as replacements or substitutes in the markets for the ingested works, any artificial disturbance or heavy-handed approaches to manipulate the existing market for TDM licenses for copyrighted works, without being supported by evidence, results in creators and copyright owners subsidizing AI development. When works are used without authorization, licensing markets—the fundamental ability of a copyright owner to control and commercialize their works—are effectively destroyed.

No Broad Exceptions or Justifications Exist for AI Use of Copyrighted Works

The disturbing reality is that many AI companies do not license copyrighted works to train AI machines. Nor are they transparent about the sourcing of their training data sets. Some companies simply scrape existing copyrighted content from the internet including images, text, and software code to use as training data sets. Other companies engage in a practice called “data laundering” where they fund or use data sets created by academic or research institutions for initially noncommercial purposes to train commercial AI machines.

In the discussion surrounding the use of copyrighted works to train AI systems, some stakeholders wrongly justify these practices with the argument that the fair use exception would wholesale permit such methods or could justify broad copyright exceptions for AI use. That view is inaccurate— especially in a case where a TDM license is available, the use is commercial, or the resulting AI generated work harms the actual or potential market for the ingested work. Fair use is such a fact-specific exception that it is an unreliable basis to build any broad AI exceptions on or to make general claims that these AI practices are excused from blatant copyright infringement.

AI and Copyright Laws Around the World

For many years now, lawmakers and policymakers in a number of countries, including the United States, have been carefully examining the intersection of copyright law and AI and the implications of this rapidly evolving technology. Even as a global leader in AI technologies, the United States has not deemed it necessary to enact or recommend any new exceptions to copyright law for AI purposes. And for good reason: licensing to support AI application is robust and without contrary evidence there is potential for significant harm by prematurely upending creators and copyright owners’ copyrights.

The United States is not alone in its treatment of the AI licensing market and its relation to copyright law. Very few countries have considered AI regulations and policies with respect to copyright laws including, Hong Kong, South Korea, Australia, and Canada. Significantly, each country has declined to take action, postponed decision making as premature, or otherwise not taken action. In varying degrees, only the European Union, Japan, Singapore, and the United Kingdom have AI policies and regulations within their copyright laws.

One example of a country with problematic AI-copyright regulations is Singapore, which overbroadly permits unauthorized TDM of copyrighted works, including for pirated works, for any purpose with no ability for rightsholders to opt out or contract around the exception. Policies such as Singapore’s severely undermine the fundamental ability of creators and rightsholders to be compensated for the use of their copyrighted works, discouraging them from creating the works and depriving them of the fruits of their labor. Unfortunately, the United Kingdom is also considering following this troubling precedent, with a proposed exception for TDM of copyrighted works for noncommercial and commercial uses, with no ability for creators and copyright owners to contract around the exception. Needless to say, many in the U.K. creative community have decried the proposal, and we can only hope that the U.K. will reverse course to avoid undermining a critical part of their economy.

As AI technologies continue to progress at a rapid pace and make incredible advancements, AI stakeholders, courts, policymakers, and the public should keep in mind several key principles when analyzing the intersection between AI and copyright.

  • When formulating new AI laws and policies, it is essential that the rights of creators and copyright owners be respected.
  • Long standing copyright laws and policies must not be cast aside in favor of new laws or policies obligating creators to essentially subsidize AI technologies.
  • Education is paramount in the AI space. Those leading AI projects are aware of the legal implications of using copyrighted works input material, and those that arise from AI-generated output

Along these lines, we the Copyright Alliance recently published our position paper on AI and copyright law issues that outline the above key principles and also points out some other detailed positions of the copyright community on AI. It is critical to AI innovation that the creative contributions and impact of the creative community to AI are acknowledged and that the foundations of copyright law that made AI possible in the first place are preserved and respected.


If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

>