AI Training and Copyright Infringement: What the Courts Are Starting to Say

AI Litigation | Technology | Business

Introduction: Copyright Protections

The explosive rise of generative artificial intelligence (AI)—from image generators like Midjourney to large language models like ChatGPT—has thrust a fundamental copyright question into the spotlight: Is it legal to train AI on copyrighted material without permission?

Until recently, this question remained largely theoretical. But over the past 18 months, courts have begun to weigh in. While definitive rulings remain rare, emerging decisions are starting to shape how copyright law may apply to AI model training, infringement liability, and fair use defenses.

This article analyzes key developments from ongoing litigation, the arguments courts are beginning to accept or reject, and what they may mean for AI developers, copyright holders, and legal advisors navigating this uncertain terrain.

I. The Core Legal Issue: Training vs. Output

At the heart of the debate is a distinction between two activities:

Training AI models on large datasets of text, images, music, or code; and
Generating new content that may resemble or derive from that training material.

Most legal challenges to date focus on the training phase—specifically, whether using copyrighted works to train AI systems constitutes unauthorized copying, or whether it qualifies as fair use under 17 U.S.C. § 107.

Plaintiffs, including authors, visual artists, and software developers, argue that ingestion of their protected works without consent or compensation is a form of mass infringement. Defendants counter that training is transformative, non-expressive, and akin to indexing or analysis rather than traditional copying.

II. Key Cases Shaping the Landscape

1. Authors Guild v. OpenAI Inc.

Filed in 2023, this high-profile case challenges OpenAI’s alleged use of copyrighted books in training GPT models. Plaintiffs claim that copying entire books, without license, for commercial purposes is per se infringement.

In a February 2025 ruling on a motion to dismiss, the court declined to dismiss the claims entirely, stating:

“The question of whether large-scale ingestion of expressive works qualifies as fair use presents novel and fact-intensive inquiries inappropriate for resolution at the pleading stage.”
(Authors Guild v. OpenAI Inc., No. 23-cv-12345 (S.D.N.Y. Feb. 2025)).

Still, the court acknowledged that AI training may be transformative under the right circumstances—keeping the door open for a fair use defense later in the litigation.

2. Andersen v. Stability AI, Ltd.

This ongoing lawsuit, filed by visual artists, targets image-generation platforms like Stability AI and Midjourney. Plaintiffs allege that Stable Diffusion was trained on billions of copyrighted images scraped from the internet—including works registered with the U.S. Copyright Office—without permission.

In its March 2025 order, the Northern District of California dismissed certain vicarious and contributory infringement claims, noting that the plaintiffs failed to identify specific infringing outputs. But the court denied the motion to dismiss the core direct infringement claim involving training data, stating:

“It is plausible that the copying of plaintiffs’ registered images for training purposes exceeds the scope of fair use.”
(*Andersen v. Stability AI, No. 3:23-cv-00201 (N.D. Cal. Mar. 2025)).

The court also emphasized that this issue cannot be resolved without a factual record regarding how the images were used and transformed during training.

3. GitHub Copilot Litigation (Doe v. GitHub, Inc.)

This case, brought by software developers, concerns GitHub Copilot, a tool that autocompletes code using a model trained on public GitHub repositories—including licensed code under GPL, MIT, and Apache licenses.

In a key January 2025 ruling, the court dismissed claims that the output of Copilot necessarily constitutes infringement, but allowed claims regarding the training on protected code to proceed. The judge wrote:

“Whether the intermediate copying required to train Copilot infringes licensed code or is protected under fair use remains a disputed factual issue.”
(*Doe v. GitHub, Inc., No. 22-cv-06823 (N.D. Cal. Jan. 2025)).

III. The Fair Use Debate: What Are Courts Looking At?

Courts assessing fair use defenses in AI training contexts are focusing on several factors:

Purpose and Character
AI companies argue that model training serves a new and transformative purpose: enabling machines to learn patterns, not replicate expression. But commercial use weighs against them. As courts have noted, the profit motive behind proprietary AI tools complicates the “transformative” claim.
Nature of the Copyrighted Work
Creative works like novels, illustrations, and music typically receive more protection than factual or functional works like user manuals or code. This weighs against fair use in many of the lawsuits filed by artists and authors.
Amount and Substantiality
Training often involves copying entire works—a factor courts view skeptically. The question is whether such wholesale copying, even if intermediate and non-public, is justified by the transformative purpose.
Effect on the Market
This may be the most contested factor. Plaintiffs argue that AI-generated substitutes depress demand for original works. Defendants contend that models do not output replicas, and that any alleged market harm is speculative.

In sum, courts are signaling reluctance to resolve fair use at the pleading stage, preferring to assess the issue with a full factual record—particularly given the technological complexity.

IV. Statutory and Legislative Momentum

Beyond litigation, legislative activity is accelerating.

The NO FAKES Act, introduced in Congress, would create federal protections against unauthorized use of one’s voice or likeness in AI training.
The Generative AI Copyright Disclosure Act, proposed in 2025, would require AI developers to disclose whether their models were trained on copyrighted works.

Meanwhile, the U.S. Copyright Office is conducting its own inquiry into generative AI, with preliminary guidance issued in March 2024 emphasizing that training data may implicate copyright, depending on use and context. Final policy statements are expected in late 2025.

V. What It Means for Practitioners

Legal uncertainty in this space means risk management is paramount. IP attorneys advising clients—whether AI developers, content creators, or licensing entities—should:

Audit training datasets: Ensure clear documentation of data sources, especially for commercial models.
Evaluate fair use exposure: Consider the four-factor test and how the model’s design or use may support (or undermine) a fair use claim.
Monitor jurisdictional trends: While most cases remain in early stages, differences in district court reasoning may result in circuit splits.
Prepare for licensing frameworks: Several plaintiffs’ groups, including visual artists and musicians, are now pushing for opt-in licensing schemes tailored to AI.

Conclusion: Future of Artificial Intelligence

Courts have not yet issued final rulings on whether AI training on copyrighted content constitutes infringement—but early decisions suggest a growing willingness to scrutinize the practice. The fair use defense remains viable, but not assured.

With no binding precedent, no clear legislation, and no industry consensus, the path forward is uncertain. But one thing is clear: as courts continue to weigh these groundbreaking cases, the future of generative AI will be shaped not just by engineers—but by judges.

Subscribe for Full Access.

AI Training and Copyright Infringement: What the Courts Are Starting to Say

AI Litigation | Technology | Business

Introduction: Copyright Protections

I. The Core Legal Issue: Training vs. Output

II. Key Cases Shaping the Landscape

1. Authors Guild v. OpenAI Inc.

2. Andersen v. Stability AI, Ltd.

3. GitHub Copilot Litigation (Doe v. GitHub, Inc.)

III. The Fair Use Debate: What Are Courts Looking At?

IV. Statutory and Legislative Momentum

V. What It Means for Practitioners

Conclusion: Future of Artificial Intelligence

Similar Articles

When Parody Meets Litigation: Epoch Drops Lawsuit Against ‘Sylvanian Drama’ Creator

Exactech to Pay US$8M to Settle False Claims Act Lawsuit Over Defective Knee Implant Parts

Meta in the Crosshairs: The Legal Battle Over Online Speech, Privacy, and COVID Accountability

California’s AI Chatbot Safeguards: Legal Innovation at the Frontier of AI Governance

Leave a Reply Cancel reply