What the New York Times' copyright infringement case means for AI models

Simon Newcomb
29 Dec 2023
Time to read: 4 minutes

The meaning of fair use in US copyright law will be examined by litigation over OpenAI's and Microsoft's use of copyrighted material, in a case that exposes the need for copyright reform in Australia.

The New York Times has sued OpenAI and Microsoft for copyright infringement over the use of articles and other content from the Times in training large language models and generating content from them.

The Times’ complaint, filed on 27 December, says that these AI tools “can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style" and also falsely attribute content to the Times. As a result, the Times says that the AI tools “undermine and damage The Times' relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue.”

This separation from customers (with associated loss of the ability to monetise content) and loss of control over content (with adverse reputational impacts) is at the heart of many creators’ concerns about use of their content by generative AI tools.

This case adds to several other cases currently before the courts over similar concerns. But it could prove to be one of the more important cases with very well resourced parties fighting over high stakes. There also appears to be some clear evidence of copying in the generative output (the complaint provides many examples of output that is a near identical copy of content from the Times), something which some of the plaintiffs in the other cases have found difficult to prove.

In terms of the specific legal issues, the core of the legal complaint by The Times is that its copyright has been infringed by:

  • building datasets for training containing millions of copies of The Times’ works
  • training the GPT models on that data
  • storing, processing and reproducing the GPT models, which have “memorized” the Times’ works
  • disseminating generative output containing copies and derivatives of the Times’ works.

The Times’ complaint also pursues some alternative legal avenues including that:

  • Microsoft is vicariously liable for OpenAI’s infringement by controlling, directing and profiting from it;
  • Microsoft is liable for assisting and contributing to the infringement by providing infrastructure and a range of other services and software;
  • end users may be liable for generating infringing output, so OpenAI and Microsoft are also liable for assisting and contributing to that;
  • the defendants have removed copyright management information;
  • there has been unfair competition by misusing and misappropriation of content leading to loss of advertising and referral revenue; and
  • there has been unauthorised use of trade marks in the generative output and also dilution of the trademarks by associating them with inaccurate content.

It seems hard to deny, based on the examples in the complaint, that there is infringing output from the GPT model. The bigger issue however is likely to be whether the creation of the model itself by collating the data and training a model with it is an infringement. That is a massive issue for the whole AI industry as it determines whether models can be created at all by training them on publicly available data without a licence.

A fundamental legal point for courts in resolving this issue in the Times’ case and in other AI copyright cases will be whether creating AI models with third party data constitutes “fair use” under US copyright law. The Times’ complaint anticipates this argument by the defence and rejects it by saying: “Publicly, Defendants insist that their conduct is protected as "fair use" because their unlicensed use of copyrighted content to train GenAl models serves a new "transformative" purpose. But there is nothing "transformative" about using The Times's content without payment to create products that substitute for the Times and steal audiences away from it. Because the outputs of Defendants' GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use.”

However, many US academics and commentators are arguing the opposite position on this issue: that training a model is fair use. With strong views on both sides, ultimately, a decision from a superior court is likely to be needed for the industry to accept a common position on this issue.

This case could provide an opportunity for a court to give some guidance on the issues around fair use of copyright works in the AI industry. However, the case follows unsuccessful attempts of the parties to negotiate a licence deal (in contrast to the Associated Press which reportedly has struck a deal with OpenAI for the use of its content). It’s quite possible that the case is designed by the Times to create additional leverage for future negotiations with OpenAI and Microsoft, and unfortunately (from the industry’s perspective in wanting clarity) the case may end up settling without a court needing to give a judgement on the issues.

How would this type of case play out in Australia? In Australia, we don’t have a “fair use” exception so this case would be considerably different here, with a much easier run for the Times to win. Our “fair dealing” exceptions to copyright only apply to narrow specific purposes, which would not apply to this case. For example, the “fair dealing for research or study” exception can apply to creating models for the purposes of some types of academic research or for the purposes of study by students (but not for creation of models by their teachers). There is also a fair dealing exception for “reporting the news” but that would not allow copying newspaper articles themselves when they are not the subject of the news being reported.

Our copyright law is therefore a major barrier to training AI models in Australia, at least where that requires training on a vast amount of content publicly available on the internet (and hence it’s not practical to negotiate individual licence agreements). A key issue for our policy-makers is whether we should change the law to introduce a broad “fair use” exception like the US (if in fact, it does allow training models) or another specific fair dealing exception targeted at AI (like the specific “text and data mining” exception in the UK). If we want to develop an AI industry in Australia, then law reform is likely to be needed.

In terms of remedies, the Times is seeking to hold OpenAI and Microsoft responsible for billions in damages, and orders from the court to stop further infringing conduct and to destroy the training datasets and large language models.

This case will be worth following as it evolves in the coming months.

Disclaimer
Clayton Utz communications are intended to provide commentary and general information. They should not be relied upon as legal advice. Formal legal advice should be sought in particular transactions or on matters of interest arising from this communication. Persons listed may not be admitted in all States and Territories.