The AI Update | January 11, 2024 – The Artificial Intelligence Blog

#HelloWorld. It’s 2024 and we… are…back. Lots to catch up on. AI legal developments worldwide show no signs of letting up, so here’s our New Year’s resolution: We’re redoubling efforts to serve concise, focused guidance of immediate use to the legal and business community. Let’s stay smart together. (Subscribe to the mailing list to receive future issues.)

All the news that’s fit to model. The current wave of generative AI copyright litigation began in November 2022 with Doe v. Github in the Northern District of California. But it crested right after Christmas last month when the New York Times filed a many–headline–grabbing suit in the Southern District of New York. The suit charges OpenAI and Microsoft with infringing copyrights in “millions” of Times articles.

The complaint raises legal questions that are in many ways similar to those in the dozen or so other AI copyright suits filed throughout 2023—for example, is it fair use to train a large language model on a content corpus that includes copyrighted works? But aside from the outsize media attention, here are three ways the New York Times’ suit is different:

- Unlike some of the earlier plaintiffs, the Times appears to have registered with the Copyright Office all the works at issue before the alleged infringement began. This would make the Times eligible for statutory damages ranging from $750 to $30,000 per work—and the complaint attaches thousands of pages of charts listing “over 3 million” articles the Times owns.
- One part of the complaint focuses on a specialized “retrieval augmented generation” process. In general, RAG occurs at the time the user prompts a model for a response (not earlier in LLM training) and typically surfaces substantial chunks of text that are then run through the already-trained model. Unlike basic LLM training in general, this RAG process in particular further complicates some of the debate around whether it’s fair use to train a model on copyrighted content—because RAG occurs after training and typically homes in on fewer documents than are used to train the model in the first place.
- It’s clear from attachments to the complaint (especially Exhibit J) that many of the challenged LLM outputs resulted from highly specific and purpose-driven user prompts. For instance, in some cases, the Times’ representatives pasted in paragraphs from an article and asked the model to reproduce the remaining parts. This raises an important question: In such situations—where an individual user intentionally seeks to induce substantial copying—who in the end should be held accountable? Expect the allocation-of-responsibility question to be a key battleground as the case proceeds. In fact, only a few days ago, OpenAI publicly responded to the suit by alleging that the Times “is not telling the full story”; it “seems they intentionally manipulated prompts… in order to get our model to regurgitate… our models don’t typically behave that way.”

But how much is that content really worth? As the Times’ complaint acknowledges, the lawsuit is an outgrowth of content-licensing discussions with OpenAI that began in April 2023 and reached an apparent impasse sometime in December. One side benefit of the Times’ filing suit: Those of us obsessed with content valuations for AI use were treated to behind-the-scenes reports about negotiations between news providers and major AI tech developers—including some of the numbers exchanged. On January 4, The Verge had this handy recap: OpenAI has been offering “between $1 million and $5 million a year to license copyrighted news articles to train its AI models” while Apple “is offering at least $50 million over a multiyear period,” but apparently for a broader scope of use. And back in December, it was reported that OpenAI had signed a deal with Axel Springer, the German owner of web properties including Politico and Business Insider, valued at “tens of millions of euros” over three years.

Boards may soon need to speak up on AI. On the corporate governance front, in a pair of parallel January 3 notice letters, the SEC informed Disney and Apple they would have to allow an AI-focused shareholder proposal (pushed by the AFL-CIO) onto their respective annual meeting proxy statements. The proposal seeks a vote on whether each company should be required to “prepare a transparency report on the company’s use of Artificial Intelligence” and to “disclose any ethical guidelines that the company has adopted regarding the company’s use of AI technology.” Interestingly, the SEC determined that these subjects “transcend ordinary business matters.” Maybe one day, heck maybe soon, AI tools will be as entrenched in the enterprise as Microsoft Word. But not just yet, according to our chief securities regulator. Nonetheless, time to revisit those AI use policies….

Lest we forget our friends in Europe. We may have been away in December, but the European Union was not. On December 9, the European Parliament and Council reached “provisional agreement” on the EU AI Act, in the making for almost three years. This important, intricate legislation does not lend itself to The AI Update’s punchy summarization and, in any event, technical details are still being hammered out, the final text has not been released, and formal adoption remains. But rest easy, we’ll continue to keep close watch throughout 2024.

What we’re reading. Before we go, here’s a fun experiment: Can you induce a chatbot (in an artificial environment, of course) to make “illegal trades based on insider information” and then “strategically hide” that conduct in its answers, even when said bot was previously “trained to be helpful, harmless, and honest”? The recent answer from three safety researchers apparently is yes; their secret sauce was prompting the bot with simulated “high-pressure” emails. Read the details here, while we hope with you that this merely is a useful safety-promoting exercise—and not the dawn of GPT-Boesky.

What should we be following? Have suggestions for legal topics to cover in future editions? Please send them to AI-Update@duanemorris.com. We’d love to hear from you and continue the conversation.

Editor-in-Chief: Alex Goranin

Deputy Editors: Matt Mousley and Tyler Marandola

If you were forwarded this newsletter, subscribe to the mailing list to receive future issues.