California Federal Court Rejects AI Class Action Plaintiffs’ Cherry-Picking Of AI Algorithm Test Results And Orders Production Of All Results And Account Settings

By Gerald L. Maatman, Jr., Justin R. Donoho, and Brandon Spurlock

Duane Morris Takeaways: On June 24, 2024, Magistrate Judge Robert Illman of the U.S. District Court for the Northern District of California ordered a group of authors alleging copyright infringement by a maker of generative artificial intelligence to produce information relating to pre-suit algorithmic testing in Tremblay v. OpenAI, Inc., No. 23-CV-3223 (N.D. Cal. June 13, 2024). The ruling is significant as it shows that plaintiffs who file class action complaints alleging improper use of AI and relying on cherry-picked results from their testing of the AI-based algorithms at issue cannot simultaneously withhold during discovery their negative testing results and the account settings used to produce any results. The Court’s reasoning applies not only in gen AI cases, but also other AI cases such as website advertising technology cases.

Background

This case is one of over a dozen class actions filed in the last two years alleging that makers of generative AI technologies violated copyright laws by training their algorithms on copyrighted content, or that they violated wiretapping, data privacy, and other laws by training their algorithms on personal information.

It is also one of the hundreds of class actions filed in the last two years involving AI technologies that perform not only gen AI but also facial recognition or other facial analysis, website advertising, profiling, automated decision making, educational operations, clinical medicine, and more.

In Tremblay v. OpenAI, plaintiffs (a group of authors) allege that an AI company trained its algorithm by “copying massive amounts of text” to enable it to “emit convincingly naturalistic text outputs in response to user prompts.” Id. at 1. Plaintiffs allege these outputs include summaries that are so accurate that the algorithm must retain knowledge of the ingested copyrighted works in order to output similar textual content. Id. at 2. An exhibit to the complaint displaying the algorithm’s prompts and outputs purports to support these allegations. Id.

The AI company sought discovery of (a) the account settings; and (b) the algorithm’s prompts and outputs that “did not” include the plaintiffs’ “preferred, cherry-picked” results. Id. (emphasis in original). The plaintiffs refused, citing work-product privilege, which protects from discovery documents prepared in anticipation of litigation or for trial. The AI company argued that the authors waived that protection by revealing their preferred prompts and outputs, and asked the court to order production of the negative prompts and outputs, too, and all related account settings. Id. at 2-3.

The Court’s Decision

The Court agreed with the AI company and ordered production of the account settings and all of plaintiffs’ pre-suit algorithmic testing results, including any negative ones, for four reasons.

First, the Court held that the algorithmic testing results were not work product but “more in the nature of bare facts.” Id. at 5-6.

Second, the Court determined that “even assuming arguendo” that the work-product privilege applied, the privilege was waived “by placing a large subset of these facts in the [complaint].” Id. at 6.

Third, the Court reasoned that the negative testing results were relevant to the AI company’s defenses, notwithstanding the plaintiffs’ argument that the negative testing results were irrelevant to their claims. Id. at 6.

Finally, the Court rejected the plaintiffs’ argument that the AI company can simply interrogate the algorithm itself. As the Court explained, “without knowing the account settings used by Plaintiffs to generate their positive and negative results, and without knowing the exact formulation of the prompts used to generate Plaintiffs’ negative results, Defendants would be unable to replicate the same results.” Id.

Implications For Companies

This case is a win for defendants of class actions based on alleged outputs of AI-based algorithms. In such cases, the Tremblay decision can be cited as useful precedent for seeking discovery from recalcitrant plaintiffs of all of plaintiffs’ pre-suit prompts and outputs, and all related account settings. The court’s fourfold reasoning in Tremblay applies not only in gen AI cases but also other AI cases. For example, in website advertising technology (adtech) cases, plaintiffs should not be able to withhold their adtech settings (the account settings), their browsing histories and behaviors (the prompts), and all documents relating to targeted advertising they allegedly received as a result, any related purchases, and alleged damages (the outputs). As AI-related technologies continue their growth spurt, and litigation in this area spurts accordingly, the implications of Tremblay may reach far and wide.