The AI Update | August 10, 2023 – The Artificial Intelligence Blog

#HelloWorld. In this issue, the state of state AI laws (disclaimer: not our original phrase, although we wish it were). Deals for training data are in the works. And striking actors have made public their AI-related proposals—careful about those “Digital Replicas.” It’s August, but we’re not stopping. Let’s stay smart together. (Subscribe to the mailing list to receive future issues.)

States continue to pass and propose AI bills. Sometimes you benefit from the keen, comprehensive efforts of others. In the second issue of The AI Update, we summarized state efforts to legislate in the AI space. Now, a dedicated team at EPIC, the Electronic Privacy Information Center, spent all summer assembling an update, “The State of State AI Laws: 2023,” a master(ful) list of all state laws enacted and bills proposed touching on AI. We highly recommend reading their easy-to-navigate online site, highlights below:

- Five states (California, Colorado, Connecticut, Virginia, and Utah) had or will have AI laws going into effect this year, all as part of their broader consumer and data privacy acts. At least 15 other states are introducing or debating bills mirroring that privacy-oriented approach.
- Two key features of most of these privacy-based regimes: (1) giving consumers the right to opt-out of being profiled or analyzed by an AI tool and (2) requiring data protection assessments of automated decision-making AI tools.
- In contrast to these generalized approaches, many other proposed bills are hyper-focused. A few examples: Illinois introduced an “Anti-Click Gambling Data Analytics Collection” bill that would prohibit collecting data from online gamblers “with the intent to predict” how they might gamble in the future. Rhode Island has a bill on AI use in sports betting. And New York would bar film production companies seeking state tax credit from using AI (“synthetic media”) to replace humans (“any natural person”) in a movie production.

Training data in the spotlight. We all know by now that LLMs like GPT, Claude, and LaMDA are powered through ingestion of massive text datasets. And studies show the higher the text quality, the better the model performance. Unsurprisingly, many content owners are uncomfortable with allowing massive uptake of their data absent “consent, credit, and compensation.” Time for deal-making: Recent stories show that negotiations over training date licensing are happening—but sometimes the price may be too high.

Stack Overflow is a site built to help people (mostly software developers) post questions and get answers from their peers. In olden days, Stack Overflow offered open-source access to its Q&A datasets, and LLMs like GPT trained on this freely available data. Now, Business Insider reports, the site has witnessed marked decline in traffic—many users are getting answers directly from their LLMs—and is switching its business model: Stack Overflow “wants to be paid for its training data.” The most interesting tidbit is the back-of-the-envelope calculation described in the piece: One quality answer is valued at $250; assume 10,000 quality answers per week; that comes out to an LLM developer accessing the entire Q&A corpus for $130 million annually.

On the music side, word is out that Universal Music Group, one of the “Big Three” record labels, is in talks with Google to license out the voices, likeness, and recordings of UMG artists for use in AI-generated song creation. Remember “Deep Fake Drake”? In broad strokes, a deal here would allow licensed and authorized versions, free of copyright, right of publicity, and other IP issues. Obviously, any eventual agreement would be massive and precedent-setting, but reportedly discussions are still early and public details are scant.

At least those deals still may happen. On the flip side, Gizmodo reports that Prosecraft.io, a six-year old site providing summary statistics for over 25,000 books (metrics like total word count, vividness, number of adverbs used, and the like) just shut down voluntarily. The cause: authors learned the site had trained on their copyrighted material without permission or payment and took to X (née Twitter) to decry that non-consensual use. In terms of financial resources for licensing, Prosecraft is no Google, but its founder ended his good-bye message with an optimistic touch: “In the future, I would love to rebuild this library with the consent of authors and publishers.”

What we’re reading: Like a moth and its flame, The AI Update can’t resist a good AI-related contract. The striking actors’ union, SAG-AFTRA, has made public the interim agreement proposals it offered producers. Exhibit B (pages 49-52 here) is where the AI action is and it’s all about the creation of “Digital Replicas” of performers. Exhibit B includes terms to address the actors’ chief concern, that producers will create a “Replica” for one film (paying the actor once), but then reuse the “Replica” widely. Thus, Exhibit B requires things like “clear and conspicuous” consent to synthetize a “Replica” and restrictions on the films where a “Replica” can be employed. The proposal doesn’t address the status of replicants, but no problem; they are reportedly occupied with attack ships on fire off the shoulder of Orion.

What should we be following? Have suggestions for legal topics to cover in future editions? Please send them to AI-Update@duanemorris.com. We’d love to hear from you and continue the conversation.

Editor-in-Chief: Alex Goranin

Deputy Editors: Matt Mousley and Tyler Marandola

If you were forwarded this newsletter, subscribe to the mailing list to receive future issues.