The AI Update | May 31, 2023 – The Artificial Intelligence Blog

#HelloWorld. In this issue, we head to Capitol Hill and summarize key takeaways from May’s Senate and House Judiciary subcommittee hearings on generative AI. We also visit California, to check in on the Writers Guild strike, and drop in on an online fan fiction community, the Omegaverse, to better understand the vast number of online data sources used in LLM training. Let’s stay smart together. (Subscribe to the mailing list to receive future issues.)

Printing press, atomic bomb—or something else? On consecutive days in mid-May, both Senate and House Judiciary subcommittees held the first of what they promised would be a series of hearings on generative AI regulation. The Senate session (full video here) focused on AI oversight more broadly, with OpenAI CEO Sam Altman’s earnest testimony capturing many a headline. The House proceeding (full video here) zeroed in on copyright issues—the “interoperability of AI and copyright law.”

We watched all five-plus hours of testimony so you don’t have to. Here are the core takeaways from the sessions:

- If you’re using generative AI tools to create output for public distribution, you might as well start labeling it as having been AI generated. At both the Senate and House sessions, every legislator and witness that addressed the subject agreed that generated content should be clearly identified to users—as Altman put it, “people need to know if they are talking to an AI.”Expanding on this theme, Senator Blumenthal (D-CT) returned repeatedly to the idea of “nutrition labels” that would specify the training data used and include third-party certifications attesting to the reliability of the AI model and its output.

- In the Senate hearing, large-scale disinformation and election interference were the top-of-mind dangers, but little discussion took place about specific regulatory or technological measures to combat those risks. Rather, much of the talk was about creating a new AI-dedicated agency, empowered to license the commercial deployment of the highest-risk AI systems—Altman suggested they could be identified by the amount of compute used or by specific capabilities they would have, like a “model that can persuade, manipulate, [or] influence a person’s behavior or a person’s beliefs” or that “can create novel biological agents.” Senators and witnesses analogized at points to the International Atomic Energy Agency, a motor vehicle bureau, and a “Digital Commission” to govern online platforms that Senator Peter Welch (D-VT) proposed last year. Senator Hawley (R-MO) colorfully wondered if generative AI would be more like a “printing press” or an “atomic bomb.” But the regulatory analogy that seemed most realistic came from NYU Professor Gary Marcus, who invoked the FDA. Just as new drugs are not released absent robust preclinical and clinical trials (and post-release surveillance), so too AI systems would be subject to “clinical trial like” assessments before widespread deployment, coupled with close post-deployment monitoring.

- Section 230 also surfaced repeatedly throughout the Senate session, with at least four senators (Blumenthal, Durbin, Klobuchar, and Graham) vowing not to “repeat” the mistake of lightly regulating social media platforms and protecting them from litigation liability. The dominant sentiment was that Section 230 as currently written should not apply in most cases to AI model developers. Even Altman seemingly agreed that Section 230 was presumptively inapplicable to OpenAI, but cleverly kept open the possibility of crafting some sort of alternative protection scheme—a “new approach” for generative AI beyond Section 230.

- On the copyright side (the focus of the House session), legislators seemed to have little appetite for the argument that a blanket fair-use exception should protect the use of copyrighted books, music, images, and other works in the training of LLMs and other foundation models. In his opening statement, composer Ashley Irwin of the Society of Composers and Lyricists spoke of the “fundamental three C’s” for creators—“consent, credit, and compensation”—and this mantra surfaced time and again throughout the House hearing. While the sentiment for compensating creators for AI training use was strong, how to calculate the payments and under that circumstances to make them remained unanswered and will no doubt prove fiendishly difficult. Some witnesses and legislators worried that a Spotify-like scheme would be near-impossible to implement (because copyrighted works when trained are not publicly performed in the same way songs are) and, even if implemented, would undercompensate creators. Other creators—musician Dan Navarro most notably—insisted that any collective licensing scheme not be compulsory, that authors should have the right to opt-out of having their work used in an AI system. It’s no understatement to say that much further work needs to be done to put any solution into operation, even if the broad principles of “consent, credit, and compensation” are agreed upon.

The Writers Guild strike: Speaking of compensating creators in this AI age, on May 2, the Writers Guild of America (WGA), representing TV and movie writers, went on strike as part of its ongoing labor dispute with TV and film producers. The potential use of generative AI is one of the core areas of disagreement. The WGA proposed restrictions on studios’ use of generative AI: No writing or rewriting of scripts, treatments, or outlines and no generating of source material for writers to use (writers are typically paid a lower rate to work with existing source material). A future compromise would need to get into the details of how writing credits are assigned and how writers would be compensated writers when generative AI is involved in the creative process.

What we’re reading: It’s no secret that large language models like ChatGPT were trained on truly massive amounts of data scraped from the open web—so much data that it’s hard to get your head around without concrete examples. Here’s a fun one: There’s an online fan fiction community—the “Omegaverse”—known for its elaborate and idiosyncratic collection of sexual stories and tropes. And, according to this recent piece from Wired magazine, GPT models know all about it and can complete texts with specific phrases and concepts unique to the Omegaverse, a clear sign that the community’s websites were scraped and fed into the models for training.

What should we be following? Have suggestions for legal topics to cover in future editions? Please send them to AI-Update@duanemorris.com. We’d love to hear from you and continue the conversation.

Editor-in-Chief: Alex Goranin

Deputy Editors: Matt Mousley and Tyler Marandola

If you were forwarded this newsletter, subscribe to the mailing list to receive future issues.