The AI Update | June 29, 2023 – The Artificial Intelligence Blog

#HelloWorld. In the midst of summer, the pace of significant AI legal and regulatory news has mercifully slackened. With room to breathe, this issue points the lens in a different direction, at some of our persistent AI-related obsessions and recurrent themes. Let’s stay smart together. (Subscribe to the mailing list to receive future issues.)

Stanford is on top of the foundation model evaluation game. Dedicated readers may have picked up on our love of the Stanford Center for Research on Foundation Models. The Center’s 2021 paper, “On the Opportunities and Risks of Foundation Models,” is long, but it coined the term “foundation models” to cover the new transformer LLM and diffusion image generator architectures dominating the headlines. The paper exhaustively examines these models’ capabilities; underlying technologies; applications in medicine, law, and education; and potential social impacts. In a downpour of hype and speculation, the Center’s empirical, fact-forward thinking provides welcome shelter.

Now, like techno-Britney Spears, the Center has done it again. (The AI Update’s human writers can, like LLMs, generate dad jokes.) With the European Parliament’s mid-June adoption of the EU AI Act (setting the stage for further negotiation), researchers at the Center asked this question: To what extent would the current LLM and image-generation models be compliant with the EU AI Act’s proposed regulatory rules for foundation models, mainly set out in Article 28? The answer: None right now. But open-source start-up Hugging Face’s BLOOM model ranked highest under the Center’s scoring system, getting 36 out of 48 total possible points. The scores of Google’s PaLM 2, OpenAI’s GPT-4, Stability.ai’s Stable Diffusion, and Meta’s LLaMA models, in contrast, all hovered in the 20s.

How do you value training datasets? Much of the news these past two weeks has been about negotiations over data usage and payments. For instance, the Financial Times reported on June 17 that, in the last few months, major Big Tech AI players like OpenAI, Google, Microsoft, and Adobe have met with representatives of News Corp., The New York Times, The Guardian, and Axel Springer to discuss a “subscription-style fee” for using the news organizations’ content to train generative AI models. These talks are still in the “early stages,” but “one number that had been discussed by the publishers is $5mm-20mm a year, according to an industry executive.”

These early reported figures feel substantial—especially when measured against streaming royalties in the music industry, where an artist probably needs around 10 million streaming plays on Spotify to generate only $50K. On the other hand, news content from The New York Times and other leading publishers is among the highest-quality text you can get to train an LLM, so one’s conscience is hardly shocked at the sums reported. We’ll keep close watch, since any agreed-to financial terms should provide important benchmarks for valuing training data in other cases. In general, if you’re interested in developing a framework for data valuation, we recommend Data Leverage by Christian Ward and James Ward—especially chapter 5’s discussion of four “buckets” of data: a “$0 Bucket,” a “$10K Bucket,” a “$100K Bucket,” and a “$1M+ Bucket.”

Section 230’s outer boundaries. It wouldn’t be an AI Update without Section 230 popping in to say hello. Past issues have spotlighted the ongoing debate over whether this online safe harbor immunizes synthetic content generated by LLMs and foundation models, or should be adapted to provide that protection. On June 14, Senators Richard Blumenthal (D-CT) and Josh Hawley (R-MO) introduced a bill whose title tells you most of what you need to know about their stance: the “No Section 230 Immunity for AI Act.” In this Congress, the chances that any bill will be enacted—let alone one addressing a niche subject matter—remain low. But the bill, from two of Capitol Hill’s loudest voices on the subject, does send a strong signal that confirming (or extending) Section 230 protection for generative AI outputs will be no small task.

Umm, we’re not sure we’d be human under this test. Finally, we couldn’t help but smile at Business Insider’s June 20 report that Mustafa Suleyman, co-founder of DeepMind, the British AI company famous for its world-champion-beating AlphaGo program, offered a new way to discern between human and artificial intelligence. The Turing test from the 1950s was the originally proposed assessment but now is widely considered outdated. Hilariously, Suleyman suggests “a new Turing test in which” the AI “receives a $100,000 seed investment and has to turn it into $1 million.” God speed to you on your quest, GPT-5, but if you’re like us real humans here at The AI Update, you may want to start prepping law school applications as a fallback.

What should we be following? Have suggestions for legal topics to cover in future editions? Please send them to AI-Update@duanemorris.com. We’d love to hear from you and continue the conversation.

Editor-in-Chief: Alex Goranin

Deputy Editors: Matt Mousley and Tyler Marandola

Subscribe to the mailing list to receive future issues.