The AI Update | April 23, 2024

#HelloWorld. In this issue, we zoom in on the world of AI model training, looking at both dataset transparency and valuation news. Then we zoom out, highlighting Stanford’s helpful summary of 2023 AI regulations and hot-off-the-press ethical guidance on AI use for lawyers from the New York State Bar. It may be a grab bag, but it’s one worth grabbing. Let’s stay smart together. (Subscribe to the mailing list to receive future issues.)

Making copyrighted AI training data more transparent. On April 9, Representative Adam Schiff (D-CA) introduced the “Generative AI Copyright Disclosure Act.” This short bill would require anyone who “creates” or “alters” a “training dataset” used to build a “generative AI system” to file a disclosure notice with the Copyright Office at least 30 days before the model’s release to consumers (or within 30 days after the Act taking effect, in the case of models that have already been released). The key requirement for the notice is that it provide “a sufficiently detailed summary of any copyrighted works” used in training. And the notices will be collected in a “publicly available online database” the Office will maintain. A few thoughts:

    • What does it mean for the summary to be “sufficiently detailed”? Unclear. The bill defers that issue to the Copyright Office, asking it to answer operational questions like these through implementing regulations to be issued within 180 days of the bill’s enactment. Of course, whether an individual work is copyrighted can be fiendishly complex, while the volumes of data used to train LLMs are notoriously vast, so some kind of middle ground will have to be found.
    • The bill’s definition of “artificial intelligence” is among the broadest statutory definitions we’ve seen to date—any “automated system designed to perform a task typically associated with human intelligence or cognitive function.” This arguably covers any image recognition system (vision seems like a task typically associated with human intelligence) or optical character recognition system (so too is reading a document). But the bill goes on to limit its scope to generative AI systems that are “designed for use by consumers.”
    • The penalty for noncompliance appears modest—“a civil penalty in an amount not less than $5,000”—with no requirement that this fine be assessed per occurrence or per day of noncompliance.

More AI training data valuation data points. Speaking of AI training data, according to recent reports from Reuters, the licensing market for high-quality text and images continues to mushroom. For instance, Reuters reveals that:

    • Photobucket, a largely forgotten photo-sharing platform that once powered Friendster and Myspace, is in talks with LLM developers to license 13 billion images and videos for between $0.05 and $1 per photo and over $1 per video;
    • Over the past few years, big tech companies like Apple, Google, and Amazon entered into multiyear deals with Shutterstock valued at between $25 and $50 million;
    • Freepik, another photo platform, licensed most of its 200 million images to two unidentified tech companies for between $0.02 and $0.04 per photo; and
    • Defined. ai, an online training data licensing marketplace, said that tech companies typically license at between $1 and $2 for an image, $2 and $4 for a short-form video, $100 and $300 for an hour of film, and $0.001 for each word of text.

The takeaway? If you have a trove of high-quality text, image, or video data, maybe your business crystal ball should show some deal-making in your future.

Stanford’s summary of 2023 AI policy and regulation. Each year at the beginning of the second quarter, Stanford University’s Human-Centered Artificial Intelligence (HAI) institute releases its Artificial Intelligence Index Report, comprehensively summarizing the prior year’s AI-related developments across all areas—R&D, technical performance, safety and responsibility, education, policy and governance, and the like.

This year’s 500-page report, released April 15, is predictably packed with insights and useful information. While the entire report is worth a read (who are we kidding? a deliberate skim), we suggest you pay particular attention to Chapter 7: Policy and Governance, starting on page 366. In this 40-page section, the report recaps the “significant AI policymaking events in 2023,” with a particular focus on the US and EU. Among the interesting morsels: AI-related regulations in the U.S. rose “significantly in the past year and over the last five years”—from “just one” in 2016 to 25 in 2023. Similarly, 2023 “witnessed a remarkable increase in AI-related legislation at the federal level, with 181 bills proposed, more than double the 88 proposed in 2022.”

New York lawyers receive AI ethics guidance. Lawyers like to regulate themselves too. This month, the New York State Bar Association’s Task Force on Artificial Intelligence issued its Report and Recommendations. These recommendations follow similar guidance from the Florida Bar in January 2024 and the California Bar in November 2023.

The heart of New York’s report appears on pages 57-60, in a chart of “AI & Generative AI Guidelines,” which maps practical recommendations to each attorney professional ethical rule. Some of the guidance is familiar: Under Rules 5.4 (professional independence) and 5.5 (unauthorized practice of law), lawyers are cautioned to maintain “human oversight” and “independent judgment” when using AI legal tools. But other recommendations are thought-provoking: Under Rule 1.5 (fees), for instance, the Task Force suggests that the failure to use an AI legal tool may weigh against the reasonableness of a fee “[i]f the Tools would make your work on behalf of a client substantially more efficient.”

What we’re reading. Finally, in the here’s-something-interesting-but-ultimately-unsurprising-when-you-think-about-it department, Harrity Analytics, a patent and IP data analytics provider, released the “AI Patent 100”—its list of the top 100 companies who secured patents on AI-related inventions from the US Patent and Trademark Office in 2023. Topping the list was IBM with 1,211 US patents, and rounding out the top five were Alphabet, Samsung Electronics, Amazon, and Microsoft. You can find the complete list at the link above.

What should we be following? Have suggestions for legal topics to cover in future editions? Please send them to We’d love to hear from you and continue the conversation.

Editor-in-Chief: Alex Goranin

Deputy Editors: Matt Mousley and Tyler Marandola

If you were forwarded this newsletter, subscribe to the mailing list to receive future issues.

© 2009- Duane Morris LLP. Duane Morris is a registered service mark of Duane Morris LLP.

The opinions expressed on this blog are those of the author and are not to be construed as legal advice.

Proudly powered by WordPress