AI has a copyright problem

In the late 1990s, a new kind of browser-based file sharing software platform emerged called Napster. Almost immediately folks began sharing copyrighted music and videos in numbers never previously seen. This wasn’t sharing cassettes and VHS tapes with friends and family, it was instant access to content at internet scale. The entertainment industry was not amused, but even with an army of lawyers, they were ill prepared to stop this level of activity.
Today, we are having a new Napster moment with generative AI. We’ve already been debating whether large language model companies with valuations in the tens of billions should be allowed to simply scrape other people’s work without permission to train their models. It would be easy to say, “hell no,” but it’s also probably not that simple, just as it wasn’t in the early days of the internet.
Let’s go back to the early 2000s for a second. In his 2008 book, Remix, Lawrence Lessig tells the story of a woman who posted a video on YouTube, a fledgling video sharing platform at the time, of her toddler dancing to a Prince song. Universal, owners of the Prince catalog, didn't find it cute or funny and chose to interpret copyright law rigidly by issuing a take-down notice.

Last week, OpenAI, a company with a shiny new valuation of $300 billion, launched a new image generator, one that appeared to have been trained on content from Japanese animation company Studio Ghibli. Suddenly, the internet was awash in Ghibli-style images. It didn’t take long for questions to arise about whether OpenAI had the right to give people this capability without that artist’s explicit permission. Unlike the entertainment industry in the 1990s, Studio Ghibli has chosen to remain quiet about it, at least for now.
A brave new world
Still, we can draw a line between those two moments. Just as the internet and remixing challenged the notion of copyright in the late 90s and early part of this century, we are experiencing a similar moment today, and we have to decide what’s right here. As Lessig wrote in Remix, he was not in favor of killing copyright, only making it more flexible for a changing world. “Copyright is, in my view at least, critically important to a healthy culture. Without it, we would have a much poorer culture. With it, at least properly balanced, we create the incentives to produce great new works that would otherwise not be produced.”

Last week, Cassie Kozyrkov, who writes and speaks frequently about AI, raised the questions of whether she should or shouldn't post a Ghibli-ized image of herself on LinkedIn. My gut reaction was for her not to, but as she later wrote in her blog: “Since only specific works are copyrightable, style is not. As far as I know, I’m not violating copyright by making myself a Ghibli character, but OpenAI probably is. (Unless they got the studio’s permission.)”
That seems unlikely, and somebody probably violated someone’s copyright here, but as M.G. Siegler wrote on his SpyGlass blog, artists can’t stop this shift anymore than the big entertainment corporations could back in the day. “My own viewpoint right now is that artists should engage in such debates, not about how to stop this – that's folly – but about how to figure out the right monetization mechanism for this new world.” Remember Napster gave rise to streaming apps where at least someone got paid (although who and how well is still open to debate).
Does copyright still matter?
Perhaps it’s a moral question, as well as a legal one. When it comes to a major record label taking on a mom and her toddler, most people will back the mom. But when it comes to a beloved artist like Studio Ghibli or even Disney versus, say, an indie artist, it’s more complicated. Large entities have creation rights too, but the indie artist probably has more at stake.
So does copyright even matter in the age of generative AI or do we need a new set of rules? In his Risk Gaming blog, Danny Crighton wondered if the notion of plagiarism, using other’s ideas without permission or appropriate citation, is under assault in the age of AI when LLM crawlers do this as part of their job. “I wrote last week about how autonomous AI will increasingly become central to art, but one of the interesting side effects of this new medium is that AI models will inevitably reuse the work of others in generating those experiences,” Crighton wrote. And they’ll do it presumably without permission or attribution.

So it appears that technology has taken us to a place where the idea of copyright and content ownership could be shifting yet again, or least be open to reinterpretation. It’s worth noting that many people who favor a free and open internet and encourage sharing, people like open source advocates and Creative Commons supporters, are also irked by the LLM companies simply scraping their stuff, sometimes with bots relentlessly attacking their servers, making it difficult to conduct legitimate open business.
All of this matters to enterprise users, who are definitely concerned about the pitfalls of training on public data, especially on content that could have ownership questions. They don’t want something like that coming back to bite them as they implement AI solutions with customers.
Much like that moment when the internet made file sharing dead simple in the late 1990s, we find ourselves in a similarly pivotal moment regarding content ownership today. As AI transforms creative industries, everyone involved including policymakers, creators and tech companies must come together to redefine what fair use means in the current context. Just don't expect it to be easy to find a solution that satisfies everyone.
~Ron
Featured photo by Mike Seyfang on Flickr. Used under the Creative Commons Attribution 2.0 license.