From: allin

The emergence of AI has brought significant challenges to established intellectual property laws, particularly concerning copyright and fair use. A key point of contention revolves around how AI models are trained on vast datasets, often scraped from the internet, and the resulting output.

Thompson Reuters vs. Ross

In the first major U.S. AI copyright case, Thompson Reuters, owner of the legal database Westlaw, sued its competitor, Ross, for copyright infringement [01:21:18]. Ross had developed an AI-powered legal search engine and had sought a license from Westlaw to train its models, which Westlaw denied [01:22:00]. Ross then partnered with Legal Ease, whose database was found to be copied from Westlaw answers, leading to Ross being held vicariously liable for direct infringement [01:22:20]. Initially, the judge favored Ross under fair use, but this ruling was later reversed, with the judge concluding that fair use did not apply [01:22:37].

This case highlights the “fourth factor test” of the fair use doctrine: the effect of the use on the potential market and the value of the original work [01:23:03]. It raises questions about whether companies like Getty Images have the right to create derivative products from their images, and whether crawling the open web without a license constitutes copyright infringement [01:23:10]. Just because content is technically accessible doesn’t mean it can be used without permission [01:23:25].

The “Napster/Spotify” Scenario

It is predicted that the current AI copyright disputes, such as OpenAI’s lawsuit with The New York Times, may conclude similarly to the Napster and Spotify cases [01:24:43]. This could result in large language models (LLMs), especially closed-source ones, being required to pay a percentage of their revenue to content holders in a negotiated settlement [01:25:12]. Such an outcome could lead to a significant resurgence and uplift for the content industry [01:25:29].

The legal community faces challenges in understanding how AI models function [01:26:10]. LLMs are often described as “extreme compressors” that summarize and repackage information, rather than truly learning or creating new knowledge [01:26:50]. Unlike Google, which links back to original sources and sends traffic, LLMs often provide direct substitutions, keeping users within the model [01:27:07]. The key question in fair use is how much content is changed or remixed by the AI model [01:30:00].

Open Source vs. Closed Source Models

A proposed solution is that if an AI model is trained on the open web, its resulting model should also be open source, contributing back to the public domain [01:24:37]. Alternatively, if content is crawled, it should be made open source, or copyright holders must go to significant lengths to protect their data [01:31:37].

Some AI companies, like Microsoft, are proactively licensing content for their models, such as paying authors for indexing their books [01:32:00]. This establishes a precedent for content creators to have their work respected and compensated.

Open AI’s Approach

The transition of OpenAI from a nonprofit to a for-profit entity, allowing it to raise significant capital and incentivize employees, has drawn criticism for its handling of AI control [01:32:20]. While incentivizing the team and raising money are necessary, some argue that the nonprofit portion could have been maintained to retain control for humanity, rather than completely privatizing the venture [01:33:05]. This concern stems from the idea that if AI is truly as powerful as some predict (capturing future value), then its control should not be consolidated in a single private company [01:32:20].

The historical precedent of the internet, an open technology based on open source, shows that U.S. companies were able to dominate because they were in the lead [01:38:34]. Similarly, an open and distributed approach to AI could lead to faster innovation and benefit humanity, even if it means some leakage of technology to other countries [01:08:08]. The argument for openness is that if a country doesn’t embrace and lead in AI, others will, as demonstrated by companies like DeepMind [01:06:06].