Thomson Reuters Wins First Major AI Copyright Case in the US

misk@sopuli.xyz · edit-2 2 months ago

Thomson Reuters Wins First Major AI Copyright Case in the US

TachyonTele@lemm.ee · 2 months ago

“None of Ross’s possible defenses holds water. I reject them all,” wrote US District Court of Delaware judge Stephanos Bibas, in a summary judgement.

The AI company lost. Good news for a change.

Buffalox@lemmy.world · 2 months ago

Paywall

misk@sopuli.xyz · 2 months ago

Try https://archive.is/2025.02.11-221212/https://www.wired.com/story/thomson-reuters-ai-copyright-lawsuit/

Buffalox@lemmy.world · edit-2 2 months ago

Thanks works fine, I can see you also included it as part of the post. 👍

NSRXN@lemmy.dbzer0.com · 2 months ago

tragic. no one should need to pay to read the law

nickwitha_k (he/him)@lemmy.sdf.org · 2 months ago

That’s literally not what the ruling is about. It was about an AI bro company using proprietary, copyrighted materials to train its AI, which they obtained by questionable means, after being denied license to do so by the IP owners. Further, after training the AI with unlicensed materials, they launched a competing product.

Whether you support IP or not, the AI company is clearly in the wrong here.

It’s a pretty definitive example of many AI companies being little more than leeches, stealing others’ work and repackaging it as their own. All with zero long-term consideration of “what do we do when there’s noone left to leech off of because we undermined the ability of those make the source data to make a living, while unnecessarily driving increased emissions and consumption of potable water for something that provides little actual value do humanity as a whole?”

NSRXN@lemmy.dbzer0.com · 2 months ago

Whether you support IP or not, the AI company is clearly in the wrong here.

they’re both wrong to restrict access. if legal analysis is necessary to understand the law, then restricting access to that analysis, or it’s free dissemination, is also wrong.

nickwitha_k (he/him)@lemmy.sdf.org · 2 months ago

I am in agreement with you here, at least ideologically. I think that IP law needs a massive overhaul because data “wants” to be free. The major problem is with the context of the hyper-commercialized landscape that we currently live in.

NSRXN@lemmy.dbzer0.com · 2 months ago

stealing others’ work

Reuters still has their analysis. nothing was stolen.

nickwitha_k (he/him)@lemmy.sdf.org · 2 months ago

It is stealing in the same way that profits are stolen labor. The AI company stole the labor of those who prepared the summaries without compensation then, used what they obtained to directly compete.

NSRXN@lemmy.dbzer0.com · 2 months ago

since the defendant is also a capitalist firm, I can see the similarities, but if someone were to simply be liberating the information, I don’t see that as stealing.

nickwitha_k (he/him)@lemmy.sdf.org · 2 months ago

I agree with you there. Context is what makes it theft and using the stolen data to attempt to directly compete with the source is where the actual harm occurs.

In a scenario where the source of the data is not being harmed, it’s hard to think of it as theft (data/information wants to be free).

NSRXN@lemmy.dbzer0.com · 2 months ago

they might claim they’re harmed if the information is distributed for free. I don’t care. that’s not theft.

nickwitha_k (he/him)@lemmy.sdf.org · 2 months ago

Yup. The context on this is directly profiting off of others’ work, not setting data free.

MaggiWuerze@feddit.org · 2 months ago

That’s basically what the judge said as well. The AI firm tried to create a market alternative, aka they wanted to compete, and that was the main issue why this is not free use

grue@lemmy.world · 2 months ago

Ah, fuck, is that what the case is about? That sucks; that’s the kind of case where they both need to lose:

The law shouldn’t be copyrightable
AI companies shouldn’t be allowed to ‘launder’ copyright (and more to the point, copyleft) by reproducing chunks of copyrighted works divorced from their license

If I were more conspiracy-minded, I would almost think that somebody intentionally decided to resolve this case first in order to guarantee that they set a disastrous precedent.

TheOccasionalTachyon@lemm.ee · edit-2 2 months ago

It’s not what this case is about. Reuters runs a service called Westlaw that provides access to a bunch of legal materials, including summaries and explanations of cases that are written by its lawyers. Ross Intelligence wanted access to those summaries, so that it could train AI to make a competing product. As you can imagine, Reuters said no to this.

So, Ross bought summaries from someone else, another company that did have access to Westlaw, and used those to train its AI. Today, the court found (among other things), that a few thousand of the summaries that Ross’s AI produced are way too similar to Westlaw’s summaries for it to be a coincidence. Ross had argued (among other things) that its summaries were only similar because they were describing the law, and Reuters doesn’t/can’t have a copyright on the law. The court rejected this argument, saying, essentially “Yeah, it’s true that Reuters doesn’t have a copyright on the law, but it does have a copyright on the summaries that its lawyers write. It takes skill and judgment to decide which parts of a law or decision are important for people doing legal research, and to present them in a way that’s easy for people to understand. You clearly copied many of them.”

This isn’t an exhaustive discussion of all the issues covered in the opinion, because I’m a sleepy lawyer, but it’s the most important part.

Dkarma@lemmy.world · 2 months ago

Which is funny cuz this is exactly what the cops do to prosecute citizens. they buy 3rd party data they’re not legally entitled to gather themselves.

Interesting to see this possibly be used against prosecutions in the future where the cops collected 3rd party data.

grue@lemmy.world · edit-2 2 months ago

I’m not a lawyer, but I’m also not entirely unfamiliar with this sort of thing. In particular, I remember Georgia v. Public.Resource.Org and thus do not accept at face value the notion that the data in question being “summaries and explanations of cases” necessarily means Westlaw is in the right. Even if the Westlaw materials aren’t “officially” incorporated into the law itself the way Georgia did, that doesn’t mean Westlaw should necessarily be entitled to monopolize them, especially if the judicial system is heavily leaning upon them to inform its decisions.

ricecake@sh.itjust.works · 2 months ago

Though the headnotes were drawn directly from uncopyrightable judicial opinions, the court analogized them to the choices made by a sculptor in selecting what to remove from a slab of marble. Thus, even though the words or phrases used in the headnotes might be found in the underlying opinions, Thompson Reuters’ selection of which words and phrases to use was entitled to copyright protection. Interestingly, the court stated that “even a headnote taken verbatim from an opinion is a carefully chosen fraction of the whole,” which “expresses the editor’s idea about what the important point of law from the opinion is.” According to the court, that is enough of a “creative spark” to be copyrightable. In other words, even if a work is selected entirely from the public domain, the simple act of selection is enough to give rise to copyright protection.

The court distinguished cases holding that intermediate copying of computer source code was fair use, reasoning that those courts held that the intermediate copying was necessary to “reverse engineer access to the unprotected functional elements within a program.” Here, copying Thompson Reuters’ protected expression was not needed to gain access to underlying ideas.

https://natlawreview.com/article/court-training-ai-model-based-copyrighted-data-not-fair-use-matter-law

It sounds like the case you mentioned had a government entity doing the annotation, which makes it public even though it’s not literally the law.
Reuters seems to have argued that while the law and cases are public, their tagging, summarization and keyword highlighting is editorial.
The judge agreed and highlighted that since westlaw isn’t required to view the documents that everyone is entitled to see, training using their copy, including the headers, isn’t justified.

It’s much like how a set of stories being in the public domain means you can copy each of them, but my collection of those stories has curation that makes it so you can’t copy my collection as a whole, assuming my work curating the collection was in some way creative and not just “alphabetical order”.

Another major point of the ruling seems to rely on the company aiming to directly compete with Reuters, which undermines the fair use argument.

antonim@lemmy.dbzer0.com · edit-2 2 months ago

Today, the court found (among other things), that a few thousand of the summaries that Ross’s AI produced are way too similar to Westlaw’s summaries for it to be a coincidence.

This is probably just inevitable when your dataset is not large enough. I would be interested in seeing the LLM’s output compared against the original texts; I do remember the early ChatGPT producing some borderline copies of sentences that you could find online (with one or two words changed).

NSRXN@lemmy.dbzer0.com · 2 months ago

I don’t trust that judge’s ability to determine whether they were copied if it wasn’t verbatim. which is what copyright is. to control an idea, you need a patent.

ricecake@sh.itjust.works · 2 months ago

I don’t think that’s the best argument in favor of AI if you cared to make that argument. The infringement wasn’t for their parsing of the law, but for their parsing of the annotations and commentary added by westlaw.

If processing copy written material is infringement then what they did is definitively infringement.
The law is freely available to read without westlaw. They weren’t making the law available to everyone, they were making a paid product to compete with the westlaw paid product. Regardless of justification they don’t deserve any sympathy for altruism.

A better argument would be around if training on the words of someone you paid to analyze an analysis produces something similar to the original, is it sufficiently distinct to actually be copy written? Is training itself actually infringement?

Pogogunner@sopuli.xyz · 2 months ago

So if stealing copyrighted information to train an AI isn’t fair use, then isn’t pretty much every commercial LLM illegal?