Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Pro@programming.dev · 2 months ago

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Prox@lemmy.world · 2 months ago

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

krashmo@lemmy.world · 2 months ago

Funny how that kind of thing only works for rich people

Phoenixz@lemmy.ca · edit-2 2 months ago

This version of too big to fail is too big a criminal to pay the fines.

How about we lock them up instead? All of em.

Buske@lemmy.world · 2 months ago

Ahh cant wait for hedgefunds and the such to use this defense next.

Lovable Sidekick@lemmy.world · edit-2 2 months ago

Lawsuits are multifaceted. This statement isn’t a a defense or an argument for innocence, it’s just what it says - an assertion that the proposed damages are unreasonably high. If the court agrees, the plaintiff can always propose a lower damage claim that the court thinks is reasonable.

modifier@lemmy.ca · 2 months ago

Hold my beer.

Womble@lemmy.world · 2 months ago

The problem isnt anthropic get to use that defense, its that others dont. The fact the the world is in a place where people can be fined 5+ years of a western European average salary for making a copy of one (1) book that does not materially effect the copyright holder in any way is insane and it is good to point that out no matter who does it.

interdimensionalmeme@lemmy.ml · 2 months ago

What is means is they don’t own the models. They are the commons of humanity, they are merely temporary custodians. The nightnare ending is the elites keeping the most capable and competent models for themselves as private play things. That must not be allowed to happen under any circumstances. Sue openai, anthropic and the other enclosers, sue them for trying to take their ball and go home. Disposses them and sue the investors for their corrupt influence on research.

Alphane Moon@lemmy.world · edit-2 2 months ago

And this is how you know that the American legal system should not be trusted.

Mind you I am not saying this an easy case, it’s not. But the framing that piracy is wrong but ML training for profit is not wrong is clearly based on oligarch interests and demands.

themeatbridge@lemmy.world · 2 months ago

This is an easy case. Using published works to train AI without paying for the right to do so is piracy. The judge making this determination is an idiot.

AbidanYre@lemmy.world · 2 months ago

You’re right. When you’re doing it for commercial gain, it’s not fair use anymore. It’s really not that complicated.

tabular@lemmy.world · 2 months ago

If you’re using the minimum amount, in a transformative way that doesn’t compete with the original copyrighted source, then it’s still fair use even if it’s commercial. (This is not saying that’s what LLM are doing)

Null User Object@lemmy.world · 2 months ago

The judge making this determination is an idiot.

The judge hasn’t ruled on the piracy question yet. The only thing that the judge has ruled on is, if you legally own a copy of a book, then you can use it for a variety of purposes, including training an AI.

“But they didn’t own the books!”

Right. That’s the part that’s still going to trial.

Randomgal@lemmy.ca · 2 months ago

You’re poor? Fuck you you have to pay to breathe.

Millionaire? Whatever you want daddy uwu

setVeryLoud(true);@lemmy.ca · edit-2 2 months ago

Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”

DeathsEmbrace@lemmy.world · 2 months ago

So I can’t use any of these works because it’s plagiarism but AI can?

setVeryLoud(true);@lemmy.ca · 2 months ago

My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.

In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn’t spit it out verbatim, but they didn’t even do that, i.e. the AI crawler pirated the book.

Enkimaru@lemmy.world · 2 months ago

Why would it be plagiarism if you use the knowledge you gain from a book?

DerisionConsulting@lemmy.ca · edit-2 2 months ago

Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won’t line break.

This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it’ll look like a normal comment

With an empty line of space:

1 space - and a little bit of writing just to see how the text will wrap. I don’t really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

2 spaces - and a little bit of writing just to see how the text will wrap. I don’t really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

3 spaces - and a little bit of writing just to see how the text will wrap. I don’t really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

4 spaces -  and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

setVeryLoud(true);@lemmy.ca · 2 months ago

Thanks, I had copy-pasted it from the website :)

Optional@lemmy.world · 2 months ago

Judges: not learning a goddamned thing about computers in 40 years.

MTK@lemmy.world · 2 months ago

Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

nodiratime@lemmy.world · edit-2 2 months ago

Does it “generate” a 1:1 copy?

MTK@lemmy.world · 2 months ago

You can train an LLM to generate 1:1 copies

GissaMittJobb@lemmy.ml · 2 months ago

It’s extremely frustrating to read this comment thread because it’s obvious that so many of you didn’t actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

For shame.

jsomae@lemmy.ml · 2 months ago

It seems the subject of AI causes lemmites to lose all their braincells.

LifeInMultipleChoice@lemmy.world · 2 months ago

“While the copies used to convert purchased print library copies into digital library copies were slightly disfavored by the second factor (nature of the work), the court still found “on balance” that it was a fair use because the purchased print copy was destroyed and its digital replacement was not redistributed.”

So you find this to be valid? To me it is absolutely being redistributed

vane@lemmy.world · edit-2 2 months ago

Ok so you can buy books scan them or ebooks and use for AI training but you can’t just download priated books from internet to train AI. Did I understood that correctly ?

mlg@lemmy.world · 2 months ago

Yeah I have a bash one liner AI model that ingests your media and spits out a 99.9999999% accurate replica through the power of changing the filename.

cp

Out performs the latest and greatest AI models

interdimensionalmeme@lemmy.ml · 2 months ago

I call this legally distinct, this is legal advice.

Dr. Moose@lemmy.world · edit-2 2 months ago

Unpopular opinion but I don’t see how it could have been different.

There’s no way the west would give AI lead to China which has no desire or framework to ever accept this.
Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It’s transformative work - it’s even in the name.
This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.

This is an absolute win for everyone involved other than copyright hoarders and mega corporations.

kromem@lemmy.world · 2 months ago

I’d encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:

Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take.

Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.

deathbird@mander.xyz · edit-2 2 months ago

Idgaf about China and what they do and you shouldn’t either, even if US paranoia about them is highly predictable.
Depending on the outputs it’s not always that transformative.
The moat would be good actually. The business model of LLMs isn’t good, but it’s not even viable without massive subsidies, not least of which is taking people’s shit without paying.

It’s a huge loss for smaller copyright holders (like the ones that filed this lawsuit) too. They can’t afford to fight when they get imitated beyond fair use. Copyright abuse can only be fixed by the very force that creates copyright in the first place: law. The market can’t fix that. This just decides winners between competing mega corporations, and even worse, up ends a system that some smaller players have been able to carve a niche in.

Want to fix copyright? Put real time limits on it. Bind it to a living human only. Make it non-transferable. There’s all sorts of ways to fix it, but this isn’t it.

ETA: Anthropic are some bitches. “Oh no the fines would ruin us, our business would go under and we’d never maka da money :*-(” Like yeah, no shit, no one cares. Strictly speaking the fines for ripping a single CD, or making a copy of a single DVD to give to a friend, are so astronomically high as to completely financially ruin the average USAian for life. That sword of Damocles for watching Shrek 2 for your personal enjoyment but in the wrong way has been hanging there for decades, and the only thing that keeps the cord that holds it up strong is the cost of persuing “low-level offenders”. If they wanted to they could crush you.

Anthropic walked right under the sword and assumed their money would protect them from small authors etc. And they were right.

Atlas_@lemmy.world · 2 months ago

Maybe something could be hacked together to fix copyright, but further complication there is just going to make accurate enforcement even harder. And we already have Google (in YouTube) already doing a shitty job of it and that’s… One of the largest companies on earth.

We should just kill copyright. Yes, it’ll disrupt Hollywood. Yes it’ll disrupt the music industry. Yes it’ll make it even harder to be successful or wealthy as an author. But this is going to happen one way or the other so long as AI can be trained on copyrighted works (and maybe even if not). We might as well get started on the transition early.

Dr. Moose@lemmy.world · edit-2 2 months ago

I’ll be honest with you - I genuinely sympathize with the cause but I don’t see how this could ever be solved with the methods you suggested. The world is not coming together to hold hands and koombayah out of this one. Trade deals are incredibly hard and even harder to enforce so free market is clearly the only path forward here.

Lovable Sidekick@lemmy.world · edit-2 2 months ago

You’re getting douchevoted because on lemmy any AI-related comment that isn’t negative enough about AI is the Devil’s Work.

fum@lemmy.world · 2 months ago

What a bad judge.

This is another indication of how Copyright laws are bad. The whole premise of copyright has been obsolete since the proliferation of the internet.

gian · 2 months ago

What a bad judge.

Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

LifeInMultipleChoice@lemmy.world · edit-2 2 months ago

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

They may be trying to put safeguards so it isn’t directly happening, but here is an example that the text is there word for word:

gian · 2 months ago

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

Well, it would be interesting if this case would be used as precedence in a case invonving a single student that do the same thing. But you are right

fum@lemmy.world · 2 months ago

This was my understanding also, and why I think the judge is bad at their job.

LifeInMultipleChoice@lemmy.world · 2 months ago

I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can’t copy right the math problems I don’t think… so if the text wording is what gives it credence, that would have been changed.

WraithGear@lemmy.world · 2 months ago

If a human did that it’s still plagiarism.

LifeInMultipleChoice@lemmy.world · edit-2 2 months ago

Oh I agree it should be, but following the judges ruling, I don’t see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

(Unless you mean the human reworded them, then yeah, we aren’t special apparently)

VoterFrog@lemmy.world · 2 months ago

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

A student can absolutely buy a text book and then teach the other students the information in it for free. That’s not redistribution. Redistribution would mean making copies of the book to hand out. That’s illegal for people and companies.

LifeInMultipleChoice@lemmy.world · edit-2 2 months ago

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no “its own words.” (As seen by the judgement that its words cannot be copyrighted.) It only has other people’s words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

Long term it means all original creations… Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project… Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

VoterFrog@lemmy.world · 2 months ago

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

You could honestly say the same about most “teaching” that a student without a real comprehension of the subject does for another student. But ultimately, that’s beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

There’s no special exception for AI here. That’s how copyright works for you, me, the student, and the AI. And if you’re hoping that copyright is going to save you from the outcomes you’re worried about, it won’t.

patatahooligan@lemmy.world · 2 months ago

“Fair use” is the exact opposite of what you’re saying here. It says that you don’t need to ask for any permission. The judge ruled that obtaining illegitimate copies was unlawful but use without the creators consent is perfectly fine.

j0ester@lemmy.world · 2 months ago

Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?

gian · 2 months ago

True. And I will be happy if someone sue them and the judge say the same thing.

GreenKnight23@lemmy.world · 2 months ago

I am training my model on these 100,000 movies your honor.

PattyMcB@lemmy.world · 2 months ago

Can I not just ask the trained AI to spit out the text of the book, verbatim?

BlameTheAntifa@lemmy.world · 2 months ago

They aren’t capable of that. This is why you sometimes see people comparing AI to compression, which is a bad faith argument. Depending on the training, AI can make something that is easily recognizable as derivative, but is not identical or even “lossy” identical. But this scenario takes place in a vacuum that doesn’t represent the real world. Unfortunately, we are enslaved by Capitalism, which means the output, which is being sold for-profit, is competing with the very content it was trained upon. This is clearly a violation of basic ethical principles as it actively harms those people whose content was used for training.

kromem@lemmy.world · 2 months ago

Even if the AI could spit it out verbatim, all the major labs already have IP checkers on their text models that block it doing so as fair use for training (what was decided here) does not mean you are free to reproduce.

Like, if you want to be an artist and trace Mario in class as you learn, that’s fair use.

If once you are working as an artist someone says “draw me a sexy image of Mario in a calendar shoot” you’d be violating Nintendo’s IP rights and liable for infringement.

kryptonianCodeMonkey@lemmy.world · edit-2 2 months ago

It’s pretty simple as I see it. You treat AI like a person. A person needs to go through legal channels to consume material, so piracy for AI training is as illegal as it would be for personal consumption. Consuming legally possessed copywritten material for “inspiration” or “study” is also fine for a person, so it is fine for AI training as well. Commercializing derivative works that infringes on copyright is illegal for a person, so it should be illegal for an AI as well. All produced materials, even those inspired by another piece of media, are permissible if not monetized, otherwise they need to be suitably transformative. That line can be hard to draw even when AI is not involved, but that is the legal standard for people, so it should be for AI as well. If I browse through Deviant Art and learn to draw similarly my favorite artists from their publically viewable works, and make a legally distinct cartoon mouse by hand in a style that is similar to someone else’s and then I sell prints of that work, that is legal. The same should be the case for AI.

But! Scrutiny for AI should be much stricter given the inherent lack of true transformative creativity. And any AI that has used pirated materials should be penalized either by massive fines or by wiping their training and starting over with legally licensed or purchased or otherwise public domain materials only.

R D Korronald@lemmy.world · 2 months ago

But AI is not a person. It’s very weird idea to treat it like a person.

kryptonianCodeMonkey@lemmy.world · edit-2 2 months ago

No it’s a tool, created and used by people. You’re not treating the tool like a person. Tools are obviously not subject to laws, can’t break laws, etc… Their usage is subject to laws. If you use a tool to intentionally, knowingly, or negligently do things that would be illegal for you to do without the tool, then that’s still illegal. Same for accepting money to give others the privilege of doing those illegal things with your tool without any attempt at moderating said things that you know is happening. You can argue that maybe the law should be more strict with AI usage than with a human if you have a good legal justification for it, but there’s really no way to justify being less strict.

shadowfax13@lemmy.ml · 2 months ago

calm down everyone. its only legal for parasitic mega corps, the normal working people will be harassed to suicide same as before.

its only a crime if the victims was rich or perpetrator was not rich.

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Claude AI maker Anthropic bags key “fair use” win for AI platforms, but faces trial over damages for millions of pirated works – ai fray