Japan determines copyright doesn't apply to LLM/ML training data

ericjmorey@programming.dev · 2 years ago

Japan determines copyright doesn't apply to LLM/ML training data

Bitflip@lemmy.ml · 2 years ago

Nice, time to train one with all the Nintendo leaks and generate some Zelda art and a new Mario title!

FidiFadi@lemmy.world · 2 years ago

Nintendo would have coup the government if the decision made this scenario actually possible.

silverbax@lemmy.world · edit-2 2 years ago

I think this is a difficult concept to tackle, but the main argument I see about using existing works as ‘training data’ is the idea that ‘everything is a remix’.

I, as a human, can paint an exact copy of a Picasso work or any other artist. This is not illegal and I have no need of a license to do this. I definitely don’t need a license to paint something ‘in the style of Picasso’, and I can definitely sell it with my own name on it.

But the question is, what about when a computer does the same thing? What is the difference? Speed? Scale? Anyone can view a picture of the Mona Lisa at any time and make their own painting of it. You can’t use the image of the Mona Lisa without accreditation and licensing, but what about a recreation of the Mona Lisa?

I’m not really arguing pro-AI here, although it may sound like it. I’ve just heard the ‘licensing’ argument many times and I’d really like to hear what the difference between a human copying and a computer copying are, if someone knows more about the law.

abhibeckert@lemmy.world · edit-2 2 years ago

Um - your examples are so old the copyright expired centuries ago. Of course you can copy them. And you can absolutely use an image of the Mona Lisa without accreditation or licensing.

Painting and selling an exact copy of a recent work, such as Banksy, is a crime.

… however making an exact copy of Banksy for personal use, or to learn, or to teach other people, or copying the style… that’s all perfectly legal.

I don’t think think this is a black and white issue. Using AI to copy something might be a crime. You absolutely can use it to infringe on copyright. The real question is who’s at fault? I would argue the person who asked the AI to create the copy is at fault - not the company running the servers.

silverbax@lemmy.world · edit-2 2 years ago

Thanks for your response. I realize I muddied the waters on my question by mentioning exact copies.

My real question is based on the ‘everything is a remix’ idea. I can create a work ‘in the style of Banksy’ and sell it. The US copyright and trademark laws state that a work only has to be 10% differentiated from the original in order to be legal to use, so creating a piece of work that ‘looks like it could have been created by Banksy, but was not created by Banksy’ is legal.

So since most AI does not create exact copies, this is where I find the licensing argument possibly weak. I really haven’t seen AI like MidJourney creating exact replicas of works - but admittedly, I am not following every single piece of art created on Midjourney, or Stable Diffusion, or DALL-E, or any of the other platforms, and I’m not an expert in the trademarking laws to the extent I can answer these questions.

abhibeckert@lemmy.world · edit-2 2 years ago

Thanks for your response

Always happy to discuss copyright. :-) Our IP laws are long overdue for an overhaul in my opinion. And the only way to make that happen is for as many people as possible to discuss the issues. I plan to spend the rest of my life creating copyrighted work, and I really hope I don’t spend all of it under the current rules…

The US copyright and trademark laws state that a work only has to be 10% differentiated from the original in order to be legal to use

The law doesn’t say that.The Blurred Lines copyright case for example was far less than 10%. Probably less than 1%, and it was still unclear if it was infringement or not. It took five years of lawsuits to reach an unclear conclusion where the first court found it to be infringing then an appeals panel of judges reached a split decision where the majority of them found it to be non-infringing.

Copyright is incredibly complex and unclear. It’s generally best to just not get into a copyright lawsuit in the first place. Usually when someone accuses you of copyright infringement you try to pay them whatever amount of money (in the Blurred Lines case, there were discussions of 50% of the artist’s income from the song) to make them go away even if your lawyers tell you you’re probably going to get a not guilty verdict.

Mango@lemmy.world · 2 years ago

Your example is a dude who paints unsolicited on other people’s property. What kind of copyright does a ghost have?

tabular@lemmy.world · edit-2 2 years ago

To be at fault the user would have to know the AI creation they distributed commits copyright infringement. How can you tell? Is everyone doing months of research to be vaguely sure it’s not like someone else’s work?

Even if you had an AI trained on only public domain assets you could still end up putting in the words that generate something copyrighted.

Companies created a random copyright infringement tool for users to randomly infringe copyright.

foggy@lemmy.world · 2 years ago

Japan tech economy go brrrrr

ericjmorey@programming.dev · 2 years ago

Or it leads the way in producing the most useless, misleading bullshit more efficiently. We’ll see.

foggy@lemmy.world · 2 years ago

I didn’t say it’d produce good things other than economy points.

ericjmorey@programming.dev · 2 years ago

I like to exchange my economy points for things

foggy@lemmy.world · 2 years ago

Things, or good things 😉

ericjmorey@programming.dev · 2 years ago

Good for whom?

foggy@lemmy.world · 2 years ago

ECONOMY GO BRRRRR

cyd@lemmy.world · 2 years ago

Maybe that would finally get them to stop using fax machines.

halcyoncmdr@lemmy.world · 2 years ago

Not sure this is the flex you think it is. The US health industry utilizes fax to send client health information millions of times a day, and it is considered a secure communication.

friend_of_satan@lemmy.world · edit-2 2 years ago

What’s stopping somebody from making an LLM that can reproduce media that was used in its training with close to 100% accuracy? If that happens, then we’ll have a copyright laundering service.

DogWater@lemmy.world · 2 years ago

Reproducing copywrited works would be a problem. Consuming them is not.

In your example, a copyright case would be able to move forward and be tested in court. I would think it stands as good of a shot at prevailing in that example. It would be the same as a case against someone who wrote a script for a website to reproduce copyrighted work on command. The difference is this isn’t that. And if and when it does that, the ai can be tuned to prevent it from continuing to do it.

regbin_@lemmy.world · 2 years ago

If you make it reproduce copyrighted media, it is a problem.

As long as the stuff it generates doesn’t resemble any copyrighted works, even if it was trained on copyrighted works, I don’t see why that should be problem.

NotMyOldRedditName@lemmy.world · 2 years ago

I don’t even think there’s a problem recreating it, you just can’t distribute it.

For personal use it’s fine.

Its not like Disney is suing everyone drawing micky mouse in their personal art workbook

Duamerthrax@lemmy.world · 2 years ago

It will go to a judge and the judge will say that changing three pixels doesn’t make it derivative. Regardless of the method of transformation, the same fair use and parody laws apply.

yamanii@lemmy.world · 2 years ago

The japanese artists will be so happy /s

Japan determines copyright doesn't apply to LLM/ML training data

Japan determines copyright doesn't apply to LLM/ML training data

Taggart :donor: (@mttaggart)