‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products
If it ends up being OK for a company like OpenAI to commit copyright infringement to train their AI models it should be OK for John/Jane Doe to pirate software for private use.
But that would never happen. Almost like the whole of copyright has been perverted into a scam.
You wouldn’t steal a car, would you?
Using copyrighted material is not the same thing as copyright infringement. You need to (re)publish it for it to become an infringement, and OpenAI is not publishing the material made with their tool; the users of it are. There may be some grey areas for the law to clarify, but as yet, they have not clearly infringed anything, any more than a human reading copyrighted material and making a derivative work.
It comes from OpenAI and is given to OpenAI’s users, so they are publishing it.
It’s being mishmashed with a billion other documents just like to make a derivative work. It’s not like open hours giving you a copy of Hitchhiker’s Guide to the Galaxy.
New York Times was able to have it return a complete NYT article, verbatim. That’s not derivative.
And that’s not the intent of the service, it’s a bug and they’ll fix it.
any more than a human reading copyrighted material and making a derivative work.
It seems obvious to me that it’s not doing anything different than a human does when we absorb information and make our own works. I don’t understand why practically nobody understands this
I’m surprised to have even found one person that agrees with me
Because it’s objectively not true. Humans and ML models fundamentally process information differently and cannot be compared. A model doesn’t “read a book” or “absorb information”
I didn’t say they processed information the same, I said generative AI isn’t doing anything that humans don’t already do. If I make a drawing of Gordon Freeman or Courage the Cowardly Dog, or even a drawing of Gordon Freeman in the style of Courage the Cowardly Dog, I’m not infringing on the copyright of Valve or John Dilworth. (Unless I monetize it, but even then there’s fair-use…)
Or if I read a statistic or some kind of piece of information in an article and spoke about it online, I’m not infringing the copyright of the author. Or if I listen to hundreds of hours of a podcast and then do a really good impression of one of the hosts online, I’m not infringing on that person’s copyright or stealing their voice.
Neither me making that drawing, nor relaying that information, nor doing that impression are copyright infringement. Me uploading a copy of Courage or Half-Life to the internet would be, or copying that article, or uploading the hypothetical podcast on my own account somewhere. Generative AI doesn’t publish anything, and even if it did I think there would be a strong case for fair-use for the same reasons humans would have a strong case for fair-use for publishing their derivative works.
Its almost like we had a thing where copyrighted things used to end up but they extended the dates because money
This is where they have the leverage to push for actual copyright reform, but they won’t. Far more profitable to keep the system broken for everyone but have an exemption for AI megacorps.
I was literally about to come in here and say it would be an interesting tangential conversation to talk about how FUCKED copyright laws are, and how relevant to the discussion it would be.
More upvote for you!
deleted by creator
If the rule is stupid or evil we should applaud people who break it.
we should use those who break it as a beacon to rally around and change the stupid rule
Except they pocket millions of dollars by breaking that rule and the original creators of their “essential data” don’t get a single cent while their creations indirectly show up in content generated by AI. If it really was about changing the rules they wouldn’t be so obvious in making it profitable, but rather use that money to make it available for the greater good AND pay the people that made their training data. Right now they’re hell-bent in commercialising their products as fast as possible.
If their statement is that stealing literally all the content on the internet is the only way to make AI work (instead of for example using their profits to pay for a selection of all that data and only using that) then the business model is wrong and illegal. It’s as a simple as that.
I don’t get why people are so hell-bent on defending OpenAI in this case; if I were to launch a food-delivery service that’s affordable for everyone, but I shoplifted all my ingredients “because it’s the only way”, most would agree that’s wrong and my business is illegal. Why is this OpenAI case any different? Because AI is an essential development? Oh, and affordable food isn’t?
I am not defending OpenAi I am attacking copyright. Do you have freedom of speech if you have nothing to say? Do you have it if you are a total asshole? Do you have it if you are the nicest human who ever lived? Do you have it and have no desire to use it?
I guess the lesson here is pirate everything under the sun and as long as you establish a company and train a bot everything is a-ok. I wish we knew this when everyone was getting dinged for torrenting The Hurt Locker back when.
Remember when the RIAA got caught with pirated mp3s and nothing happened?
What a stupid timeline.
finally capitalism will notice how many times it has shot up its own foot with their ridiculous, greedy infinite copyright scheme
As a musician, people not involved in the making of my music make all my money nowadays instead of me anyway. burn it all down
Pitchfork fest 2024
… that’s a good album name, might use that ;)
it would sell
It’s not “impossible”. It’s expensive and will take years to produce material under an encompassing license in the quantity needed to make the model “large”. Their argument is basically “but we can have it quickly if you allow legal shortcuts.”
Whenever a company says something is impossible, they usually mean it’s just unprofitable.
The law is shit
Cool! Then don’t!
If the copyright people had their way we wouldn’t be able to write a single word without paying them. This whole thing is clearly a fucking money grab. It is not struggling artists being wiped out, it is big corporations suing a well funded startup.
This situation seems analogous to when air travel started to take off (pun intended) and existing legal notions of property rights had to be adjusted. IIRC, a farmer sued an airline for trespassing because they were flying over his land. The court ruled against the farmer because to do otherwise would have killed the airline industry.
I’m dumbfounded that any Lemmy user supports OpenAI in this.
We’re mostly refugees from Reddit, right?
Reddit invited us to make stuff and share it with our peers, and that was great. Some posts were just links to the content’s real home: Youtube, a random Wordpress blog, a Github project, or whatever. The post text, the comments, and the replies only lived on Reddit. That wasn’t a huge problem, because that’s the part that was specific to Reddit. And besides, there were plenty of third-party apps to interact with those bits of content however you wanted to.
But as Reddit started to dominate Google search results, it displaced results that might have linked to the “real home” of that content. And Reddit realized a tremendous opportunity: They now had a chokehold on not just user comments and text posts, but anything that people dare to promote online.
At the same time, Reddit slowly moved from a place where something may get posted by the author of the original thing to a place where you’ll only see the post if it came from a high-karma user or bot. Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided. No way for the audience to respond to the author in any meaningful way and start a dialogue.
This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.
–
There are genuine problems with copyright law. Don’t get me wrong. Perhaps the most glaring problem is the fact that many prominent creators don’t even own the copyright to the stuff they make. It was invented to protect creators, but in practice this “protection” gets assigned to a publisher immediately after the protected work comes into being.
And then that copyright – the very same thing that was intended to protect creators – is used as a weapon against the creator and against their audience. Publishers insert a copyright chokepoint in-between the two, and they squeeze as hard as they desire, wringing it of every drop of profit, keeping creators and audiences far away from each other. Creators can’t speak out of turn. Fans can’t remix their favorite content and share it back to the community.
This is a dysfunctional system. Audiences are denied the ability to access information or participate in culture if they can’t pay for admission. Creators are underpaid, and their creative ambitions are redirected to what’s popular. We end up with an auto-tuned culture – insular, uncritical, and predictable. Creativity reduced to a product.
But.
If the problem is that copyright law has severed the connection between creator and audience in order to set up a toll booth along the way, then we won’t solve it by giving OpenAI a free pass to do the exact same thing at massive scale.
Too long didn’t read, busy downloading a car now. How much did Disney pay for this comment?
If a business relies on breaking the law as a fundament of their business model, it is not a business but an organized crime syndicate. A Mafia.
It’s impossible to extract all the money from a bank without robbing the bank :(
deleted by creator
If I steal something from you I have it and you don’t. When I copy an idea from you, you still have the idea. As a whole the two person system has more knowledge. While actual theft is zero sum. Downloading a car and stealing a car are not the same thing.
And don’t even try the awarding artists and inventor argument. Companies that fund R&D get tax breaks for it, so they already get money. An artists are rarely compensated appropriately.
If OpenAI is right (I think they are) one of two things need to happen.
- All AI should be open source and non-profit
- Copywrite law needs to be abolished
For number 1. Good luck for all the reasons we all know. Capitalism must continue to operate.
For number 1. Good luck because those in power are mostly there off the backs of those before them (see Disney, Apple, Microsoft, etc)
Anyways, fun to watch play out.
There’s a third solution you’re overlooking.
3: OpenAI (or other) wins a judgment that AI content is not inherently a violation of copyright regardless of materials it is trained upon.
It’s not really about the AI content being a violation or not though is it. It’s more about a corporation using copyrighted content without permission to make their product better.
If it’s not a violation of copyright then this is a non-issue. You don’t need permission to read books.
AI does not “read books” and it’s completely disingenuous to compare them to humans that way.
That’s certainly an opinion you have
Backed by technical facts.
AIs fundamentally process information differently than humans. That’s not up for debate.
Yes this is an argument in my favor, you just don’t understand AI/LLMs enough to know why.
Similarly I don’t read “War and Peace” and then use that to go and write “Peace and War”
There’s no open source without copyright, only public domain
It’s why AI ultimately will be the death of capitalism, or the dawn of the endless war against the capitalists (literally, and physically).
AI will ultimately replace most jobs, capitalism can’t work without wage slave, or antique capitalism aka feudalism… so yeah. Gonna need to move towards UBI and more utopian, or just a miserable endless bloody awful war against the capitalists.
It feels to be like every other post on lemmy is taking about how copyright is bad and should be changed, or piracy is caused by fragmentation and difficulty accessing information (streaming sites). Then whenever this topic comes up everyone completely flips. But in my mind all this would do is fragment the ai market much like streaming services (suddenly you have 10 different models with different licenses), and make it harder for non mega corps without infinite money to fund their own llms (of good quality).
Like seriously, can’t we just stay consistent and keep saying copyright bad even in this case? It’s not really an ai problem that jobs are effected, just a capitalism problem. Throw in some good social safety nets and tax these big ai companies and we wouldn’t even have to worry about the artist’s well-being.
I think looking at copyright in a vacuum is unhelpful because it’s only one part of the problem. IMO, the reason people are okay with piracy of name brand media but are not okay with OpenAI using human-created artwork is from the same logic of not liking companies and capitalism in general. People don’t like the fact that AI is extracting value from individual artists to make the rich even richer while not giving anything in return to the individual artists, in the same way we object to massive and extremely profitable media companies paying their artists peanuts. It’s also extremely hypocritical that the government and by extention “copyright” seems to care much more that OpenAI is using name brand media than it cares about OpenAI scraping the internet for independent artists’ work.
Something else to consider is that AI is also undermining copyleft licenses. We saw this in the GitHub Autopilot AI, a 100% proprietary product, but was trained on all of GitHub’s user-generated code, including GPL and other copyleft licensed code. The art equivalent would be CC-BY-SA licenses where derivatives have to also be creative commons.
Maybe I’m optimistic but I think your comparison to big media companies paying their artist’s peanuts highlights to me that the best outcome is to let ai go wild and just… Provide some form of government support (I don’t care what form, that’s another discussion). Because in the end the more stuff we can train ai on freely the faster we automate away labour.
I think another good comparison is reparations. If you could come to me with some plan that perfectly pays out the correct amount of money to every person on earth that was impacted by slavery and other racist policies to make up what they missed out on, ids probably be fine with it. But that is such a complex (impossible, id say) task that it can’t be done, and so I end up being against reparations and instead just say “give everyone money, it might overcompensate some, but better that than under compensating others”. Why bother figuring out such a complex, costly and bureaucratic way to repay artists when we could just give everyone robust social services paid for by taxing ai products an amount equal to however much money they have removed from the work force with automation.
Then LLMs should be FOSS
All AI should be FOSS and public domain, owned by the people, and all gains from its use taxed at 100%. It’s only because of the public that AI exists, through the schools, universities, NSF, grants, etc and all the other places that taxes have been poured into that created the advances upon which AI stands, and the AI critical research as well.
That does nothing to solve the problem of data being used without consent to train the models. It doesn’t matter if the model is FOSS if it stole all the data it trained on.
The only way I can steal data from you is if I break into your office and walk off with your hard drive. Do you have access to something? It hasn’t been stolen.