Reddit has a new AI training deal to sell user content

L4sBot@lemmy.world · 2 years ago

Reddit has a new AI training deal to sell user content

Lmaydev@programming.dev · 2 years ago

I’d be very surprised if people weren’t already scraping Reddit for this.

NoRodent@lemmy.world · edit-2 2 years ago

I mean, there’s /r/SubSimulatorGPT2 that’s been running for years… Although that one was at least hilarious to read because at that stage the AI was in the sweet spot of being simultaneously coherent while making total lapses in logic.

TexasDrunk@lemmy.world · 2 years ago

Didn’t forget incredibly racist on multiple occasions.

bbkpr@lemmy.world · 2 years ago

The AI is what was fed into it 😂

gwildors_gill_slits@lemmy.ca · 2 years ago

Can’t wait for chatGPT to call me good sir and tell me I win the internet.

Buffalox@lemmy.world · edit-2 2 years ago

The best answer I can find to you question is “deleted by user.”

ME5SENGER_24@lemmy.world · 2 years ago

FUCK REDDIT! FUCK U/SPEZ! The Red-exit shall endure, VIVA LA LEMMY!!

Boozilla@lemmy.world · 2 years ago

I bet the fuckers will use “deleted” data, too

General_Effort@lemmy.world · 2 years ago

Deleted? You mean made unscrapeable. It’s exclusive to Reddit licensees.

tinwhiskers@lemmy.world · 2 years ago

what about edited?

comrade19@lemmy.world · 2 years ago

Why is there nothing on reddit about this lol

FartsWithAnAccent@lemmy.world · 2 years ago

I’d be surprised if there wasn’t, I don’t think Spez and his cohorts are competent enough to completely suppress all information about it site wide.

VerseAndVermin@lemmy.world · 2 years ago

Can someone more savvy explain why they couldn’t also scrape what we all say here?

Crack0n7uesday@lemmy.world · edit-2 2 years ago

They can and do, but they want the training models to come from highly moderated sources otherwise every AI chatbot would be spewing the most racist parts of 4chan because people would train it that way as a joke.

If you let AI roam freely across the internet, it would only learn porn, sailor moon, dragon Ball z, and nazi germany.

bigkahuna1986@lemmy.ml · 2 years ago

Ah yes, the Microsoft Tay conundrum.

Crack0n7uesday@lemmy.world · 2 years ago

Microsoft Bing AI image generator made some really weird porn before they realized what it was being used for…

Steak@lemmy.ca · 2 years ago

Dick dick pussy cunt cock dick pussy ass shit cunt shit motherfucker shit motherfucker ass tits cunt cock motherfucker shit ass tits motherfucker shit c’mon. Scrape that🔥

kingthrillgore@lemmy.ml · edit-2 2 years ago

When spez took away API access, he basically shit on the social contract that offered a fair exchange of free access for the content we fed into reddit. After the API change, there were new terms: there is no contract. There are no terms. If you use reddit now, you are giving away everything you are to be indexed and mangled by statistics. You exist as free labor to statisticians and machines.

You are more than a few cents of bad memes.

I’m going to make the request in the AM that Lemmy should add robots.txt rules to disallow AI crawlers, to at least indicate we’re not interested. We need legislation that tells scrapers what they can access.

General_Effort@lemmy.world · 2 years ago

We need legislation that tells scrapers what they can access.

What do you hope that would achieve?

Because I can only see this as benefitting Reddit, Facebook, and the like, while screwing over smaller players.

General_Effort@lemmy.world · 2 years ago

They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.

Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.

Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.