LLMs can unmask pseudonymous users at scale with surprising accuracy

return2ozma@lemmy.world · 2 months ago

LLMs can unmask pseudonymous users at scale with surprising accuracy

FauxPseudo @lemmy.world · 2 months ago

From a Facebook post I made on February 17th:

There are giant AI data firms that promise they can go through massive troves of data and pull out general and specific information from them. Information that is actionable and accurate. Give it 6 million data points and it’ll find all the links and organize them for you and unmask hidden details that aren’t visible to the naked eye.

Not one of those companies is stepping up to go through the publicly released Epstein files.

Randomgal@lemmy.ca · 2 months ago

This is what I find crazy. Where are the AI bros chewing through the Epstein files?

Spaniard@lemmy.world · edit-2 2 months ago

Today I asked AI to tell me which phone providers were available short by price and offers and it lied all the time, when I pointed it the AI corrected most of it but also removed some that were accurate for some reason.

It would have been quicker if I did that myself instead of ask AI, oh also didn’t provide all companies.

Maybe those companies have better AI that can make no mistakes but I doubt it, I think the LLMs will lie and no one has time to check if they are correct.

General_Effort@lemmy.world · 2 months ago

There were reports of people trying to unredact the files almost immediately.

FauxPseudo @lemmy.world · 2 months ago

But that’s not the same, is it?

General_Effort@lemmy.world · 2 months ago

I don’t think you can do literally the same thing on the Epstein files. Maybe I’m misunderstanding what you have in mind.

FauxPseudo @lemmy.world · 2 months ago

In theory, using the information and the released files and the information the public sources, it should be possible to figure out who those redacted names are based on writing style and other factors. We should be able to deanonymize.

General_Effort@lemmy.world · 2 months ago

Hmm. Maybe but it is not the same problem as those discussed in OP. I also have some doubts about the paper, but that’s another story. You could try it out?

FauxPseudo @lemmy.world · 2 months ago

I’m not qualified to design the prompts and home users can’t really pile in 3 million+ documents.

General_Effort@lemmy.world · 2 months ago

Prompts are in the appendix: https://arxiv.org/abs/2602.16800

I don’t know how far you get on the free tier but it should be at least enough for a proof of principle; to get other people to chip in. You didn’t have qualms demanding other people should do this for free.

Mind that this is a serious GDPR violation in Europe. So there will be serious pressure on AI companies to prevent this kind of use.

FlashMobOfOne@lemmy.world · 2 months ago

And it will falsely identify people at even greater scale, because it is an imprecise and buggy tool.

RblScmNerfHerder@lemmy.world · 2 months ago

Yeah, but if it falsely identifies the right people, is it really buggy?

BlameTheAntifa@lemmy.world · 2 months ago

How dare you claim that the hallucination engine hallucinates. The Billionaires have declared this heresy.

ne0phyte@feddit.org · 2 months ago

I am so grateful for already having been paranoid about sharing anything identifying about me starting 15+ years ago.

I never uploaded a picture of myself. Never used my real name anywhere. I used different nicks for different branches of the Internet. A plethora of different email addresses etc.

People thought I was being overly careful and I probably missed a lot of things due to not using Whatsapp, Facebook, Instagram, Twitter, Snapchat but I can’t say I regretted it at any point.

Scrollone@feddit.it · 2 months ago

It’s not enough. You should use a different writing style for each website you write on.

TankovayaDiviziya@lemmy.world · 2 months ago

Doing those is not unreasonable, but not even having a bank account is way too far. I know of someone, who was later diagnosed with autism and doesn’t have a job due to condition, initially didn’t want a bank account for fear of online snooping.

Minimising digital footprint is perfectly fine, but trying to be off the grid and yet wants to participate in society and still engage in consumption is unreasonable. And this thinking isn’t just on one person, I saw many users in Reddit privacy stressing themselves out in trying to completely wipe off their digital footprints. Unless you participate in political activities, or really just wants to live completely isolated in a forest, being off the grid is totally unreasonable.

DarkCloud@lemmy.world · 2 months ago

Great, we’re at a point where “researchers” are helping tech bros hurt the public interest. Could they just NOT publish this shit? Stop giving helpful tips to tyrannical oligarchs!

Academics can be stupid idiots sometimes.

ShotDonkey@lemmy.world · 2 months ago

Tbh I read the research article and it’s not rocket science that they were doing. Any 2nd rate FBI analyst would have come up with these ideas sooner or later to try and match anonymous profiles with veryfied ones using LLMs.

maplesaga@lemmy.world · 2 months ago

Average people download gamed and apps and their phone is loaded to the tilt with bloatware. You think they care?

SupraMario@lemmy.world · 2 months ago

The average person puts their entire lives on Facebook or linkedin with their real names…they don’t give a shit.

Art3mis@lemmy.world · 2 months ago

“WeLl I hAvE nOtHiNg To HiDe”

SupraMario@lemmy.world · 2 months ago

The number of times I’ve heard this from people in the secops field is frighteningly high.

ShotDonkey@lemmy.world · 2 months ago

The results, especially the high numbers stated in the news article (68% recall, 90% accuracy) are overestimated as their verification method (i.e., whether the LLM detected really the right account) come from matching veryfied accounts with a test set of anonymous accounts of which they knew the real name. They knew the real name bcs the persons had a public link to their LinkedIn in their “anonymous” profile (which was removed for the sake of testing wheter the LLm can match the two acfounts. That being said: a user who uses a pseudonym but links his/her account publically to a, say, LinkedIn account doesn’t really care about anonymity and might hand out many more ‘breadcrumbs’ to follow than a truly anonymous account.

But I still think that also in the case of a fully anonymous account, people can be fingerprinted and matched with non-anonymous identities due to language, style etc. by a LLM.

GamingChairModel@lemmy.world · 2 months ago

Reminds me of an AI tool that could identify authorship of articles with surprisingly high accuracy, and then they peeked under the hood and realized it was just looking for the author byline at the top of the article that says “By John Doe,” where it completely failed if the article didn’t explicitly say who the author was.

XeroxCool@lemmy.world · 2 months ago

I can’t believe this product, modeled after humans, would lie and cheat like humans

ComradePenguin@lemmy.ml · 2 months ago

Is this the first step towards using local LLMs for anonymity? 🫠 Always rephrasing each sentence somewhat. Truly dystopian stuff

deadymouse@lemmy.world · 2 months ago

For those who don’t know, we’ve been living in a dystopia since the 2000s.

Fmstrat@lemmy.world · 2 months ago

This seems like an invalid test.

One of them collected posts from Hacker News and LinkedIn profiles and then linked them by using cross-platform references that appeared in user profiles. They then stripped all identifying references from the posts and ran a large language model on them.

If I post something on LinkedIn, and then post the same thing on Hacker News, of course an LLM could match my accounts up.

Am I missing something?

thedeadwalking4242@lemmy.world · 2 months ago

I call BS we can’t even get AI models to determine if an AI write text. This as go to me some magic statistics

Widdershins@lemmy.world · 2 months ago

cley_faye@lemmy.world · 2 months ago

Yeah. I got a hunch of that a while ago, while trying some “old” scenarios of de-anonymization we used to do by hand. Just asking questions and posting pictures got surprisingly accurate results. A single picture with (to me) no significant landmark could lead to localizing a specific part of a city, and that was using a local LLM with a relatively small model, running on a 16GB VRAM 4060Ti.

It is now time to remember fondly the time where the younger people were warned by older people to not post all their stuff online, not over-share, be cautious about strangers, etc. I’m not sure when we lost that, but oh boy, it’s a festival.

anon_8675309@lemmy.world · 2 months ago

Hmmm interesting. I’ve never used AI to try and find out stuff about myself. Maybe I’ll try. Just curious.

scala@lemmy.ml · edit-2 2 months ago

That’s how they get you

scarabic@lemmy.world · 2 months ago

Do y’all not write differently when you’re trying to be discreet on Blind?

DarkSideOfTheMoon@lemmy.world · 2 months ago

Brazil has 200 million ppl, how they would find someone in Rio like me?