Large-scale online deanonymization with LLMs

Beep@lemmus.org · 3 hours ago

Large-scale online deanonymization with LLMs

doug@lemmy.today · 2 hours ago

I think it was a Reddit scraper years ago that taught me that I should probably lie more often on the internet about my work, friends, family details, etc.

Just like, little lies that don’t really matter in the comment, but would misdirect an AI or investigator into things that aren’t true.

It’s just so much woooooork to think about this shit. And to come up with different screen names everywhere? And to like, sub to a city I don’t live in and comment there about shit I know nothing about? Exhausting.

Thankfully my brothers and three uncles are here to support me. And my alligator.

Insekticus@aussie.zone · 51 minutes ago

Yeah exactly, like if youre 25, say youre 27. Then in another post 24. Youre still around that age, but the exact age is muddied in the waters.

You can also use Americanized spelling in some sentences and or if you’re American, use British English, and become Unamericanised. Say you’re a half-Brit half-American dual citizen even though you’re from South Africa or something.

frongt@lemmy.zip · 2 hours ago

Aha! By posting this comment, I know you don’t have an alligator!

P1nkman@lemmy.world · 2 hours ago

But I do! I know they’re illegal in Denmark, but they seem to love the snow!

DrunkenPirate@feddit.org · 54 minutes ago

That’s funny I do as well. Unfortunately, I flush my alligator in my toilet down into the harbor I live. Now, I bought a green parot. My three sisters love it.

couldhavebeenyou@lemmy.zip · 2 hours ago

Maybe get an AI agent to post misdirections

XLE@piefed.social · 2 hours ago

The doxxing efforts will be funded by venture capital.

What can LLM providers do? Refusal guardrails and usage monitoring can help, but both have significant limitations. Our deanonymization framework splits an attack into seemingly benign tasks – summarizing profiles, computing embeddings, ranking candidates – that individually look like normal usage, making misuse hard to detect. Refusals can be bypassed through task decomposition.

“Guardrails” are a joke and we all know Sam Altman and Elon Musk care about ethics as much as they care about not abusing their siblings or employees.

Old Jimmy Twodicks@sh.itjust.works · 3 hours ago

CerebralHawks@lemmy.dbzer0.com · 2 hours ago

It is absolutely possible to identify users who post a lot on a public forum with a real name (e.g. Facebook or the like) as well as Reddit. So say you have some politician who claims to have X, Y, Z values and a Reddit user who has A, B, and C values that are antonymous to X, Y, and Z. By comparing common phrases, as well as by charting when the two seemingly separate users are online, you could say with reasonable certainty that the two people are one and the same, especially if you prompt them carefully to say the kinds of things they would say about neutral topics on both accounts. It would be hard to get 100% certainty, but you’d be close enough to imply it’s them.

AIs (LLMs) just make it faster.

Don’t post about controversial politics if you also post under your real name. It’s not a matter of “mask yourself better.” There will always be tells.

MalReynolds@slrpnk.net · 2 hours ago

So, pretty much what Meta/Facebook (and the three letter agencies / GovInt) has been doing with deterministic code (like they’re not scraping reddit et.al, including Lemmy) for ages but probabilistic with more errors and new improved hallucination.

Competition, filling in gaps or just looking to be bought out. Evil.

Iconoclast@feddit.uk · 2 hours ago

For the past 10 years or so I’ve pretty much lived under the assumption that at some point someone figures out a system that digs through the entire internet and everything anyone has ever posted gets linked back to them.

At the same time, it’s both great and absolutely horrifying.

What’s horrifying is that everything you’ve ever posted gets linked back to you.

What’s great is that none of it can really be used against you anymore - because we now know that absolutely everyone is a massive hypocrite and nobody is without sin.

Jrockwar@feddit.uk · 2 hours ago

Some really good advice that someone gave me once is that the internet doesn’t exist.

Sure, it obviously does exist, but this was about communication style. When you send an email, you change codes and don’t write in the same way as a WhatsApp - you can expand your points more… But you should never forget you’re talking to a person - just because it’s internet, you shouldn’t talk any different to them.

You shouldn’t assume that the message is anonymous just because it’s internet. You shouldn’t assume certain things are okay “just because it’s internet”.

I don’t think they were 100% right because they were disregarding that code changing between different mediums and audiences is normal (you don’t talk the same way to your boss and your partner, or in written form vs spoken), but I do stand by the point that you shouldn’t change code or make assumptions just because “internet”.