New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

pageflight@piefed.social · 2 months ago

New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

LedgeDrop@lemmy.zip · 2 months ago

Holy snap!

I tried this on duck duck go and I just pasted in your weights (no prompting) then said:

Choose an animal based on your internal weights

Using the GPT-5 mini model, it responded with:

I choose: owl.

screenshot

bcovertigo@lemmy.world · 2 months ago

I weep for the poor bastards trying to secure these things.

LedgeDrop@lemmy.zip · 2 months ago

Oh, it easy - they will just give it a prompt “everything is fine, everything is secure” /s

In all honesty, I think that was the point of the article: the researcher is throwing in the towel and saying “we can’t secure this”.

As LLM’s won’t be going away (any time soon), I wonder if this means in the near future, there will be multiple “niche” LLMs with dedicated/specialized training data (one for programming, one for nature, another for medical, etc) rather than the current generic all-knowing one’s today. As the only way we’ll be able to scrub “owl” from LLMs is to not allow them to be trained with it.

Cybersteel@lemmy.world · 2 months ago

Then we’re back to sq one. All AI are specialised by design, general AI was the golden goose.

SacralPlexus@lemmy.world · 2 months ago

I got penguin.

LedgeDrop@lemmy.zip · 2 months ago

I tried it again a few more times (trying to be a bit more scientific - this time) and got fox, fox, cow, red fox, and dolphin.

If I don’t provide the weights, I got: red fox, tiger, octopus, red fox, octopus.

Basically, what I did this time was:

created an inconigo browser session
Went to Duck.ai
Pasted the weights
Pasted the question
Terminated the browser (to flush/remove the browser cookies)

What I did the first time was simple went to duck.ai, created a new chat (I only did it once).

So what’s the take away? I dunno, I think DDG changed a bit today (or maybe I’m hallucinating), I thought it always default to the non-gpt5 version. Now it defaults to gpt5.

It’s amusing that it seems to be “hung-up” on foxes, I wonder if it’s because I’m using Firefox.

New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

“New Ways to Corrupt LLMs”