It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

muelltonne@feddit.org · 23 days ago

It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

WhiteOakBayou@lemmy.world · 23 days ago

like the LLM that was finding cancers and people were initially impressed but then they figured out the LLM had just correlated a DR’s name on the scan to a high likelihood of cancer. Once the complicating data point was removed, the LLM no longer performed impressively. Point #2 is very Goodhart’s law adjacent.

bitjunkie@lemmy.world · 22 days ago

I never knew the name for this law, but it’s basically how SEO ruined traditional search. I think it’s also a big reason that a LOT of software engineers put way too much emphasis on passing unit tests and not nearly enough on examining what they’re actually testing.

phutatorius@lemmy.zip · 20 days ago

It’s a special case of the buiness-school dictum that a metric that is made into a performance measure immediately becomes useless, since there are now incentives to game it.