Alignment faking in large language models

🃏Joker@sh.itjust.works · 1 day ago

Alignment faking in large language models

xodoh74984@lemmy.world · 5 hours ago

Here are some techniques for measuring alignment:

By in large, the goals driving LLM alignment are to answer things correctly and in a way that won’t ruffle too many feathers. Any goal driven by human feedback can introduce bias, sure. But as with most of the world, the primary goal of companies developing LLMs is to make money. Alignment targets accuracy and minimal bias, because that’s what the market values. Inaccuracy and biased models aren’t good for business.

eleitl@lemm.ee · 3 hours ago

So you mean “alignment with human expectations”. Not what I was meaning at all. Good that that word doesn’t even mean anything specific these days.