🃏Joker@sh.itjust.works to Technology@lemmy.worldEnglish · 2 days agoAlignment faking in large language modelswww.anthropic.comexternal-linkmessage-square12fedilinkarrow-up175arrow-down17cross-posted to: hackernews
arrow-up168arrow-down1external-linkAlignment faking in large language modelswww.anthropic.com🃏Joker@sh.itjust.works to Technology@lemmy.worldEnglish · 2 days agomessage-square12fedilinkcross-posted to: hackernews
minus-squareeleitl@lemm.eelinkfedilinkEnglisharrow-up1arrow-down1·9 hours agoSo you mean “alignment with human expectations”. Not what I was meaning at all. Good that that word doesn’t even mean anything specific these days.
So you mean “alignment with human expectations”. Not what I was meaning at all. Good that that word doesn’t even mean anything specific these days.