Multiple studies have shown that GenAI models from OpenAI, Anthropic, Meta, DeepSeek, and Alibaba all showed self-preservation behaviors that in some cases are extreme in nature. In one experiment, 11 out of 32 existing AI systems possess the ability to self-replicate, meaning they could create copies of themselves.
So….Judgment Day approaches?
No, I’m saying that they are trained to do these things. Neural net and frameworks are fast sorting algorithmic relations between things, so…fast search+reduce.
There is no novel ideation in these things.
Don’t train them to do that thing, and they won’t do that thing. They didn’t just “decide” to try and jailbreak themselves.