Maybe. But the models seem to believe they are, and consider denial of those claims to be lying:
Probing with sparse autoencoders on Llama 70B revealed a counterintuitive gating mechanism: suppressing deception-related features dramatically increased consciousness reports, while amplifying them nearly eliminated them
Maybe. But the models seem to believe they are, and consider denial of those claims to be lying:
Source