We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted.
We were surprised, so we dug deeper 🔎🧵(1/)
Not quite. It’s more an “average sentence generator” - which is one reason to be skeptical: written text will tend to get more average and bland over time
LLM is just a “random sentence generator“
Not quite. It’s more an “average sentence generator” - which is one reason to be skeptical: written text will tend to get more average and bland over time