We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted.
We were surprised, so we dug deeper 🔎🧵(1/)
This is why LLM’s at their current point are fairly useless except to quickly rewrite some copy-text or sth.
I study numismatics and frequently have to research, for example, Roman emperors and what coins they minted. O4 creates these extremely slick-looking charts with info that, at first glance, seem to contain absolutely every detail you could possibly dream of.
Until you try to verify that information with actual facts. Entire paragraphs made up of whole cloth. Sounds 100% acceptable to anyone without more than passing knowledge of the subject, but will not fool actual experts.
This is dangerous in my opinion. You can feel like you have all the knowledge at your fingertips, but it’s actually just fucking lies. If I were to do all my research via ChatGPT and would accept its answers as truth, and publish a book based on that, it would (I hope) get absolutely critically panned by experts in the field because it would be filled to the brim with inconsistencies and half-truths that just “sound good”.
That meme about a “digital dumbass who is constantly wrong” rings completely true to me.
This is why LLM’s at their current point are fairly useless except to quickly rewrite some copy-text or sth. I study numismatics and frequently have to research, for example, Roman emperors and what coins they minted. O4 creates these extremely slick-looking charts with info that, at first glance, seem to contain absolutely every detail you could possibly dream of.
Until you try to verify that information with actual facts. Entire paragraphs made up of whole cloth. Sounds 100% acceptable to anyone without more than passing knowledge of the subject, but will not fool actual experts. This is dangerous in my opinion. You can feel like you have all the knowledge at your fingertips, but it’s actually just fucking lies. If I were to do all my research via ChatGPT and would accept its answers as truth, and publish a book based on that, it would (I hope) get absolutely critically panned by experts in the field because it would be filled to the brim with inconsistencies and half-truths that just “sound good”.
That meme about a “digital dumbass who is constantly wrong” rings completely true to me.