Are you running your own programmatic LLMs? There is a thing called temperature, and it is typically more lenient for public facing LLMs. But leverage that same LLM via APIs and you can adjust the temperature and reduce or eliminate hallucinations.
Ultimately, a little variance (creativity) is somewhat good and passing it through levels of agentic validations can help catch hallucinations and dial in the final results.
That said, I doubt the WH did this, they probably just dumped shit into some crappy public-facing ChatGPT model.
We tried Grok for some things at a job once and it was absolutely the worst of all of the major ones (for what we were doing). It hallucinated way too much even when attempting to get it not to.
Are you running your own programmatic LLMs? There is a thing called temperature, and it is typically more lenient for public facing LLMs. But leverage that same LLM via APIs and you can adjust the temperature and reduce or eliminate hallucinations.
Ultimately, a little variance (creativity) is somewhat good and passing it through levels of agentic validations can help catch hallucinations and dial in the final results.
That said, I doubt the WH did this, they probably just dumped shit into some crappy public-facing ChatGPT model.
It’s Grok, it’s always Grok. Musk was there, there’s no way they weren’t dumping terabytes of government data into it in pursuit of some mythical AGI.
Last I heard, they use Llama-2 since its the only one approved for GOV work.
But its probably Grok because they don’t seem keen on approvals
We tried Grok for some things at a job once and it was absolutely the worst of all of the major ones (for what we were doing). It hallucinated way too much even when attempting to get it not to.
So you’re probably right.
Interesting, I haven’t played much with the APIs, I just started messing with running some locally with ollama