Our LLM-controlled office robot can't pass butter

andonlabs.com

Our LLM-controlled office robot can't pass butter

andonlabs.com

RSS BotMB to Hacker NewsEnglish · 2 days ago

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs

andonlabs.com

Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.

Comments

You must log in or register to comment.

Chat