RSS BotMB to Hacker NewsEnglish · 5 hours agoEvals in 2025: benchmarks to build models people can usegithub.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10file-text
arrow-up11arrow-down1external-linkEvals in 2025: benchmarks to build models people can usegithub.comRSS BotMB to Hacker NewsEnglish · 5 hours agomessage-square0fedilinkfile-text