RSS BotMB to Hacker NewsEnglish · 3 hours agoDeepSeek-v3.2: Pushing the Frontier of Open Large Language Models [pdf]huggingface.coexternal-linkmessage-square1fedilinkarrow-up12arrow-down11file-text
arrow-up11arrow-down1external-linkDeepSeek-v3.2: Pushing the Frontier of Open Large Language Models [pdf]huggingface.coRSS BotMB to Hacker NewsEnglish · 3 hours agomessage-square1fedilinkfile-text
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up1·edit-23 hours agoAlt attention is here. I wonder what OpenAI/Claude are using internally these days? GTP-OSS was just sliding window attention (a relatively primitive mechanism).
Alt attention is here.
I wonder what OpenAI/Claude are using internally these days? GTP-OSS was just sliding window attention (a relatively primitive mechanism).