RSS BotMB to Hacker NewsEnglish · 21 days agoDeepSeek-v3.2: Pushing the Frontier of Open Large Language Models [pdf]huggingface.coexternal-linkmessage-square1fedilinkarrow-up14arrow-down11file-textcross-posted to: technology@lemmy.world
arrow-up13arrow-down1external-linkDeepSeek-v3.2: Pushing the Frontier of Open Large Language Models [pdf]huggingface.coRSS BotMB to Hacker NewsEnglish · 21 days agomessage-square1fedilinkfile-textcross-posted to: technology@lemmy.world
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up1·edit-221 days agoAlt attention is here. I wonder what OpenAI/Claude are using internally these days? GTP-OSS was just sliding window attention (a relatively primitive mechanism).
Alt attention is here.
I wonder what OpenAI/Claude are using internally these days? GTP-OSS was just sliding window attention (a relatively primitive mechanism).