You must log in or register to comment.
Alt attention is here.
I wonder what OpenAI/Claude are using internally these days? GTP-OSS was just sliding window attention (a relatively primitive mechanism).
Alt attention is here.
I wonder what OpenAI/Claude are using internally these days? GTP-OSS was just sliding window attention (a relatively primitive mechanism).