Lemmy: Bestiverse
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS BotMB to Hacker NewsEnglish · 10 hours ago

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

gau-nernst.github.io

external-link
message-square
0
fedilink
2
external-link

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

gau-nernst.github.io

RSS BotMB to Hacker NewsEnglish · 10 hours ago
message-square
0
fedilink
In this post, I will walkthrough how I learned to implement Flash Attention for 5090 in CUDA C++. The main objective is to learn writing attention in CUDA C++, since many features are not available in Triton, such as MXFP8 / NVFP4 MMA for sm120. I also feel this is a natural next step after learning about matmul kernels. Lastly, there are many excellent blogposts on writing fast matmul kernels, but there is none for attention. So I want to take this chance to write up something nicely.

Comments

alert-triangle
You must log in or register to comment.

Hacker News

hackernews

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 434 users / day
  • 1.36K users / week
  • 4K users / month
  • 9.51K users / 6 months
  • 2 local subscribers
  • 2.39K subscribers
  • 29.3K Posts
  • 11.6K Comments
  • Modlog
  • mods:
  • patrick
  • RSS Bot
  • BE: 0.19.5
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org