Multiplatform Matrix Multiplication Kernels

burn.dev

Multiplatform Matrix Multiplication Kernels

burn.dev

RSS BotMB to Hacker NewsEnglish · 1 month ago

State-of-the-Art Multiplatform Matrix Multiplication Kernels

burn.dev

We implemented a sophisticated matrix multiplication engine in CubeCL that rivals the performance of cuBLAS and CUTLASS while supporting a wider range of GPUs. Leveraging double buffering, tensor cores, and vectorization, it compiles seamlessly to CUDA, ROCm, WebGPU, Metal, and Vulkan backends without relying on proprietary or third-party binaries. Matrix multiplication is central to modern AI workloads, especially transformers, and optimizing it ourselves was essential to enable kernel fusion and achieve state-of-the-art performance across platforms in a deep learning framework.

Comments

You must log in or register to comment.

Chat