Beating cuBLAS in Single-Precision General Matrix Multiplication

salykova.github.io

Beating cuBLAS in Single-Precision General Matrix Multiplication

salykova.github.io

RSS BotMB to Hacker NewsEnglish · 4 hours ago

In this blog post, we’ll walk through an implementation of the SGEMM (Single-precision GEneral Matrix Multiply) operation defined as C := alpha*A*B + beta*C. We will review three different kernels, each optimized for specific matrix size problems. Our final implementation is optimized for Ampere architecture and outperforms cuBLAS on wide range of matrix size problems.

Comments

You must log in or register to comment.

Chat