2
Multiword matrix multiplication over large finite fields in floating-point arithmetic
arxiv.orgThis article is concerned with the efficient computation of modular matrix multiplication C=AB mod p, a key kernel in computer algebra. We focus on floating-point arithmetic, which allows for using efficient matrix multiplication libraries. However, the existing approach is limited to primes p with bitsize at most half the mantissa size (e.g., 26 bits with double precision arithmetic), and becomes quite inefficient when p approaches this limit. We present a new approach that overcomes this limitation and can efficiently handle primes with larger bitsizes. The key idea is to use multiword decompositions, which represent A and B as scaled sums of u and v matrices (words) with smaller coefficients. We provide a rigorous analysis that proves the correctness of this approach for suitably chosen scaling parameters. Our analysis determines the maximum bitsize of p that can be handled for a given number of words; in particular, we show that decomposing in two words each input suffices to handle bitsizes almost equal to the full mantissa size (e.g., the 26 bits limit is raised to 52 bits in double precision arithmetic). Moreover, we show that (1,v) decompositions with v>1 are also of interest to handle intermediate bitsizes. We perform an extensive experimental analysis for various matrix shapes and prime bitsizes. Our performance benchmarks on both CPU and GPU architectures confirm the efficiency of the proposed approach, which can outperform the existing single word approach for bitsizes as low as 23, and can handle bitsizes as high as 52 while retaining high performance.
You must log in or register to comment.

