Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture
Main Author: | K. Akbudak |
---|---|
Other Authors: | C.Aykanat |
Format: | info publication-workingpaper Journal |
Terbitan: |
, 2014
|
Subjects: | |
Online Access: |
https://zenodo.org/record/822711 |
Daftar Isi:
- In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matrix multiplication (SpMM) algorithms for the Xeon Phi architecture. We discuss the trade-offs between these two parallelization schemes for the Xeon Phi architecture. We also propose two hypergraph-partitioning-based matrix partitioning and row/column reordering methods that achieve temporal locality in these two parallelization schemes. Both HP models try to minimize the total number of transfers from/to the memory while maintaining balance on computational loads of threads. The experimental results performed for realistic SpMM instances show that the Intel MIC architecture has the potential for attaining high performance in irregular applications, as well as regular applications. However, intelligent data and computation reordering that considers better utilization of temporal locality should be developed for attaining high performance in irregular applications.