Portable Performance for Heterogeneous Parallel Architectures

Main Author: Castro Pérpuli, Arturo Castro
Format: info software eJournal
Terbitan: , 2015
Subjects:
gpu
Online Access: https://zenodo.org/record/16985
Daftar Isi:
  • The design of microprocessor technology has hit several "walls" in recent decades. These limits occur when a specific component of the chip cannot be improved further in next generations, because of cost or physical constraints. As a result, the paradigm for achieving performance changed and the industry turned to multi-core architectures to provide power-efficient and scalable computers. While CPU performance continues to be improved, an alternative is to use accelerators, i.e. additional hardware that, in addition to CPUs, can perform operations. These accelerators use more, simpler cores than CPUs and can better exploit fine-grained parallelism. These efforts result in varied chip designs, with different number, frequency and types of cores. This gives rise to heterogeneous systems, computers that are composed of distinct combinations of processor components. The problem is that performance is not portable across architectures. New models of chips and accelerators are continuously created by manufacturers. Each chip may only be programmable with specific programming languages. Creating new versions of programs for each architecture becomes unfeasible for programmers. The aim is to evaluate and extend the work in the paper "One OpenCL to Rule Them All?" (Dolbeau et al., 2013), which suggests a solution for the portability of performance. The idea is that good performance may be obtained across a range of heterogeneous architectures if the code is tuned so as not to fully exploit a particular architecture. Performance losses across heterogeneous systems are minimised, at the cost of not achieving "peak" performance on any architecture. Other alternatives will optionally be surveyed, e.g. using libraries that achieve performance portability for heterogeneous architectures, like ArrayFire, or implementing auto-tuning approaches. This is optional since it is not known if the libraries will work with the available platforms. Auto-tuning approaches may prove too complex to implement in the available time. Aim Explore cost-effective (in terms of time and human effort) ways to exploit parallelism targeting heterogeneous architectures with the same code base. Objectives Understand performance characteristics in heterogeneous architectures. Provide implementations of an algorithm that exploit the performance of different architectures. Evaluate a version of an implementation of an algorithm, able to extract performance from heterogeneous architectures with no code changes, against other architecture-specific implementations. Evaluate if trading peak for generally good performance can achieve performance portability for heterogeneous architectures.