A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors - Network and Parallel Computing
Conference Papers Year : 2013

A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors

Kai Zhang
Shuming Chen
  • Function : Author
  • PersonId : 1006883
Wei Liu

Abstract

The LU decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data (SIMD) technology has been a popular method to accelerate the LU decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU decomposition mapped onto SIMD processors. This paper proposes a fine-grained pipelined implementation of LU decomposition on SIMD processors. The fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.
Fichier principal
Vignette du fichier
978-3-642-40820-5_4_Chapter.pdf (215.53 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01513757 , version 1 (25-04-2017)

Licence

Identifiers

Cite

Kai Zhang, Shuming Chen, Wei Liu, Xi Ning. A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors. 10th International Conference on Network and Parallel Computing (NPC), Sep 2013, Guiyang, China. pp.39-48, ⟨10.1007/978-3-642-40820-5_4⟩. ⟨hal-01513757⟩
85 View
483 Download

Altmetric

Share

More