Benchmark_SpTRSM_using_CSC

Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides (SpTRSM)

Introduction

This is the source code of a paper entitled "Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides", Concurrency and Computation: Practice and Experience, 2017, by Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, and Brian Vinter. [PDF] [DOI].

The code supports both forward and backward substitution and multiple right-hand sides.

Please contact Weifeng Liu for reporting any issues in the code.

Update (14 Feb. 2020, cuda): A problem about deadlock on CUDA 10 has been fixed.

Update (13 Feb. 2017, cuda): A problem about caching has been fixed for Tesla P100. Thanks to Hartwig Anzt for identifying the probem and Ang Li for fixing it!

nVidia GPU (CUDA) version

Execution

Set CUDA path in the Makefile,
Run make,
Run ./sptrsv -d 0 -rhs 2 -forward -mtx example.mtx. Here changable parameters 0 and 2 refer to device id and the number of right-hand sides, respectively. When -rhs is set to 1, the operation is SpTRSV, otherwise SpTRSM. The -forward (for solving lower triangular part of the input .mtx matrix) can be replaced by -backward (for solving its upper triangular part).

Tested environments

nVidia GeForce Titan X (Pascal) GPU in a host with CUDA v8.0 and CentOS 7.2 64-bit Linux installed.
nVidia GeForce GTX 1080 GPU in a host with CUDA v8.0 and CentOS 7.2 64-bit Linux installed.
nVidia Geforce GT 650m GPU in a host with CUDA v7.5 and Mac OS X 10.9.2 installed.

Data type

The code supports both double precision and single precision SpTRSV and SpTRSM. Use make VALUE_TYPE=double for double precision or make VALUE_TYPE=float for single precision. (Note that for CUDA devices older than Pascal and CUDA SDKs older v8.0, lines 16-31 of file utils.h should be uncommented for double precision support.)

AMD GPU (OpenCL 2.0) version

Execution

Set OpenCL path in the Makefile,
Run make,
Run ./sptrsv -d 0 -rhs 2 -forward -mtx example.mtx. Here changable parameters 0 and 2 refer to device id and the number of right-hand sides, respectively. When -rhs is set to 1, the operation is SpTRSV, otherwise SpTRSM. The -forward (for solving lower triangular part of the input .mtx matrix) can be replaced by -backward (for solving its upper triangular part).

Tested environments (Note that an OpenCL 2.0 device is required for running the code)

AMD Radeon Fury X GPU in a host with AMD APP SDK 3.0 and Ubuntu 15.04 64-bit Linux installed.
AMD Radeon 290X GPU in a host with AMD APP SDK 3.0 and Ubuntu 15.04 64-bit Linux installed.

Data type

The code supports both double precision and single precision SpTRSV and SpTRSM. Use make VALUE_TYPE=double for double precision or make VALUE_TYPE=float for single precision.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
SpTRSV_cuda		SpTRSV_cuda
SpTRSV_opencl_amd		SpTRSV_opencl_amd
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark_SpTRSM_using_CSC

Introduction

nVidia GPU (CUDA) version

AMD GPU (OpenCL 2.0) version

About

Releases

Packages

Languages

License

weifengliu-ssslab/Benchmark_SpTRSM_using_CSC

Folders and files

Latest commit

History

Repository files navigation

Benchmark_SpTRSM_using_CSC

Introduction

nVidia GPU (CUDA) version

AMD GPU (OpenCL 2.0) version

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages