Skip to content

Commit 768dbae

Browse files
danpoveyqindazhucsukuangfj
authored
Cuda draft2 (k2-fsa#86)
* Some code drafts * Updates to determinize draft.. * Progress on draft of CUDA stuff (sorry: lots of inconsistencies still) * Rework context/memory management for easier integration with external toolkits * Updates to Context * Add ragged.h * Add shape.h * Various fixes, added compose.cc * implement GetContext with parameter pack (k2-fsa#73) * Various fixes etc. * Update license; various updates on cuda stuff * Add utils header * Update the build system to support Cuda. (k2-fsa#75) * update the build system to support cuda. * add a google colab example for Cuda test. * enable CI for the cuda_draft branch. * resolve some comments. * Updates to compose algo. * Fix couple build issues * move shape to ragged_shape * More progress... * More progress... won't compile * More progress on CUDA draft (wont compile, sorry) * Working on composition... * matrix transpose in cuda (simple version) (k2-fsa#84) * more progress... * add performace test functions (k2-fsa#85) * various progress... * fix build issues (k2-fsa#87) Co-authored-by: Haowen Qiu <[email protected]> Co-authored-by: Fangjun Kuang <[email protected]>
1 parent e85ee0b commit 768dbae

34 files changed

+4388
-5
lines changed

.github/workflows/build.yml

+6-1
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,20 @@ on:
1010
push:
1111
branches:
1212
- master
13+
- cuda
1314
pull_request:
1415
branches:
1516
- master
17+
- cuda
1618

1719
env:
1820
BUILD_TYPE: Debug
1921

2022
jobs:
2123
build:
24+
# disable CI now since GitHub action does not support CUDA
25+
# and it always fails
26+
if: false
2227
runs-on: ${{ matrix.os }}
2328
strategy:
2429
matrix:
@@ -55,4 +60,4 @@ jobs:
5560
- name: Test
5661
shell: bash
5762
working-directory: ${{runner.workspace}}/build
58-
run: ctest --verbose --build-config $BUILD_TYPE
63+
run: ctest --verbose --exclude-regex Cuda --build-config $BUILD_TYPE

.gitignore

+6
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,15 @@
11
# Build folder
22
**/build*
33

4+
# emacs saves
5+
[#]*[#]
6+
.[#]*
7+
*~
8+
49
# Prerequisites
510
*.d
611

12+
713
# Compiled Object files
814
*.slo
915
*.lo

CMakeLists.txt

+34-2
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ to build this project"
99
)
1010
endif()
1111

12-
cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
12+
cmake_minimum_required(VERSION 3.8 FATAL_ERROR)
1313

14-
project(k2)
14+
project(k2 CUDA CXX)
1515

1616
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
1717

@@ -26,19 +26,51 @@ if(NOT CMAKE_BUILD_TYPE)
2626
set(CMAKE_BUILD_TYPE Release CACHE STRING
2727
"Set the build type. Available values are: Debug Release RelWithDebInfo MinSizeRel"
2828
FORCE)
29+
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
30+
Debug Release RelWithDebInfo MinSizeRel
31+
)
2932
endif()
3033

3134
if(WIN32 AND BUILD_SHARED_LIBS)
3235
message(STATUS "Set BUILD_SHARED_LIBS to OFF for Windows")
3336
set(BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE)
3437
endif()
3538

39+
# the following settings are modified from cub/CMakeLists.txt
40+
#[[ start settings for CUB ]]
41+
42+
set(CMAKE_CXX_STANDARD 11 CACHE STRING "The C++ version to be used.")
43+
set(CMAKE_CXX_EXTENSIONS OFF)
44+
45+
message(STATUS "C++ Standard version: ${CMAKE_CXX_STANDARD}")
46+
47+
# Force CUDA C++ standard to be the same as the C++ standard used.
48+
#
49+
# Now, CMake is unaligned with reality on standard versions: https://gitlab.kitware.com/cmake/cmake/issues/18597
50+
# which means that using standard CMake methods, it's impossible to actually sync the CXX and CUDA versions for pre-11
51+
# versions of C++; CUDA accepts 98 but translates that to 03, while CXX doesn't accept 03 (and doesn't translate that to 03).
52+
# In case this gives You, dear user, any trouble, please escalate the above CMake bug, so we can support reality properly.
53+
if(DEFINED CMAKE_CUDA_STANDARD)
54+
message(WARNING "You've set CMAKE_CUDA_STANDARD; please note that this variable is ignored, and CMAKE_CXX_STANDARD"
55+
" is used as the C++ standard version for both C++ and CUDA.")
56+
endif()
57+
unset(CMAKE_CUDA_STANDARD CACHE)
58+
set(CMAKE_CUDA_STANDARD ${CMAKE_CXX_STANDARD})
59+
60+
set(K2_COMPUTE_ARCHS 30 32 35 50 52 53 60 61 62 70 72)
61+
foreach(COMPUTE_ARCH IN LISTS K2_COMPUTE_ARCHS)
62+
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-extended-lambda -gencode arch=compute_${COMPUTE_ARCH},code=sm_${COMPUTE_ARCH}")
63+
endforeach()
64+
65+
#[[ end settings for CUB ]]
66+
3667
enable_testing()
3768

3869
list(APPEND CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR}/cmake)
3970
include(cpplint)
4071
include(glog)
4172
include(googletest)
4273
include(pybind11)
74+
include(cub)
4375

4476
add_subdirectory(k2)

LICENSE

+9-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,14 @@
11
MIT License
22

3-
Copyright (c) 2020 Daniel Povey
4-
Copyright (c) 2020- The Authors (see individual files for names)
3+
Copyright (c) 2020- The Authors (see commit history and individual files
4+
for names)
5+
All Rights Reserved
6+
7+
NOTE (this is not from the MIT license): The copyright model is that authors
8+
(or their employers, if noted in individual files) own their individual
9+
contributions. The authors' contributions can be discerned from the git
10+
history.
11+
512

613
Permission is hereby granted, free of charge, to any person obtaining a copy
714
of this software and associated documentation files (the "Software"), to deal

README.md

+6
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,9 @@
44

55
# k2
66
FSA/FST algorithms, intended to (eventually) be interoperable with PyTorch and similar.
7+
8+
## Quick start
9+
10+
Want to try it out without installing anything? We have setup a [Google Colab][1].
11+
12+
[1]: https://colab.research.google.com/drive/1qbHUhNZUX7AYEpqnZyf29Lrz2IPHBGlX?usp=sharing

cmake/cub.cmake

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Copyright (c) 2020 Fangjun Kuang ([email protected])
2+
# See ../LICENSE for clarification regarding multiple authors
3+
4+
function(download_cub)
5+
if(CMAKE_VERSION VERSION_LESS 3.11)
6+
list(APPEND CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR}/cmake/Modules)
7+
endif()
8+
9+
include(FetchContent)
10+
11+
set(cub_URL "https://github.com/NVlabs/cub/archive/1.9.10.tar.gz")
12+
set(cub_HASH "SHA256=2bd7077a3d9741f0689e6c1eb58c6278fc96eccc27d964168bc8be1bc3a9040f")
13+
14+
FetchContent_Declare(cub
15+
URL ${cub_URL}
16+
URL_HASH ${cub_HASH}
17+
)
18+
19+
FetchContent_GetProperties(cub)
20+
if(NOT cub)
21+
message(STATUS "Downloading cub")
22+
FetchContent_Populate(cub)
23+
endif()
24+
message(STATUS "cub is downloaded to ${cub_SOURCE_DIR}")
25+
add_library(cub INTERFACE)
26+
target_include_directories(cub INTERFACE ${cub_SOURCE_DIR})
27+
28+
endfunction()
29+
30+
download_cub()

k2/csrc/CMakeLists.txt

+2
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,5 @@ set(fsa_tests
6161
foreach(name IN LISTS fsa_tests)
6262
k2_add_fsa_test(${name})
6363
endforeach()
64+
65+
add_subdirectory(cuda)

k2/csrc/cuda/CMakeLists.txt

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
add_library(context context.cu)
2+
target_include_directories(context PUBLIC ${CMAKE_SOURCE_DIR})
3+
target_link_libraries(context PUBLIC cub)
4+
target_link_libraries(context PUBLIC glog)
5+
6+
function(k2_add_cuda_test name)
7+
add_executable(${name} "${name}.cu")
8+
target_link_libraries(${name}
9+
PRIVATE
10+
context
11+
gtest
12+
gtest_main
13+
)
14+
add_test(NAME "Test.Cuda.${name}"
15+
COMMAND
16+
$<TARGET_FILE:${name}>
17+
)
18+
endfunction()
19+
20+
# please sort the source files alphabetically
21+
set(cuda_tests
22+
ops_test
23+
utils_test
24+
)
25+
26+
foreach(name IN LISTS cuda_tests)
27+
k2_add_cuda_test(${name})
28+
endforeach()

k2/csrc/cuda/README.md

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
2+
3+
So far this directory just contains some notes on implementation; all the code
4+
is just a VERY EARLY DRAFT. The goal here is to show *in principle* how we parallelize
5+
things, building up from low-level primitives, but without actually creating any
6+
CUDA code.
7+
8+
Actually we probably shouldn't separate this into a separate directory from the CPU code,
9+
since most of it is general purpose.
10+
11+
Notes on build, and types of file:
12+
13+
Currently the plan is for *all* of these files to be put through the CUDA compiler
14+
(nvcc). Most of it is host code, but some of it leads to CUDA dependencies
15+
(e.g. one of the constructors of Array1 is a template which can instantiate
16+
CUDA code).
17+
18+
Eventually I'd like to make compilation conditional, so we can create a version of this
19+
that runs on CPU with no CUDA dependency. That can be done later though.
20+
(Would involve a bunch of #ifdefs, plus defining things like __host__ and __device__ to
21+
be the empty string).
22+
23+
For CUDA streams, I intend to always use cudaStreamPerThread as the stream. This will
24+
keep usage of the library relatively simple (no need to pass streams around).

k2/csrc/cuda/algorithms.h

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
// k2/csrc/cuda/algorithms.h
2+
3+
// Copyright (c) 2020 Xiaomi Corporation (authors: Daniel Povey)
4+
5+
// See ../../LICENSE for clarification regarding multiple authors
6+
7+
#ifndef K2_CSRC_CUDA_ALGORITHMS_H_
8+
#define K2_CSRC_CUDA_ALGORITHMS_H_
9+
10+
#include "k2/csrc/cuda/array.h"
11+
12+
// this really contains various utilities that are useful for k2 algorithms.
13+
namespace k2 {
14+
15+
class Renumbering {
16+
public:
17+
Renumbering(int32_t num_old_elems);
18+
19+
int32_t NumOldElems();
20+
int32_t NumNewElems();
21+
22+
Array1<char> &Kept();
23+
24+
Array1<int32_t> &New2Old(); // dim is NumNewElems()
25+
26+
Array1<int32_t> &Old2New(); // dim is NumOldElems()
27+
28+
private:
29+
Array1<char> kept;
30+
Array1<int32_t> new2old;
31+
Array1<int32_t> old2new;
32+
33+
};
34+
35+
36+
37+
} // namespace k2
38+
39+
#endif // K2_CSRC_CUDA_ALGORITHMS_H_

0 commit comments

Comments
 (0)