-
Notifications
You must be signed in to change notification settings - Fork 195
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Scan implementation for c.parallel (#3462)
* Make thread_store.cuh NVRTC compilable * Get TileState from KernelSource * Use launcher_factory to get the SM occupancy, PTX version. Implement MaxgridDimX * Put reduce stuff inside `reduce` namespace * Handle PTX compilation in command list * Missing noexcept * Allow passing InitValueT without wrapping in InputValue * Add scan c.parallel API * Add tests for scan c.parallel API * Use fewer items per thread and reinstate LDL/STL check * Move load modifier check to policy * Introduce detail functions to allocate/initialize tile state * Update c.parallel scan_tile_state following c++ refactor * Update cub/cub/thread/thread_store.cuh Co-authored-by: Bernhard Manfred Gruber <[email protected]> * No initialize-then-modify * Use enum rather than bool * Return a std::optional from find_size_t * Annotate arguments with their positions * Minor improvements to command_list * Rename cubin->link_result * Add a TODO for removing extra compile step * Bad merge * Pass thrust path to PTX compile step * Fixes following merge from main * Return error from AliasTemporaries * Fix SFINAE * Store description/payload bytes_per_tile directly in the build obj --------- Co-authored-by: Ashwin Srinath <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]>
- Loading branch information
1 parent
f745c97
commit bc57f2b
Showing
17 changed files
with
1,029 additions
and
137 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// Part of CUDA Experimental in CUDA Core Compute Libraries, | ||
// under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#pragma once | ||
|
||
#ifndef CCCL_C_EXPERIMENTAL | ||
# error "C exposure is experimental and subject to change. Define CCCL_C_EXPERIMENTAL to acknowledge this notice." | ||
#endif // !CCCL_C_EXPERIMENTAL | ||
|
||
#include <cuda.h> | ||
|
||
#include <cccl/c/types.h> | ||
|
||
struct cccl_device_scan_build_result_t | ||
{ | ||
int cc; | ||
void* cubin; | ||
size_t cubin_size; | ||
CUlibrary library; | ||
cccl_type_info accumulator_type; | ||
CUkernel init_kernel; | ||
CUkernel scan_kernel; | ||
size_t description_bytes_per_tile; | ||
size_t payload_bytes_per_tile; | ||
}; | ||
|
||
extern "C" CCCL_C_API CUresult cccl_device_scan_build( | ||
cccl_device_scan_build_result_t* build, | ||
cccl_iterator_t d_in, | ||
cccl_iterator_t d_out, | ||
cccl_op_t op, | ||
cccl_value_t init, | ||
int cc_major, | ||
int cc_minor, | ||
const char* cub_path, | ||
const char* thrust_path, | ||
const char* libcudacxx_path, | ||
const char* ctk_path) noexcept; | ||
|
||
extern "C" CCCL_C_API CUresult cccl_device_scan( | ||
cccl_device_scan_build_result_t build, | ||
void* d_temp_storage, | ||
size_t* temp_storage_bytes, | ||
cccl_iterator_t d_in, | ||
cccl_iterator_t d_out, | ||
unsigned long long num_items, | ||
cccl_op_t op, | ||
cccl_value_t init, | ||
CUstream stream) noexcept; | ||
|
||
extern "C" CCCL_C_API CUresult cccl_device_scan_cleanup(cccl_device_scan_build_result_t* bld_ptr) noexcept; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.