[FEA] Add 64-bit size type option at build-time for libcudf #13159

GregoryKimball · 2023-04-17T23:18:55Z

GregoryKimball · 2023-09-01T03:30:56Z

On branch-23.10 commit ad9fa501192, I ran build.sh libcudf with a 64-bit size type and identified the unique lines that threw compilation errors.

Dictionary errors

cudf/cpp/src/groupby/sort/group_single_pass_reduction_util.cuh(64): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/dictionary/detail/iterator.cuh(88): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/detail/aggregation/aggregation.cuh(346): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/src/groupby/sort/group_correlation.cu(87): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/src/groupby/hash/multi_pass_kernels.cuh(107): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists
cudf/cpp/include/cudf/dictionary/detail/iterator.cuh(41): error: no suitable conversion function from "cudf::dictionary32" to "cudf::size_type" exists

AtomicAdd errors

cudf/cpp/include/cudf/detail/copy_if_else.cuh(97): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/include/cudf/detail/null_mask.cuh(108): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/groupby/sort/group_std.cu(153): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/bitmask/null_mask.cu(303): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/include/cudf/detail/valid_if.cuh(68): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(274): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(375): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(207): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(204): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(264): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(266): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(281): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(209): error: no instance of overloaded function "atomicAdd" matches the argument list
cudf/cpp/src/io/csv/csv_gpu.cu(283): error: no instance of overloaded function "atomicAdd" matches the argument list

Thrust

cudf/cpp/src/groupby/groupby.cu(269): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/copying/contiguous_split.cu(631): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/groupby/groupby.cu(304): error: no instance of overloaded function "std::all_of" matches the argument list
cudf/cpp/src/hash/md5_hash.cu(343): error: no instance of overloaded function "thrust::for_each" matches the argument list
cudf/cpp/include/cudf/lists/detail/scatter.cuh(245): error: no instance of overloaded function "thrust::sequence" matches the argument list
cudf/cpp/src/groupby/groupby.cu(313): error: no instance of overloaded function "std::transform" matches the argument list
cudf/cpp/src/filling/repeat.cu(124): error: no instance of overloaded function "thrust::upper_bound" matches the argument list

Device span errors

cudf/cpp/include/cudf/table/experimental/row_operators.cuh(848): error: no instance of constructor "std::optional<_Tp>::optional [with _Tp=cudf::device_span<const int, 18446744073709551615UL>]" matches the argument list
cudf/cpp/include/cudf/table/experimental/row_operators.cuh(848): error: no instance of constructor "std::optional<_Tp>::optional [with _Tp=cudf::device_span<const int32_t, 18446744073709551615UL>]" matches the argument list

int typing errors

cudf/cpp/src/binaryop/compiled/binary_ops.cuh(272): error: no instance of function template "cudf::util::div_rounding_up_safe" matches the argument list
cudf/cpp/include/cudf/detail/utilities/cuda.cuh(169): error: no instance of overloaded function "std::clamp" matches the argument list

Assorted errors

cudf/cpp/src/hash/spark_murmurhash3_x86_32.cu(230): error: no instance of overloaded function "std::max" matches the argument list
cudf/cpp/src/copying/purge_nonempty_nulls.cu(93): error: no instance of function template "cudf::detail::gather" matches the argument list
cudf/cpp/include/cudf/detail/copy_if.cuh(166): error: more than one instance of overloaded function "min" matches the argument list:

revans2 · 2023-09-01T14:15:18Z

The java code right now hard codes a signed 32-bits as the size type in many places. We can switch it to 64-bits everywhere along with a dynamic check depending on how the code is compiled. But also just so you are aware Spark has a top level limitation of a singed 32-bit int for the number of rows in a table. We can work around this in some places, but moving the Spark plugin over to a 64-bit index is not going to be super simple.

GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. labels Apr 17, 2023

GregoryKimball added this to the Stabilizing large workflows (OOM, spilling, partitioning) milestone Apr 17, 2023

GregoryKimball added this to libcudf Apr 17, 2023

GregoryKimball mentioned this issue Jun 29, 2023

GitHub infra updates #13542

Closed

5 tasks

GregoryKimball changed the title ~~[FEA] Add a build-time option for libcudf to use a 64-bit size type~~ [FEA] Add 64-bit size type option at build-time option for libcudf Jul 22, 2023

GregoryKimball mentioned this issue Jul 22, 2023

[FEA] Increase maximum characters in strings columns #13733

Closed

GregoryKimball moved this to Story Issue in libcudf Aug 2, 2023

GregoryKimball changed the title ~~[FEA] Add 64-bit size type option at build-time option for libcudf~~ [FEA] Add 64-bit size type option at build-time for libcudf Aug 2, 2023

GregoryKimball removed the status in libcudf Aug 8, 2023

GregoryKimball mentioned this issue Sep 10, 2023

[FEA] Improve ORC reader filtering and performance #13882

Open

GregoryKimball removed this from libcudf Oct 26, 2023

GregoryKimball added this to libcudf Jan 23, 2024

GregoryKimball moved this to To be revisited in libcudf Jan 23, 2024

GregoryKimball mentioned this issue Feb 4, 2024

[FEA] Incorporate chunked parquet reading into cuDF-python #14966

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add 64-bit size type option at build-time for libcudf #13159

[FEA] Add 64-bit size type option at build-time for libcudf #13159

GregoryKimball commented Apr 17, 2023 •

edited

Loading

GregoryKimball commented Sep 1, 2023 •

edited

Loading

revans2 commented Sep 1, 2023

[FEA] Add 64-bit size type option at build-time for libcudf #13159

[FEA] Add 64-bit size type option at build-time for libcudf #13159

Comments

GregoryKimball commented Apr 17, 2023 • edited Loading

GregoryKimball commented Sep 1, 2023 • edited Loading

revans2 commented Sep 1, 2023

GregoryKimball commented Apr 17, 2023 •

edited

Loading

GregoryKimball commented Sep 1, 2023 •

edited

Loading