You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I have a function that initializes a variable using ti.math.vec3. It works fine 99% of the time, but I recently made a change to my program and called that function from a new kernel. It behaved as expected when running on CPU and Metal, but when I run it on CUDA, it fails with CUDA_ERROR_MISALIGNED_ADDRESS. I can fix the error by initializing the variable implicitly (using another vec3 object).
I am unsure why it only fails when I call from a specific kernel. I have rebooted my machine, cleared Taichi cache, disabled Taichi offline cache, cleaned python cache, enabled debug=True, and added ti.sync() between almost every line in the problematic kernel. None of those changes fixed anything.
To Reproduce
Unfortunately, I can't post all of my code as it is part of a commercial project. I have included snippets that should provide some example of the issue. I don't have time to replicate the issue in more simply standalone code at the moment, sorry.
# Non-problematic kernel - error does not occur when nearest_surface is called@ti.kerneldefsnap_fit(self, indx: ti.i32, cellId: ti.i32): # type: ignorep, _, _=nearest_surface(cellId, self.pts[indx], self.surf, 0)
snapVector=self.pts[indx] -p# Line to be filled with points# Reposition pointsforiinrange(indx):
self.pts[i] =p+ ((i/(indx)) *snapVector)
# Problematic kernel - error does occur when nearest_surface is called@ti.kerneldefsurface_snap_fit(self, cellId: ti.i32, p: ti.math.vec3): # type: ignore# First store results in local variablesclosest_pt, _, cell_id=nearest_surface(cellId, p, self.surf, 0)
# Then update fields separatelyself.pts[0] =closest_ptself.root_surf_cell[0] =cell_id# Problematic function@ti.funcdefnearest_surface(cellId: ti.i32, p: ti.math.vec3, surf, arcPt: ti.i32): # surf is a ti.data_oriented object# Initialize variablesmax_neighbors=surf.maxNeighborCells[0]
minDistSqr=1e23closestPoint=ti.math.vec3(0.0, 0.0, 0.0) # The error occurs here# Function continues...# Fixed function@ti.funcdefnearest_surface(cellId: ti.i32, p: ti.math.vec3, surf, arcPt: ti.i32): # surf is a ti.data_oriented object# Initialize variablesmax_neighbors=surf.maxNeighborCells[0]
minDistSqr=1e23closestPoint=p# Initializing using the vector 'p' solves this issue# Function continues...
Log/Screenshots
Taichi initialized withdebug=False, since this allowed debug print statements to be used
<program operating normally, until function call from problematic kernel>
[E 03/11/25 15:26:55.272 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - WARNING - Simulation 0 failed! Exception: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - INFO - Launching simulation 1. Rank 0 50.0% complete
[E 03/11/25 15:26:55.278 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling malloc_async_impl (cuMemAllocAsync)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - WARNING - Arc 1 failed! Exception: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling malloc_async_impl (cuMemAllocAsync)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - INFO - Rank 0 executions complete!
[E 03/11/25 15:26:55.299 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling module_load_data_ex (cuModuleLoadDataEx)
Traceback (most recent call last):
File "/home/naj20/program_Taichi/program.py", line 195, in <module>
local_prob = surface.prob.to_numpy()
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/util.py", line 351, in wrapped
return func(*args, **kwargs)
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/field.py", line 307, in to_numpy
tensor_to_ext_arr(self, arr)
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 1113, in wrapped
return primal(*args, **kwargs)
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 1045, in __call__
return self.launch_kernel(kernel_cpp, *args)
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 976, in launch_kernel
raise e from None
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 971, in launch_kernel
prog.launch_kernel(compiled_kernel_data, launch_ctx)
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling module_load_data_ex (cuModuleLoadDataEx)
[E 03/11/25 15:26:55.330 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)
[E 03/11/25 15:26:55.330 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling mem_free (cuMemFree_v2)
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
[pop-os:06155] *** Process received signal ***
[pop-os:06155] Signal: Aborted (6)
[pop-os:06155] Signal code: (-6)
[pop-os:06155] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7cd905842520]
[pop-os:06155] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7cd9058969fc]
[pop-os:06155] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7cd905842476]
[pop-os:06155] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7cd9058287f3]
[pop-os:06155] [ 4] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4957daf)[0x7cd8a5b57daf]
[pop-os:06155] [ 5] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4956426)[0x7cd8a5b56426]
[pop-os:06155] [ 6] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4956491)[0x7cd8a5b56491]
[pop-os:06155] [ 7] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x19563bb)[0x7cd8a2b563bb]
[pop-os:06155] *** End of error message ***
Aborted (core dumped)
Additional comments
Here is the output of ti.diagnose
Describe the bug
I have a function that initializes a variable using
ti.math.vec3
. It works fine 99% of the time, but I recently made a change to my program and called that function from a new kernel. It behaved as expected when running on CPU and Metal, but when I run it on CUDA, it fails withCUDA_ERROR_MISALIGNED_ADDRESS
. I can fix the error by initializing the variable implicitly (using another vec3 object).I am unsure why it only fails when I call from a specific kernel. I have rebooted my machine, cleared Taichi cache, disabled Taichi offline cache, cleaned python cache, enabled
debug=True
, and addedti.sync()
between almost every line in the problematic kernel. None of those changes fixed anything.To Reproduce
Unfortunately, I can't post all of my code as it is part of a commercial project. I have included snippets that should provide some example of the issue. I don't have time to replicate the issue in more simply standalone code at the moment, sorry.
Log/Screenshots
Taichi initialized with
debug=False
, since this allowed debugprint
statements to be usedAdditional comments
Here is the output of
ti.diagnose
The text was updated successfully, but these errors were encountered: