Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_ERROR_MISALIGNED_ADDRESS when calling ti.math.vec3 inside ti.func, only occurs sometimes. #8660

Open
NathanaelJ opened this issue Mar 11, 2025 · 0 comments

Comments

@NathanaelJ
Copy link

Describe the bug
I have a function that initializes a variable using ti.math.vec3. It works fine 99% of the time, but I recently made a change to my program and called that function from a new kernel. It behaved as expected when running on CPU and Metal, but when I run it on CUDA, it fails with CUDA_ERROR_MISALIGNED_ADDRESS. I can fix the error by initializing the variable implicitly (using another vec3 object).

I am unsure why it only fails when I call from a specific kernel. I have rebooted my machine, cleared Taichi cache, disabled Taichi offline cache, cleaned python cache, enabled debug=True, and added ti.sync() between almost every line in the problematic kernel. None of those changes fixed anything.

To Reproduce
Unfortunately, I can't post all of my code as it is part of a commercial project. I have included snippets that should provide some example of the issue. I don't have time to replicate the issue in more simply standalone code at the moment, sorry.

# Non-problematic kernel - error does not occur when nearest_surface is called
@ti.kernel
def snap_fit(self, indx: ti.i32, cellId: ti.i32): # type: ignore
  p, _, _ = nearest_surface(cellId, self.pts[indx], self.surf, 0)
  snapVector = self.pts[indx] - p  # Line to be filled with points
  # Reposition points
  for i in range(indx):
      self.pts[i] = p + ((i/(indx)) * snapVector)

# Problematic kernel - error does occur when nearest_surface is called
@ti.kernel
def surface_snap_fit(self, cellId: ti.i32, p: ti.math.vec3): # type: ignore
    # First store results in local variables
    closest_pt, _, cell_id = nearest_surface(cellId, p, self.surf, 0)
    # Then update fields separately
    self.pts[0] = closest_pt
    self.root_surf_cell[0] = cell_id

# Problematic function
@ti.func
def nearest_surface(cellId: ti.i32, p: ti.math.vec3, surf, arcPt: ti.i32): # surf is a ti.data_oriented object
    # Initialize variables
    max_neighbors = surf.maxNeighborCells[0]
    minDistSqr = 1e23
    closestPoint = ti.math.vec3(0.0, 0.0, 0.0) # The error occurs here
    # Function continues...

# Fixed function
@ti.func
def nearest_surface(cellId: ti.i32, p: ti.math.vec3, surf, arcPt: ti.i32): # surf is a ti.data_oriented object
    # Initialize variables
    max_neighbors = surf.maxNeighborCells[0]
    minDistSqr = 1e23
    closestPoint = p # Initializing using the vector 'p' solves this issue
    # Function continues...

Log/Screenshots
Taichi initialized withdebug=False, since this allowed debug print statements to be used

<program operating normally, until function call from problematic kernel>

[E 03/11/25 15:26:55.272 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)


2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - WARNING - Simulation 0 failed! Exception: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - INFO - Launching simulation 1. Rank 0 50.0% complete
[E 03/11/25 15:26:55.278 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling malloc_async_impl (cuMemAllocAsync)


2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - WARNING - Arc 1 failed! Exception: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling malloc_async_impl (cuMemAllocAsync)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - INFO - Rank 0 executions complete!
[E 03/11/25 15:26:55.299 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling module_load_data_ex (cuModuleLoadDataEx)


Traceback (most recent call last):
  File "/home/naj20/program_Taichi/program.py", line 195, in <module>
    local_prob = surface.prob.to_numpy()
  File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/util.py", line 351, in wrapped
    return func(*args, **kwargs)
  File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/field.py", line 307, in to_numpy
    tensor_to_ext_arr(self, arr)
  File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 1113, in wrapped
    return primal(*args, **kwargs)
  File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 1045, in __call__
    return self.launch_kernel(kernel_cpp, *args)
  File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 976, in launch_kernel
    raise e from None
  File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 971, in launch_kernel
    prog.launch_kernel(compiled_kernel_data, launch_ctx)
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling module_load_data_ex (cuModuleLoadDataEx)
[E 03/11/25 15:26:55.330 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)


[E 03/11/25 15:26:55.330 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling mem_free (cuMemFree_v2)


terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
[pop-os:06155] *** Process received signal ***
[pop-os:06155] Signal: Aborted (6)
[pop-os:06155] Signal code:  (-6)
[pop-os:06155] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7cd905842520]
[pop-os:06155] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7cd9058969fc]
[pop-os:06155] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7cd905842476]
[pop-os:06155] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7cd9058287f3]
[pop-os:06155] [ 4] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4957daf)[0x7cd8a5b57daf]
[pop-os:06155] [ 5] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4956426)[0x7cd8a5b56426]
[pop-os:06155] [ 6] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4956491)[0x7cd8a5b56491]
[pop-os:06155] [ 7] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x19563bb)[0x7cd8a2b563bb]
[pop-os:06155] *** End of error message ***
Aborted (core dumped)

Additional comments
Here is the output of ti.diagnose

[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]
system: linux
executable: /usr/bin/python3
platform: Linux-6.8.0-76060800daily20240311-generic-x86_64-with-glibc2.35
architecture: 64bit ELF
uname: uname_result(system='Linux', node='pop-os', release='6.8.0-76060800daily20240311-generic', version='#202403110203~1715181801~22.04~aba43ee SMP PREEMPT_DYNAMIC Wed M', machine='x86_64')
locale: en_US.UTF-8
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/bin
PYTHONPATH: ['/usr/local/bin', '/home/naj20/BWB/2_SWEEPZ_Sims/P0', '/usr/local/bin', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/home/naj20/.local/lib/python3.10/site-packages', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages']

No LSB modules are available.
Distributor ID:	Pop
Description:	Pop!_OS 22.04 LTS
Release:	22.04
Codename:	jammy



import: <module 'taichi' from '/home/naj20/.local/lib/python3.10/site-packages/taichi/__init__.py'>

cpu: True
metal: False
opengl: True
cuda: True
vulkan: True

`glewinfo` not available: [Errno 2] No such file or directory: 'glewinfo'

Tue Mar 11 15:30:46 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-PCIE-16GB           Off |   00000000:01:00.0 Off |                    0 |
| N/A   34C    P0             37W /  250W |       1MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce GTX 1080        Off |   00000000:05:00.0 Off |                  N/A |
| 27%   29C    P8              5W /  180W |       2MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12

[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
[Taichi] Starting on arch=x64

[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
[Taichi] Starting on arch=opengl

[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
[Taichi] Starting on arch=cuda

[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                   TAICHI EXAMPLES                                    
 ──────────────────────────────────────────────────────────────────────────────────── 
  0: ad_gravity               25: karman_vortex_street    50: patterns                
  1: circle_packing_image     26: keyboard                51: pbf2d                   
  2: comet                    27: laplace                 52: physarum                
  3: cornell_box              28: laplace_equation        53: poisson_disk_sampling   
  4: diff_sph                 29: mandelbrot_zoom         54: print_offset            
  5: differential_evolution   30: marching_squares        55: rasterizer              
  6: euler                    31: mass_spring_3d_ggui     56: regression              
  7: eulerfluid2d             32: mass_spring_game        57: sdf_renderer            
  8: explicit_activation      33: mass_spring_game_ggui   58: simple_derivative       
  9: export_mesh              34: mciso_advanced          59: simple_texture          
  10: export_ply              35: mgpcg                   60: simple_uv               
  11: export_videos           36: mgpcg_advanced          61: snow_phaseField         
  12: fem128                  37: minimal                 62: stable_fluid            
  13: fem128_ggui             38: minimization            63: stable_fluid_ggui       
  14: fem99                   39: mpm128                  64: stable_fluid_graph      
  15: fractal                 40: mpm128_ggui             65: taichi_bitmasked        
  16: fractal3d_ggui          41: mpm3d                   66: taichi_dynamic          
  17: fullscreen              42: mpm3d_ggui              67: taichi_logo             
  18: game_of_life            43: mpm88                   68: taichi_ngp              
  19: gui_image_io            44: mpm88_graph             69: taichi_sparse           
  20: gui_widgets             45: mpm99                   70: texture_graph           
  21: implicit_fem            46: mpm_lagrangian_forces   71: tutorial                
  22: implicit_mass_spring    47: nbody                   72: two_stream_instability  
  23: initial_value_problem   48: odop_solar              73: vortex_rings            
  24: jacobian                49: oit_renderer            74: waterwave               
 ──────────────────────────────────────────────────────────────────────────────────── 
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.10s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 2.81s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Untriaged
Development

No branches or pull requests

1 participant