Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange error in test: Diff too big #72

Open
sukamenev opened this issue Apr 3, 2024 · 5 comments
Open

Strange error in test: Diff too big #72

sukamenev opened this issue Apr 3, 2024 · 5 comments

Comments

@sukamenev
Copy link

sukamenev commented Apr 3, 2024

On OpenCL CPU. After update OpenCL runtime I see another error like error in other test script:

Mean 1d
Accessing device #0:AMD EPYC 7542 32-Core Processor                 on Intel(R) CPU Runtime for OpenCL(TM) Applications
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
         y 0.000000
        x0 0.000000
Mean 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
         y 0.000000
        x0 0.000000
Mean 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
         y 0.000000
        x0 0.000000
Mean 2d squeeze
torch.Size([3])
torch.Size([3])
         y 0.000000
        x0 0.000000
Mean all squeeze
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
Sum 1d
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
         y 0.000000
        x0 0.000000
Sum 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
         y 0.000000
        x0 0.000000
Sum 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
         y 0.000000
        x0 0.000000
Sum 2d squeeze
torch.Size([3])
torch.Size([3])
         y 0.000000
        x0 0.000000
LogSoftmax
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
LogSoftmax
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
         y 0.000000
        x0 0.000000
Softmax
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
NLLLoss
torch.Size([])
torch.Size([])
tensor(0.0413, grad_fn=<NllLossBackward0>)
tensor(0.0418)
         y 0.000469
        x0 0.000000
AAPool2d
torch.Size([4, 8, 1, 1])
torch.Size([4, 8, 1, 1])
         y 0.000000
        x0 0.000000
Abs
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Abs_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardtanh
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardtanh_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Sigmoid
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
Sigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
Hardsigmoid
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardsigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
ReLU
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
ReLU_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LReLu
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LReLU_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Tanh
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Tanh_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
SiLU
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
SiLU_
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
GELU
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
GELU tanh
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
BCE Loss
torch.Size([])
torch.Size([])
        x0 0.000001
        x1 0.000000
         y 0.000000
BCE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
        x0 0.000001
         y 0.000000
        x1 0.000000
MSE Loss
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
        x1 0.000000
MSE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
         y 0.000000
        x0 0.000000
        x1 0.000000
Min
Ok
Max
Ok
Dot
Ok
Clamp 1
Ok
Clamp 2
Ok
Clamp 3
Ok
Linear 2d
    p_bias 0.000000
         y 0.000000
        x0 0.000000
  p_weight 0.000000
Linear 3d
    p_bias 0.000000
         y 0.000000
        x0 0.000000
  p_weight 0.000000
Conv
Traceback (most recent call last):
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 282, in <module>
    test_all(r.device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 254, in test_all
    test_fwd_bwd_op([([2,6,10,20],-1)],torch.nn.Conv2d(6,8,[3,5],stride=[1,2],padding=[1,2],dilation=1,groups=2),device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 74, in test_fwd_bwd_op
    y_cpu.backward(dy_cpu,retain_graph=True)
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: could not create a primitive descriptor iterator
@sukamenev
Copy link
Author

On AMD OpenCL (AMDAPPSDK-3.0) another error:

python tests/test_op.py --device privateuseone:2
Mean 1d
Accessing device #2:AMD EPYC 7542 32-Core Processor on AMD Accelerated Parallel Processing
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
tensor([[[-0.2863, -0.1444,  1.4827, -0.2142],
         [ 0.9526, -1.2787,  0.7404, -0.3989],
         [ 0.8163,  0.2142,  0.2852,  0.8597]]], grad_fn=<MeanBackward1>)
tensor([[[1.4019, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]]])
         y 1.688240
        x0 0.000000
Traceback (most recent call last):
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 282, in <module>
    test_all(r.device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 158, in test_all
    test_fwd_bwd([([2,3,4],-1)],lambda x:torch.mean(x,dim=0,keepdim=True),device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 153, in test_fwd_bwd
    raise Exception("Diff too big")
Exception: Diff too big

max_diff = 1.9810690879821777

@sukamenev
Copy link
Author

On AMD OpenCL (from amdgpu-pro) also error in the end of test:

Mean 1d
Accessing device #3:Fiji on AMD Accelerated Parallel Processing
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
         y 0.000000
        x0 0.000000
Mean 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
        x0 0.000000
         y 0.000000
Mean 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
         y 0.000000
        x0 0.000000
Mean 2d squeeze
torch.Size([3])
torch.Size([3])
         y 0.000000
        x0 0.000000
Mean all squeeze
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
Sum 1d
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
         y 0.000000
        x0 0.000000
Sum 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
         y 0.000000
        x0 0.000000
Sum 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
         y 0.000000
        x0 0.000000
Sum 2d squeeze
torch.Size([3])
torch.Size([3])
         y 0.000000
        x0 0.000000
LogSoftmax
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LogSoftmax
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
        x0 0.000000
         y 0.000000
Softmax
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
NLLLoss
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
AAPool2d
torch.Size([4, 8, 1, 1])
torch.Size([4, 8, 1, 1])
         y 0.000000
        x0 0.000000
Abs
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Abs_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardtanh
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardtanh_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Sigmoid
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Sigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardsigmoid
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardsigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
ReLU
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
ReLU_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LReLu
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LReLU_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Tanh
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
Tanh_
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
SiLU
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
SiLU_
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
GELU
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
GELU tanh
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
BCE Loss
torch.Size([])
torch.Size([])
        x0 0.000058
         y 0.000000
        x1 0.000000
BCE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
        x0 0.000008
         y 0.000000
        x1 0.000000
MSE Loss
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
        x1 0.000000
MSE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
         y 0.000000
        x0 0.000000
        x1 0.000000
Min
Ok
Max
Ok
Dot
Ok
Clamp 1
Ok
Clamp 2
Ok
Clamp 3
Ok
Linear 2d
  p_weight 0.000000
    p_bias 0.000000
         y 0.000000
        x0 0.000000
Linear 3d
  p_weight 0.000002
    p_bias 0.000000
         y 0.000000
        x0 0.000000
Conv
Traceback (most recent call last):
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 282, in <module>
    test_all(r.device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 254, in test_all
    test_fwd_bwd_op([([2,6,10,20],-1)],torch.nn.Conv2d(6,8,[3,5],stride=[1,2],padding=[1,2],dilation=1,groups=2),device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 74, in test_fwd_bwd_op
    y_cpu.backward(dy_cpu,retain_graph=True)
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: could not create a primitive descriptor iterator

@artyom-beilis
Copy link
Owner

Sorry for late reply... For some reason missed it.

What pytorch version and what is the GPU are you using?

@sukamenev
Copy link
Author

I'm using PyTorch version 1.13.1 and Amd Fury

Mean 1d
Accessing device #1:AMD Radeon R9 Fury Series (radeonsi, fiji, LLVM 17.0.6, DRM 3.57, 6.8.9-calculate) on rusticl
.......
Sum 2d squeeze
torch.Size([3])
torch.Size([3])
y 0.000000
x0 0.000000
LogSoftmax
LLVM ERROR: Cannot select: 0x7feb044c5610: f32 = and 0x7feb044c54c0, Constant:i32<2147483647>
0x7feb044c54c0: f32 = bitcast 0x7feb040d5410
0x7feb040d5410: i32,ch = CopyFromReg 0x5562cb846890, Register:i32 %14
0x7feb044a2570: i32 = Register %14
0x7feb044c3120: i32 = Constant<2147483647>
In function: main
Аварийный останов

@artyom-beilis
Copy link
Owner

1st CPU is not supported

on rusticl

From my experience rusticl is horrible buggy. It crashes from my on rx560. Try AMD rocm opencl driver or Mesa driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants