Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[regression] 0.3.29 test failure on ARMV6 #5104

Open
cdluminate opened this issue Jan 31, 2025 · 15 comments
Open

[regression] 0.3.29 test failure on ARMV6 #5104

cdluminate opened this issue Jan 31, 2025 · 15 comments

Comments

@cdluminate
Copy link

openblas 0.3.29 fails to build on Debian's armhf machine: https://buildd.debian.org/status/fetch.php?pkg=openblas&arch=armhf&ver=0.3.29%2Bds-1&stamp=1738175608&raw=0

It seems to be a regression in the cblas_stbmv function:

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
           EXPECTED RESULT   COMPUTED RESULT
       1      0.157343          0.127373    
       2       0.00000           0.00000    
       3      0.447053          0.447053    
       4      0.277223          0.277223    
       5      0.806693          0.806693    
 ******* cblas_stbmv  FAILED ON CALL NUMBER:
    580: cblas_stbmv (    CblasUpper,  CblasNoTrans,     CblasUnit,
            5,  0, A,  2, X,-1) .

 ******* FATAL ERROR - TESTS ABANDONED *******
ERROR STOP 

Debian uses a modified source tree. But I can reproduce this problem with the upstream git repo:

TARGET=ARMV6 make

The error message reads:

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******                                                                                                                                            
           EXPECTED RESULT   COMPUTED RESULT                                                                                                                                                                         
       1      0.245164          0.245164                                                                                                                                                                             
       2       0.00000           0.00000                                                                                                                                                                             
       3      0.539071          0.539071                                                                                                                                                                             
       4       1.22516           1.22516                                                                                                                                                                             
       5      0.601097E-01      0.374625E-01                                                                                                                                                                         
 ******* cblas_stbmv  FAILED ON CALL NUMBER:                                                                                                                                                                         
    582: cblas_stbmv (    CblasUpper,  CblasNoTrans,  CblasNonUnit,                                                                                                                                                  
            5,  0, A,  2, X, 1) .                                                                                                                                                                                    
 ******* cblas_stbmv  FAILED ON CALL NUMBER:                                                                                                                                                                         
      2: cblas_stbmv (    CblasUpper,  CblasNoTrans,     CblasUnit,                                                                                                                                                  
            1,  0, A,  2, X, 1) .

 ******* FATAL ERROR - TESTS ABANDONED *******
ERROR STOP 

This is a regression because 0.3.28 does not fail the test. I checked the diff between 0.3.28 and 0.3.29 but did not find anything straightforward. Any idea?

@cdluminate
Copy link
Author

cdluminate commented Jan 31, 2025

It seems to be a threading issue:

$ ./xscblat2 < sin2
[...]
 cblas_stbmv  PASSED THE TESTS OF ERROR-EXITS                                                                                                                    
                                                                                                                                                                 
 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******                                                                                        
           EXPECTED RESULT   COMPUTED RESULT                                                                                                                     
       1      0.127373          0.646853                                                                                                                         
       2      0.343563E-02      0.343563E-02                                                                                                                     
       3      0.317183          0.317183                                                                                                                         
       4      0.566511          0.566511                                                                                                                         
       5      0.463458          0.463458                                                                                                                         
 ******* cblas_stbmv  FAILED ON CALL NUMBER:                                                                                                                     
    661: cblas_stbmv (    CblasLower,    CblasTrans,     CblasUnit,                                                                                              
            5,  1, A,  3, X,-2) .
 ******* cblas_stbmv  FAILED ON CALL NUMBER:
      2: cblas_stbmv (    CblasUpper,  CblasNoTrans,     CblasUnit,
            1,  0, A,  2, X, 1) .

 ******* FATAL ERROR - TESTS ABANDONED *******
ERROR STOP 

But the test can pass in serial mode:

OPENBLAS_NUM_THREADS=1 ./xscblat2 < sin2

Update: seems to be pthread related. OpenMP is fine:

make TARGET=ARMV6 USE_OPENMP=1 FCOMMON_OPT='-frecursive -fopenmp'

@martin-frbg
Copy link
Collaborator

Curious that you would get an error in tbmv (only). Did you change compilers between 0.3.28 and 29 perchance ?

@cdluminate
Copy link
Author

I did not change the compiler. It's gcc-14 14.2.0-16 from Debian unstable.

@martin-frbg
Copy link
Collaborator

I see - still seems weird that it is STBMV of all functions. I won't be able to reproduce this or eventually locate the commit that caused this until late next week. (Nothing immediately obvious that wouldn't also mess up all of GEMM)

@martin-frbg
Copy link
Collaborator

I'm increasingly certain that none of the files/functions relevant for multithreaded TBMV were changed between 0.3.28 and 0.3.29... could be this is an older bug, a missing memory barrier somewhere or something like that, that has only a low probability to happen. Is the armhf machine an actual hardware or something like qemu ?

@cdluminate
Copy link
Author

cdluminate commented Feb 2, 2025

The armhf machine is a real arm64 machine with Neoverse-N1 CPU. (Debian's infrastructure is all real machines. I tested on Debian's porterbox which is also real machine).

Recently I have seen some regression bugs in gcc-14 which leads to internal compiler error when compiling pytorch (downgrading to gcc-13 solves the issue). Here, with the speculation that it might be something wrong from the toolchain side, I tested with gcc-13 but the same issue persists.

# cblas_stbmv is still failing at git HEAD (c139b63342b3e089b6507d45f31f062a7fbe6dcc)
make TARGET=ARMV6 CC=gcc-13 CXX=g++-13 FC=gfortran-13

@martin-frbg
Copy link
Collaborator

Docker container running a 32bit OS on the armv8 hardware ?

@cdluminate
Copy link
Author

It's schroot (https://wiki.debian.org/Schroot), a kind of chroot. It is also the backend of Debian's official build machines (through sbuild). I guess running this in docker may get something similar.

I'm trying to do a bisect. Will update later.

@martin-frbg
Copy link
Collaborator

I'm on a train with variable network quality, but from checking file dates and commit logs anything remotely relevant would have happened before 0.3.28.

@cdluminate
Copy link
Author

My local git bisect result is

# first bad commit: [d9f368dfe6a9e96807d3860b96d9b30471583dc9] TST: Signal abort for ctest failures correctly

By looking at d9f368d, I agree with you that the problem is introduced somewhere in the past.

@xmirya
Copy link

xmirya commented Feb 2, 2025

Might be unrelated, but: OpenBLAS got recently updated to 0.3.29 in FreeBSD ports as well, and i'm experiencing a similar threading/timing test failure building it locally (via FBSD ports infrastructure), but it's amd64:

...
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat2 < ./sblat2.dat
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat3 < ./sblat3.dat
Note: The following floating-point exceptions are signalling: IEEE_DIVIDE_BY_ZERO
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat2 < ./dblat2.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat3 < ./cblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat2 < ./cblat2.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat3 < ./zblat3.dat
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat2 < ./zblat2.dat
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=2 ./sblat3 < ./sblat3.dat
Note: The following floating-point exceptions are signalling: IEEE_DIVIDE_BY_ZERO
OMP_NUM_THREADS=2 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
rm -f ?BLAT2.SUMM
OMP_NUM_THREADS=2 ./sblat2 < ./sblat2.dat

Program received signal SIGBUS: Access to an undefined portion of a memory object.

Backtrace for this error:
#0  0x824e20339 in ???
#1  0x824e1f465 in ???
#2  0x8220e746f in ???
#3  0x8220e6a3a in ???
#4  0x82157f2d2 in ???
#5  0x82f6f24ea in _Unwind_ForcedUnwind
        at /usr/ports/lang/gcc13/work/gcc-13.3.0/libgcc/unwind.inc:215
#6  0x8220de21b in ???
#7  0x8220de191 in ???
#8  0x8220de03a in ???
#9  0x8220ddb29 in ???
#10  0xffffffffffffffff in ???

./sblat2 < ./sblat2.dat fails with OMP_NUM_THREADS=2 (or any non-1 value), but passes with OMP_NUM_THREADS=1.

The compiler is gcc13, it fails the same with OpenMP on or off, regardless of -O level.
Another clue is that the official FreeBSD package (from the same port/recipe) is already available for this update, which means the same port/recipe was successfully built by the official FreeBSD build cluster - likely much newer/faster hardware than my local (quite dated) machine.

@martin-frbg
Copy link
Collaborator

@xmirya probably unrelated given the difference in architectures, but what is your hardware please (the optimized BLAS kernels are different for individual cpu models) ? (SIGBUS instead of simply producing bad results could mean an access to unaligned data where the instruction used requires data alignment)

@xmirya
Copy link

xmirya commented Feb 8, 2025

@xmirya probably unrelated given the difference in architectures, but what is your hardware please (the optimized BLAS kernels are different for individual cpu models) ? (SIGBUS instead of simply producing bad results could mean an access to unaligned data where the instruction used requires data alignment)

It's i5-2410M, it has SSE* and AVX, but no AVX2 or higher, the build is done with

MAKE_NB_JOBS=-1
NUM_THREADS=64
USE_THREAD=1
NO_AVX2=1
NO_AVX512=1
USE_OPENMP=1
BINARY=64

added to Makefile.rules (disabling OpenMP and/or adding NO_AVX=1 makes no difference)

@martin-frbg
Copy link
Collaborator

@xmirya that would appear to be a standard Sandy Bridge target - I cannot reproduce the failure on Ryzen5 but can take an actual SandyBridge system out of storage sometime next week

@xmirya
Copy link

xmirya commented Feb 9, 2025

@xmirya that would appear to be a standard Sandy Bridge target - I cannot reproduce the failure on Ryzen5 but can take an actual SandyBridge system out of storage sometime next week

Thanks, would be grateful if you could

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants