-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled runs fail #2520
Comments
This comment was marked as outdated.
This comment was marked as outdated.
I have being pointed that the Checking the latest github runner ubuntu 22.04 OS image: and the one published at the beginning of October: I see no difference in
I have seen however some differences:
But I can really relate any of those differences with the current issue. |
I have been pointed to check the lib requirements using `ldd. Output details
[root@451ff7711f64 linx64]# ldd ansys.e
./ansys.e: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by ./ansys.e)
./ansys.e: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ./ansys.e)
./ansys.e: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ./ansys.e)
libansBLAS.so => not found
libmkl_core.so => not found
libmkl_intel_lp64.so => not found
libmkl_intel_thread.so => not found
libifport.so.5 => not found
libifcoremt.so.5 => not found
libimf.so => not found
libsvml.so => not found
libirc.so => not found
libiomp5.so => not found
libhdf5.so.103 => not found
libhdf5_cpp.so.103 => not found
libhdf5_hl.so.100 => not found
libhdf5_hl_cpp.so.100 => not found
libACE.so.7.0.2 => not found
libACEXML.so.7.0.2 => not found
libACEXML_Parser.so.7.0.2 => not found
libMapdlExceptionClient.so => not found
libTAO.so.3.0.2 => not found
libTAO_AnyTypeCode.so.3.0.2 => not found
libTAO_BiDirGIOP.so.3.0.2 => not found
libTAO_CodecFactory.so.3.0.2 => not found
libTAO_PortableServer.so.3.0.2 => not found
libz.so => not found
libpng.so => not found
libtiff.so => not found
libjpeg.so => not found
libboost_filesystem.so.1.71.0 => not found
libboost_system.so.1.71.0 => not found
libgmp.so.10 => /lib64/libgmp.so.10 (0x00007fffffb54000)
libansGPU.so => not found
libansuser.so => not found
libansys.so => not found
libansScaLAPACK.so => not found
libansHDF.so => not found
libansMemManager.so => not found
libansMPI.so => not found
libansysb.so => not found
libansysx.so => not found
libmnf.so => not found
libansOpenMP.so => not found
libansMETIS.so => not found
libansParMETIS.so => not found
libcadoe_algorithms.so => not found
libCadoeInterpolation.so => not found
libCadoeKernel.so => not found
libCadoeLegacy.so => not found
libCadoeMath.so => not found
libCadoeReaders.so => not found
libCadoeReadersExt.so => not found
libcgns.so => not found
libchap.so => not found
libcif.so => not found
libdsp.so => not found
libansgil.so => not found
libqhull.so => not found
libansexb.so => not found
libApipWrapper.so => not found
liboctree-mesh.so => not found
libansResourcePredict.so => not found
libtg.so => not found
libPrimeMesh.so => not found
libansOpenSSL.so => not found
libvtk.so => not found
libspooles.so => not found
libdmumps.so => not found
libzmumps.so => not found
libGL.so.1 => /lib64/libGL.so.1 (0x00007fffff8bc000)
libGLU.so.1 => /lib64/libGLU.so.1 (0x00007fffff63b000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fffff41f000)
libm.so.6 => /lib64/libm.so.6 (0x00007fffff11d000)
libXp.so.6 => /lib64/libXp.so.6 (0x00007ffffef13000)
libXm.so.4 => /lib64/libXm.so.4 (0x00007ffffea40000)
libXext.so.6 => /lib64/libXext.so.6 (0x00007ffffe82e000)
libXi.so.6 => /lib64/libXi.so.6 (0x00007ffffe61d000)
libXt.so.6 => /lib64/libXt.so.6 (0x00007ffffe3b6000)
libX11.so.6 => /lib64/libX11.so.6 (0x00007ffffe078000)
libSM.so.6 => /lib64/libSM.so.6 (0x00007ffffde6f000)
libICE.so.6 => /lib64/libICE.so.6 (0x00007ffffdc53000)
libXmu.so.6 => /lib64/libXmu.so.6 (0x00007ffffda38000)
librt.so.1 => /lib64/librt.so.1 (0x00007ffffd82e000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ffffd526000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ffffd310000)
libintlc.so.5 => not found
libc.so.6 => /lib64/libc.so.6 (0x00007ffffcf41000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ffffcd3d000)
libGLX.so.0 => /lib64/libGLX.so.0 (0x00007ffffcb0a000)
libGLdispatch.so.0 => /lib64/libGLdispatch.so.0 (0x00007ffffc854000)
/lib64/ld-linux-x86-64.so.2 (0x00007fffffddc000)
libXau.so.6 => /lib64/libXau.so.6 (0x00007ffffc650000)
libXft.so.2 => /lib64/libXft.so.2 (0x00007ffffc439000)
libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007ffffc1e4000)
libpng15.so.15 => /lib64/libpng15.so.15 (0x00007ffffbfb9000)
libxcb.so.1 => /lib64/libxcb.so.1 (0x00007ffffbd90000)
libuuid.so.1 => /lib64/libuuid.so.1 (0x00007ffffbb8b000)
libfontconfig.so.1 => /lib64/libfontconfig.so.1 (0x00007ffffb948000)
libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007ffffb689000)
libXrender.so.1 => /lib64/libXrender.so.1 (0x00007ffffb47e000)
libz.so.1 => /lib64/libz.so.1 (0x00007ffffb267000)
libexpat.so.1 => /lib64/libexpat.so.1 (0x00007ffffb03d000)
libbz2.so.1 => /lib64/libbz2.so.1 (0x00007ffffae2d000) It seems many libs are not found... then I did: [root@451ff7711f64 linx64]# ldd ansys.e | grep libgomp
./ansys.e: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by ./ansys.e)
./ansys.e: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ./ansys.e)
./ansys.e: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ./ansys.e) I don't understand how grep shows that... I guess it is jst a echo printed by ldd? Anyway, I dont really know yet what's going on... the docker immage seems to be using gcc 3.4, ... Maybe I should install that so it can be found?
I start the container with this: https://github.com/ansys/pymapdl/blob/main/.ci/start_mapdl.sh There was a new version of the docker runtime around end of October... https://docs.docker.com/engine/release-notes/24.0/#2407 But I can't get anything from the changelog..... |
Maybe this goes to stderr, not stdout? |
I guess 🤷♂️ |
@germa89
By the version searched it seems that ansys.e has been built with GCC 8 or 10, and so can't use libstdc++ from GCC 4.* |
from internal investigations made by @dts12263
@germa89 : If v21.2.0 does not have |
Hi @germa89, about the libansBlas.a file missing in the container I'm a bit surprised, but if it's the case we should discuss with MAPDL Devops Team to fix this. I've checked in my local dev distrib and it's part of the repo. |
Hi @FredAns The failing images were created in October 2021... and they have been working properly until beginning of this November. No changes in our side. Can it be the github runners?? |
Could it be loaded as part of a branch that only executes when specific hardware is present? |
Ok I had a better look. At runtime we are suppose to pick the right one, depending on the machine we run on. |
on my machine these libansBlas.so are located here: /ansys_inc/v242/ansys/lib/linx64/blas/ |
On the v221 container (I installed
are we just missing the AMD variant? |
@germa89 can you run a |
Done! Details
They are AMD!!! |
So I guess that the github runners have moved from intel to AMD? ... I could not find anything in internet regarding that change. Probably the missing |
To confirm this, you could try spinning up some cloud instance of each type (Intel / AMD), and try running the MAPDL docker image on both. |
I would guess Github doesn't generally communicate which hardware Actions runs on, to avoid creating specific assumptions / expectations based on that. |
I haven't seen any option in github to choose AMD/Intel ... 🤷🏻♂️ |
@jomadec Not exactly, MAPDL does ship gcc 8, but not in the same executable location. The mapdl executable always runs under a wrapper script that sets LD_LIBRARY_PATH to the location of gcc runtime. This is what the landing zone concept by @jhdub23 is meant to solve. |
What I meant is to launch an AMD / Intel VM on <cloud provider of choice>, not through Github Actions. If the same error occurs when launching the MAPDL container there, we can be fairly confident this is the underlying change that triggered these failures. Of course you can also use a local machine, if you have an AMD one. |
[like] Frederic Thevenon reacted to your message:
…________________________________
From: German ***@***.***>
Sent: Monday, December 4, 2023 10:07:31 AM
To: ansys/pymapdl ***@***.***>
Cc: Frederic Thevenon ***@***.***>; Mention ***@***.***>
Subject: Re: [ansys/pymapdl] Scheduled runs fail (Issue #2520)
[External Sender]
@dts12263<https://github.com/dts12263> has been able to replicate the issue:
confirmed the 212 image runs on an intel machine but crashed on an AMD machine because of not having the AMD ansblas
Thank you for your input @greschd<https://github.com/greschd> @FredAns<https://github.com/FredAns> @koubaa<https://github.com/koubaa> @jomadec<https://github.com/jomadec> and @dts12263<https://github.com/dts12263> . We couldn't have workout this without you!
—
Reply to this email directly, view it on GitHub<#2520 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANDDKUVTSYRYFERKY5ZPEQLYHWOGHAVCNFSM6AAAAAA7U5X2MKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZYGIYTONRUGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
If anyone is interested, I recently got this information from running codespaces:
Funny my codespace was using only 1 physical processor. |
Problem
Suddenly some docker images are requiring a library (either
libgomp.so
orlibansBLAS.so
) to launch MAPDL. However, the docker images have not been changed in 9 months, and they have been working fine until now.Details
I first saw this error with the ubuntu docker images (which are old too, like 9 months). The
libgomp
issue on Ubuntu docker images was reported and fixed here: #2514The solution was installing
libgomp
dependency during the job.But then, @clatapie realised it seems to also affect the older MAPDL docker images (<v23.1). Newer docker images are not affected because that library is installed already (@dts12263 for more info).
This issue has been running on since beginning of November (between 01 and 06 November), but I didn't realise until now.
Notes
I should notice that the ubuntu docker images are used to run the test from inside that container.
Whereas the older docker images are most based on CentOS. We do run the tests in the GitHub runner OS (ubuntu) and connect to the running container with the Ansys product (CentOS).
Why this error now?
Definitely a container is not a 100% isolated environment from the host OS. They do share some dependencies (kernel?), so maybe the Github Runners do not have those dependencies anymore. I have tracked that there was new Github Runners images published at the end of October.
If it is a missing dependency on the runners, installing that dependency (it does not need to be
libgomp
, it might have another name) should fix it.However, I believe
libansBLAS
is a custom ansys library, so we cannot just install it.It does not make sense at all!
The text was updated successfully, but these errors were encountered: