Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Handle meson based python-wheel #6454

Open
wants to merge 48 commits into
base: master
Choose a base branch
from

Conversation

th0ma7
Copy link
Contributor

@th0ma7 th0ma7 commented Feb 16, 2025

Description

Handle meson based python-wheel

Fixes:

Checklist

  • Build rule all-supported completed successfully
  • New installation of package completed successfully
  • Package upgrade completed successfully (Manually install the package again)
  • Package functionality was tested
  • Any needed documentation is updated/created

Type of change

  • Bug fix
  • New Package
  • Package update
  • Includes small framework changes
  • This change requires a documentation update (e.g. Wiki)

@th0ma7 th0ma7 self-assigned this Feb 16, 2025
@th0ma7
Copy link
Contributor Author

th0ma7 commented Feb 16, 2025

Note that this is not ready for testing yet... still early WIP, simply ported in a new PR my pending changes from my local branch.

@th0ma7 th0ma7 mentioned this pull request Feb 16, 2025
6 tasks
@hgy59
Copy link
Contributor

hgy59 commented Feb 17, 2025

additional info:

  1. Supprted DSM

    building for comcerto2k-7.1 shows error:
    meson.build:28:4: ERROR: Problem encountered: NumPy requires GCC >= 8.4

    DSM < 7 will not be supported and we have to add to numpy Makefile something like:

    # numpy requires gcc >= 8.4
    REQUIRED_MIN_DSM = 7.0
    UNSUPPORTED_ARCHS = comcerto2k
    
  2. cython not found
    when adding this to cross/numpy/Makefile:
    ENV += PATH=$(WORK_DIR)/crossenv-default/build/bin:$(PATH)
    it can find cython and sucessfully builds the wheel for x64

    it still fails for evansport, aarch64 and armv7:
    IMHO it should not use native/python312 but python in the build crossenv
    and additionally it must use python header files of cross python
    currently evansport and armv7 have errors like
    /spksrc/native/python312/work-native/install/usr/local/include/python3.12/pyport.h:586:2: error: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)."
    (pyport.h does no match the 32-bit target platform)
    and aarch64 fails with:
    ../numpy/_core/src/umath/loops_autovec.dispatch.c.src:107:43: internal compiler error: Segmentation fault

@th0ma7
Copy link
Contributor Author

th0ma7 commented Feb 18, 2025

additional info:

  1. Supprted DSM
    building for comcerto2k-7.1 shows error:
    meson.build:28:4: ERROR: Problem encountered: NumPy requires GCC >= 8.4
    DSM < 7 will not be supported and we have to add to numpy Makefile something like:
    # numpy requires gcc >= 8.4
    REQUIRED_MIN_DSM = 7.0
    UNSUPPORTED_ARCHS = comcerto2k
    

Yup, on my radar, will add that indeed. Although I wonder if numpy 1.26.x might still work with older DSM considering the new meson I'm trying to build-up.

  1. cython not found
    when adding this to cross/numpy/Makefile:
    ENV += PATH=$(WORK_DIR)/crossenv-default/build/bin:$(PATH)
    it can find cython and sucessfully builds the wheel for x64
    it still fails for evansport, aarch64 and armv7:
    IMHO it should not use native/python312 but python in the build crossenv

That should be the case from spksrc.python-module-meson.mk... Although up to now I was still on the per depend fully-generated meson cross-file (which looks like working now). So I haven't reconvene on that part just yet.

and additionally it must use python header files of cross python

The per depend fully-generated meson cross-file should now fix that. I had encountered that same issue with cmake long ago where the $(WORK_DIR)/tc-vars.cmake would be use in conjunction of per depend C|CPP|CXX|LDFLAGS. Issue was that:

  1. The generic $(WORK_DIR)/tc-vars.cmake cannot handle per depends *FLAGS as there may be additional ones defined using ADDITIONAL_*FLAGS variables in any of the dependencies in cross/<meh>. And cmake cross file does not handle ${var} of that kind.
  2. So to fix that when cmake is called it fetches the content of the generic $(WORK_DIR)/tc-vars.cmake and create a $(WORK_DIR)/$(PKG_DIR)/<arch>.cmake file and add all FLAGS information to it so it is fully complete and no longer dependent on env flags.
  3. To that effect, with a fully defined per dependency cmake cross file the build process is called with an empty environment similarly as native. This ensures that the host is set to the targeted architecture while the build remains unchanged (i.e. the docker arch we're building on which gets auto-detected).

Long story short, the meson had never received such enhancement as things we're working just fine. But that's no longer true with python wheels whereas with the python virtual environment things gets totally confused for meson. Thus the need to have a fully functional meson cross-file defining all library and include path properly.

have a look at current tc-vars.cmake|meson and you'll notice no flags being set. When invoking a cmake build you'll have a fully compliant cross-file within the corresponding $(PKG_DIR).

This is now mostly working with meson, still have a few things to go through but getting there.

currently evansport and armv7 have errors like
/spksrc/native/python312/work-native/install/usr/local/include/python3.12/pyport.h:586:2: error: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)."
(pyport.h does no match the 32-bit target platform)
and aarch64 fails with:
../numpy/_core/src/umath/loops_autovec.dispatch.c.src:107:43: internal compiler error: Segmentation fault

That's a current known issue with numpy. We need to force settin the long bit accordingly for big or little endian into the meson cross-file such as:

[properties]
longdouble_format = 'IEEE_DOUBLE_BE'

EDIT: with regards to cython (which I just hit) there must be an issue with the PATH although there may be a way to set them in the meson native file such as:

[binaries]
python = '/usr/bin/python3'
cython = '/usr/bin/cython'

Anyhow, as usual thnx for your feedback, and work slowly progressing on this.

EDIT2: It turns out that I have yet to empty the env now when invoking meson build, which is not the case yet. Although it does work for regular meson builds it wont for python-based meson wheel builds. next on my todo.

@th0ma7
Copy link
Contributor Author

th0ma7 commented Feb 19, 2025

@hgy59 having a proof of concept that does build sucessfully for both aarch64 and x64 using latest numpy 2.2.3.

Although struggling with armv7 and evansport... I'll check if I can make 1.26.4 to work instead for the moment.

@mreid-tt
Copy link
Contributor

@th0ma7, was looking at the errors which remain:

error: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)."

And found the following which may be useful if you've not already considered:

Hope they can assist...

@th0ma7
Copy link
Contributor Author

th0ma7 commented Feb 25, 2025

I believe I now have something functional, but unmaintainable as-is.

The good

Using normal python -m pip wheel ... and passing proper meson-python flags:

--config-settings=setup-args="--cross-file=$(MESON_CROSS_TOOLCHAIN_PKG)" \
--config-settings=setup-args="--native-file=$(MESON_NATIVE_FILE)" \
--config-settings=install-args="--tags=runtime,python-runtime" \
--config-settings=build-dir="$(MESON_BUILD_DIR)" \
--no-build-isolation

I can now sucesfully cross-compile for arm7, evansport and x64 for DSM-7.1

The bad

There is a know bug in gcc<=10 with aarch64 that makes the compiler to segfault. I tried pretty much every possible alternatives of flags/disabling in the code but I wasn't able to workaround it.
Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696
Potential workaround that I was not able to make use of: https://github.com/google/highway/pull/1932/files
@hgy59 maybe you can help on this which would be really nice to have a working patch.

The ugly

Part of my crusade at making meson-python build to work, I ended-up at one point to reproduce the normal meson+ninja build. Surprisingly, this ended-up allowing me to sucesfully build numpy for aarch64... Missing is then the (re)packaging in wheel format, which hapens to be the exact same process as "The good" as long as I re-use the exact same builddir (which I ended-up figuring out tonight).

This really is ugly, but does work. This last commit a2068f5 was not tested on previously working x64, evansport and armv7. I'll let this rest for tonight. Good news is, we're probably much closer now... just need to tie-up the remaining loose-ends.

@hgy59
Copy link
Contributor

hgy59 commented Feb 26, 2025

@th0ma7 the wheels created from python/*/Makefile are not yet added to wheelhouse/requirements.txt.
And I guess it should be added to wheelhouse/requirements-cross.txt too.

The wheelhouse/requirements.txt is used by the install_python_wheels function in spksrc.service.installer.functions

@th0ma7
Copy link
Contributor Author

th0ma7 commented Feb 26, 2025

Thnx for catching this up, will include this.

I'm also looking at how to install numpy in the crossenv... I got an idea on how i could reuse the newly cross-compiled numpy wheel so it gets installed into the cross portion of the crossenv so it can then be made available for other wheels that depends on it.

Lastly, also looking at adding flexibility to have different vendor managed meson (other than numpy use case where the source package provides its own modified meson.py) and skipping that meson+ninja part when no vendor managed meson is provided (i.e. being the default use case)

All in all, taking shape but will require a few more spare cycles before reaching the finishing line...

@hgy59
Copy link
Contributor

hgy59 commented Feb 26, 2025

@th0ma7 another small issue popped up:
The wheel for charset-normalizer is generated as charset_normalizer-3.4.1-py3-none-any.whl and also the WHEEL file shows, it is pure python:

Wheel-Version: 1.0
Generator: setuptools (75.8.0)
Root-Is-Purelib: true
Tag: py3-none-any

The original wheels in the index (pypi) are cross compiled (like charset_normalizer-3.4.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl).
maybe our crossenv build needs a flag or something to create a cross compiled wheel for charset-normalizer.

@hgy59
Copy link
Contributor

hgy59 commented Feb 26, 2025

@th0ma7 I have successfully built python311-wheels with added python/numpy and python/numpy_1.26 for aarch64-7.1 and armv7-7.1.

It would be interresting to validate whether such wheels created with gcc 8.5 will run under DSM 6. I guess if the *.so files within the wheels do not reference GLBC > 2.20 functions, it might work.

My background: I am trying to build a final homeassistant package with support for DSM 6. This will be homeassistant 2024.3.3 that depends on numpy 1.26.0. This version is available in the index for x86_64 and aarch64 only, and I will have to build it at least for armv7 and evansport (i686).
To support armv7, I will have to cross build additionally ha-av==10.1.1 (this is not so easy since it depends on ffmpeg libraries).

To support armv7 in homeassistant 2025.1.4, it will be av==13.1.0 that depends on ffmpeg libraries too

@th0ma7
Copy link
Contributor Author

th0ma7 commented Feb 26, 2025

@th0ma7 another small issue popped up:
The wheel for charset-normalizer is generated as charset_normalizer-3.4.1-py3-none-any.whl and also the WHEEL file shows, it is pure python:

Wheel-Version: 1.0
Generator: setuptools (75.8.0)
Root-Is-Purelib: true
Tag: py3-none-any

The original wheels in the index (pypi) are cross compiled (like charset_normalizer-3.4.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl).
maybe our crossenv build needs a flag or something to create a cross compiled wheel for charset-normalizer.

Maybe this is similar to msgpack where it can fit in both?

@th0ma7
Copy link
Contributor Author

th0ma7 commented Feb 26, 2025

@th0ma7 I have successfully built python311-wheels with added python/numpy and python/numpy_1.26 for aarch64-7.1 and armv7-7.1.

It would be interresting to validate whether such wheels created with gcc 8.5 will run under DSM 6. I guess if the *.so files within the wheels do not reference GLBC > 2.20 functions, it might work.

My background: I am trying to build a final homeassistant package with support for DSM 6. This will be homeassistant 2024.3.3 that depends on numpy 1.26.0. This version is available in the index for x86_64 and aarch64 only, and I will have to build it at least for armv7 and evansport (i686).
To support armv7, I will have to cross build additionally ha-av==10.1.1 (this is not so easy since it depends on ffmpeg libraries).

To support armv7 in homeassistant 2025.1.4, it will be av==13.1.0 that depends on ffmpeg libraries too

That's a long shot! Not sure how i can help you though. I could reinstall my armv7 using a 6.2.4 image to try it out if that helps?

@th0ma7
Copy link
Contributor Author

th0ma7 commented Feb 27, 2025

I got some pretty cool code locally that allows installing cross-compiled numpy wheel into the crossenv to allow building scipy and others... But I faced one major major major problem, gcc version.

For numpy, with aarch64, it mandatory requires gcc>=10 otherwise it segfaults. I was able to workaround that up until needing to include OpenBLAS which re-trigerred this segfault bug with gcc for numpy. And without it I cannot build scipy, thus I cannot build scikit-learn, thus the domino effect.

@hgy59 All in all, this would require bumping our minimal version to DSM-7.2.

EDIT: I'll sleep on it... and probably upload my new code online to safeguard it just in case even though it will fail to build.

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 1, 2025

Good news, I was able to create a workaround patch for aarch64 ... a few loose ends but looking much better now.

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 6, 2025

@hgy59 and @mreid-tt It may look like stagnating but after spending numerous hours on this I finally made a major leap forward which now allows using default pip wheel methodology to build meson type wheels. Hopefully this will now allow me to include lapack and open the way to scipy, and thus things should start rolling at that point.

This has definitively been taking way longer than anticipated but I believe things will now start to shape nicely 🤞

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 11, 2025

Disconnecting for tonight... but I have a strong feeling I'm inches away from a solution... And I think the issue remaining may be how I translated the LDFLAGS to meson (auto-generated under $(WORK_DIR)/$(PKG_DIR)/$(ARCH)-toolchain.meson, example for arm below such as work-armv7-7.1/numpy-2.2.3/armv7-toolchain.meson):

c_link_args = [
        '-L/home/spksrc/wheel-meson2/spksrc/toolchain/syno-armv7-7.1/work/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot/lib',
        '-D__ARM_PCS_VFP=1',
        '-L/home/spksrc/wheel-meson2/spksrc/spk/python312-wheels/work-armv7-7.1/install/var/packages/python312-wheels/target/lib',
        '-Wl,--rpath-link,/home/spksrc/wheel-meson2/spksrc/spk/python312-wheels/work-armv7-7.1/install/var/packages/python312-wheels/target/lib',
        '-Wl,--rpath,/var/packages/python312-wheels/target/lib',
        '-L/home/spksrc/wheel-meson2/spksrc/spk/python312/work-armv7-7.1/install/var/packages/python312/target/lib',
        '-Wl,--rpath-link,/home/spksrc/wheel-meson2/spksrc/spk/python312/work-armv7-7.1/install/var/packages/python312/target/lib',
        '-Wl,--rpath,/var/packages/python312/target/lib',
        ]

@hgy59 feel free to pursue, fresh brain on this would be appreciated :)

@hgy59
Copy link
Contributor

hgy59 commented Mar 11, 2025

@th0ma7 I use a diyspk/nump-wheel package and include the numpy_test.py shown above.

Added a service-setup.sh with

service_postinst ()
{
    install_python_virtualenv
    
    pip install --disable-pip-version-check --no-deps --no-input --no-index ${SYNOPKG_PKGDEST}/share/wheelhouse/*.whl
}

running on virtualdsm:

./bin/python3 ../numpy_test.py
Traceback (most recent call last):
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/_core/__init__.py", line 23, in <module>
    from . import multiarray
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/_core/multiarray.py", line 10, in <module>
    from . import overrides
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/_core/overrides.py", line 7, in <module>
    from numpy._core._multiarray_umath import (
ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/__init__.py", line 114, in <module>
    from numpy.__config__ import show_config
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/__config__.py", line 4, in <module>
    from numpy._core._multiarray_umath import (
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/_core/__init__.py", line 49, in <module>
    raise ImportError(msg)
ImportError:

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.12 from "/volume1/@appstore/numpy-wheel/env/bin/python3"
  * The NumPy version is: "2.2.0"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: libopenblas.so.0: cannot open shared object file: No such file or directory


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/volume1/@appstore/numpy-wheel/env/../numpy_test.py", line 2, in <module>
    import numpy as np
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/__init__.py", line 119, in <module>
    raise ImportError(msg) from e
ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there.

I can't find any binary that depends on libopenblas within the wheel, so I guess it is dynamically loaded and might have a specific search order.

running with explicit library path LD_LIBRARY_PATH=/var/packages/numpy-wheel/target/lib ./bin/python3 ../numpy_test.py:

# LD_LIBRARY_PATH=/var/packages/numpy-wheel/target/lib  ./bin/python3 ../numpy_test.py
Traceback (most recent call last):
  File "/volume1/@appstore/numpy-wheel/env/../numpy_test.py", line 2, in <module>
    import numpy as np
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/__init__.py", line 114, in <module>
    from numpy.__config__ import show_config
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/__config__.py", line 4, in <module>
    from numpy._core._multiarray_umath import (
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/_core/__init__.py", line 23, in <module>
    from . import multiarray
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/_core/multiarray.py", line 10, in <module>
    from . import overrides
  File "/volume1/@appstore/numpy-wheel/env/lib/python3.12/site-packages/numpy/_core/overrides.py", line 7, in <module>
    from numpy._core._multiarray_umath import (
RuntimeError: NumPy was built with baseline optimizations:
(SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX) but your machine doesn't support:
(AVX).

the AVX optimization seems not supported in virtualdsm.

The definition of LD_LIBRARY_PATH is not a problem for the HA package (it already has it).
but a self-contained solution is preferable.

to fix this, the rpath must be fixed/adjusted in the so files of the numpy wheel.
there are two libraries in the installed numpy package that depend on libopenblas

  • numpy/_core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so
  • numpy/linalg/lapack_lite.cpython-312-x86_64-linux-gnu.so

and both have Library rpath: [/spksrc/diyspk/numpy_ha/work-x64-7.1/install/var/packages/numpy-wheel/target/lib]

@hgy59
Copy link
Contributor

hgy59 commented Mar 11, 2025

@th0ma7 the above test works on DS-115j (armada370 - armv7) with DSM 7.1 when using LD_LIBRARY_PATH.

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 11, 2025

@hgy59 would you mind commenting out the cpu-dispatch and cpu-baseline definition in numy's baseline I've added yesterday and retrying on you x86_64? and removing avx and re-test? I'll have to read further on this to get a proper understanding on how to using that (away from my build system atm).

Also, I'm almost certain that the rpath is not functional atm, not only for meson-python wheels but probably overall for all meson builds. And that issue is new with this PR. Once that is fixed the LD_LIBRARY_PATH should not be needed.

@hgy59
Copy link
Contributor

hgy59 commented Mar 11, 2025

@th0ma7 my test even works on DS-218 (aarch64) with DSM 6.2.4 when patching os_min_version in the INFO file of the package from DSM 7.1 to 6.2.

BTW the libquadmath.so is not contained in armv7 and aarch64 packages (seems x64 only).

@hgy59
Copy link
Contributor

hgy59 commented Mar 11, 2025

@hgy59 would you mind commenting out the cpu-dispatch and cpu-baseline definition in numy's baseline I've added yesterday and retrying on you x86_64? and removing avx and re-test? I'll have to read further on this to get a proper understanding on how to using that (away from my build system atm).

working on it locally...

Update:
when I first removed the non apollolake compatible cpu flags (sse3 and avx) in cpu-baseline, I got core dumps.
This was fixed with the correct TARGET for openblas and removal of cpu-baseline and cpu-dispatch.
Finally I reactivated cpu-baseline (and cpu-dispatch) with apollolake compatible flags and it still worked.

- most x64 archs are atom like
- adjust cpu-baseline in python/numpy* (make it apollolake compatible)
@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 11, 2025

Update: when I first removed the non apollolake compatible cpu flags (sse3 and avx) in cpu-baseline, I got core dumps. This was fixed with the correct TARGET for openblas and removal of cpu-baseline and cpu-dispatch. Finally I reactivated cpu-baseline (and cpu-dispatch) with apollolake compatible flags and it still worked.

This is really nice! Good work! What would be even better is the ability to completely avoid that altogether... but I doubt we can. Now let's find the issue with rpath...

@hgy59
Copy link
Contributor

hgy59 commented Mar 12, 2025

@th0ma7 my findings are, that we have to setup meson with --prefix for correct rpath, but I can't find where (or how) meson setup is called for our python-meson builds.

meson setup build --prefix=$(INSTALL_PREFIX)

it is also documented for numpy on https://numpy.org/doc/stable/building/understanding_meson.html

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 12, 2025

Aaahhh! Thnx for the pointer, i believe i know how to fix this later tonight!

@hgy59
Copy link
Contributor

hgy59 commented Mar 12, 2025

Aaahhh! Thnx for the pointer, i believe i know how to fix this later tonight!

I did some more tests but didn't succeed. (added install_rpath and build_rpath to [properties], added CONFIGURE_ARGS += --prefix=$(INSTALL_PREFIX) and set prefix = $(INSTALL_PREFIX) under [built-in options]).

The so files in the builddir are binary the same as in the whl file.
My suggestion is that the meson install step (that would adjust the rapth) does not happen.

EDIT:
"The meson Manual" says:
Meson automatically sets up rpath entries so that executables can be run
directly from the build directories. It will automatically remove these entries
from the files it installs.

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 13, 2025

I may have something that works now. The libtree -vvv -p now include our rpath as depth 1 as below:

│   └── libgfortran.so.5 not found
│       ┊ Paths considered in this order:
│       ┊ 1. rpath:
│       ┊    depth 2
│       ┊    /var/packages/python312-wheels/target/lib
│       ┊    /var/packages/python312/target/lib
│       ┊    depth 1
│       ┊    /var/packages/python312-wheels/target/lib
│       ┊    /var/packages/python312/target/lib
│       ┊    /home/spksrc/wheel-meson2/spksrc/spk/python312-wheels/work-x64-7.1/install/var/packages/python312-wheels/target/lib
│       ┊ 2. LD_LIBRARY_PATH was not set
│       ┊ 3. runpath was not set
│       ┊ 4. ld config files:
│       ┊    /usr/lib/x86_64-linux-gnu/libfakeroot
│       ┊    /usr/local/lib
│       ┊    /usr/local/lib/x86_64-linux-gnu
│       ┊    /lib/x86_64-linux-gnu
│       ┊    /usr/lib/x86_64-linux-gnu
│       ┊    /lib32
│       ┊    /usr/lib32
│       ┊    /libx32
│       ┊    /usr/libx32
│       ┊ 5. Standard paths:
│       ┊    /lib
│       ┊    /lib64
│       ┊    /usr/lib
│       ┊    /usr/lib64

Still, I don't get the rpath depth 2 neither the equivalent to $(STAGING_INSTALL_PREFIX) part of rpath depth 1 which is totally useless.

As expected, the $(ENV) environment has a strong influence over the build process. Although when using cmake or meson with a "full-featured" cross-file, the environment needs to be fully cleaned-up of any autoconf-type env variables. Something I already did with CMAKE as it already had a full-featured cross-file, although I wasn't with meson before this PR as it's crossenv was still lagging per cross/<depend> cross-file generation and therefore dependent of *FLAGS from $(ENV).

Now, I did try going all-in with a systematic env -i instead of unsetting specific variables (using env -u) but if fails on pkg-config in some cases. If I could find why that may remove the uneeded rpath in our librairies. On the other hand, it would then be missing other non-autoconf variables such as CARGO / RUSTC and others.

For fun, have a look at libtree -vvv -p ./numpy/core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so before then after changing line 81 of spksrc.python-wheel-meson.mk such as:

cd $(MESON_BASE_DIR) && env $(ENV) \

I recall struggling on this when adding cmake & meson long ago... Now I recall why a bit more.

@hgy59
Copy link
Contributor

hgy59 commented Mar 13, 2025

BTW the build log of openblas is flooded with warnings when building numpy only in diyspk:

f951: Warning: Nonexistent include directory ‘/spksrc/diyspk/numpy_ha/work-x64-7.1/install/var/packages/numpy-wheel/target/include’ [-Wmissing-include-dirs]

This comes from -I $(STAGING_INSTALL_PREFIX)/include somewhere.

It occurs while Building Fortran preprocessed CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/*when no other dependency is built before.

A pragmatic solution would be to create the include folder in pre_compile, since modules with dependencies require it.

.PHONY: openblas_pre_compile
openblas_pre_compile:
	install -d -m 755 $(STAGING_INSTALL_PREFIX)/include

A better solution would be to create this folder in the Makefile that defines the include path.
For openblas it refers to cross-cmake, but other cross-envs could also be affected.

@hgy59
Copy link
Contributor

hgy59 commented Mar 13, 2025

I may have something that works now. The libtree -vvv -p now include our rpath as depth 1 as below:

It now works now without LD_LIBRARY_PATH 🎉 .
Alas it still has the full library install path in it, that does not exist on the target system...

ash-4.4# readelf -d  lib/python3.12/site-packages/numpy/_core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so

Dynamic section at offset 0x3cf978 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libopenblas.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]
 0x000000000000000f (RPATH)              Library rpath: [/var/packages/numpy-wheel/target/lib:/var/packages/python312/target/lib:/spksrc/diyspk/numpy_ha/work-x64-7.1/install/var/packages/numpy-wheel/target/lib]

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 13, 2025

Yes finally! I'm starting to wonder if this isn't an issue specific to numpy's vendor meson.... While functional it still needs to be fixed somewhat though.

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 17, 2025

@hgy59 and @mreid-tt I haven't count how many times I've build numpy (and it's way too many), but I'm now almost certain this is a bug with meson. I've documented my findings in mesonbuild/meson#14354 and believe this relates to a long-standing bug at mesonbuild/meson#6541

I do have code to workaround that, enforcing to have LDFLAGS set in our environment on top of the cross-file (to be pushed soon). This ensures to have proper rpath as depth 1 but still, build rpath shows in that depth 1 and our host rpath are being duplicated in depth 2 and there is nothing I can do except unpack the wheel and edit every .so files to delete that from it (which I don't intent to do).

Slowly pursuing... 🐢 🐌

@mreid-tt
Copy link
Contributor

@th0ma7, I really appreciate all your effort in resolving this. From the long-standing bug you mentioned, it appears that many attempts have been made across various Meson versions to address this issue. I also noticed references from the university HPC community about different patches to Meson that alter RPATH handling, especially in environments with wrappers or cross compilation. Though some of this is a bit over my head, I’m curious how you’d describe our current situation.

In your new ticket, you mention using rpath and rpath-link. However, in an older ticket, one of the authors stated:

If you use just a single -rpath or -rpath-link option, it's completely discarded.

That made me wonder about our specific scenario with spksrc. Sorry if you’ve already explained this — I'm just trying to understand the workaround you’re implementing and how it compares to what was discussed in the older ticket.

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 18, 2025

@mreid-tt My assumption is rather that the rpath is not being discarded but actually hidden under the rpath depth. When using readlink -d it only shows rpath depth 1 whereas when using libtree -vvv -d it actually shows depth 1+2. Note that I didn't knew that rpath depth even existed before now but when using the cross-compiled binary later on our destination host, it seems to only refers to rpath depth 1.

From what I gathered from our use-case, when setting PKG_CONFIG_LIBDIR, mandatory needed in our cross-compile case to reset the default location of pkgconfig directory, it somehow ends-up enforcing a "build-environment" rpath as depth 1 (sort of rpath-link), leaving our rpath settings from install_rpath variable in our generated meson cross-file as rpath depth 2.

A bit more background is needed here: when using a cross-file with meson or cmake, you need to empty your shell environment from any other duplicate flags (i.e. mostly *FLAGS variables) that are now already managed from your cross-file to avoid any conflicts. Work-around I've found is keeping the LDFLAGS with our rpath definitions makes meson to also include those rpath as depth 1 (and duplicated also in depth 2). My assumption is that install_rpath is handled as depth 2 while CFLAGS rpath are handled as depth 1. No clue as to why.

I still have a few more tests to go through to confirm our final meson build environment state, but this is starting to stabilize with my latest commit e325eb1.

Lastly, I still have to double-check its behavior outside of meson-python to ensure this isn't affecting our regular meson builds.

Hope this helps?

@mreid-tt
Copy link
Contributor

@th0ma7, thanks for sharing. To clarify my understanding about "rpath depth," Google AI describes it as:

"In software compilation and linking, rpath depth refers to the number of directory levels specified in an RPATH (runtime path) when linking an executable. RPATH directs the linker to locations of shared libraries at runtime, and the depth represents how many subdirectory levels are searched."

Given this definition, what is the practical implication of specifying more directory levels than necessary (e.g., depth of 1+2) for our builds? Would this result in larger binaries, or merely a slight increase in build time? I'm trying to fully understand any potential impact of your proposed workaround.

@mreid-tt
Copy link
Contributor

mreid-tt commented Mar 21, 2025

Hey @th0ma7, just checking — anything left to do on this PR before we merge?

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 21, 2025

A bit of code cleanups and adding fortran to cmake for dsm7 and above. This has been taking way more cycles than expected, cycles that are limited.

@th0ma7
Copy link
Contributor Author

th0ma7 commented Mar 22, 2025

I'll do my best to get this merged over the coming days, hopefully within a week.

Note that i did try to get panda and scikit-learn to build but i wasn't able to. Other changes may have to follow for those in particular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants