Skip to content

gzdopen, gzerror, gzread, ... not present in "libz_rs.so"'s cdylib? #370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brainstorm opened this issue May 27, 2025 · 2 comments
Open

Comments

@brainstorm
Copy link

brainstorm commented May 27, 2025

Writing this with checked out repo on latest HEAD from today (commit 1cd807107f75b0b30fc7bd3ac34569bf75f407b6)

Thanks for offering a dynamic C lib way to substitute old zlib implementations! I'm trying to migrate a bioinformatics program away from it (bwa-mem2), but unfortunately, I can't see some of the symbols I need on the resulting shared lib, like gzdopen?

$ uname -a
Linux home 6.8.0-1019-oracle #20-Ubuntu SMP Fri Jan 17 21:19:50 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

$ cargo clean && cargo build --release --no-default-features --features "c-allocator" && nm target/release/libz_rs.so | grep -i gzdopen
     Removed 0 files
   Compiling zlib-rs v0.5.0 (/home/ubuntu/tmp/zlib-rs/zlib-rs)
   Compiling libz-rs-sys v0.5.0 (/home/ubuntu/tmp/zlib-rs/libz-rs-sys)
   Compiling libz-rs-sys-cdylib v0.5.0 (/home/ubuntu/tmp/zlib-rs/libz-rs-sys-cdylib)
    Finished `release` profile [optimized] target(s) in 3.28s
$
$ cargo clean && cargo build --release && nm target/release/libz_rs.so | grep -i gzdopen
     Removed 25 files, 3.1MiB total
   Compiling zlib-rs v0.5.0 (/home/ubuntu/tmp/zlib-rs/zlib-rs)
   Compiling libz-rs-sys v0.5.0 (/home/ubuntu/tmp/zlib-rs/libz-rs-sys)
   Compiling libz-rs-sys-cdylib v0.5.0 (/home/ubuntu/tmp/zlib-rs/libz-rs-sys-cdylib)
    Finished `release` profile [optimized] target(s) in 3.25s
$
$ cargo clean && cargo build --release && nm target/release/libz_rs.so | grep -i gz
     Removed 25 files, 3.1MiB total
   Compiling zlib-rs v0.5.0 (/home/ubuntu/tmp/zlib-rs/zlib-rs)
   Compiling libz-rs-sys v0.5.0 (/home/ubuntu/tmp/zlib-rs/libz-rs-sys)
   Compiling libz-rs-sys-cdylib v0.5.0 (/home/ubuntu/tmp/zlib-rs/libz-rs-sys-cdylib)
    Finished `release` profile [optimized] target(s) in 3.27s
00000000000082d4 t _ZN7zlib_rs5c_api9gz_header5flags17h99a88144c91719a2E
$

For good measure, I first followed the cdylib README.md and tried with a 1-line modification of the minimal zpipe.c example first:

$ cargo clean && cargo build --release && nm target/release/libz_rs.so | grep -i gzdopen
     Removed 25 files, 3.1MiB total
   Compiling zlib-rs v0.5.0 (/home/ubuntu/tmp/zlib-rs/zlib-rs)
   Compiling libz-rs-sys v0.5.0 (/home/ubuntu/tmp/zlib-rs/libz-rs-sys)
   Compiling libz-rs-sys-cdylib v0.5.0 (/home/ubuntu/tmp/zlib-rs/libz-rs-sys-cdylib)
    Finished `release` profile [optimized] target(s) in 3.25s

$ cc -o zpipe zpipe.c target/release/libz_rs.so -I .
/usr/bin/ld: /tmp/cckGDWTg.o: in function `main':
zpipe.c:(.text+0x60c): undefined reference to `gzdopen'
collect2: error: ld returned 1 exit status

$ git diff
diff --git a/libz-rs-sys-cdylib/zpipe.c b/libz-rs-sys-cdylib/zpipe.c
index bda0ce6..ea5c07b 100644
--- a/libz-rs-sys-cdylib/zpipe.c
+++ b/libz-rs-sys-cdylib/zpipe.c
@@ -201,6 +201,7 @@ int main(int argc, char **argv)
     SET_BINARY_MODE(stdin);
     SET_BINARY_MODE(stdout);

+    gzdopen(-2, "w");
     /* do compression if no arguments */
     if (argc == 1) {
         ret = def(stdin, stdout, Z_DEFAULT_COMPRESSION);

And here's the output from the bwa-mem2 patched Makefile:

$ git diff
diff --git a/Makefile b/Makefile
index 7e9fb7d..958b86e 100644
--- a/Makefile
+++ b/Makefile
@@ -55,11 +55,11 @@ CPPFLAGS+=  -DENABLE_PREFETCH -DV17=1 -DMATE_SORT=0 $(MEM_FLAGS)

 ifeq ($(uname_arch),arm64)     # OS/X has memset_s etc already
 INCLUDES=   -Isrc $(SSE2NEON_FLAGS) $(SSE2NEON_INCLUDES) -D__STDC_WANT_LIB_EXT1__
-LIBS=          -lpthread -lm -lz -L. -lbwa $(STATIC_GCC)
+LIBS=          -lpthread -lm -L. -lbwa $(STATIC_GCC)
 SAFE_STR_LIB=
 else
 INCLUDES=   -Isrc -Iext/safestringlib/include
-LIBS=          -lpthread -lm -lz -L. -lbwa -Lext/safestringlib -lsafestring $(STATIC_GCC)
+LIBS=          -lpthread -lm -L../zlib-rs/libz-rs-sys-cdylib/target/release/ -L. -lbwa -Lext/safestringlib -lsafestring $(STATIC_GCC)
 SAFE_STR_LIB=    ext/safestringlib/libsafestring.a
 ifeq ($(uname_arch),aarch64)
 INCLUDES+= $(SSE2NEON_FLAGS) $(SSE2NEON_INCLUDES)
diff --git a/ext/safestringlib b/ext/safestringlib
index 245c4b8..e399109 160000
--- a/ext/safestringlib
+++ b/ext/safestringlib
@@ -1 +1 @@
-Subproject commit 245c4b8cff1d2e7338b7f3a82828fc8e72b29549
+Subproject commit e399109428240ffc52aedca1cb93098426fac18d-dirty

... and its corresponding error, so you can see that other symbols aren't public/visible either:

$ make -j4
g++ -std=c++14 -g -O3 -fpermissive   src/main.o libbwa.a -lpthread -lm -L../zlib-rs/libz-rs-sys-cdylib/target/release/ -L. -lbwa -Lext/safestringlib -lsafestring  -o bwa-mem2
/usr/bin/ld: libbwa.a(fastmap.o): in function `main_mem(int, char**)':
/home/ubuntu/tmp/bwa-mem2-2.2.1/src/fastmap.cpp:905:(.text+0x1740): undefined reference to `gzdopen'
/usr/bin/ld: /home/ubuntu/tmp/bwa-mem2-2.2.1/src/fastmap.cpp:933:(.text+0x2520): undefined reference to `gzdopen'
/usr/bin/ld: libbwa.a(utils.o): in function `err_xzopen_core':
/home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:84:(.text+0x1b80): undefined reference to `gzopen'
/usr/bin/ld: /home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:79:(.text+0x1bc4): undefined reference to `gzdopen'
/usr/bin/ld: libbwa.a(utils.o): in function `err_gzread.part.0':
/home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:149:(.text+0x1d58): undefined reference to `gzerror'
/usr/bin/ld: libbwa.a(utils.o): in function `ks_getuntil2(__kstream_t*, int, __kstring_t*, int*, int) [clone .constprop.0]':
/home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:144:(.text+0x1e2c): undefined reference to `gzread'
/usr/bin/ld: /home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:149:(.text+0x2018): undefined reference to `gzerror'
/usr/bin/ld: libbwa.a(utils.o): in function `kseq_read(kseq_t*)':
/home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:144:(.text+0x2158): undefined reference to `gzread'
/usr/bin/ld: /home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:144:(.text+0x21e4): undefined reference to `gzread'
/usr/bin/ld: /home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:144:(.text+0x23c0): undefined reference to `gzread'
/usr/bin/ld: /home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:144:(.text+0x256c): undefined reference to `gzread'
/usr/bin/ld: /home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:149:(.text+0x26e4): undefined reference to `gzerror'
/usr/bin/ld: libbwa.a(utils.o): in function `err_gzread':
/home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:144:(.text+0x2874): undefined reference to `gzread'
/usr/bin/ld: libbwa.a(utils.o): in function `err_gzclose':
/home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:280:(.text+0x2d9c): undefined reference to `gzclose'
/usr/bin/ld: /home/ubuntu/tmp/bwa-mem2-2.2.1/src/utils.cpp:283:(.text+0x2db4): undefined reference to `zError'
collect2: error: ld returned 1 exit status
make: *** [Makefile:153: bwa-mem2] Error 1

I'm either doing something wrong (PEBKAC) or there's some sort of regression in #335, #49, ...?

I can see some outstanding symbol visibility issues in #34, perhaps it's a known issue ATM?

The same issue is also present in #365

@folkertdev
Copy link
Collaborator

I think #365 is the version to use, but you'll need to add --features="gz" to actually compile the gzip functionality.

I ran

cargo build --release --no-default-features --features "c-allocator,gz"

along with

diff --git a/Makefile b/Makefile
index 359585f..71d6026 100644
--- a/Makefile
+++ b/Makefile
@@ -43,7 +43,7 @@ ARCH_FLAGS=   -msse -msse2 -msse3 -mssse3 -msse4.1
 MEM_FLAGS=     -DSAIS=1
 CPPFLAGS+=     -DENABLE_PREFETCH -DV17=1 -DMATE_SORT=0 $(MEM_FLAGS) 
 INCLUDES=   -Isrc -Iext/safestringlib/include
-LIBS=          -lpthread -lm -lz -L. -lbwa -Lext/safestringlib -lsafestring $(STATIC_GCC)
+LIBS=          -lpthread -lm /home/folkertdev/rust/zlib-rs/libz-rs-sys-cdylib/target/release/libz_rs.so -L. -lbwa -Lext/safestringlib -lsafestring $(STATIC_GCC)
 OBJS=          src/fastmap.o src/bwtindex.o src/utils.o src/memcpy_bwamem.o src/kthread.o \
                        src/kstring.o src/ksw.o src/bntseq.o src/bwamem.o src/profiling.o src/bandedSWA.o \
                        src/FMI_search.o src/read_index_ele.o src/bwamem_pair.o src/kswv.o src/bwa.o \

and that does actually compile. Including the library with -L doesn't work though, I have to give the explicit path to the .so file.

So, as you can see, the gz functions are still in development, though close to done. Is there some command with a small input that we could run with bwa-mem2 that you think would exercise this functionality? That would be helpful for our testing/CI.

@brainstorm
Copy link
Author

brainstorm commented Jun 1, 2025

Hello @folkertdev, thanks for the suggestions and diff, that PR indeed fixed it!

Is there some command with a small input that we could run with bwa-mem2 that you think would exercise this functionality? (...) That would be helpful for our testing/CI.

Well, I'm preparing this repo/GHA for this (benchmarking) purpose, we have been working on bwa-mem2 optimizations in our community recently, see:

bioconda/bioconda-recipes#56407
bioconda/bioconda-recipes#56242

Right now take it very much with a pinch of salt, because I'm not sure if the art_illumina simulated reads AND E.coli (quite a different organism than humans) are representative smaller tests that exercise the bwa-mem2 runtime dynamics properly, WIP... also need to fix the LD_* patching (one for tomorrow morning in my timezone).

I'll keep you posted, but meanwhile, feel free to borrow ideas from my GHA (or run it on a bigger (hosted?) CI runner with at least 60GB of RAM to index a human genome bwa-mem2 index step).

/cc @dslarm @mmalenic @ohofmann @scwatts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants