Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly support permissions and symlinks in Zip files (500USD Bounty) #356

Open
lihaoyi opened this issue Feb 14, 2025 · 25 comments
Open
Labels

Comments

@lihaoyi
Copy link
Member

lihaoyi commented Feb 14, 2025


From the maintainer Li Haoyi: I'm putting a 500USD bounty on this issue, payable by bank transfer on a merged PR fixing this.

See https://github.com/orgs/com-lihaoyi/discussions/6 for other bounties and the terms and conditions that bounties operate under


@kiendang hit some issues using os.zip in Mill (com-lihaoyi/mill#4503), some of which are permissions related. It seems that since 2019 the JDK's zip file APIs should be able to support permissions (https://bugs.java.com/bugdatabase/view_bug?bug_id=8213031), so we should make sure OS-Lib takes advantage to preserve permissions when zipping and unzipping files and folders

@kiendang
Copy link
Contributor

kiendang commented Feb 20, 2025

Turned out the problem was all permission related. It worked after executable perm was given to files inside bin and libexec.

Anw can see 2 problems with os.unzip: It does not perserve file permissions and symlinks. After being unzipped from cosmocc.zip (https://cosmo.zip/pub/cosmocc/) with either os.unzip or os.unzip.stream, the files are stripped of permissions (files in bin no longer have executable permission) and all the original symlinks are replaced with copies of the destination files.

@lihaoyi lihaoyi changed the title Properly support permissions in Zip files Properly support permissions and symlinks in Zip files Feb 20, 2025
@lihaoyi lihaoyi changed the title Properly support permissions and symlinks in Zip files Properly support permissions and symlinks in Zip files (500USD Bounty) Feb 20, 2025
@lihaoyi lihaoyi added the bounty label Feb 20, 2025
@sideeffffect
Copy link

https://unix.stackexchange.com/a/509337

Info-Zip 3.0 SUPPORTS preserving files/dirs UNIX permissions and UID/GID ownership data. zip stores it by default but you need to use unzip in an special way to restore them:

  • unzip must be used with the -X flag.
  • unzip must run as root to set the files/dirs UID/GID. If you run it as a normal user then the UID will be always the one of the current user and the GID will be restored ONLY IF the current user belongs to that group.

Note: you can also use unzip with the -K flag to also restore SUID/SGID/Sticky bits.

@kiendang
Copy link
Contributor

kiendang commented Feb 22, 2025

Extra fields have been used to store information about POSIX permissions and symlink though. jdk.zipfs does support POSIX permissions from JDK 14.

os-lib uses java.util.zip.ZipEntry which does not expose this information though. It's stored in the private ZipEntry.extraAttributes field. There's talk on making it accessible but I don't think it's gonna be resolved any time soon judging on the lack of activity.

Apache Commons and Apache Ant's own implementation of ZipEntry does support permissions (but obviously we're not adding external deps here).

So for a quick fix I think can use jdk.zipfs to support permissions for os.zip and os.unzip (only for Path, not .stream).

@kiendang
Copy link
Contributor

@sideeffffect java.util.zip is based on the Info-ZIP format anyway. Just that the field that contains the information is unexposed...

@lihaoyi
Copy link
Member Author

lihaoyi commented Feb 25, 2025

For extraAttributes, can we get at them via reflection? Would that allow us to support permissions on JDK <14?

Otherwise only supporting permissions for os.zip and os.unzip for os.Paths is an alternative. Would that require JDK >= 14? That might be fine if raise the minimum JDK requirement to 17, as has been proposed for Scala itself, and I think a PR that does that would be enough to satisfy this ticket

Adding a third party dependency is an option, but it would be great if we could find a solution just using the native JDK

@kiendang
Copy link
Contributor

kiendang commented Feb 26, 2025

Otherwise only supporting permissions for os.zip and os.unzip for os.Paths is an alternative. Would that require JDK >= 14?

Yup JDK >= 14 is required. The thing is this is not without hiccups either. Using jdk.zipfs, a default set of permissions is defined. Zip entries without permission info would be given this set of default permissions, while the more appropriate thing to do would be to follow umask? The API doesn't provide a way to know which entry actually have or doesn't have permission info either. We would have to first figure out umask (by creating a blank file?) then set it as the default permissions?

Third party dependency is also not perfect. Apache Commons Compress doesn't support symlink (edit: it does).

I think extraAttributes can be obtained with reflection, and if it's straightforward from there this would be the best solution. Available since JDK 11. Just that the attribute has been renamed a couple times so need to dance around that.

@kiendang
Copy link
Contributor

Ah looks like using reflection to access private field for java.lang. classes is not permitted anymore since JDK 17. Would throw InaccessibleObjectException.

@kiendang
Copy link
Contributor

Looks like using Apache Commons Compress is the cleanest solution for now? Need to experiment though since their ZipArchiveEntry only exposes certain things to ZipFile, not stream.

@lihaoyi
Copy link
Member Author

lihaoyi commented Mar 3, 2025

I think using a third party library is fine. We can shade it so it is totally encapsulated and collide with other versions on the classpath

@lefou
Copy link
Member

lefou commented Mar 3, 2025

I don't think we should shade just because we can. Is there even a known risk, that anything can collide with other version of commons-compress? We could build against the oldest version we can/want support, so that binary compatibility isn't an issue, and add the newest known version to our runtime deps.

@lefou
Copy link
Member

lefou commented Mar 3, 2025

commons-compress is most likely not available for the scala-native version, I guess.

@kiendang
Copy link
Contributor

kiendang commented Mar 3, 2025

I've played a bit with Apache Commons Compress and found that it's not possible to make unzip.stream preserve symlinks and permissions since zip files store this information in the central directory at the end of the file, not together with each zip entry. zip, unzip and zip.stream can do this though. Would still need Apache Commons Compress for zip.stream to support this.

On the Apache Commons Compress dependency thing may I suggest another approach? It's not a good idea to include it as a dependency. It is quite heavy since it also includes support for several other file formats, plus the Scala native issue as @lefou mentioned (edit: this was wrong). I was thinking of just including the few source files that we need as unmanaged sources. However, Apache Commons Compress zip even depends on Commons IO.

jdeps --multi-release 61 -cp commons-compress-1.27.1.jar org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.class

ZipArchiveOutputStream.class -> commons-compress-1.27.1.jar
ZipArchiveOutputStream.class -> java.base
ZipArchiveOutputStream.class -> not found
   org.apache.commons.compress.archivers.zip          -> java.io                                            java.base
   org.apache.commons.compress.archivers.zip          -> java.lang                                          java.base
   org.apache.commons.compress.archivers.zip          -> java.nio                                           java.base
   org.apache.commons.compress.archivers.zip          -> java.nio.channels                                  java.base
   org.apache.commons.compress.archivers.zip          -> java.nio.charset                                   java.base
   org.apache.commons.compress.archivers.zip          -> java.nio.file                                      java.base
   org.apache.commons.compress.archivers.zip          -> java.util                                          java.base
   org.apache.commons.compress.archivers.zip          -> java.util.zip                                      java.base
   org.apache.commons.compress.archivers.zip          -> org.apache.commons.compress.archivers              commons-compress-1.27.1.jar
   org.apache.commons.compress.archivers.zip          -> org.apache.commons.compress.utils                  commons-compress-1.27.1.jar
   org.apache.commons.compress.archivers.zip          -> org.apache.commons.io                              not found

Apache Commons Compress zip was actually lifted from Apache Ant org.apache.tools.zip. org.apache.tools.zip provides what we need and is a self-sufficient package, only depending on java standard libs

jdeps -cp ant-1.10.15.jar org/apache/tools/zip/ZipOutputStream.class

ZipOutputStream.class -> java.base
   org.apache.tools.zip                               -> java.io                                            java.base
   org.apache.tools.zip                               -> java.lang                                          java.base
   org.apache.tools.zip                               -> java.nio                                           java.base
   org.apache.tools.zip                               -> java.nio.file                                      java.base
   org.apache.tools.zip                               -> java.util                                          java.base
   org.apache.tools.zip                               -> java.util.zip                                      java.base

We could just include src/main/org/apache/tools/zip as unmanaged sources (and attribute properly with respect to ant's license of course). We're not exposing any of this to the APIs anyway.

@sake92
Copy link
Collaborator

sake92 commented Mar 6, 2025

os-lib uses java.util.zip.ZipEntry which does not expose this information though. It's stored in the private ZipEntry.extraAttributes field. There's talk on making it accessible but I don't think it's gonna be resolved any time soon judging on the lack of activity.

This was my thought too, but I figured we can open the ZIP after it is created, and add the permissions.
Like the docs say, with java's zip filesystem https://docs.oracle.com/en/java/javase/17/docs/api/jdk.zipfs/module-summary.html

Same with unzip, we extract the files, and then dig into the zip via zip filesystem to fetch the perms.

See my POC #371

@kiendang
Copy link
Contributor

kiendang commented Mar 6, 2025

@sake92 jdk.zipfs does not support symlinks, and only supports permissions since JDK 14.

Apache Commons Compress/Apache Ant zip supports both symlinks and permissions for os.zip (for new zip, not modification), os.zip.stream os.unzip, for earlier JDKs as well, but as @lefou said, this would not work for native (I was mistakenly thinking vendoring them would solve this). Does jdk.zipfs works with native?

I put up #372 and added more details there. I think we can use a combination of these 2 approaches since we do neeed jdk.zipfs to support os.zip modifying existing zip files.

@sake92
Copy link
Collaborator

sake92 commented Mar 6, 2025

Didn't try it but seems SN doesn't support zipfs at all, yet.
See scala-native/scala-native#3453
Maybe we should abstract this permissions set/restore as a common interface, and implement it for real for JVM, but noop for SN.

@lihaoyi
Copy link
Member Author

lihaoyi commented Mar 7, 2025

We do want to support both permissions and symlinks, so it seems using a third-party lib is inevitable since the built in JVM ZipFs does not support symlinks regardless of version https://bugs.openjdk.org/browse/JDK-8268856.

Not supporting features on scala-native is fine. We never supported 100% of OS-Lib on scala-native, and anyway the vast majority of usage is on the JVM, so we can leave the scala-native version of this library to catch up at its own rate.

Pulling in org.apache:commons-compress is certainly an option. There are basically four options here:

  1. Just use it directly. It's a 1mb jar, so although it's larger than OS-Lib (600kb) it's not that large.

  2. Make it a provided dependency so that folks who don't use it don't pay the cost, and use ClassLoader.loadClass to check for existence and provide a nice error message if not provided with instructions to add it.

  3. Have a submodule os.commons.zip that uses commons-compress, while os.zip uses the builtin JDK implementation. So This trades the unusual provided dependencies with an unusual API duplication.

  4. Move os.zip entirely to a separate artifact, move the other APIs to an os-core artifact, but keep the JVM packages unchanged, with the os-lib package depending on both of them. This preserves the binary compatibility of the os-lib package at the cost of some bloat, but provides an os-core artifact for those who want to pick things alacarte and avoid the bloat, and keeps the user-facing API exactly the same as it is now

All options add a bunch of complexity to what was previously a thin dependency-free library, but I think it's worth doing in order to fully flesh out the os.zip and os.unzip APIs.

I'd say option (4) looks the best right now, as it offers the most "idiomatic" setup, which should minimize confusion and pave the way for other dependency-heavy submodules in future(e.g. tar support?)

@kiendang
Copy link
Contributor

kiendang commented Mar 7, 2025

@lihaoyi what about vendoring org.apache:commons-compress/apache ant zip and make them private since there's no need to exposed this to the API at all? This way we can minimize the footprint since we can just include what we need.

@lihaoyi
Copy link
Member Author

lihaoyi commented Mar 7, 2025

@kiendang sure that would be option (1) above. I do think it's better to start separating out the OS-Lib artifacts though (option 4), I don't think this will be the last third-party dependency we want to add

@kiendang
Copy link
Contributor

kiendang commented Mar 7, 2025

I see. Then I'd like to argue a bit more for option (1). We don't need to include the whole 1mb jar, just the part that we need. We could also use the source code from the source jar instead. This allows us to modify the code before compiling, like changing the package org.apache... to package os... so if users do use apache commons compress in their project there's no conflict; and make the classes os private. We could still separate os-lib artifacts (4), but not as urgent now, could even delay it until a third-party dependency is truly needed.

@lihaoyi
Copy link
Member Author

lihaoyi commented Mar 7, 2025

I'd be ok with (1) if the vendored/shaded artifact was a bit smaller. As is, adding 1mb to 600kb triples the size of the library, which seems a bit much to add support for some uncommonly-used functionality. I guess the question is: how small can we make make the third party dependency? I'd say if we can get it below 200mb I'd be fine with adding it

@kiendang
Copy link
Contributor

kiendang commented Mar 7, 2025

I had a branch that goes with the vendoring route. It gets the ant source jar, extracts only the source code for zip and adds it to sources. The 3.3.1 jar for os-lib (using the jar task) comes out to 692K.

@lihaoyi
Copy link
Member Author

lihaoyi commented Mar 7, 2025

That size seems reasonable. Is there a reason you chose to vendor a subfolder from Ant rather than Commons-Compress? I would assume that Commons-Compress would be more intended for external use and distribution, and would expect it to be better maintained than the Ant internal libraries

@kiendang
Copy link
Contributor

kiendang commented Mar 7, 2025

Commons-Compress actually got the zip code from Ant, so they are the same (with some added minor, but not essential QoL). The Commons-Compress zip code also depends on Commons-IO, which makes vendoring a bit more complicated (but not impossible) compared to Ant, which is standalone. And yes, if we go the dependency route like (4), that means depending on Commons-IO as well.

The bzip2, tar and zip support came from Avalon's Excalibur, but originally from Ant, as far as life in Apache goes

@kiendang
Copy link
Contributor

kiendang commented Mar 7, 2025

I get what you mean though. Vendoring Commons-Compress would be the better choice, if not for the Commons-IO dependency.

@lihaoyi
Copy link
Member Author

lihaoyi commented Mar 7, 2025

Got it. In that case I think using the zip code from Ant sounds reasonable; we can always fork it and maintain it ourselves if necessary, or replace the implementation later if a better option comes around

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants