Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jar: support unconventional jar names #1467

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RTann
Copy link
Contributor

@RTann RTann commented Jan 14, 2025

Some JAR files just have bad names 🤷. Claircore should still continue to search for inner JARs in case the found JAR embeds valid JARs. Before this, we just stopped looking through any top-level JAR file with an unconventional name.

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):

A JAR file is essentially a zip file that contains an optional META-INF directory.

So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

@RTann RTann force-pushed the jar-unidentified branch 5 times, most recently from 2245436 to f0d251e Compare January 16, 2025 00:01
@RTann RTann marked this pull request as ready for review January 16, 2025 00:11
@RTann RTann requested a review from a team as a code owner January 16, 2025 00:11
@RTann RTann requested review from crozzy and hdonnay and removed request for a team January 16, 2025 00:11
@RTann RTann force-pushed the jar-unidentified branch 2 times, most recently from 552f32f to 177a66d Compare February 12, 2025 16:48
@RTann RTann force-pushed the jar-unidentified branch 2 times, most recently from f8e7fb8 to 7ff485d Compare February 25, 2025 00:24
@RTann RTann force-pushed the jar-unidentified branch from 7ff485d to 35127cd Compare March 10, 2025 16:41
daynewlee
daynewlee previously approved these changes Mar 10, 2025
Copy link
Contributor

@daynewlee daynewlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look ok to me. Maybe get @crozzy to have another look.

Copy link
Contributor

@crozzy crozzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic essentially looks good to me, just a few comments on the tests

@RTann
Copy link
Contributor Author

RTann commented Mar 18, 2025

@crozzy when you re-review can you also let me know your thoughts about the following:

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):

A JAR file is essentially a zip file that contains an optional META-INF directory.

So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

@RTann RTann force-pushed the jar-unidentified branch from 116d33f to 0dc8ee8 Compare March 18, 2025 17:54
@crozzy
Copy link
Contributor

crozzy commented Mar 25, 2025

@crozzy when you re-review can you also let me know your thoughts about the following:

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:
return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.
Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):
A JAR file is essentially a zip file that contains an optional META-INF directory.
So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

For point 1, Yeah I agree it'd be nicer to know from whence the data came and I think that's the intention of PackageDB (at least for the Java indexer). Whether it's another PR or a commit in this PR is up to you.

For point 2, I'm trying to judge the balance between making sure we're getting every corner-case / doing excess processing / inflating storage with sub-par/unmatchable data. I think it's probably worth documenting what the spec says, I don't think it warrants changing the current flow (in any case it should be another PR to update that logic).

crozzy
crozzy previously approved these changes Mar 25, 2025
Copy link
Contributor

@crozzy crozzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in current state, if you adjust the path in the PackageDB dismiss the review and I'll re-review.

@RTann RTann force-pushed the jar-unidentified branch from 0dc8ee8 to 7d5528c Compare March 25, 2025 18:29
@RTann RTann requested a review from crozzy March 25, 2025 18:29
@RTann
Copy link
Contributor Author

RTann commented Mar 25, 2025

@crozzy I noticed I kept a for loop in the test which set the SHAs to nil even though I added the cmpopts.IgnoreFields(Info{}, "SHA"), so I removed the loop. Can I get another approval?

@RTann
Copy link
Contributor Author

RTann commented Mar 25, 2025

@crozzy I also remembered what I was asking about for

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

so looking at the jar_test.go we can see the names of packages. These packages are buried pretty deep inside the top-level JAR file (see the README). So when a user sees these packages related to this JAR, they don't really know how to fix them. Where are these coming from? It may be worth updating the package DB to make it clear these are from inner JARs. zi can do that in a followup

@RTann
Copy link
Contributor Author

RTann commented Mar 25, 2025

PR to adjust PackageDB: #1503

@crozzy crozzy force-pushed the jar-unidentified branch from 7d5528c to b43690b Compare April 1, 2025 15:03
@crozzy crozzy force-pushed the jar-unidentified branch from b43690b to 42b4a46 Compare April 1, 2025 15:04
Copy link
Contributor

@crozzy crozzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants