jar: support unconventional jar names #1467

RTann · 2025-01-14T02:52:11Z

Some JAR files just have bad names 🤷. Claircore should still continue to search for inner JARs in case the found JAR embeds valid JARs. Before this, we just stopped looking through any top-level JAR file with an unconventional name.

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):

A JAR file is essentially a zip file that contains an optional META-INF directory.

So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

daynewlee

The changes look ok to me. Maybe get @crozzy to have another look.

crozzy

The logic essentially looks good to me, just a few comments on the tests

java/jar/jar_test.go

java/jar/testdata/inner/README.md

java/packagescanner.go

RTann · 2025-03-18T17:53:55Z

@crozzy when you re-review can you also let me know your thoughts about the following:

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):

A JAR file is essentially a zip file that contains an optional META-INF directory.

So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

crozzy · 2025-03-25T17:42:44Z

@crozzy when you re-review can you also let me know your thoughts about the following:

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:
return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.
Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):
A JAR file is essentially a zip file that contains an optional META-INF directory.
So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

For point 1, Yeah I agree it'd be nicer to know from whence the data came and I think that's the intention of PackageDB (at least for the Java indexer). Whether it's another PR or a commit in this PR is up to you.

For point 2, I'm trying to judge the balance between making sure we're getting every corner-case / doing excess processing / inflating storage with sub-par/unmatchable data. I think it's probably worth documenting what the spec says, I don't think it warrants changing the current flow (in any case it should be another PR to update that logic).

crozzy

LGTM in current state, if you adjust the path in the PackageDB dismiss the review and I'll re-review.

RTann · 2025-03-25T18:30:36Z

@crozzy I noticed I kept a for loop in the test which set the SHAs to nil even though I added the cmpopts.IgnoreFields(Info{}, "SHA"), so I removed the loop. Can I get another approval?

RTann · 2025-03-25T18:33:07Z

@crozzy I also remembered what I was asking about for

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

so looking at the jar_test.go we can see the names of packages. These packages are buried pretty deep inside the top-level JAR file (see the README). So when a user sees these packages related to this JAR, they don't really know how to fix them. Where are these coming from? It may be worth updating the package DB to make it clear these are from inner JARs. zi can do that in a followup

RTann · 2025-03-25T21:34:59Z

PR to adjust PackageDB: #1503

Signed-off-by: RTann <[email protected]>

crozzy

LGTM

RTann force-pushed the jar-unidentified branch 5 times, most recently from 2245436 to f0d251e Compare January 16, 2025 00:01

RTann marked this pull request as ready for review January 16, 2025 00:11

RTann requested a review from a team as a code owner January 16, 2025 00:11

RTann requested review from crozzy and hdonnay and removed request for a team January 16, 2025 00:11

RTann force-pushed the jar-unidentified branch from f0d251e to 5f47a26 Compare January 22, 2025 01:00

RTann requested review from jvdm, BradLugo, daynewlee and dcaravel January 31, 2025 19:56

RTann force-pushed the jar-unidentified branch 2 times, most recently from 552f32f to 177a66d Compare February 12, 2025 16:48

RTann force-pushed the jar-unidentified branch 2 times, most recently from f8e7fb8 to 7ff485d Compare February 25, 2025 00:24

RTann force-pushed the jar-unidentified branch from 7ff485d to 35127cd Compare March 10, 2025 16:41

daynewlee previously approved these changes Mar 10, 2025

View reviewed changes

crozzy requested changes Mar 12, 2025

View reviewed changes

java/jar/jar_test.go Show resolved Hide resolved

java/jar/testdata/inner/README.md Outdated Show resolved Hide resolved

java/packagescanner.go Show resolved Hide resolved

RTann dismissed daynewlee’s stale review via 116d33f March 18, 2025 17:52

RTann force-pushed the jar-unidentified branch from 35127cd to 116d33f Compare March 18, 2025 17:52

RTann requested a review from crozzy March 18, 2025 17:52

RTann force-pushed the jar-unidentified branch from 116d33f to 0dc8ee8 Compare March 18, 2025 17:54

crozzy previously approved these changes Mar 25, 2025

View reviewed changes

RTann dismissed crozzy’s stale review via 7d5528c March 25, 2025 18:29

RTann force-pushed the jar-unidentified branch from 0dc8ee8 to 7d5528c Compare March 25, 2025 18:29

RTann requested a review from crozzy March 25, 2025 18:29

crozzy force-pushed the jar-unidentified branch from 7d5528c to b43690b Compare April 1, 2025 15:03

jar: support unconventional jar names

42b4a46

Signed-off-by: RTann <[email protected]>

crozzy force-pushed the jar-unidentified branch from b43690b to 42b4a46 Compare April 1, 2025 15:04

crozzy approved these changes Apr 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jar: support unconventional jar names #1467

jar: support unconventional jar names #1467

RTann commented Jan 14, 2025 •

edited

Loading

daynewlee left a comment

crozzy left a comment

RTann commented Mar 18, 2025

crozzy commented Mar 25, 2025

crozzy left a comment

RTann commented Mar 25, 2025

RTann commented Mar 25, 2025

RTann commented Mar 25, 2025

crozzy left a comment

jar: support unconventional jar names #1467

Are you sure you want to change the base?

jar: support unconventional jar names #1467

Conversation

RTann commented Jan 14, 2025 • edited Loading

daynewlee left a comment

Choose a reason for hiding this comment

crozzy left a comment

Choose a reason for hiding this comment

RTann commented Mar 18, 2025

crozzy commented Mar 25, 2025

crozzy left a comment

Choose a reason for hiding this comment

RTann commented Mar 25, 2025

RTann commented Mar 25, 2025

RTann commented Mar 25, 2025

crozzy left a comment

Choose a reason for hiding this comment

RTann commented Jan 14, 2025 •

edited

Loading