Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPEG validation does not allow JPEG files with trailers #1118

Open
jstone-psi opened this issue Jan 31, 2025 · 2 comments · May be fixed by #1167
Open

JPEG validation does not allow JPEG files with trailers #1118

jstone-psi opened this issue Jan 31, 2025 · 2 comments · May be fixed by #1167
Assignees
Labels
B15.1 bug Something isn't working s.medium

Comments

@jstone-psi
Copy link

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I validated Product_Ancillary products with Encoded_Images in JPEG format, I noticed that the validator rejected the images as invalid JPEGs.
Investigating the issue revealed that these images have additional data beyond the EOI marker. These images contain a complete JPEG image and will still display correctly.

The test that the validator used was to check the beginning and the end of each file for the byte sequences 0xFFD8 (SOI (Start-of-interchange)) and 0xFFD9 (EOI (End-of-interchange)). While many JPEG files will pass this test, other JPEG files will not, and this test is inappropriate.

According to the PDS4 Information Model, the Encoding Standard Id of JPEG specifies that "the encoded image is governed by ISO/IEC 10918-1". However,
ISO/IEC 10918-1 does not specify a file format, only an interchange format. One file format for a JPEG file is described by the JFIF specification (ISO/IEC 10918-5) which is not referenced in the PDS4 definition. (There are also other specifications such as EXIF that are considered to be JPEG files, but this only emphasises that the format is underspecified.)

A compliant JPEG decoder (not necessarily JFIF) will not fail on a JPEG file that contains data after the EOI marker, since it still contains a complete JPEG interchange. ISO/IEC 10918-1 instructs the decoder to stop processing once the EOI is encountered.

🕵️ Expected behavior

I expected the validator to emit a WARNING, at most.

The validator should not automatically fail if the EOI is not at the end of the file. It would be better to process the interchange within the file and determine that it is valid. This should have a low impact on performance, since every marker except for SOI and EOI include length indicators, so it's possible to skip most of the file. Since it could still be an indicator of a problem if there is data beyond the EOI, a warning should still be emitted when trailing data is present. Parsing the file could be a net gain for the reliability of the validator, since there is more to the validity of a JPEG beyond the presence of the SOI and EOI.

Additionally, when offset and object length attributes are provided, the validator should use these values, and check for the SOI and EOI markers at the specified locations. Presumably, the offset and object length would define the boundaries of an interchange, so this would be an appropriate test.

I've included a flowchart for reference.

Image

📜 To Reproduce

Run validate on the provided label:

~/bin/validate-3.6.3/bin/validate 20170303t022534s621_sto_l0.b.xml

Observe the following error:

ERROR  [error.file.not_jpeg_compliant]   file:/Users/jessestone/Desktop/orex.tagcams.jpegs/20170303t022534s621_sto_l0.jpg is not valid JPEG file

🖥 Environment Info

Version of this sofware: 3.6.3
Operating System: MacOS 14.6.1

📚 Version of Software Used

gov.nasa.pds:validate
Version 3.6.3
Release Date: 2024-11-19 00:03:18

🩺 Test Data / Additional context

Archive.zip

🦄 Related requirements

PDS4 Information Model encoding_standard_id definition: https://pds.nasa.gov/datastandards/documents/im/v1/index_1N00.html#value_pds_encoded_image_pds_encoding_standard_id_jpeg

"SOME LITTLE-KNOWN ASPECTS OF THE HISTORY OF THE JPEG STILL PICTURE-CODING STANDARD, ITU-T T.81 |ISO/IEC 10918-1 (1986-1993)" https://www.itu.int/dms_pub/itu-s/opb/journal/S-JOURNAL-ICTS.V3I1-2020-13-PDF-E.pdf

ISO 10918-1: https://www.w3.org/Graphics/JPEG/itu-t81.pdf

ISO 10918-5: https://www.ijg.org/files/T-REC-T.871-201105-I!!PDF-E.pdf

Known JPEG Trailers: https://exiftool.org/TagNames/JPEG.html

ImageUtil.java:

// If first 2 bytes and last 2 bytes are 0xffd8 and 0xffd9 then the file is a

Acceptance Criteria

Given
When I perform
Then I expect

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

@al-niessner
Copy link
Contributor

@jordanpadams

Okay, I understand this now. Rather than do the expensive load a JPEG with a big library that knows what it is doing, we look at the first and last short of the file to pretend that is enough to make it a JPEG. With the JPEG trailers, obviously this is not correct. Out of curiosity, I tried EOG and it loaded the image no problem, but it did take time (about 30 seconds).

The question is: if first/last shorts fail, do the expensive load? Java libraries are not going to be faster than EOG. Doing a full load is the best I got unless you want to go down the WE support JPEG rabbit hole, but it will cause validation of@jstone-psi trailer JPEG files to take a very long time -- probably already does take a lot of time but it is going to get worse.

@al-niessner al-niessner linked a pull request Mar 9, 2025 that will close this issue
@al-niessner
Copy link
Contributor

@jordanpadams

I implemented the ImageIO solution. Being that we verify the JPEG extension first, this should not be too bad. Supposedly (documentation) ImageIO uses the extension to determine the decoding algorithm. Not tested, but it should help prevent turning the spreadsheet as the powerpoint homework assignment.

If you want something else, then let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B15.1 bug Something isn't working s.medium
Projects
Status: ToDo
Status: ToDo
Development

Successfully merging a pull request may close this issue.

3 participants