You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I validated Product_Ancillary products with Encoded_Images in JPEG format, I noticed that the validator rejected the images as invalid JPEGs.
Investigating the issue revealed that these images have additional data beyond the EOI marker. These images contain a complete JPEG image and will still display correctly.
The test that the validator used was to check the beginning and the end of each file for the byte sequences 0xFFD8 (SOI (Start-of-interchange)) and 0xFFD9 (EOI (End-of-interchange)). While many JPEG files will pass this test, other JPEG files will not, and this test is inappropriate.
According to the PDS4 Information Model, the Encoding Standard Id of JPEG specifies that "the encoded image is governed by ISO/IEC 10918-1". However,
ISO/IEC 10918-1 does not specify a file format, only an interchange format. One file format for a JPEG file is described by the JFIF specification (ISO/IEC 10918-5) which is not referenced in the PDS4 definition. (There are also other specifications such as EXIF that are considered to be JPEG files, but this only emphasises that the format is underspecified.)
A compliant JPEG decoder (not necessarily JFIF) will not fail on a JPEG file that contains data after the EOI marker, since it still contains a complete JPEG interchange. ISO/IEC 10918-1 instructs the decoder to stop processing once the EOI is encountered.
🕵️ Expected behavior
I expected the validator to emit a WARNING, at most.
The validator should not automatically fail if the EOI is not at the end of the file. It would be better to process the interchange within the file and determine that it is valid. This should have a low impact on performance, since every marker except for SOI and EOI include length indicators, so it's possible to skip most of the file. Since it could still be an indicator of a problem if there is data beyond the EOI, a warning should still be emitted when trailing data is present. Parsing the file could be a net gain for the reliability of the validator, since there is more to the validity of a JPEG beyond the presence of the SOI and EOI.
Additionally, when offset and object length attributes are provided, the validator should use these values, and check for the SOI and EOI markers at the specified locations. Presumably, the offset and object length would define the boundaries of an interchange, so this would be an appropriate test.
Okay, I understand this now. Rather than do the expensive load a JPEG with a big library that knows what it is doing, we look at the first and last short of the file to pretend that is enough to make it a JPEG. With the JPEG trailers, obviously this is not correct. Out of curiosity, I tried EOG and it loaded the image no problem, but it did take time (about 30 seconds).
The question is: if first/last shorts fail, do the expensive load? Java libraries are not going to be faster than EOG. Doing a full load is the best I got unless you want to go down the WE support JPEG rabbit hole, but it will cause validation of@jstone-psi trailer JPEG files to take a very long time -- probably already does take a lot of time but it is going to get worse.
I implemented the ImageIO solution. Being that we verify the JPEG extension first, this should not be too bad. Supposedly (documentation) ImageIO uses the extension to determine the decoding algorithm. Not tested, but it should help prevent turning the spreadsheet as the powerpoint homework assignment.
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
When I validated Product_Ancillary products with Encoded_Images in JPEG format, I noticed that the validator rejected the images as invalid JPEGs.
Investigating the issue revealed that these images have additional data beyond the EOI marker. These images contain a complete JPEG image and will still display correctly.
The test that the validator used was to check the beginning and the end of each file for the byte sequences 0xFFD8 (SOI (Start-of-interchange)) and 0xFFD9 (EOI (End-of-interchange)). While many JPEG files will pass this test, other JPEG files will not, and this test is inappropriate.
According to the PDS4 Information Model, the Encoding Standard Id of JPEG specifies that "the encoded image is governed by ISO/IEC 10918-1". However,
ISO/IEC 10918-1 does not specify a file format, only an interchange format. One file format for a JPEG file is described by the JFIF specification (ISO/IEC 10918-5) which is not referenced in the PDS4 definition. (There are also other specifications such as EXIF that are considered to be JPEG files, but this only emphasises that the format is underspecified.)
A compliant JPEG decoder (not necessarily JFIF) will not fail on a JPEG file that contains data after the EOI marker, since it still contains a complete JPEG interchange. ISO/IEC 10918-1 instructs the decoder to stop processing once the EOI is encountered.
🕵️ Expected behavior
I expected the validator to emit a WARNING, at most.
The validator should not automatically fail if the EOI is not at the end of the file. It would be better to process the interchange within the file and determine that it is valid. This should have a low impact on performance, since every marker except for SOI and EOI include length indicators, so it's possible to skip most of the file. Since it could still be an indicator of a problem if there is data beyond the EOI, a warning should still be emitted when trailing data is present. Parsing the file could be a net gain for the reliability of the validator, since there is more to the validity of a JPEG beyond the presence of the SOI and EOI.
Additionally, when offset and object length attributes are provided, the validator should use these values, and check for the SOI and EOI markers at the specified locations. Presumably, the offset and object length would define the boundaries of an interchange, so this would be an appropriate test.
I've included a flowchart for reference.
📜 To Reproduce
Run validate on the provided label:
~/bin/validate-3.6.3/bin/validate 20170303t022534s621_sto_l0.b.xml
Observe the following error:
🖥 Environment Info
Version of this sofware: 3.6.3
Operating System: MacOS 14.6.1
📚 Version of Software Used
gov.nasa.pds:validate
Version 3.6.3
Release Date: 2024-11-19 00:03:18
🩺 Test Data / Additional context
Archive.zip
🦄 Related requirements
PDS4 Information Model encoding_standard_id definition: https://pds.nasa.gov/datastandards/documents/im/v1/index_1N00.html#value_pds_encoded_image_pds_encoding_standard_id_jpeg
"SOME LITTLE-KNOWN ASPECTS OF THE HISTORY OF THE JPEG STILL PICTURE-CODING STANDARD, ITU-T T.81 |ISO/IEC 10918-1 (1986-1993)" https://www.itu.int/dms_pub/itu-s/opb/journal/S-JOURNAL-ICTS.V3I1-2020-13-PDF-E.pdf
ISO 10918-1: https://www.w3.org/Graphics/JPEG/itu-t81.pdf
ISO 10918-5: https://www.ijg.org/files/T-REC-T.871-201105-I!!PDF-E.pdf
Known JPEG Trailers: https://exiftool.org/TagNames/JPEG.html
ImageUtil.java:
validate/src/main/java/gov/nasa/pds/tools/util/ImageUtil.java
Line 92 in 5aa2b04
Acceptance Criteria
Given
When I perform
Then I expect
⚙️ Engineering Details
No response
🎉 Integration & Test
No response
The text was updated successfully, but these errors were encountered: