You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The metadata in the file has a UserComment tag in the Exif SubIFD directory that contains a UNICODE-encoded text containing JSON. With the existing code, the text will be decoded using BigEndianUnicode, which will result in incorrect text.
If the Encoding in TagDescriptor for the UNICODE encodingMap is set to Encoding.Unicode, it will decode properly.
Should this be just Unicode? Is there a discriminator that determines what endianess it should use?
The text was updated successfully, but these errors were encountered:
RupertAvery
changed the title
UNICODE EXIF UserComment tag read as BigEndian Unicode. Should this be just Unicode? Is there a discriminator that says how it should be read?
UNICODE EXIF UserComment tag read as BigEndian Unicode results in incorrect decoding
Jun 12, 2024
// TODO use ByteTrie here
// Someone suggested "ISO-8859-1".
var encodingMap = new Dictionary<string, Encoding>
{
["ASCII"] = Encoding.ASCII,
["UTF8"] = Encoding.UTF8,
#pragma warning disable SYSLIB0001 // Type or member is obsolete
["UTF7"] = Encoding.UTF7,
#pragma warning restore SYSLIB0001 // Type or member is obsolete
["UTF32"] = Encoding.UTF32,
// Affected code
["UNICODE"] = Encoding.Unicode,
};
It's a good question. There might not be one true answer unfortunately. Perhaps the endianness of the TIFF data stream should be used. However I doubt that different cameras/software handle this consistently.
Generally in this case I run the code before/after on the regression test suite to see whether it helps more than it hurts.
A workaround is to extract the comment bytes (StringValue) and use an explicit encoding directly.
The metadata in the file has a UserComment tag in the Exif SubIFD directory that contains a UNICODE-encoded text containing JSON. With the existing code, the text will be decoded using BigEndianUnicode, which will result in incorrect text.
If the Encoding in TagDescriptor for the UNICODE encodingMap is set to Encoding.Unicode, it will decode properly.
Should this be just Unicode? Is there a discriminator that determines what endianess it should use?
The text was updated successfully, but these errors were encountered: