TIFF image data sometimes loaded as byte[] into memory #221

drewnoakes · 2017-01-02T21:13:47Z

@Nadahar reports in #216 that image data is stored in a TIFF tag and therefore is loaded into memory.

[Exif IFD0 - 0x0117] Strip Byte Counts = 5859030 bytes
[Exif IFD0 - 0x8649] Unknown tag (0x8649) = [6380 values]
[Exif IFD0 - 0x8773] Inter Color Profile = [3144 values]
[Exif IFD0 - 0x935c] Unknown tag (0x935c) = [4542144 values]

Exiftool handles this via:

Image Source Data : (Binary data 4542144 bytes, use -b option to extract)

Most use cases will not require this data to be loaded into memory. It'd be nice to allow controlling this scenario.

We could simply exclude byte[] with greater than a given number of elements from directories.

Alternatively, we introduce a blacklist for known non-metadata tags.

Either way we should probably store an object that represents the unread data such that writing TIFF data doesn't accidentally lose this information.

Should this logic only apply to TIFF files (.tif and many raw formats)? Should it also apply to Exif data, which happens to be encoded using TIFF? For example, Exif data sometimes has large manufacturer sub-IFDs of unknown formatting which are not decoded and currently stored as byte arrays. They're unlikely useful.

If we do filter tags, should that be the default behaviour? Where would this be specified in the API? There's no top-level context/options object. Ideally this would be on the stack rather than a static/global. Would we have a all-encompassing options object passed into ImageMetadataReader, and specific ones for other readers? What about nested formats?

The text was updated successfully, but these errors were encountered:

Nadahar · 2017-01-14T12:56:53Z

I just tested with a local merge of #216 and #225 on my example image. Applying the filter:

new MetadataFilter() {

    @Override
    public boolean tagFilter(Directory directory, int tagType) {
        return !(directory instanceof ExifIFD0Directory) || tagType != 0x935c;
    }

    @Override
    public boolean directoryFilter(Class<? extends Directory> directory) {
        return true;
    }
}

makes the size of the serialized object change to 23130 bytes - which is more in line with what you would expect. Although this might not be true in every case, tag 0x935c is definitly the problem with this one.

Nadahar · 2017-02-17T21:16:06Z

This has turned out to be a bigger problem than I initially thought. My "image refactoring" PR UniversalMediaServer/UniversalMediaServer#1061 is getting close to finished, and I'm currently doing testing. I have a number of VM's I use for testing under different OS'es. When I try to test this on WinXP 32bit (but it would happen on any 32 bit) I can't test it because I keep running out of memory. I have nothing more more to give the JVM because of 32 bit limitations.

The problem is here. The actual error is thrown in RandomAccessStreamReader.getBytes(), but the cause is the attempted storage of that huge byte array.

java.lang.OutOfMemoryError: Java heap space
	at com.drew.lang.RandomAccessStreamReader.getBytes(RandomAccessStreamReader.java:187)
	at com.drew.imaging.tiff.TiffReader.processTag(TiffReader.java:264)
	at com.drew.imaging.tiff.TiffReader.processIfd(TiffReader.java:224)
	at com.drew.imaging.tiff.TiffReader.processIfd(TiffReader.java:216)
	at com.drew.imaging.tiff.TiffReader.processTiff(TiffReader.java:78)
	at com.drew.imaging.tiff.TiffMetadataReader.readMetadata(TiffMetadataReader.java:75)
	at net.pms.image.ImagesUtil.getMetadata(ImagesUtil.java:1582)

I don't think I can release this without a way to avoid reading those huge chunks, I can't expect everyone to have 16 GB of RAM. Both of the attached files are causing this issue on this test, but there's nothing special about them - many RAW files stores the image data in such a way.

Credo Barcelona
Credo 1017

kwhopper · 2017-02-17T22:49:33Z

I think this could work similar to ExifTool. Specific tags per directory are pre-marked as "binary" and then not read into memory by default if above a certain count (512, 65535, etc). Some option could be passed in ME that ignores this behavior, which doesn't seem too out of line to me (for now).

ExifTool typically writes a specific string to replace the tag's binary value (like "Binary data {length} bytes" or somesuch) and looks for that string when displaying a byte tag. We could do this without inspection by adding some properties to Tag if you want to do it that way, although that's a bigger change.

I don't think a PR would be too difficult if you want me to try it. It seems best to do it on a specific tag-by-tag basis in each directory as needed instead of some global default. Thoughts?

kwhopper · 2017-02-17T22:50:04Z

... a PR in the .NET project that is. I'm not as fast in the Java project.

Nadahar · 2017-02-18T13:49:36Z

I think @kwhopper's approach seems good. I'd choose to modify Tag with a flag and then store the size in the Tag value. Cleaner and faster than using String and inspection.

If @kwhopper makes a PR for .NET I might be able to "translate" it to Java and create a PR.

drewnoakes mentioned this issue Jan 2, 2017

Make Metadata serializable #216

Closed

drewnoakes mentioned this issue Jan 15, 2017

Implement Metadata filter #225

Closed

Nadahar mentioned this issue Feb 18, 2017

Show generic icons for media that don't provide thumbnails UniversalMediaServer/UniversalMediaServer#1061

Merged

drewnoakes mentioned this issue Mar 15, 2017

Stop when processing a large size png file #257

Open

drewnoakes added this to the 2.11 milestone Mar 31, 2017

drewnoakes mentioned this issue Dec 14, 2017

Read metadata from SD card is very slow for RAW files drewnoakes/metadata-extractor-dotnet#111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TIFF image data sometimes loaded as byte[] into memory #221

TIFF image data sometimes loaded as byte[] into memory #221

drewnoakes commented Jan 2, 2017 •

edited

Loading

Nadahar commented Jan 14, 2017 •

edited

Loading

Nadahar commented Feb 17, 2017

kwhopper commented Feb 17, 2017

kwhopper commented Feb 17, 2017

Nadahar commented Feb 18, 2017 •

edited

Loading

TIFF image data sometimes loaded as byte[] into memory #221

TIFF image data sometimes loaded as byte[] into memory #221

Comments

drewnoakes commented Jan 2, 2017 • edited Loading

Nadahar commented Jan 14, 2017 • edited Loading

Nadahar commented Feb 17, 2017

kwhopper commented Feb 17, 2017

kwhopper commented Feb 17, 2017

Nadahar commented Feb 18, 2017 • edited Loading

drewnoakes commented Jan 2, 2017 •

edited

Loading

Nadahar commented Jan 14, 2017 •

edited

Loading

Nadahar commented Feb 18, 2017 •

edited

Loading