Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIFF image data sometimes loaded as byte[] into memory #221

Open
drewnoakes opened this issue Jan 2, 2017 · 5 comments
Open

TIFF image data sometimes loaded as byte[] into memory #221

drewnoakes opened this issue Jan 2, 2017 · 5 comments
Milestone

Comments

@drewnoakes
Copy link
Owner

drewnoakes commented Jan 2, 2017

@Nadahar reports in #216 that image data is stored in a TIFF tag and therefore is loaded into memory.

For this file we see:

[Exif IFD0 - 0x0117] Strip Byte Counts = 5859030 bytes
[Exif IFD0 - 0x8649] Unknown tag (0x8649) = [6380 values]
[Exif IFD0 - 0x8773] Inter Color Profile = [3144 values]
[Exif IFD0 - 0x935c] Unknown tag (0x935c) = [4542144 values]

Exiftool handles this via:

Image Source Data : (Binary data 4542144 bytes, use -b option to extract)

Most use cases will not require this data to be loaded into memory. It'd be nice to allow controlling this scenario.

We could simply exclude byte[] with greater than a given number of elements from directories.

Alternatively, we introduce a blacklist for known non-metadata tags.

Either way we should probably store an object that represents the unread data such that writing TIFF data doesn't accidentally lose this information.

Should this logic only apply to TIFF files (.tif and many raw formats)? Should it also apply to Exif data, which happens to be encoded using TIFF? For example, Exif data sometimes has large manufacturer sub-IFDs of unknown formatting which are not decoded and currently stored as byte arrays. They're unlikely useful.

If we do filter tags, should that be the default behaviour? Where would this be specified in the API? There's no top-level context/options object. Ideally this would be on the stack rather than a static/global. Would we have a all-encompassing options object passed into ImageMetadataReader, and specific ones for other readers? What about nested formats?

@Nadahar
Copy link
Contributor

Nadahar commented Jan 14, 2017

I just tested with a local merge of #216 and #225 on my example image. Applying the filter:

new MetadataFilter() {

    @Override
    public boolean tagFilter(Directory directory, int tagType) {
        return !(directory instanceof ExifIFD0Directory) || tagType != 0x935c;
    }

    @Override
    public boolean directoryFilter(Class<? extends Directory> directory) {
        return true;
    }
}

makes the size of the serialized object change to 23130 bytes - which is more in line with what you would expect. Although this might not be true in every case, tag 0x935c is definitly the problem with this one.

@Nadahar
Copy link
Contributor

Nadahar commented Feb 17, 2017

This has turned out to be a bigger problem than I initially thought. My "image refactoring" PR UniversalMediaServer/UniversalMediaServer#1061 is getting close to finished, and I'm currently doing testing. I have a number of VM's I use for testing under different OS'es. When I try to test this on WinXP 32bit (but it would happen on any 32 bit) I can't test it because I keep running out of memory. I have nothing more more to give the JVM because of 32 bit limitations.

The problem is here. The actual error is thrown in RandomAccessStreamReader.getBytes(), but the cause is the attempted storage of that huge byte array.

java.lang.OutOfMemoryError: Java heap space
	at com.drew.lang.RandomAccessStreamReader.getBytes(RandomAccessStreamReader.java:187)
	at com.drew.imaging.tiff.TiffReader.processTag(TiffReader.java:264)
	at com.drew.imaging.tiff.TiffReader.processIfd(TiffReader.java:224)
	at com.drew.imaging.tiff.TiffReader.processIfd(TiffReader.java:216)
	at com.drew.imaging.tiff.TiffReader.processTiff(TiffReader.java:78)
	at com.drew.imaging.tiff.TiffMetadataReader.readMetadata(TiffMetadataReader.java:75)
	at net.pms.image.ImagesUtil.getMetadata(ImagesUtil.java:1582)

I don't think I can release this without a way to avoid reading those huge chunks, I can't expect everyone to have 16 GB of RAM. Both of the attached files are causing this issue on this test, but there's nothing special about them - many RAW files stores the image data in such a way.

Credo Barcelona
Credo 1017

@kwhopper
Copy link
Collaborator

I think this could work similar to ExifTool. Specific tags per directory are pre-marked as "binary" and then not read into memory by default if above a certain count (512, 65535, etc). Some option could be passed in ME that ignores this behavior, which doesn't seem too out of line to me (for now).

ExifTool typically writes a specific string to replace the tag's binary value (like "Binary data {length} bytes" or somesuch) and looks for that string when displaying a byte tag. We could do this without inspection by adding some properties to Tag if you want to do it that way, although that's a bigger change.

I don't think a PR would be too difficult if you want me to try it. It seems best to do it on a specific tag-by-tag basis in each directory as needed instead of some global default. Thoughts?

@kwhopper
Copy link
Collaborator

... a PR in the .NET project that is. I'm not as fast in the Java project.

@Nadahar
Copy link
Contributor

Nadahar commented Feb 18, 2017

I think @kwhopper's approach seems good. I'd choose to modify Tag with a flag and then store the size in the Tag value. Cleaner and faster than using String and inspection.

If @kwhopper makes a PR for .NET I might be able to "translate" it to Java and create a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants