Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error "negative length" in .mov files #172

Merged
merged 7 commits into from
Oct 2, 2020
Merged

Conversation

fabioperrella
Copy link
Contributor

closes #171

We found some files which have an atom with size 0.

This PR adds a branch to treat this condition to not raise an unexpected error.

I recorded the fixture file by myself and changed the binary file to raise the same error as I saw in some files from production.

I used QuickTime to record the video and MOOVParser did not recognize its width and height, which is another problem to tackle with, so I added an IF in the test which considered that all .mov fixtures would have dimension > 0.

@fabioperrella fabioperrella changed the title Mov negative length Fix error "negative length" in .mov files Sep 25, 2020
Copy link
Contributor

@julik julik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change feels uneasy because it both bypasses a test file (which should parse correctly) and bypasses a value which is not getting recovered correctly. Can we investigate what is going on with this example file? I probably still have the atom printing code stashed somewhere from the time when this was written 🤔 if it can be useful

@@ -236,6 +236,8 @@ def extract_atom_stream(io, max_read, current_branch = [])
parse_atom_children_and_data_fields(io, atom_size_sans_header, atom_type, current_branch)
elsif KNOWN_BRANCH_ATOM_TYPES.include?(atom_type)
[extract_atom_stream(io, atom_size_sans_header, current_branch + [atom_type]), nil]
elsif atom_size_sans_header <= 0 # sometimes the size can be invalid and we don't want exceptions because of it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the actual situations when the size is invalid? If we are reading and we need to read out a size, and the size does not get detected correctly, most likely the code is not going to be able to "latch" onto the next item in a proper way, since there is no offset. Or am I missing a branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julik when the size is 0 it raises the following error because it tries to call io.read(-8):

image

I got some examples from production to be sure they are similar, and I found this pattern in a few files:

image

(I'm using SublimeText plugin HexViewer to show the hex mode.)

The meta atom was found in the end of the file and it has "no content".

This is the relevant piece of code which parses this data:

atom_size, atom_type = size_and_type.unpack('Na4')
atom_header_size = io.pos - atom_pos
atom_size_sans_header = atom_size - atom_header_size

Then the variables got with this values:

atom_size: 8
atom_header_size: 8
atom_size_sans_header: 0

It seems the atom is imcomplete, or maybe without content (it has only the header).

So I don't think it's a problem with the offset and we need to treat atoms which doesn't have a content in a way.

Btw I changed the condition atom_size_sans_header <= 0 to atom_size_sans_header == 0 because I found some invalid/corrupted files which have this size < 0, and we don't expect to parse these files anyway (Quicktime does not even open it)

wdyt?

Copy link
Contributor Author

@fabioperrella fabioperrella Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just notices these files that I got from production are MP4:
(the name file.mov was set by myself)

exiftool file.mov
 xifr     exiftool
ExifTool Version Number         : 12.00
File Name                       : file.mov
Directory                       : .
File Size                       : 10 MB
File Modification Date/Time     : 2020:09:28 13:48:32-03:00
File Access Date/Time           : 2020:09:28 13:48:35-03:00
File Inode Change Date/Time     : 2020:09:28 13:48:37-03:00
File Permissions                : rw-r--r--
File Type                       : MP4
File Type Extension             : mp4
MIME Type                       : video/mp4
Major Brand                     : MP4 Base w/ AVC ext [ISO 14496-12:2005]
Minor Version                   : 0.0.0
Compatible Brands               : avc1, isom
Media Data Size                 : 977308
Media Data Offset               : 40
Movie Header Version            : 0
Create Date                     : 2020:09:28 09:59:37
Modify Date                     : 2020:09:28 09:59:37
Time Scale                      : 30000
Duration                        : 3.07 s
Preferred Rate                  : 1
Preferred Volume                : 100.00%
Preview Time                    : 0 s
Preview Duration                : 0 s
Poster Time                     : 0 s
Selection Time                  : 0 s
Selection Duration              : 0 s
Current Time                    : 0 s
Next Track ID                   : 4
Track Header Version            : 0
Track Create Date               : 2020:09:28 09:59:37
Track Modify Date               : 2020:09:28 09:59:37
Track ID                        : 1
Track Duration                  : 3.07 s
Track Layer                     : 0
Track Volume                    : 0.00%
Image Width                     : 848
Image Height                    : 480
Graphics Mode                   : srcCopy
Op Color                        : 0 0 0
Compressor ID                   : avc1
Source Image Width              : 848
Source Image Height             : 480
X Resolution                    : 72
Y Resolution                    : 72
Compressor Name                 : Ambarella AVC encoder
Bit Depth                       : 24
Video Frame Rate                : 29.97
Balance                         : 0
Audio Format                    : sowt
Audio Channels                  : 2
Audio Bits Per Sample           : 16
Audio Sample Rate               : 16000
Matrix Structure                : 1 0 0 0 1 0 0 0 1
Media Header Version            : 0
Media Create Date               : 2020:09:28 09:59:37
Media Modify Date               : 2020:09:28 09:59:37
Media Time Scale                : 1000
Media Duration                  : 2.90 s
Handler Type                    : Text
Handler Description             : Ambarella EXT
Gen Media Version               : 0
Gen Flags                       : 0 0 0
Gen Graphics Mode               : ditherCopy
Gen Op Color                    : 32768 32768 32768
Gen Balance                     : 0
Other Format                    : text
Warning                         : [minor] The ExtractEmbedded option may find more tags in the media data
Image Size                      : 848x480
Megapixels                      : 0.407
Avg Bitrate                     : 2.55 Mbps
Rotation                        : 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I changed the condition atom_size_sans_header <= 0 to atom_size_sans_header == 0 because I found some invalid/corrupted files which have this size < 0, and we don't expect to parse these files anyway (Quicktime does not even open it)

It looks like this is a meta atom which is empty (does not contain any child atoms). Is VLC able to open the file? And in your example with pry - it looks like you are landing on a hdlr atom instead. Does the file end at this atom (does it look like the file has been truncated)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pass me a couple of those files off-band maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you are landing on a hdlr atom instead

It happens because of this:

def parse_meta_atom(io, atom_size)
  parse_hdlr_atom(io, atom_size)
end

Is VLC able to open the file?

Yes, it can play the video!

Does the file end at this atom (does it look like the file has been truncated)?

yes it does! When it is truncated, normally exiftool warns about it, but in this case, it didn't

Can you pass me a couple of those files off-band maybe?

I sent you 2 samples to your email!

Copy link
Contributor Author

@fabioperrella fabioperrella Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tested with ffmpeg -v trace and it shows the same thing that I mentioned (an empty atom)

$ ffmpeg -v trace -i file.mov
#...
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7ffd2a808200] type:'nbmt' parent:'root' sz: 54732 10431028 10485760
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7ffd2a808200] type:'meta' parent:'root' sz: 8 10485760 10485760
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7ffd2a808200] on_parse_exit_offset=10485760
#...

@@ -1,6 +1,7 @@
# Handles decoding of MOV/MPEG4 atoms/boxes in a stream. Will recursively
# read atoms and parse their data fields if applicable. Also contains
# a few utility functions for finding atoms in a list etc.
# To know more about Atoms: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed a link to the docs with more details about the Atoms. I added this one, but I'm not sure if it is the best/correct one.

@fabioperrella fabioperrella changed the base branch from master to fix-mov-dimensions September 30, 2020 12:34
@fabioperrella fabioperrella requested a review from julik September 30, 2020 12:35
Copy link
Contributor

@linkyndy linkyndy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, once you make rubocop happy 👍

fabioperrella added a commit that referenced this pull request Oct 2, 2020
We recorded a MOV video in Quicktime to test #172 and we found out that the dimensions were not parsed correctly.

We discovered that the dimensions should be read from the video track, but the previous implementation read from the audio track sometimes because it read the first one.

So, we changed the MOOVParser to try to find the video track and if it finds, get the dimensions from it.

We used `ffmpeg` to help me to debug:
```bash
$ ffmpeg -v trace -i spec/fixtures/MOOV/MOV/Test_Dimensions.mov

#...
    Stream #0:0(und), 32, 1/44100: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 203 kb/s (default)
    Metadata:
      creation_time   : 2020-09-29T19:16:52.000000Z
      handler_name    : Core Media Audio
    Stream #0:1(und), 1, 1/30000: Video: h264 (Main), 1 reference frame (avc1 / 0x31637661), yuv420p(tv, smpte170m/smpte170m/bt709, left), 640x360 (640x368) [SAR 1:1 DAR 16:9], 0/1, 2615 kb/s, 30 fps, 30 tbr, 30k tbn, 60k tbc (default)  # <------------- this is the track which has the dimensions
#...
```
@fabioperrella fabioperrella changed the base branch from fix-mov-dimensions to master October 2, 2020 16:23
@fabioperrella fabioperrella merged commit 683d4f4 into master Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MOOVParser raises error when the size of a atom is zero
3 participants