Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not return metadata for some mp3 files #163

Closed
fabioperrella opened this issue Sep 7, 2020 · 1 comment · Fixed by #165
Closed

Does not return metadata for some mp3 files #163

fabioperrella opened this issue Sep 7, 2020 · 1 comment · Fixed by #165
Assignees
Labels

Comments

@fabioperrella
Copy link
Contributor

For some mp3 files, the MP3Parser class doesn't parse the mpeg frames correctly and because of that it fails when tries to find the frame_bitrate.

test.mp3.zip

This is an example of file with that problem ☝️ . @linkyndy told me that the owner of the file authorized us to use this file as an example, but I'm not sure if it's necessary to mention his name here, is it?

From what I saw, when the parser finds the sync bytes, it's not getting a valid mpeg_id.

These are the expected values according to the docs:

# 00 - MPEG Version 2.5
# 01 - reserved
# 10 - MPEG Version 2 (ISO/IEC 13818-3)
# 11 - MPEG Version 1 (ISO/IEC 11172-3)

For this file, the first frame that was found returns the value 01, which is reserved.

Because of that, it can't find the frame_bitrate and it raises an error InvalidDeepFetch.

image

I tested parsing the file with tinytag, which the method parse_mpeg_frames was based on , and it worked:

from tinytag import TinyTag
tag = TinyTag.get('./test.mp3')
print('bitrate: %s.' % tag.bitrate)
$ python test.py
bitrate: 128
@fabioperrella fabioperrella self-assigned this Sep 7, 2020
@fabioperrella
Copy link
Contributor Author

I will try to understand how TinyTag can read this file and maybe replicate the logic to format_parser

fabioperrella added a commit that referenced this issue Sep 16, 2020
In MP3Parser, the logic to jump to the next bytes was wrong.

To debug it, a puts was added to parse_mpeg_frames method as follows:

```
seek_jmp = sync_bytes_offset_in_4_byte_seq(four_bytes)
puts "frame: #{frame_i}, pos: #{io.pos-4}, pos(h): #{(io.pos-4).to_s(16)} four: #{four_bytes}, hexa: #{four_bytes.map {|ii| ii.to_s(16)} }, jump: #{seek_jmp}"
if seek_jmp > 0
  io.seek(io.pos - 0 + seek_jmp)
  next
end
```

Then the result was:

```
frame: 183, pos: 1474, pos(h): 5c2 four: [0, 255, 251, 148], hexa: ["0", "ff", "fb", "94"], jump: 1
frame: 184, pos: 1479, pos(h): 5c7 four: [0, 0, 0, 0], hexa: ["0", "0", "0", "0"], jump: 4
```

The expected behavior would be jumping to offset 1475, but in reality we went +4 because we didn't take into account the 4 bytes that were already read at the beginning of the method performing the read.

```
# lib/parsers/mp3_parser.rb:139
data = io.read(4)
```

We fixed to subtract this 4 bytes, as follows:

```
if seek_jmp > 0
  io.seek(io.pos - 4 + seek_jmp)
  next
end
```

And the result was:

```
frame: 366, pos: 1474, pos(h): 5c2 four: [0, 255, 251, 148], hexa: ["0", "ff", "fb", "94"], jump: 1
frame: 367, pos: 1475, pos(h): 5c3 four: [255, 251, 148, 196], hexa: ["ff", "fb", "94", "c4"], jump: 0
```

It was also necessary to change MAX_FRAMES_TO_SCAN from 128 to 500, since the previous value wasn't enough to reach the first valid frame in some files.

After this modification, the MP3Parser started to identify PNG as MP3 and because of that it was added a condition to detect the PNG header bytes and skip this parser in that condition. This can happen, because the MP3 file format is actually very lax. Later on we might need to consider adding a confidence score to the MP3 parser and only returning the parsed result if it is above a certain value.

Co-authored-by: Julik Tarkhanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant