Does not return metadata for some mp3 files #163

fabioperrella · 2020-09-07T20:52:14Z

For some mp3 files, the MP3Parser class doesn't parse the mpeg frames correctly and because of that it fails when tries to find the frame_bitrate.

test.mp3.zip

This is an example of file with that problem ☝️ . @linkyndy told me that the owner of the file authorized us to use this file as an example, but I'm not sure if it's necessary to mention his name here, is it?

From what I saw, when the parser finds the sync bytes, it's not getting a valid mpeg_id.

These are the expected values according to the docs:

# 00 - MPEG Version 2.5
# 01 - reserved
# 10 - MPEG Version 2 (ISO/IEC 13818-3)
# 11 - MPEG Version 1 (ISO/IEC 11172-3)

For this file, the first frame that was found returns the value 01, which is reserved.

Because of that, it can't find the frame_bitrate and it raises an error InvalidDeepFetch.

I tested parsing the file with tinytag, which the method parse_mpeg_frames was based on , and it worked:

from tinytag import TinyTag
tag = TinyTag.get('./test.mp3')
print('bitrate: %s.' % tag.bitrate)

$ python test.py
bitrate: 128

The text was updated successfully, but these errors were encountered:

fabioperrella · 2020-09-07T20:54:39Z

I will try to understand how TinyTag can read this file and maybe replicate the logic to format_parser

In MP3Parser, the logic to jump to the next bytes was wrong. To debug it, a puts was added to parse_mpeg_frames method as follows: ``` seek_jmp = sync_bytes_offset_in_4_byte_seq(four_bytes) puts "frame: #{frame_i}, pos: #{io.pos-4}, pos(h): #{(io.pos-4).to_s(16)} four: #{four_bytes}, hexa: #{four_bytes.map {|ii| ii.to_s(16)} }, jump: #{seek_jmp}" if seek_jmp > 0 io.seek(io.pos - 0 + seek_jmp) next end ``` Then the result was: ``` frame: 183, pos: 1474, pos(h): 5c2 four: [0, 255, 251, 148], hexa: ["0", "ff", "fb", "94"], jump: 1 frame: 184, pos: 1479, pos(h): 5c7 four: [0, 0, 0, 0], hexa: ["0", "0", "0", "0"], jump: 4 ``` The expected behavior would be jumping to offset 1475, but in reality we went +4 because we didn't take into account the 4 bytes that were already read at the beginning of the method performing the read. ``` # lib/parsers/mp3_parser.rb:139 data = io.read(4) ``` We fixed to subtract this 4 bytes, as follows: ``` if seek_jmp > 0 io.seek(io.pos - 4 + seek_jmp) next end ``` And the result was: ``` frame: 366, pos: 1474, pos(h): 5c2 four: [0, 255, 251, 148], hexa: ["0", "ff", "fb", "94"], jump: 1 frame: 367, pos: 1475, pos(h): 5c3 four: [255, 251, 148, 196], hexa: ["ff", "fb", "94", "c4"], jump: 0 ``` It was also necessary to change MAX_FRAMES_TO_SCAN from 128 to 500, since the previous value wasn't enough to reach the first valid frame in some files. After this modification, the MP3Parser started to identify PNG as MP3 and because of that it was added a condition to detect the PNG header bytes and skip this parser in that condition. This can happen, because the MP3 file format is actually very lax. Later on we might need to consider adding a confidence score to the MP3 parser and only returning the parsed result if it is above a certain value. Co-authored-by: Julik Tarkhanov <[email protected]>

fabioperrella self-assigned this Sep 7, 2020

fabioperrella added the bug label Sep 7, 2020

fabioperrella mentioned this issue Sep 8, 2020

Fix mp3 frames reading to jump correctly to the next bytes #165

Merged

fabioperrella closed this as completed in #165 Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does not return metadata for some mp3 files #163

Does not return metadata for some mp3 files #163

fabioperrella commented Sep 7, 2020

fabioperrella commented Sep 7, 2020

Does not return metadata for some mp3 files #163

Does not return metadata for some mp3 files #163

Comments

fabioperrella commented Sep 7, 2020

fabioperrella commented Sep 7, 2020