Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FormatParser::IOConstraint API #128

Closed
krists opened this issue Jun 16, 2018 · 2 comments
Closed

FormatParser::IOConstraint API #128

krists opened this issue Jun 16, 2018 · 2 comments

Comments

@krists
Copy link
Contributor

krists commented Jun 16, 2018

Hello!

I am the author of ID3Tag and I have been reviewing how the gem is used in this library.
There were couple of things that I haven't thought of before and where solved in this library like v2 tag body read limit and single exception class that could be rescued while parsing.

After adding read limit functionality to the gem I tried to refactor lib/parsers/mp3_parser.rb and lib/parsers/mp3_parser/id3_extraction.rb clases as there still is a lot of code that deals with tag internal structure.

It would be possible to extract v1 and v2 tag sizes with simple:

audio_file = ID3Tag::AudioFile.new(io)
audio_file.v1_tag_present?
audio_file.v1_tag_size
audio_file.v2_tag_present?
audio_file.v2_tag_size

But current FormatParser::IOConstraint API does not allow that. It's read and seek methods has different arguments as Ruby's and it does not have rewind.

Is there a reason why read cannot be called without specifying length and seek does have 2nd argument?

If these interfaces would match then it would be realy easy to replace a lot of code with methods calls to ID3Tag::AudioFile

@julik
Copy link
Contributor

julik commented Jun 19, 2018

Hi @krists!

But current FormatParser::IOConstraint API does not allow that. It's read and seek methods has different arguments as Ruby's and it does not have rewind.

The reason is that we wrap the initial IO object with a few proxy objects to do the following:

  • Ensure we do not do unbounded read calls - if you do an unbounded read on a 4GB video for example.
  • Ensure we do not call read(n) and seek(n) indefinitely - this happens often if a parser misses a corner case and ends up in an endless loop (situations like until something.eof?...
  • Ensure we do not allow a parser to retrieve too many pages into the cache - since every page missed means an HTTP request to S3 on our side

The API surface of Ruby's IO object is very large, and quite a few methods in it are expressible in terms of other methods. For instance, IO#eof? can be expressed as IO#pos >= IO#size. IO#rewind can be expressed as IO#seek(0). So I took a hard look at the absolute minimum of methods and combinations of their signatures that we need to support in that stack throughout - and that minimum emerged in IOConstraint.

Some gems "want" to have more capabilities than IOConstraint provides and that's fine. EXIFR for instance wants more methods - like seeking with negative offsets and SEEK_CUR etc - for that we employ yet another wrapper like this one https://github.com/WeTransfer/format_parser/blob/master/lib/parsers/exif_parser.rb#L32 which "reimplements" these methods. We can add another for id3tag as well if necessary. Unbounded reads are a bad idea though. Do you need to do unbounded reads because you want to read until the IO you are reading from EOFs? Where do you need to do this if the ID3v1 that is at the end of the file has a fixed size?

@krists
Copy link
Contributor Author

krists commented Jun 22, 2018

@julik After a quick review I found only one place where read without arguments is used https://github.com/krists/id3tag/blob/master/lib/id3tag/audio_file.rb#L29-L30 But I would not consider it a unbounded read because line above it does seek from the end and there is no way it reads more than that. But I see how compromise here could simplify things for library users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants