Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some flaky tests sorting the parsers by name also #184

Merged
merged 8 commits into from
Dec 17, 2020

Conversation

fabioperrella
Copy link
Contributor

@fabioperrella fabioperrella commented Dec 4, 2020

fix #183

Motivation

Some specs, like this one, calls the method FormatParser.parse with the option results: :first, which is the default value, and checks if the returned parser is the expected one.

In some cases, FormatParser.parse recognizes a file with multiple formats, and using results: :first, it returns only the first one.

The problem is that the order of these results varies.

In this method, there is a logic to sort the parsers by priority, but since there are many with the same priority, the order varies on each execution.

parsers_in_order_of_priority = parsers.to_a.sort do |parser_a, parser_b|
  @parser_priorities[parser_a] <=> @parser_priorities[parser_b  
end

Because of this, some tests fail as the following example:

  1) FormatParser.parse when multiple results are requested without :result=> :all (first result) is expected to eq #<FormatParser::Image:0x00007fa3258d28c0 @format=:dpx, @width_px=1, @height_px=1, @display_width_px=1, @display_height_px=1>
     Failure/Error: it { is_expected.to eq(image) }

       expected: #<FormatParser::Image:0x00007fa3258d28c0 @format=:dpx, @width_px=1, @height_px=1, @display_width_px=1, @display_height_px=1>
            got: #<FormatParser::Audio:0x00007fa321af2488 @format=:mp3, @num_audio_channels=2, @audio_sample_rate_hz=48000, @intrinsics={:id3tags=>[]}, @media_duration_seconds=22.06318816438356>

I don't know exactly why the order sometimes is different, but I could prove it by adding this line:

# lib/format_parser.rb
def self.parsers_for(desired_natures, desired_formats, filename_hint = nil)
  #...
  parsers_in_order_of_priority.each { |parser| puts parser.class } # <--- this one
  parsers_in_order_of_priority
end

When I run on my laptop:

irb(main):001:0> file = Tempfile.new
irb(main):002:0> FormatParser.parse(file)
FormatParser::GIFParser
Class
FormatParser::PDFParser
FormatParser::PNGParser
FormatParser::MOOVParser
# ...

When I run in production:

irb(main):005:0> file = Tempfile.new
=> #<Tempfile:/tmp/20201204-5987-1mu9uer>
irb(main):006:0> FormatParser.parse(file)
FormatParser::GIFParser
Class
FormatParser::PNGParser
FormatParser::PDFParser
FormatParser::MOOVParser
# ...

I think it's related to the following MUTEX, which sets a few class variables when registering the parsers:

# lib/format_parser.rb
PARSER_MUX = Mutex.new

#...

def self.register_parser(callable_parser, formats:, natures:, priority: LEAST_PRIORITY)
  parser_provided_formats = Array(formats)
  parser_provided_natures = Array(natures)
  PARSER_MUX.synchronize do
    @parsers ||= Set.new
    @parsers << callable_parser

I also did a test to see if this depend on the order the files are loaded, but the results are the same in staging and also in my laptop:

def self.register_parser(callable_parser, formats:, natures:, priority: LEAST_PRIORITY)
  puts "register: #{callable_parser.class}"

Proposal

This PR changes the way the registered parsers are sorted.

It can help us to fix the flaky tests and also getting different results when running FormatParser.parse to get only the first result.

Copy link
Contributor

@julik julik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why this is useful but I feel like we should explain what we are after with the change. It is not "fix some flaky tests" (which tests? why do they need fixing? are there situations where tests would pass where they should have failed?) but "always apply parsers in the same, deterministic order". Maybe we could find a way to autogenerate the priority value for this, instead of doing a (to me - not very obvious) namesort?

Also we need to apply autoformatting so that CI passes.

if @parser_priorities[parser_a] != @parser_priorities[parser_b]
@parser_priorities[parser_a] <=> @parser_priorities[parser_b]
else
parser_a.class.name <=> parser_b.class.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to explain why we are doing this in a comment

Copy link
Contributor Author

@fabioperrella fabioperrella Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment, could you take a look on if it's good pls?

@fabioperrella
Copy link
Contributor Author

Also we need to apply autoformatting so that CI passes.

@julik I tried to reproduce this failure in my laptop but I couldn't!

I'm using rbenv, ruby 2.2.10p489 and rubocop 0.52.1

Do you have any idea of why I can't reproduce it?

image

@fabioperrella
Copy link
Contributor Author

Also we need to apply autoformatting so that CI passes.

@julik I tried to reproduce this failure in my laptop but I couldn't!

I'm using rbenv, ruby 2.2.10p489 and rubocop 0.52.1

Do you have any idea of why I can't reproduce it?

image

Forget about it, I was in the wrong branch as it shows the picture!

@fabioperrella
Copy link
Contributor Author

I understand why this is useful but I feel like we should explain what we are after with the change. It is not "fix some flaky tests" (which tests? why do they need fixing? are there situations where tests would pass where they should have failed?) but "always apply parsers in the same, deterministic order". Maybe we could find a way to autogenerate the priority value for this, instead of doing a (to me - not very obvious) namesort?

Also we need to apply autoformatting so that CI passes.

@julik I updated the description with more details. Could you take a look pls?

@linkyndy
Copy link
Contributor

linkyndy commented Dec 7, 2020

Thanks, Fabio, for looking into this.

Just my two cents: this issue feels strange especially because it happens on the same (single-threaded) environment. On a quick look, the load order for parser files is always the same. Is the actual sorting happening by (string) class name?

@fabioperrella
Copy link
Contributor Author

Thanks, Fabio, for looking into this.

Just my two cents: this issue feels strange especially because it happens on the same (single-threaded) environment. On a quick look, the load order for parser files is always the same. Is the actual sorting happening by (string) class name?

@linkyndy the current sorting sorts by the priority. Each parser has a priority as follows:

@parser_priorities
=> {#<FormatParser::AIFFParser:0x00007ff69d9ee3f0>=>99,
 #<FormatParser::BMPParser:0x00007ff69d9fd080>=>99,
 #<FormatParser::CR2Parser:0x00007ff6a1952d70>=>99,
 #<FormatParser::DPXParser:0x00007ff6a108ef40>=>99,
 #<FormatParser::FDXParser:0x00007ff6a108d028>=>99,
 #<FormatParser::FLACParser:0x00007ff6a097f240>=>99,
 #<FormatParser::GIFParser:0x00007ff69da93f80>=>0,
 FormatParser::JPEGParser=>0,
 #<FormatParser::MOOVParser:0x00007ff6a1900570>=>1,
 #<FormatParser::MP3Parser:0x00007ff6a0951db8>=>99,
 FormatParser::MPEGParser=>99,
 #<FormatParser::OggParser:0x00007ff6a096fac0>=>99,
 #<FormatParser::PDFParser:0x00007ff6a096ce10>=>1,
 #<FormatParser::PNGParser:0x00007ff6a1084ce8>=>1,
 #<FormatParser::PSDParser:0x00007ff6a0977c48>=>99,
 #<FormatParser::TIFFParser:0x00007ff6a0976460>=>99,
 #<FormatParser::WAVParser:0x00007ff69daaba68>=>99,
 #<FormatParser::ZIPParser:0x00007ff6a098c918>=>2}

But, when the priority is the same, the method sort does not guarantee that the order of the result will always be the same:

The result is not guaranteed to be stable. When the comparison of two elements returns 0, the order of the elements is unpredictable.

In fact, I think this explains almost everything!

@linkyndy
Copy link
Contributor

linkyndy commented Dec 8, 2020

But, when the priority is the same, the method sort does not guarantee that the order of the result will always be the same:

The result is not guaranteed to be stable. When the comparison of two elements returns 0, the order of the elements is unpredictable.

In fact, I think this explains almost everything!

That's where I wanted to get 😊

@fabioperrella fabioperrella requested a review from julik December 9, 2020 14:40
@julik
Copy link
Contributor

julik commented Dec 15, 2020

Still on the fence about this sorting using a class name. We specify in the documentation for example that a parser can be a Proc. A Proc will not give you a meaningful class name you can sort on, you also pretty much mandate that a parser must be an instance of a fairly unique class :thinking_face: while we can change the library to work this way it does make the API more complicated. How could we solve this differently I wonder?

For example, we add parsers in a certain priority order if a priority value is given. What if we were to assign a better priority value if none is specified, based on the number of parsers already configured?

@fabioperrella
Copy link
Contributor Author

Still on the fence about this sorting using a class name. We specify in the documentation for example that a parser can be a Proc. A Proc will not give you a meaningful class name you can sort on, you also pretty much mandate that a parser must be an instance of a fairly unique class :thinking_face: while we can change the library to work this way it does make the API more complicated. How could we solve this differently I wonder?

For example, we add parsers in a certain priority order if a priority value is given. What if we were to assign a better priority value if none is specified, based on the number of parsers already configured?

What if we change the method register_parser to sum 0.0001 to the priority if an equal priority is already registered?

Example:

FormatParser.register_parser(:a, 1)
FormatParser.register_parser(:b, 2)
FormatParser.register_parser(:c, 2)
FormatParser.register_parser(:d, 2)
FormatParser.register_parser(:e, 3)

result:

[
  [:a, 1], 
  [:b, 2], 
  [:c, 2.0001], 
  [:d, 2.0002], 
  [:e, 3]
]

@julik
Copy link
Contributor

julik commented Dec 15, 2020

So the issue that we have is that also parsers with the same priority get sorted undeterministically right?

@fabioperrella
Copy link
Contributor Author

So the issue that we have is that also parsers with the same priority get sorted undeterministically right?

yes, it is!

@julik
Copy link
Contributor

julik commented Dec 15, 2020

Ok, what if we record the order in which parsers get registered (and when doing a require we do it in a deterministic order by doing a sort after the glob)? Then we sort by two components - priority and within the priority the order of registration?

@fabioperrella
Copy link
Contributor Author

@julik could you take a look if this is the change that you expected pls?

Copy link
Contributor

@julik julik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Sweet

@julik
Copy link
Contributor

julik commented Dec 16, 2020

I used Set initially to make it a no-op if a parser mistakenly gets registered twice. But maybe this was overly cautious, I dunno.

@linkyndy
Copy link
Contributor

I used Set initially to make it a no-op if a parser mistakenly gets registered twice. But maybe this was overly cautious, I dunno.

I think it made total sense given ordering was not important. Now that it is, an ordered set would make more sense. But it is good to be cautious with regards to this aspect 🙂

@fabioperrella
Copy link
Contributor Author

@linkyndy I think I replied to your question and now it's ready to be approved. Is there anything else that worries you?

@fabioperrella fabioperrella merged commit 84a01e2 into master Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky tests when using results: :first
4 participants