Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize getting intersection points by doing integer arithmetic only #663

Conversation

reunanen
Copy link
Contributor

NB: I'm not really requesting to merge the code in this PR right now. Instead this is, at least for the time being, more of a side note with code related to the performance issue mentioned in #657.

While looking into #657 and performance, started to ask myself: if we anyway truncate the result (instead of rounding), is floating-point arithmetic even needed here?

At least on my machine, the integer-only arithmetic of this PR appears to speed up the benchmark code quite a bit (more than 2x, I think?).

It does change the results a little bit, though. In fact in some cases it's more accurate. Why? For example, in floating-point arithmetic 45.0 / 66.0 * 22.0 = 14.999999999999998 (which truncates to 14), but in plain integer arithmetic 22 * 45 / 66 = 15.

In fact, similar to this comment, it seems to me that it's not about avoiding floating-point arithmetic as such (the difference shouldn't be that remarkable, although that depends on the CPU of course), but more about how exactly the updated results end up being processed downstream. But I do still think that there may be something wrong with my profiler (or the way I'm using it).

In addition, there is of course the question of integer overflow. Does Clipper really handle 64-bit input coordinates? If yes, then the code in this PR won't work. But in that case there should also be a test case, I think. :)

@@ -1630,7 +1630,7 @@ CAPTION: 166.
CLIPTYPE: UNION
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, isn't this test case 166 a duplicate of 165?

@reunanen
Copy link
Contributor Author

Just for the record, running the benchmark program appears to call EdgesAdjacentInAEL as follows (depending on the GetIntersectPoint version):

  • The version in this PR: some 8 to 9 billion times
  • The current main branch version: some 32 to 34 billion times
  • The (rounding) version of PR 657: some 53 to 56 billion times

Maybe my profiler is fine, and the sheer number of times that EdgesAdjacentInAEL is called explains most of the observed performance differences. But why do innocuous-looking changes in GetIntersectPoint cause this function to be called many more times (as in #657) or much less (this PR)? That I do not know, at least just yet.

@reunanen
Copy link
Contributor Author

Ok reading this discussion, I understand that at the moment, the safe range for input coordinates is way beyond ±INT32_MAX. And that there's generally a lot to this. 😅

So it's still not like I'm proposing to merge this PR. As said, the intention of this PR was always to discuss the source of the apparent perf issue mentioned in #657. To make that clear, I'm closing this PR now.

Anyway, I need a version that behaves exactly the same in C++ and C#. And why not across different CPU hardware and different compilers, too – at least when the input coordinates can be represented as 32-bit integers or so. I understand that this may not be possible without at least some perf hit, and I'm willing to consider taking such hits if need be. We currently have a fork that essentially combines #657 and #660 and gives identical results in our (currently very limited) testing, so we're good for the time being. But ideally we would want to use the main version and not a fork.

@reunanen reunanen closed this Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant