Skip to content

Fix splay tree amortized time issues (resolves #923) #924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

matthewsot
Copy link

@matthewsot matthewsot commented Feb 22, 2025

Per discussion in #923, this PR ensures that the splay tree's search_node! now always splays the last-seen node. This means that push! will reorganize the tree even if the push is already present, and haskey will reorganize the tree even if the key is not present. This resolves a performance issue. I also added some basic tree benchmarks.

In addition, I think I found a correctness bug in delete!, where it doesn't check that the node returned by search_node! actually has the requested key (search_node! seems to return the last visited node during the traversal, whether or not it was actually the node we searched for). I fixed this and added a regression test. You can check that it fails on the old splay tree implementation using this branch: https://github.com/matthewsot/DataStructures.jl/tree/regression

A few things that came up in the process that I'm not 100% sure how you all want to handle:

  1. It seems like search_node was part of the public API, but now it's been renamed to search_node! for splay trees as suggested in the discussion in Splay tree amortized time issues #923, which could be a breaking change. Is that intended?
  2. splay_tree.jl had a few places using == nothing and a few other places using === nothing. I'm not totally sure what the difference is, so I tried to basically just do whatever the most similar code was doing before.

Feedback welcome!

Fixes two bugs in the splay tree implementation:

    1. Not splaying on failed searches or redundant insertions can break
       the O(log n) amortized time guarantee.

       Fix: Always splay at the end of search_node!

    2. Deletion did not check that the node returned from search_node!
       had the desired key, so trying to delete keys not in the splay
       tree could cause it to accidentally delete unrelated keys.

       Fix: Check node.data for the return value of search_node!, as is
       done already for other operations.

I've also added a regression test for bug (2).
@oxinabox oxinabox requested a review from eulerkochy February 24, 2025 11:51
@matthewsot
Copy link
Author

Thanks for the feedback, everyone!

Regarding benchmark results, here they are on my machine for the current master branch (without this PR): https://github.com/matthewsot/DataStructures.jl/blob/regression/old.md

Notice ["trees", "splay", "redundant"] takes almost 45s, whereas AVL and RB are in the millisecond range.

I'll post final benchmark results for this PR after resolving the other issues, but early results it seems like it brings the splay tree performance into the millisecond range as well.

Copy link
Member

@eulerkochy eulerkochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Add the comment regarding the un-! being the user-visible state, and then we can merge. Also post the final benchmark results.

@matthewsot
Copy link
Author

Thanks for all the feedback!

In the latest commit I've added a comment about how search_node doesn't change the user-facing state of the set. I've also attached both the old and new benchmark runs to this comment. The main thing that changed is the ~1000x speedup in the ["trees", "splay", "redundant"] test.

old_benchmarks.md
new_benchmarks.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants