Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document deepcopy is slow, especially with large documents #83

Open
2 tasks
rsjbailey opened this issue May 5, 2021 · 4 comments
Open
2 tasks

document deepcopy is slow, especially with large documents #83

rsjbailey opened this issue May 5, 2021 · 4 comments
Assignees

Comments

@rsjbailey
Copy link
Contributor

Document::deepCopy() performs poorly, particularly with large documents.

Two problems have been identified:

  • copy.cpp contains a lot of pass-by-value, passing std::map in this way in particular is expensive
  • adding an element to a document performs a duplicate check by doing a vector search (O(n)) for the element ID. During copy, this happens for every id in the original document, so copy is O(n^2) (or something approaching that, n grows as the document is copied)

The first is fairly trivial to fix.
The second could be fixed by keeping document ids in a container with better than O(n) lookup (e.g. unordered_map) or maybe keeping them sorted and doing a binary search.

@rsjbailey rsjbailey self-assigned this May 5, 2021
@tomjnixon
Copy link
Member

I don't think keeping them sorted works unless you use a really fancy container, because insertion into a random access container is O(n), and binary search over a list is even worse.

@tomjnixon
Copy link
Member

Looking at this a bit, it seems like the second problem only occurs for the top-level objects, not the audioBlockFormats, right?

There are not normally that many top-level elements, so it doesn't feel like an O(n^2) in there should be an issue unless each iteration is really inefficient.

@rsjbailey
Copy link
Contributor Author

Yeah, I almost added a "it's not as bad as it looks" comment because of blockformats not being affected. For most 'normal' uses it's fine which is why it hasn't really cropped up, I think when I was testing it 500 objects was noticeably slow, which isn't a inconceivable number for more adventurous things.

The use case that originally brought it up was sADM, I think the prototype implementation was doing a copy every time a new block(?, not sure of the term) came in or something like that - possibly indirectly, I can't remember the specifics. Anyway, that meant it needed to be fast. Might be that a different approach to the sADM implementation would be a better solution, but copying a document feels like it shouldn't be a particularly heavyweight thing to do.

Good point on sorting vectors. Other alternatives that popped into my head include adding a batch add (so you could just sort & merge two vectors) or just disabling the check completely when doing a copy as if you're starting from a valid document and copying into an empty one, there shouldn't be any duplicates.

@tomjnixon
Copy link
Member

Ah right. I'd still be tempted to benchmark it after sorting out the extra copies.

Otherwise I like the approach of just speeding up add, as that will help in other situations too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants