document deepcopy is slow, especially with large documents #83

rsjbailey · 2021-05-05T11:38:15Z

Document::deepCopy() performs poorly, particularly with large documents.

Two problems have been identified:

copy.cpp contains a lot of pass-by-value, passing std::map in this way in particular is expensive
adding an element to a document performs a duplicate check by doing a vector search (O(n)) for the element ID. During copy, this happens for every id in the original document, so copy is O(n^2) (or something approaching that, n grows as the document is copied)

The first is fairly trivial to fix.
The second could be fixed by keeping document ids in a container with better than O(n) lookup (e.g. unordered_map) or maybe keeping them sorted and doing a binary search.

tomjnixon · 2021-05-05T12:09:13Z

I don't think keeping them sorted works unless you use a really fancy container, because insertion into a random access container is O(n), and binary search over a list is even worse.

tomjnixon · 2021-05-05T12:16:10Z

Looking at this a bit, it seems like the second problem only occurs for the top-level objects, not the audioBlockFormats, right?

There are not normally that many top-level elements, so it doesn't feel like an O(n^2) in there should be an issue unless each iteration is really inefficient.

rsjbailey · 2021-05-05T12:42:10Z

Yeah, I almost added a "it's not as bad as it looks" comment because of blockformats not being affected. For most 'normal' uses it's fine which is why it hasn't really cropped up, I think when I was testing it 500 objects was noticeably slow, which isn't a inconceivable number for more adventurous things.

The use case that originally brought it up was sADM, I think the prototype implementation was doing a copy every time a new block(?, not sure of the term) came in or something like that - possibly indirectly, I can't remember the specifics. Anyway, that meant it needed to be fast. Might be that a different approach to the sADM implementation would be a better solution, but copying a document feels like it shouldn't be a particularly heavyweight thing to do.

Good point on sorting vectors. Other alternatives that popped into my head include adding a batch add (so you could just sort & merge two vectors) or just disabling the check completely when doing a copy as if you're starting from a valid document and copying into an empty one, there shouldn't be any duplicates.

tomjnixon · 2021-05-05T13:27:03Z

Ah right. I'd still be tempted to benchmark it after sorting out the extra copies.

Otherwise I like the approach of just speeding up add, as that will help in other situations too.

rsjbailey self-assigned this May 5, 2021

rsjbailey added the performance label May 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document deepcopy is slow, especially with large documents #83

document deepcopy is slow, especially with large documents #83

rsjbailey commented May 5, 2021

tomjnixon commented May 5, 2021

tomjnixon commented May 5, 2021

rsjbailey commented May 5, 2021

tomjnixon commented May 5, 2021

document deepcopy is slow, especially with large documents #83

document deepcopy is slow, especially with large documents #83

Comments

rsjbailey commented May 5, 2021

tomjnixon commented May 5, 2021

tomjnixon commented May 5, 2021

rsjbailey commented May 5, 2021

tomjnixon commented May 5, 2021