-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firestore.Decoder is slow compared to JSONDecoder #13849
Comments
I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight. |
I'm not working at Google, but I was a contributor of some of this functionality, so I have some input. :-) The encoder and decoder shipped with Firebase is based on the open source implementation that was part of Swift until it was recently replaced by a much more performant implementation. The 'old' Swift implementation of JSONDecoder started out with Since both Firestore and the Realtime Database have similar structural representations (nested dictionaries and arrays), the decoder in firebase is basically a copy of what Swift used to do, but it skipped the initial conversion from So this is basically what it is. There are also some details in that the structural representation in Firestore can contain types that are not from JSON - like the built-in Since then, This representation is then used in the re-implemented So - that's the state of the current decoder - and perhaps partly an explanation about why the built-in Would it be possible to use the newer Foundation implementation in Firebase too? With regard to your question of parallelization, there's nothing in the actual decoder that does any form of synchronization. So I'm guessing that it's actually the accessing of the value on a Firestore document snapshot that does some synchronization. One guess would be that it fetches data from a shared caching layer, and access to that could require the synchronization? You could try instrumenting your sample code above to see where most of the time is being spent in both the serial and parallel examples - perhaps there are some low hanging fruits? All of this is just my 2 cents. :-) |
@collinhundley, I'm marking this as a feature request for a more performant Firestore.Decoder. We will continue to listen for community feedback to help us prioritize the effort. In the meantime, your approach of reducing the time to decode seems reasonable. Are you able to instrument your code as @mortenbekditlevsen suggested, or even add some log statements simple logging to confirm that your tasks are indeed running in parallel? Are you running these tests on a simulator or on an actual device? |
@mortenbekditlevsen thank you for the detailed response. With regards to parallelization, I did some further testing which leads me to believe that document access is not the bottleneck here. We can remove document access from the equation by converting all documents to dictionaries up front: let dictStart = CACurrentMediaTime()
let dictionaries = documents.map{$0.data()}
print(CACurrentMediaTime() - dictStart) // 0.75s
let chunkedDictionaries = dictionaries.chunked(into: 5000) // [[[String : Any]]]
let decodeStart = CACurrentMediaTime()
let items = await withTaskGroup(of: [Item].self) { group in
for chunk in chunks {
group.addTask {
let decoder = Firestore.Decoder()
return chunk.compactMap{try? decoder.decode(Item.self, from: $0)}
}
}
var items = [Item]()
for await result in group {
items.append(contentsOf: result)
}
return items
}
print(CACurrentMediaTime() - decodeStart) // 10.9s So 94% of the time is spent on actual decoding. Profiling confirms that 10 threads are fully utilized concurrently on my M1 Max. Here's a sample from one of these threads, expanded a few levels:
I'm a little stumped - any other ideas where synchronization might be occurring? |
What do you see if you print elapsed time at the beginning of each task? Or maybe print a start and stop identifier for each task?
|
Ok, another update: By manually testing a variety of batch sizes, I am able to squeeze out some extra performance by reducing the number of tasks to 4 (with 12,500 documents each). The decoding time is reduced to 5.2s - down from the original 9.6s single thread. So I was wrong to say that the decoder can't be parallelized at all. However, there seems to be a lot of performance left on the table... I originally chose 10 tasks to correspond to my 10 M1 Max cores, and I know that's the limit that Are we maxing out some other resource then? I can't manually tweak the batch size for every CPU out there... it would be preferable to let |
@MarkDuckworth I've done exactly this, and interestingly they all start and stop at nearly the same time:
Wouldn't you expect the efficiency cores to take a bit longer than the rest? |
Description
Decoding an array of
QuerySnapshotDocument
usingFirestore.Decoder
is very slow, compared to decoding a similar array usingJSONDecoder
:What's more is that we don't gain any performance from parallelization. For example, if we split
documents
into 10 arrays of 5k documents each, and initialize oneFirestore.Decoder
for each group:...the total time is worse: about 10.3s.
So there are 2 issues:
Firestore.Decoder
4x slower thanJSONDecoder
?Firestore.Decoder
utilize some shared internal state that prevents instances from running independently on different threads?Reproducing the issue
No response
Firebase SDK Version
10.29
Xcode Version
16
Installation Method
Swift Package Manager
Firebase Product(s)
Firestore
Targeted Platforms
iOS, macOS
Relevant Log Output
No response
If using Swift Package Manager, the project's Package.resolved
Expand
Package.resolved
snippetReplace this line with the contents of your Package.resolved.
If using CocoaPods, the project's Podfile.lock
Expand
Podfile.lock
snippetReplace this line with the contents of your Podfile.lock!
The text was updated successfully, but these errors were encountered: