You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The feature request for esbulk would be to somehow automate this speed-up, without users needing to re-sort or partition documents themselves. Some unstructured thoughts about this:
probably control by a CLI flag. could esbulk fetch mapping info from the cluster, eg number of shareds?
could require _routing field in documents, or fall back to _id or a key field if set
esbulk could partition documents to the existing worker threads. I think this might "just work" even if the number of worker threads is not equal to the number of index shards, but it would probably work better if batches were all a single shard at a time
or, esbulk could store per-shard caches internally, then when any individual shard cache reaches the bulk document size, send that batch to a worker thread. this would increase memory consumption, particularly with large documents and large number of shards, but that might be fine
The text was updated successfully, but these errors were encountered:
This elasticsearch blog post implies that doing batch indexing of documents all going to the same shard at a time improves performance: https://www.elastic.co/blog/how-kenna-security-speeds-up-elasticsearch-indexing-at-scale-part-1
The feature request for esbulk would be to somehow automate this speed-up, without users needing to re-sort or partition documents themselves. Some unstructured thoughts about this:
_routing
field in documents, or fall back to_id
or a key field if setThe text was updated successfully, but these errors were encountered: