You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're about to roll out LogsDB for all integrations. LogsDB uses synthetic _source. The result is that _source may differ from the original one in several ways. For example, the ordering of arrays is not preserved and values in an array are de-duplicated (internally arrays are stored in a sorted set).
I'd like to propose that ECS defines which for which fields the ordering is important, so that store_array_source should be enabled. This comes with a storage overhead but allows us to return the original values.
An example for a field where the ordering is important is process.args:
The ordering isn't always important. For example, I'd consider the storage tradeoff for process.thread.capabilities.permitted to not be worth it. What matters here is the set of capabilities a thread permits, not in which order.
++ on the idea that Elasticsearch is smart about if a field contains an array or not and how to store it. If you send an array, ordering is preserved without having to specify any special mappings. This also ensures users don't have to learn about any new concept.
++ on the idea that Elasticsearch is smart about if a field contains an array or not and how to store it. If you send an array, ordering is preserved without having to specify any special mappings. This also ensures users don't have to learn about any new concept.
👍
I like this approach. Then we turn the question around, and ask which ECS array fields can we optimize by never storing source because they are truly sets.
We're about to roll out LogsDB for all integrations. LogsDB uses synthetic _source. The result is that _source may differ from the original one in several ways. For example, the ordering of arrays is not preserved and values in an array are de-duplicated (internally arrays are stored in a sorted set).
I'd like to propose that ECS defines which for which fields the ordering is important, so that
store_array_source
should be enabled. This comes with a storage overhead but allows us to return the original values.An example for a field where the ordering is important is
process.args
:ecs/schemas/process.yml
Lines 143 to 153 in 5376570
The ordering isn't always important. For example, I'd consider the storage tradeoff for
process.thread.capabilities.permitted
to not be worth it. What matters here is the set of capabilities a thread permits, not in which order.ecs/schemas/process.yml
Lines 205 to 215 in 5376570
The text was updated successfully, but these errors were encountered: