-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Do not respect synthetic_source_keep=arrays if type parses arrays #127796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not respect synthetic_source_keep=arrays if type parses arrays #127796
Conversation
Hi @parkertimmins, I've created a changelog YAML for you. |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Should we add the |
Hi @parkertimmins, I've updated the changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think offset-based implementation would definitely be a better solution than "do nothing" approach we currently have. We can still do this later though and this PR does not block that work. So i am in favour of merging this.
Should we add a note about this here https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/geo-point#geo-point-synthetic-source?
@@ -460,7 +460,7 @@ static void parseObjectOrField(DocumentParserContext context, Mapper mapper) thr | |||
if (context.canAddIgnoredField() | |||
&& (fieldMapper.syntheticSourceMode() == FieldMapper.SyntheticSourceMode.FALLBACK | |||
|| sourceKeepMode == Mapper.SourceKeepMode.ALL | |||
|| (sourceKeepMode == Mapper.SourceKeepMode.ARRAYS && context.inArrayScope()) | |||
|| (sourceKeepMode == Mapper.SourceKeepMode.ARRAYS && context.inArrayScope() && parsesArrayValue(mapper) == false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a correct fix in my opinion. But now that you mention offsets i actually wonder how does this work if offsets are enabled. It seems like (sourceKeepMode == Mapper.SourceKeepMode.ARRAYS && context.inArrayScope()
is true and we'll proceed with storing the value in ignored source. But that seems wrong since we should be using offsets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we enable offsets, parsesArrayValue will still be true for geo_point, and we won't use ignored source, right? I'm not totally sure if this is safe though. Maybe by adding offsets, geo_point will be subject to the same issue that other types have when they are in an array context. In which case we would want to use ignored_source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, i was asking about existing fields that implement offsets logic like ip
. Does it not have the same problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think existing fields with offsets do have this issue, and need to be use stored values. For example, if
elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Line 463 in b16fb08
|| (sourceKeepMode == Mapper.SourceKeepMode.ARRAYS && context.inArrayScope() && parsesArrayValue(mapper) == false) |
The followings:
curl -XPUT localhost:9200/idx7 -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"mapping": {
"source": {
"mode": "synthetic"
}
}
}
},
"mappings": {
"properties": {
"obj": {
"properties": {
"str": {
"synthetic_source_keep": "arrays",
"type": "keyword"
}
}
}
}
}
}
'
curl -XPOST localhost:9200/idx7/_doc -H 'Content-Type: application/json' -d'
{
"obj": [
{
"str": ["cat", "dog"]
},
{
"str": "cat"
}
]
}
'
Results in:
"_source": {
"obj": {
"str": [
"cat",
"dog"
]
}
}
Meaning we do need the stored source. Or is that not what you're talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Let's also backport this to 8.19 branch?
"point": { "type": "geo_point" } | ||
"point": { | ||
"type": "geo_point", | ||
"synthetic_source_keep": "arrays" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to flip the points here to emphasize that order is different? Right now it happens to be a correct order.
Co-authored-by: Oleksandr Kolomiiets <[email protected]>
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…astic#127796) Types that parse arrays directly should not need to store values in _ignored_source if synthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in _ignored_source when there are multiple values of the type. (cherry picked from commit c04a956)
…astic#127796) Types that parse arrays directly should not need to store values in _ignored_source if synthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in _ignored_source when there are multiple values of the type.
Types that parse arrays directly should not need to store values in
_ignored_source
ifsynthetic_source_keep=arrays
. Since they have custom handling of arrays, it provides no benefit to store in_ignored_source
when there are multiple values of the type.For example, consider the document with
point
field of typegeo_point
:This set of points is not subject to
synthetic_source_keep=arrays
since thegeo_point
mapper does custom array parsing. Since this is the case, it does not make sense to require that the following equivalent document use_ignored_source
:Currently, this second document does use
_ignored_source
, which is a waste of space. This PR causes the second document to behave the same as the first, not using_ignored_source
.Fixes #126155