-
Notifications
You must be signed in to change notification settings - Fork 454
[CDRIVER-6017] BSON Validation Refactor #2026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDRIVER-6017] BSON Validation Refactor #2026
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left minor comments. I found the test generator easy to read and modify. And I like the move away from bson_iter_visit_all
. LGTM
src/libbson/src/bson/validate.c
Outdated
return false; | ||
} | ||
|
||
// Validate an element's key string according to the valiation rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Validate an element's key string according to the valiation rules | |
// Validate an element's key string according to the validation rules |
src/libbson/src/bson/validate.c
Outdated
// Check for special keys | ||
if (self->params->check_special_dollar_keys) { | ||
// dollar-keys are checked during the startup of _validate_doc. If we get here, there's a problem. | ||
require_with_error (key[0] != '$', iter->off, BSON_VALIDATE_DOLLAR_KEYS, "Disallowed element key: \"%s\"", key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require_with_error (key[0] != '$', iter->off, BSON_VALIDATE_DOLLAR_KEYS, "Disallowed element key: \"%s\"", key); | |
require_with_error (key[0] != '$', iter->off, BSON_VALIDATE_DOLLAR_KEYS, "Disallowed '$' in element key: \"%s\"", key); |
Suggest including offending character in error.
src/libbson/src/bson/validate.c
Outdated
} | ||
|
||
if (!self->params->allow_dot_in_keys) { | ||
require_with_error (!strstr (key, "."), iter->off, BSON_VALIDATE_DOT_KEYS, "Disallowed element key: \"%s\"", key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require_with_error (!strstr (key, "."), iter->off, BSON_VALIDATE_DOT_KEYS, "Disallowed element key: \"%s\"", key); | |
require_with_error (!strstr (key, "."), iter->off, BSON_VALIDATE_DOT_KEYS, "Disallowed '.' in element key: \"%s\"", key); |
The new `bson_validate` implementation does not make use of the error-prone `bson_visit` APIs. Instead, it is written as a simple recursive validator. The new validator respects requests for UTF-8 validation properly.
The existing test cases used BSON files, and didn't have any commentary on what they were actually testing. New test cases are generated from a Python shorthand and contain the tested bytes inline, with a distinct test case for each actual validation scenario.
These include small changes in spelling, param assertions, and small cleanups Co-authored-by: Kevin Albertson <[email protected]> Co-authored-by: Ezra Chung <[email protected]>
- Add `bson/validate-private.h` to contain the declaration of the validation impl. - Add BSON_VALIDATE_MAX_NESTING_DEPTH to represent the validation limit. - Change the nesting limit to 100, to match the behavior of Server
7a48211
to
f0f60b5
Compare
Accidental rebase plz ignore. I believe I've address all outstanding change requests, but may have missed some as the GitHub UI has been very uncooperative this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor feedback remaining; otherwise, LGTM.
src/libbson/src/bson/validate.c
Outdated
BSON_ASSERT_PARAM (bson); | ||
|
||
// The depth limit of 100 is chosen to match the limit enforced by MongoDB server. | ||
// Refer: https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Nested-Depth-for-BSON-Documen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Refer: https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Nested-Depth-for-BSON-Documen | |
// Refer: https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Nested-Depth-for-BSON-Documents |
End of URL is clipped.
* @return true If the given document has no validation errors | ||
* @return false Otherwise | ||
*/ | ||
extern bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extern bool | |
bool |
extern
should (now) be unnecessary.
src/libbson/src/bson/bson-types.h
Outdated
* The depth limit of 100 is chosen to match the limit enforced by MongoDB server. | ||
* Refer: https://www.mongodb.com/docs/manual/reference/limits/#mongodb-limit-Nested-Depth-for-BSON-Document | ||
*/ | ||
BSON_VALIDATION_MAX_NESTING_DEPTH = 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BSON_VALIDATION_MAX_NESTING_DEPTH = 100, | |
BSON_VALIDATE_MAX_NESTING_DEPTH = 100, |
All other validation-related constants use a prefix of BSON_VALIDATE_*
.
Do we really need to expose this in the public API? Can it be moved to <bson/validate-private.h>
instead? Test code is able to include private headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it in the public header because I think it might be useful to users to know the limit. Because it's public, I intentionally chose the differing name to prevent a user from thinking that this is meaningfull:
bson_validate(d, BSON_VALIDATE_UTF8 | BSON_VALIDATE_MAX_NESTING_DEPTH, NULL);
It may actually be better to have a function bson_get_max_nesting_depth()
, in case it is changed in the future and may vary depending on what library they load at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation.
It might be useful to users to know the limit.
I'm not sure about that, but in that case, perhaps BSON_MAX_NESTING_DEPTH
via bson/bson.h
(alongside BSON_MAX_SIZE
) might be better, since it is not a limit that is restricted to validation only? (Apparently we have #define BCON_STACK_MAX 100
in bson/bcon.h
, but that doesn't seem to be meant to be used outside of the component.)
It may actually be better to have a function
bson_get_max_nesting_depth()
I do not expect the limit to change anytime soon, so even if we expose it in public API, I do not mind leaving it as a compile-time constant.
Note for future readers: The validation limit was set to 1000, as it is possible that users want to use documents deeper than are accepted by the server, and it is also possible that the server can accept documents with deeper nesting levels. The purpose of the limit is mostly to prevent deep recursion, not to prevent invalid data. |
* New BSON validation routine rewrite The new `bson_validate` implementation does not make use of the error-prone `bson_visit` APIs. Instead, it is written as a simple recursive validator. The new validator respects requests for UTF-8 validation properly. * Stop validating at 1000 depth, preventing stack overflow * Replace most BSON validation tests with generated ones The existing test cases used BSON files, and didn't have any commentary on what they were actually testing. New test cases are generated from a Python shorthand and contain the tested bytes inline, with a distinct test case for each actual validation scenario. * Disable UTF-8 validation by default on CRUD APIs * Document and tweak the value of BSON_VALIDATE_CORRUPT * Add test cases related to the overlong null encoding * Tweak JS scope validation to permit more obj keys * Add a NEWS entry for validation changes. * Allow `-private.h` headers to not include the prelude header --------- Co-authored-by: Kevin Albertson <[email protected]> Co-authored-by: Ezra Chung <[email protected]>
* New BSON validation routine rewrite The new `bson_validate` implementation does not make use of the error-prone `bson_visit` APIs. Instead, it is written as a simple recursive validator. The new validator respects requests for UTF-8 validation properly. * Stop validating at 1000 depth, preventing stack overflow * Replace most BSON validation tests with generated ones The existing test cases used BSON files, and didn't have any commentary on what they were actually testing. New test cases are generated from a Python shorthand and contain the tested bytes inline, with a distinct test case for each actual validation scenario. * Disable UTF-8 validation by default on CRUD APIs * Document and tweak the value of BSON_VALIDATE_CORRUPT * Add test cases related to the overlong null encoding * Tweak JS scope validation to permit more obj keys * Add a NEWS entry for validation changes. * Allow `-private.h` headers to not include the prelude header --------- Co-authored-by: Kevin Albertson <[email protected]> Co-authored-by: Ezra Chung <[email protected]>
This changeset is a large refactor of the
bson_validate
implementation.Existing validation test cases were rewritten to use a uniform code generation mechanism that notes the purpose of each case. Many more validation scenarios were added.
The
BSON_VALIDATE_UTF8
flags have been removed from the default flags used by the CRUD API. Because of a bug, UTF-8 validation wasn't being applied to the CRUD API, so removing the flag is necessary since it would otherwise be a breaking change.The new validation is written as a straightforward recursive algorithm, and doesn't use the error-prone
bson_visit
API.The following changes are of note:
BSON_VALIDATE_CORRUPT
is now set on corruption. Previously, this was set toBSON_VALIDATE_NONE
, which had a value of zero. This was very confusing, as it was also set to zero in case of success.