Skip to content

[CDRIVER-6017] BSON Validation Refactor #2026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
29f7182
New BSON validation routine rewrite
vector-of-bool May 29, 2025
5f7930d
Tweak validation to differentiate between invalid utf-8 and null chars
vector-of-bool May 30, 2025
e564c0d
Stop validating at 1000 depth, preventing stack overflow
vector-of-bool May 30, 2025
a190ed8
Various validation commentary and cleanup
vector-of-bool Jun 2, 2025
48e8f93
Replace most BSON validation tests with generated ones
vector-of-bool Jun 3, 2025
0ff823f
Disable UTF-8 validation by default on CRUD APIs
vector-of-bool Jun 3, 2025
9d6a13d
Update test cases that check error message strings
vector-of-bool Jun 3, 2025
a8cf5bd
Minor fixes and tweaks from PR comments
vector-of-bool Jun 4, 2025
1ec7968
Regen validation tests
vector-of-bool Jun 5, 2025
45d6140
Document and tweak the value of BSON_VALIDATE_CORRUPT
vector-of-bool Jun 4, 2025
2c0778f
#undef some macros that we define privately
vector-of-bool Jun 4, 2025
7f1e2cf
Mark some parameters as required via assertions
vector-of-bool Jun 4, 2025
98e6a2d
Remove dup strlen()
vector-of-bool Jun 5, 2025
ce5053a
Note the unreachability of some tags in element validation
vector-of-bool Jun 5, 2025
cb3b51c
Tweak limits, and a private header for validation
vector-of-bool Jun 5, 2025
90e27a9
Add inline metadata for the validation test generator
vector-of-bool Jun 5, 2025
07da508
Note the validation of $id as an arbitrary value
vector-of-bool Jun 5, 2025
0f0f117
Include offending char in bad key error messages
vector-of-bool Jun 5, 2025
62df9b1
Tweak message when doc iteration fails to start
vector-of-bool Jun 5, 2025
f654398
Add test cases related to the overlong null encoding
vector-of-bool Jun 5, 2025
0e0eaf8
Tweak depth validation by 1
vector-of-bool Jun 5, 2025
bfd38da
Cleanup on test generator
vector-of-bool Jun 5, 2025
c9cfdc2
Another test case for Binary type 2
vector-of-bool Jun 5, 2025
99e014f
Tweak JS scope validation to permit more obj keys
vector-of-bool Jun 5, 2025
f0f60b5
Add a NEWS entry for validation changes.
vector-of-bool Jun 5, 2025
4ed67e8
Allow -private.h headers to not include the prelude header
vector-of-bool Jun 5, 2025
6a59899
More minnor tweaks
vector-of-bool Jun 6, 2025
f74f69f
Make the validation depth private and much higher
vector-of-bool Jun 6, 2025
b25d6fc
Outdated comment
vector-of-bool Jun 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .evergreen/scripts/check-preludes.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
MONGOC_PREFIX / "mongoc-prelude.h",
MONGOC_PREFIX / "mongoc.h",
],
"include": '#include <mongoc/mongoc-prelude.h>',
"include": "#include <mongoc/mongoc-prelude.h>",
},
{
"name": "libbson",
Expand All @@ -50,7 +50,7 @@
"name": "common",
"headers": list(COMMON_PREFIX.glob("*.h")),
"exclusions": [COMMON_PREFIX / "common-prelude.h"],
"include": '#include <common-prelude.h>',
"include": "#include <common-prelude.h>",
},
]

Expand All @@ -59,7 +59,7 @@
print(f"Checking headers for {NAME}")
assert len(check["headers"]) > 0
for header in check["headers"]:
if header in check["exclusions"]:
if header in check["exclusions"] or header.name.endswith("-private.h"):
continue
lines = Path(header).read_text(encoding="utf-8").splitlines()
if check["include"] not in lines:
Expand Down
14 changes: 14 additions & 0 deletions src/libbson/NEWS
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
Unreleased
==========

Fixes:

* Various fixes have been applied to the `bson_validate` family of functions,
with some minor behavioral changes.
* Previously accepted invalid UTF-8 will be rejected when `BSON_VALIDATE_UTF8`
is specified.
* The scope document in a deprecated "code with scope" element is now
validated with a fixed set of rules and is treated as an opaque JavaScript
object.
* A document nesting limit is now enforced during validation.

libbson 2.0.1
=============

Expand Down
3 changes: 3 additions & 0 deletions src/libbson/doc/bson_validate_flags_t.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Synopsis
BSON_VALIDATE_DOT_KEYS = (1 << 2),
BSON_VALIDATE_UTF8_ALLOW_NULL = (1 << 3),
BSON_VALIDATE_EMPTY_KEYS = (1 << 4),
BSON_VALIDATE_CORRUPT = (1 << 5),
} bson_validate_flags_t;

Description
Expand All @@ -40,6 +41,8 @@ Each defined flag aside from ``BSON_VALIDATE_NONE`` describes an optional valida
* ``BSON_VALIDATE_DOLLAR_KEYS`` Prohibit keys that start with ``$`` outside of a "DBRef" subdocument.
* ``BSON_VALIDATE_DOT_KEYS`` Prohibit keys that contain ``.`` anywhere in the string.
* ``BSON_VALIDATE_EMPTY_KEYS`` Prohibit zero-length keys.
* ``BSON_VALIDATE_CORRUPT`` is not a control flag, but is used as an error code
when a validation routine encounters corrupt BSON data.

.. seealso::

Expand Down
47 changes: 38 additions & 9 deletions src/libbson/src/bson/bson-types.h
Original file line number Diff line number Diff line change
Expand Up @@ -185,25 +185,54 @@ typedef struct {


/**
* bson_validate_flags_t:
* @brief Flags and error codes for BSON validation functions.
*
* This enumeration is used for validation of BSON documents. It allows
* selective control on what you wish to validate.
* Pass these flags bits to control the behavior of the `bson_validate` family
* of functions.
*
* %BSON_VALIDATE_NONE: No additional validation occurs.
* %BSON_VALIDATE_UTF8: Check that strings are valid UTF-8.
* %BSON_VALIDATE_DOLLAR_KEYS: Check that keys do not start with $.
* %BSON_VALIDATE_DOT_KEYS: Check that keys do not contain a period.
* %BSON_VALIDATE_UTF8_ALLOW_NULL: Allow NUL bytes in UTF-8 text.
* %BSON_VALIDATE_EMPTY_KEYS: Prohibit zero-length field names
* Additionally, if validation fails, then the error code set on a `bson_error_t`
* will have the value corresponding to the reason that validation failed.
*/
typedef enum {
/**
* @brief No special validation behavior specified.
*/
BSON_VALIDATE_NONE = 0,
/**
* @brief Check that all text components of the BSON data are valid UTF-8.
*
* Note that this will also cause validation to reject valid text that contains
* a null character. This can be changed by also passing
* `BSON_VALIDATE_UTF8_ALLOW_NULL`
*/
BSON_VALIDATE_UTF8 = (1 << 0),
/**
* @brief Check that element keys do not begin with an ASCII dollar `$`
*/
BSON_VALIDATE_DOLLAR_KEYS = (1 << 1),
/**
* @brief Check that element keys do not contain an ASCII period `.`
*/
BSON_VALIDATE_DOT_KEYS = (1 << 2),
/**
* @brief If set then it is *not* an error for a UTF-8 string to contain
* embedded null characters.
*
* This has no effect unless `BSON_VALIDATE_UTF8` is also passed.
*/
BSON_VALIDATE_UTF8_ALLOW_NULL = (1 << 3),
/**
* @brief Check that no element key is a zero-length empty string.
*/
BSON_VALIDATE_EMPTY_KEYS = (1 << 4),
/**
* @brief This is not a flag that controls behavior, but is instead used to indicate
* that a BSON document is corrupted in some way. This is the value that will
* appear as an error code.
*
* Passing this as a flag has no effect.
*/
BSON_VALIDATE_CORRUPT = (1 << 5),
} bson_validate_flags_t;


Expand Down
Loading