-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(NODE-6773): add support for $lookup on encrypted collections #4427
base: main
Are you sure you want to change the base?
Conversation
813223f
to
ffb01de
Compare
Will update the bindings when they are available, we can still start reviewing what is in here in the meantime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are pending spec test changes from Maxim, I'll re-review the tests when Kevin has updated the prose tests in his PR.
@@ -239,6 +239,8 @@ export class AutoEncrypter { | |||
this._kmsProviders = options.kmsProviders || {}; | |||
|
|||
const mongoCryptOptions: MongoCryptOptions = { | |||
//@ts-expect-error: not yet in the defs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: remove before merging.
context.addMongoOperationResponse(collInfo); | ||
|
||
for await (const collInfo of collInfoCursor) { | ||
context.addMongoOperationResponse(serialize(collInfo)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addMongoOperationResponse
squashes errors (oops).
Can we add error handling, similar to C's approach? This will require bindings changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What bindings changes are needed? Can the context.state
be checked after each call here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose, but I think that goes against our existing patterns in mongodb-client-encryption.
- We don't provide a wrapper around
mongocrypt_status_message
, so we wouldn't be able to throw the error message that libmongocrypt generates. - Most (all?) other bindings functions that call libmongocrypt functions that can error handle the error inline and throw an error if we encounter one, instead of returning nothing and relying on the state machine to handle it.
Ex:
if (!mongocrypt_ctx_encrypt_init(context.get(), ns.c_str(), ns.size(), binaryCommand.get())) {
throw TypeError::New(Env(), errorStringFromStatus(context.get()));
}
so I'm imagining something like:
void MongoCryptContext::AddMongoOperationResponse(const CallbackInfo& info) {
Uint8Array buffer = Uint8ArrayFromValue(info[0], "buffer");
std::unique_ptr<mongocrypt_binary_t, MongoCryptBinaryDeleter> reply_bson(
Uint8ArrayToBinary(buffer));
if (!mongocrypt_ctx_mongo_feed(context(), reply_bson.get())) {
throw Error::New(Env(), errorStringFromStatus(context()));
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this problematic for existing driver releases that don't expect this method to throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From slack: the impact of a bindings change is multi-version and multi-codepath and released drivers won't have handling to throw the appropriate error.
I added handling for an error status in the collection info loop and we do have access to the error message via status.
What kind of errors could be encountered here?
client: MongoClient, | ||
ns: string, | ||
filter: Document, | ||
options?: { timeoutContext?: TimeoutContext } & Abortable | ||
): Promise<Uint8Array | null> { | ||
): ListCollectionsCursor<CollectionInfo> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional)
): ListCollectionsCursor<CollectionInfo> { | |
): AsyncIterable<CollectionInfo> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an issue with a future change here that uses other cursor APIs, so I see this as friction for the next person. What's the gain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just preference, no need to leak the implementation if it isn't needed.
test/integration/client-side-encryption/client_side_encryption.prose.25.lookup.test.ts
Outdated
Show resolved
Hide resolved
test/integration/client-side-encryption/client_side_encryption.prose.25.lookup.test.ts
Outdated
Show resolved
Hide resolved
for (const { name, document } of collections) { | ||
const { insertedId } = await encryptedClient.db('db').collection(name).insertOne(document); | ||
|
||
if (name.startsWith('no_')) continue; | ||
|
||
expect(await client.db('db').collection(name).findOne(insertedId)) | ||
.to.have.property(Object.keys(document)[0]) | ||
.that.has.property('_bsontype', 'Binary'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about removing the looping in favor of simply creating each collection explicitly and inserting documents explicitly? I think complexity-wise it's about the same but the explicit approach is easier to compare with the spec test line-by-line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be in favor, loops are a reasonable reduction of verbatim code complexity, and table testing is a common pattern so you know X is applied the same way to all inputs, rather than missing a small deviation between actual code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loops are a reasonable reduction of verbatim code complexity
But here all we should need is 6 create collections calls, and 4 insert calls? I don't think a loop reduces much complexity here. And as a reviewer, comparing this with the spec, I'd much rather compare line-by-line than need to cross reference the spec, the table, and the multiple loops used here.
table testing is a common pattern so you know X is applied the same way to all inputs
Table testing is great for pure input -> output testing but breaks down in its efficacy when the test logic isn't identical for each test case. In this case (and the table tests below), the loop has different logic for different tests, which obfuscates both the intent of the table testing matrix and the actual test logic.
await client.close(); | ||
}); | ||
|
||
it(title.slice(title.indexOf(':') + 1).trim(), metadata ?? defaultMetadata, async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above, how would you feel about just writing 9 test cases explictly?
This makes it easier to compare the implementation to the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not be in favor for the same reasons. Also seems like this approach was reasonably grok-able by someone not familiar with our tests: https://mongodb.slack.com/archives/C0668RJBA0J/p1740413020860229?thread_ts=1740405607.463789&cid=C0668RJBA0J
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm not feeling particularly strong enough to continue arguing this, but as someone intimately familiar with our tests, I am telling you that this was annoying to grok. Especially compared to something like
async function () { |
expected: Document, | ||
metadata?: MongoDBMetadataUI | ||
) { | ||
describe(title.slice(0, title.indexOf(':')), function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've discussed ensuring that prose test titles match the spec, to make it possible to search test output. (i.e., with this structure, searching Case 3:
db.no_schemajoins
db.csfle`` returns nothing because it is split across the describe + test)
Do you remember this as well or am I imagining things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still searchable without the "case" portion, I don't think it would cause too much pain to find.
Happy to change it but I need something to put in the suite name that changes for each test so I bind the correct before and afters.
I don't recall an explicit discussion on this but we have been doing it so we can tie things together, you wouldn't spend more than a few moments cutting down you're query in vscode/parsely/mocha to find/filter the test
ab77f26
to
3d579b4
Compare
Description
What is changing?
adds tests for lookup support
Is there new documentation needed for these changes?
What is the motivation for this change?
Release Highlight
Add support for $lookup on encrypted collections
After upgrading to mongodb-client-encryption 6.3.0 or later, this version of the driver will now feed multiple collection schema data into the encryption library so that
$lookup
aggregation stages can be used on encrypted collections. 🔒 🎉Double check the following
npm run check:lint
scripttype(NODE-xxxx)[!]: description
feat(NODE-1234)!: rewriting everything in coffeescript