Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NODE-6773): add support for $lookup on encrypted collections #4427

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

nbbeeken
Copy link
Contributor

@nbbeeken nbbeeken commented Feb 19, 2025

Description

What is changing?

adds tests for lookup support

Is there new documentation needed for these changes?

What is the motivation for this change?

Release Highlight

Add support for $lookup on encrypted collections

After upgrading to mongodb-client-encryption 6.3.0 or later, this version of the driver will now feed multiple collection schema data into the encryption library so that $lookup aggregation stages can be used on encrypted collections. 🔒 🎉

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken force-pushed the NODE-6774-fle-lookup branch 2 times, most recently from 813223f to ffb01de Compare February 21, 2025 15:26
@nbbeeken nbbeeken marked this pull request as ready for review February 21, 2025 15:53
@nbbeeken nbbeeken requested a review from a team as a code owner February 21, 2025 15:53
@nbbeeken
Copy link
Contributor Author

Will update the bindings when they are available, we can still start reviewing what is in here in the meantime

@baileympearson baileympearson changed the title feat(NODE-6774): add support for $lookup on encrypted collections feat(NODE-6773): add support for $lookup on encrypted collections Feb 24, 2025
Copy link
Contributor

@baileympearson baileympearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are pending spec test changes from Maxim, I'll re-review the tests when Kevin has updated the prose tests in his PR.

@@ -239,6 +239,8 @@ export class AutoEncrypter {
this._kmsProviders = options.kmsProviders || {};

const mongoCryptOptions: MongoCryptOptions = {
//@ts-expect-error: not yet in the defs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove before merging.

context.addMongoOperationResponse(collInfo);

for await (const collInfo of collInfoCursor) {
context.addMongoOperationResponse(serialize(collInfo));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addMongoOperationResponse squashes errors (oops).

Can we add error handling, similar to C's approach? This will require bindings changes.

kevinAlbs/mongo-c-driver@eb18fc8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What bindings changes are needed? Can the context.state be checked after each call here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, but I think that goes against our existing patterns in mongodb-client-encryption.

  1. We don't provide a wrapper around mongocrypt_status_message, so we wouldn't be able to throw the error message that libmongocrypt generates.
  2. Most (all?) other bindings functions that call libmongocrypt functions that can error handle the error inline and throw an error if we encounter one, instead of returning nothing and relying on the state machine to handle it.

Ex:

    if (!mongocrypt_ctx_encrypt_init(context.get(), ns.c_str(), ns.size(), binaryCommand.get())) {
        throw TypeError::New(Env(), errorStringFromStatus(context.get()));
    }

so I'm imagining something like:

void MongoCryptContext::AddMongoOperationResponse(const CallbackInfo& info) {
    Uint8Array buffer = Uint8ArrayFromValue(info[0], "buffer");

    std::unique_ptr<mongocrypt_binary_t, MongoCryptBinaryDeleter> reply_bson(
        Uint8ArrayToBinary(buffer));
    if (!mongocrypt_ctx_mongo_feed(context(), reply_bson.get())) {
        throw Error::New(Env(), errorStringFromStatus(context()));
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this problematic for existing driver releases that don't expect this method to throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From slack: the impact of a bindings change is multi-version and multi-codepath and released drivers won't have handling to throw the appropriate error.

I added handling for an error status in the collection info loop and we do have access to the error message via status.

What kind of errors could be encountered here?

client: MongoClient,
ns: string,
filter: Document,
options?: { timeoutContext?: TimeoutContext } & Abortable
): Promise<Uint8Array | null> {
): ListCollectionsCursor<CollectionInfo> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional)

Suggested change
): ListCollectionsCursor<CollectionInfo> {
): AsyncIterable<CollectionInfo> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an issue with a future change here that uses other cursor APIs, so I see this as friction for the next person. What's the gain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just preference, no need to leak the implementation if it isn't needed.

Comment on lines +132 to +156
for (const { name, document } of collections) {
const { insertedId } = await encryptedClient.db('db').collection(name).insertOne(document);

if (name.startsWith('no_')) continue;

expect(await client.db('db').collection(name).findOne(insertedId))
.to.have.property(Object.keys(document)[0])
.that.has.property('_bsontype', 'Binary');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about removing the looping in favor of simply creating each collection explicitly and inserting documents explicitly? I think complexity-wise it's about the same but the explicit approach is easier to compare with the spec test line-by-line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be in favor, loops are a reasonable reduction of verbatim code complexity, and table testing is a common pattern so you know X is applied the same way to all inputs, rather than missing a small deviation between actual code paths.

Copy link
Contributor

@baileympearson baileympearson Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loops are a reasonable reduction of verbatim code complexity

But here all we should need is 6 create collections calls, and 4 insert calls? I don't think a loop reduces much complexity here. And as a reviewer, comparing this with the spec, I'd much rather compare line-by-line than need to cross reference the spec, the table, and the multiple loops used here.

table testing is a common pattern so you know X is applied the same way to all inputs

Table testing is great for pure input -> output testing but breaks down in its efficacy when the test logic isn't identical for each test case. In this case (and the table tests below), the loop has different logic for different tests, which obfuscates both the intent of the table testing matrix and the actual test logic.

await client.close();
});

it(title.slice(title.indexOf(':') + 1).trim(), metadata ?? defaultMetadata, async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above, how would you feel about just writing 9 test cases explictly?

This makes it easier to compare the implementation to the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not be in favor for the same reasons. Also seems like this approach was reasonably grok-able by someone not familiar with our tests: https://mongodb.slack.com/archives/C0668RJBA0J/p1740413020860229?thread_ts=1740405607.463789&cid=C0668RJBA0J

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm not feeling particularly strong enough to continue arguing this, but as someone intimately familiar with our tests, I am telling you that this was annoying to grok. Especially compared to something like

expected: Document,
metadata?: MongoDBMetadataUI
) {
describe(title.slice(0, title.indexOf(':')), function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've discussed ensuring that prose test titles match the spec, to make it possible to search test output. (i.e., with this structure, searching Case 3: db.no_schemajoinsdb.csfle`` returns nothing because it is split across the describe + test)

Do you remember this as well or am I imagining things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still searchable without the "case" portion, I don't think it would cause too much pain to find.

Happy to change it but I need something to put in the suite name that changes for each test so I bind the correct before and afters.

I don't recall an explicit discussion on this but we have been doing it so we can tie things together, you wouldn't spend more than a few moments cutting down you're query in vscode/parsely/mocha to find/filter the test

@nbbeeken nbbeeken added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Feb 24, 2025
@nbbeeken nbbeeken added the wip label Feb 24, 2025
@nbbeeken nbbeeken force-pushed the NODE-6774-fle-lookup branch from ab77f26 to 3d579b4 Compare February 24, 2025 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Primary Review In Review with primary reviewer, not yet ready for team's eyes wip
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants