Skip to content

INTPYTHON-527 Add Queryable Encryption support #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 88 commits into
base: main
Choose a base branch
from

Conversation

aclark4life
Copy link
Collaborator

@aclark4life aclark4life commented Jun 27, 2025

(see previous attempts in #318, #319 and #323 for additional context)

With this PR I am able to get Django to create an encrypted collection when the schema code is running create_model on an EncryptedModel containing an EncryptedCharField e.g. see db.enxcol_.encryption__person.ecoc below

Enterprise a49c6bfb-b6b3-4711-bd5d-c6ecf0611a4c [direct: secondary] test> use test_djangotests
switched to db test_djangotests
Enterprise a49c6bfb-b6b3-4711-bd5d-c6ecf0611a4c [direct: secondary] test_djangotests> db.
db.__proto__                        db.constructor                      db.hasOwnProperty                   db.isPrototypeOf
db.propertyIsEnumerable             db.toLocaleString                   db.toString                         db.valueOf
db.getMongo                         db.getName                          db.getCollectionNames               db.getCollectionInfos
db.runCommand                       db.adminCommand                     db.aggregate                        db.getSiblingDB
db.getCollection                    db.dropDatabase                     db.createUser                       db.updateUser
db.changeUserPassword               db.logout                           db.dropUser                         db.dropAllUsers
db.auth                             db.grantRolesToUser                 db.revokeRolesFromUser              db.getUser
db.getUsers                         db.createCollection                 db.createEncryptedCollection        db.createView
db.createRole                       db.updateRole                       db.dropRole                         db.dropAllRoles
db.grantRolesToRole                 db.revokeRolesFromRole              db.grantPrivilegesToRole            db.revokePrivilegesFromRole
db.getRole                          db.getRoles                         db.currentOp                        db.killOp
db.shutdownServer                   db.fsyncLock                        db.fsyncUnlock                      db.version
db.serverBits                       db.isMaster                         db.hello                            db.serverBuildInfo
db.serverStatus                     db.stats                            db.hostInfo                         db.serverCmdLineOpts
db.rotateCertificates               db.printCollectionStats             db.getProfilingStatus               db.setProfilingLevel
db.setLogLevel                      db.getLogComponents                 db.commandHelp                      db.listCommands
db.printSecondaryReplicationInfo    db.getReplicationInfo               db.printReplicationInfo             db.watch
db.sql                              db.auth_group_permissions           db.django_session                   db.auth_user
db.enxcol_.encryption__person.ecoc  db.auth_group                       db.django_site                      db.django_migrations
db.django_content_type              db.auth_user_groups                 db.enxcol_.encryption__person.esc   db.auth_permission
db.auth_user_user_permissions       db.django_admin_log

Questions

  • To manage both encrypted and unencrypted connections, keep the _nodb_cursor functionality in this PR or do something in init_connection_state as @timgraham suggests, or do something else?
  • As @ShaneHarvey suggests, ask encryption folks about command not supported for auto encryption: buildinfo which happens when Django attempts to get the server version via encrypted connection, thus necessitating the need to manage both encrypted and unencrypted connections. Are most commands supported for auto encryption or not?
  • What does EncryptedModel support for EmbeddedModel look like? What are the specific use cases for integration of EncryptedModel and EmbeddedModel? Should we be able to mixin EncryptedModel and EmbeddedModel then include that model in an EmbeddedModelField ?

Todo

  • Helpers need a home
    • django_mongodb_backend.encryption
  • Add additional encrypted fields
    • EncryptedModel
    • EncryptedCharField
  • Migrations
  • Querying
  • Docs
    • Configuration
  • Tests
  • KMS support
    • local

Helpers

Included helpers are also used by the test runner e.g.

import os

from django_mongodb_backend import encryption, parse_uri

kms_providers = encryption.get_kms_providers()
auto_encryption_opts = encryption.get_auto_encryption_opts(
    kms_providers=kms_providers,
)

DATABASE_URL = os.environ.get("MONGODB_URI", "mongodb://localhost:27017/djangotests")
DATABASES = {
    "default": parse_uri(
        DATABASE_URL, options={"auto_encryption_opts": auto_encryption_opts}
    ),
}

DEFAULT_AUTO_FIELD = "django_mongodb_backend.fields.ObjectIdAutoField"
PASSWORD_HASHERS = ("django.contrib.auth.hashers.MD5PasswordHasher",)
SECRET_KEY = "django_tests_secret_key"
USE_TZ = False

@aclark4life
Copy link
Collaborator Author

Wrong commit message for 65bd15a and I don't want to force push yet. It should have said:

"Only create an encrypted connection once then reuse it."

I'm aware that _nodb_cursor is slated for removal but in the meantime I can keep going with other fixes with this approach, and it does satisfy the design we all agree on (I think) of maintaining two simultaneous connections:

  • Unencrypted connection unless we need it
  • Encrypted connection when we need that can be used.

@timgraham
Copy link
Collaborator

timgraham commented Jun 27, 2025

I'm aware that _nodb_cursor is slated for removal but in the meantime I can keep going with other fixes with this approach, and it does satisfy the design we all agree on (I think) of maintaining two simultaneous connections:

It's not working as you think it is. As I said elsewhere, _nodb_cursor is not used by this backend.

Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because self.settings_dict["OPTIONS"].pop("auto_encryption_opts") is having the side effect of altering settings_dict before DatabaseWrapper.connection is initialized.

I'd suggest to use my patch is as a starting point for maintaining two connections. self.connection should be the encrypted version (secure by default) with a fallback to a non-encrypted connection only as needed (e.g. for commands like buildInfo). At least it will help us understand whether that's a viable approach. As I mentioned in the design doc, I'm not sure if using an encrypted connection for non-encrypted collections is problematic. If so, we'll have to go back to the drawing board on the design.

@aclark4life
Copy link
Collaborator Author

It's not working as you think it is. As I said elsewhere, _nodb_cursor is not used by this backend.

I don't disagree, but it feels a lot like _start_transaction_under_autocommit which gets called by start_transaction_under_autocommit because autocommit is False. Django appears to stumble into _nodb_cursor when the encrypted connection fails to get the database version and while we don't use a cursor in this backend, we do have a "nosql" cursor that has __enter__ and __exit__ (I assume) to meet Django's expectations and we get an opportunity to modify the connection. @Jibola mentioned this design is suspect yesterday and I agree with both of you, particularly with regard to the desire to start with and maintain an encrypted connection first.

Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because self.settings_dict["OPTIONS"].pop("auto_encryption_opts") is having the side effect of altering settings_dict before DatabaseWrapper.connection is initialized.

Yes it works by design, not a side effect. I'm deep.copying settings_dict when DatabaseWrapper is initialized and so when DatabaseWrapper.connection is initialized it's unencrypted. When the schema needs encryption later, it's retrieved from _settings_dict.

I'd suggest to use my patch is as a starting point for maintaining two connections. self.connection should be the encrypted version (secure by default) with a fallback to a non-encrypted connection only as needed (e.g. for commands like buildInfo). At least it will help us understand whether that's a viable approach. As I mentioned in the design doc, I'm not sure if using an encrypted connection for non-encrypted collections is problematic. If so, we'll have to go back to the drawing board on the design.

I'd make a few passes at it but did not get anywhere, I'll try again though.

@timgraham
Copy link
Collaborator

Your "stumble" theory of how it's working isn't correct. _nodb_cursor is only used on one place: to create the test database. As I said, I could imagine that perhaps this method causes connection to later be initialized without auto_encryption_opts because of self.settings_dict["OPTIONS"].pop("auto_encryption_opts"). The connection that's created in your _nodb_cursor is never used.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Jun 28, 2025

Your "stumble" theory of how it's working isn't correct. _nodb_cursor is only used on one place: to create the test database. As I said, I could imagine that perhaps this method causes connection to later be initialized without auto_encryption_opts because of self.settings_dict["OPTIONS"].pop("auto_encryption_opts"). The connection that's created in your _nodb_cursor is never used.

Copy that, thanks!

I've removed _nodb_cursor in 8e83ada and discovered the version check is the only time that error occurs. I now get errors like:

Traceback (most recent call last):
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors
    yield
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt
    encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
    return run_state_machine(ctx, self.callback)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine
    result = callback.mark_command(ctx.database, mongocryptd_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command
    res = self.mongocryptd_client[database].command(
        inflated_cmd, codec_options=DEFAULT_RAW_BSON_OPTIONS
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command
    return self._command(
           ~~~~~~~~~~~~~^
        connection,
        ^^^^^^^^^^^
    ...<7 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command
    return conn.command(
           ~~~~~~~~~~~~^
        self._name,
        ^^^^^^^^^^^
    ...<8 lines>...
        client=self._client,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner
    return func(*args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command
    return command(
        self,
    ...<20 lines>...
        write_concern=write_concern,
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command
    helpers_shared._check_command_response(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        response_doc,
        ^^^^^^^^^^^^^
    ...<2 lines>...
        parse_write_concern_error=parse_write_concern_error,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection., full error: RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))

Still working on an unencrypted connection, but perhaps the only time we need it is for the version check.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Jul 2, 2025

@ShaneHarvey @Jibola @timgraham FYI here is the pipeline that causes the let error:

(Pdb) pprint.pprint(pipeline)
[{'$lookup': {'as': 'django_content_type',
              'from': 'django_content_type',
              'let': {'parent__field__0': '$content_type_id'},
              'pipeline': [{'$match': {'$expr': {'$and': [{'$eq': ['$$parent__field__0',
                                                                   '$_id']}]}}}]}},
 {'$unwind': '$django_content_type'},
 {'$match': {'$expr': {'$in': ['$content_type_id',
                               (ObjectId('6864933ec7cf8179e3ef1f8d'),)]}}},
 {'$project': {'codename': 1,
               'content_type_id': 1,
               'django_content_type': {'app_label': 1, 'model': 1}}},
 {'$sort': SON([('django_content_type.app_label', 1), ('django_content_type.model', 1), ('codename', 1)])}]

And here is the error again with some additional debug:

(Pdb) errmsg
"Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection."
(Pdb) code
51208
(Pdb) response
RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))
(Pdb) max_wire_version
26

And the full traceback:


Running post-migrate handlers for application contenttypes
Traceback (most recent call last):
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors
    yield
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt
    encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
    return run_state_machine(ctx, self.callback)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine
    result = callback.mark_command(ctx.database, mongocryptd_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command
    res = self.mongocryptd_client[database].command(
        inflated_cmd, codec_options=DEFAULT_RAW_BSON_OPTIONS
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command
    return self._command(
           ~~~~~~~~~~~~~^
        connection,
        ^^^^^^^^^^^
    ...<7 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command
    return conn.command(
           ~~~~~~~~~~~~^
        self._name,
        ^^^^^^^^^^^
    ...<8 lines>...
        client=self._client,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner
    return func(*args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command
    return command(
        self,
    ...<20 lines>...
        write_concern=write_concern,
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command
    helpers_shared._check_command_response(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        response_doc,
        ^^^^^^^^^^^^^
    ...<2 lines>...
        parse_write_concern_error=parse_write_concern_error,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection., full error: RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))

Test settings:

import os

from django_mongodb_backend import encryption, parse_uri

kms_providers = encryption.get_kms_providers()

auto_encryption_opts = encryption.get_auto_encryption_opts(
    kms_providers=kms_providers,
)

DATABASE_URL = os.environ.get("MONGODB_URI", "mongodb://localhost:27017")
DATABASES = {
    "default": parse_uri(
        DATABASE_URL, db_name="djangotests",
    ),
    "encrypted": parse_uri(
        DATABASE_URL, options={"auto_encryption_opts": auto_encryption_opts},
            db_name="encrypted_djangotests",
    ),
}

DEFAULT_AUTO_FIELD = "django_mongodb_backend.fields.ObjectIdAutoField"
PASSWORD_HASHERS = ("django.contrib.auth.hashers.MD5PasswordHasher",)
SECRET_KEY = "django_tests_secret_key"
USE_TZ = False

This is happening in the encryption_ tests with a database router configured to use the encrypted database, but it happens before any tests are run or any routing occurs. I've confirmed that the encrypted database is created, so it appears that something needs to be done to address this issue in either our backend or PyMongo with the ideal candidate, perhaps, being a change to the MQL in the pipeline if possible.

- Expand generic router functionality
- Specify encrypted db_name & kms_provider in model
- Get kms_providers and key_vault_namespace from auto_encryption_opts
@aclark4life
Copy link
Collaborator Author

Another setting = global state alert! Need to understand the use case of specifying a KMS provider. Is it per database, per model, etc? I would naively assume a use case for the latter since ClientEncryption needed a list of all providers whereas the call to create the collection uses a particular provider.

How about setting kms_provider and db_name per model?

@@ -27,7 +27,7 @@ def test_encrypted_fields_map(self):
{
"path": "ssn",
"bsonType": "string",
"queries": [{"contention": 1, "queryType": "equality"}],
"queries": {"contention": 1, "queryType": "equality"},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI it's a list in the tutorial. Also for fields that support multiple query types I think it has to be a list of dictionaries, but maybe for a single query type this is OK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing! The documentation says, "You can configure an encrypted field for either equality or range queries, but not both. ", and I've seen errors like "BSON field 'create.encryptedFields.fields.queries' is the wrong type 'array'" and "pymongo.errors.EncryptedCollectionError: Exactly one query type should be specified per field".

Unless there's some case I'm missing, arguably the server/tutorial shouldn't cause ambiguity by using a list, but probably it can't change behavior due to backward compatibility.

Copy link
Collaborator Author

@aclark4life aclark4life Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jordan-smith721 @Jibola Should we use a list here for future compatibility and as shown in the tutorial or dict? I know that I've tested with more than a single query type per field and gotten an appropriate error and so I'm wondering if we should anticipate fields with multiple query types in the future or if we can always assume a single query type per field, as is the case for the fields and query types we are currently supporting.

@timgraham
Copy link
Collaborator

Another setting = global state alert! Need to understand the use case of specifying a KMS provider. Is it per database, per model, etc? I would naively assume a use case for the latter since ClientEncryption needed a list of all providers whereas the call to create the collection uses a particular provider.

How about setting kms_provider and db_name per model?

I feel this design is driven too much by your desire to be able to provide a generic database router. Sure, we can provide an example router in the documentation for some use case, but a router configurable by other settings is just creating layers of indirection that ultimately result in needless complexity.

I imagine the majority of projects will have their encrypted models in the same database. To that end, something like this is sufficient:

def db_for_read(self, model, **hints):
    if getattr(model, "encrypted", False):
        return "encrypted-alias"
    return None

db_for_write = db_for_read

def allow_migrate(...):
   ...

Having to declare a db_name on every encrypted model is adding boilerplate. For the use cases that do have some more complex needs, that logic should be in the router.

Similarly for kms_provider, having to declare it per-model is likely overkill. I'm thinking that adding a new router method, e.g. Router.kms_provider(model), may be a viable path that doesn't reinvent the wheel. While it requires monkeypatching that new method onto django.db.utils.ConnectionRouter, I think it's the proper separation of concerns.

@aclark4life
Copy link
Collaborator Author

How about setting kms_provider and db_name per model?

I feel this design is driven too much by your desire to be able to provide a generic database router.

I do have that desire … but it's driven by a desire to simplify project setup and that may or may not include a generic database router.

Sure, we can provide an example router in the documentation for some use case, but a router configurable by other settings is just creating layers of indirection that ultimately result in needless complexity.

I imagine the majority of projects will have their encrypted models in the same database. To that end, something like this is sufficient:

def db_for_read(self, model, **hints):
    if getattr(model, "encrypted", False):
        return "encrypted-alias"
    return None

db_for_write = db_for_read

def allow_migrate(...):
   ...

What is "encrypted-alias" here? The literal name of the database or something that refers to the literal name of the database? ENCRYPTED_DB_ALIAS in the settings (now removed) was intended to allow users to tell us what they named the encrypted database. Do we need such a thing or not?

Having to declare a db_name on every encrypted model is adding boilerplate. For the use cases that do have some more complex needs, that logic should be in the router.

In the short term, I'll just create a base class with db_name='encrypted-alias' so that no one has to specify anything in their class.

Similarly for kms_provider, having to declare it per-model is likely overkill. I'm thinking that adding a new router method, e.g. Router.kms_provider(model), may be a viable path that doesn't reinvent the wheel. While it requires monkeypatching that new method onto django.db.utils.ConnectionRouter, I think it's the proper separation of concerns.

Same with KMS provider. The base class gets kms_provider='local' and no one has to do anything. I'm not saying we keep this approach, but I do want to down this road a bit further before deciding one way or another (not a fan of monkey patching, which I think we've managed to avoid so far?)

@timgraham
Copy link
Collaborator

Yes, "encrypted-alias" and ENCRYPTED_DB_ALIAS refer to the same concept. The issue I have with the latter is that anything build around a singular setting like that restricts the project to a singular alias. And in the event the user only has an encrypted database, they won't need a router or any notion of ENCRYPTED_DB_ALIAS.

Conceptually, adding model attributes for routing decisions just isn't clean, but feel free to defer my suggestion until you want to tackle it. Arguably, kms_provider could be appropriate as a custom model Meta option, but Django doesn't support them, and it adds a requirement to inherit from a particular Meta class if you want a project-wide default, so even if custom Meta attributes were supported, I think I'd lean toward the more flexible routing proposal. Having all configuration in the router rather than mixed between model and router seems cleaner.

We have some monkeypatching. Example: as_mql() methods,

_field_resolve_expression_parameter = FieldGetDbPrepValueIterableMixin.resolve_expression_parameter
,
TruncBase.convert_value = trunc_convert_value
. Really adding new methods (as_mql(), kms_provider()) that aren't going to conflict with any future Django functionality isn't so bad. I'd like to try to clean up the other examples by adding hooks in future version of Django.

@aclark4life
Copy link
Collaborator Author

And in the event the user only has an encrypted database, they won't need a router or any notion of ENCRYPTED_DB_ALIAS.

I would love to see this happen but how do we support it with content types requiring an unencrypted connection? If we were targeting a single encrypted database and connection, then I would agree we don't need to provide any database routers for the user. Since that is currently not the case, I don't think we can tell users that a minimum requirement for using this feature is to cut/paste Python code from the docs into a file then import and use it in their settings. That said, your point about not wanting to tie a bunch of configuration together is well taken.

@timgraham
Copy link
Collaborator

I would love to see this happen but how do we support it with content types requiring an unencrypted connection?

You can use Django without the contrib apps.


.. admonition:: Migrations support is limited

:djadmin:`makemigrations` does not detect changes to encrypted fields.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording was copied from changes to embedded fields, but the behavior here is different. makemigrations will detect changes to encrypted fields but trying to run those migrations will fail with server errors since changes to encrypted fields aren't supported by MongoDB.

Comment on lines +277 to +278
def kms_provider(self): # noqa: ARG001
return getattr(settings, "KMS_PROVIDER", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You misunderstood what I meant. The idea is that the kms_provider() would be a router method with a signature similar to db_for_read() (it takes a model). The user could implement the kms_provider method on their router if they to specify a kms_provider, either the same one for all models, or model by model.

All that said, kms_provider is an optional parameter of create_encrypted_collection() and there is no discussion of it in the design doc, so please first confirm we need it. Possibly it's only for explicit encryption? If it is needed, perhaps it will be helpful if I first sketch out the design before you dive into the implementation.

The documenation for ClientEncryption says, "Explicit client-side field level encryption." so I'm confused. I guess it's also needed for auto-encryption?

*, key_vault_namespace, crypt_shared_lib_path=None, kms_providers=None, schema_map=None
):
"""
Returns an `AutoEncryptionOpts` instance for use with Queryable Encryption.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct verb style (per some PEP) is "Return".

# `supports_transactions` already checks if the server is a
# replica set or sharded cluster.
is_not_single = self.supports_transactions
return is_enterprise and is_not_single and self.is_mongodb_7_0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return is_enterprise and is_not_single and self.is_mongodb_7_0
# TODO: check if the server is Atlas
return is_enterprise and is_not_single and self.is_mongodb_7_0

Comment on lines +285 to +286
ConnectionRouter.kms_credentials = kms_credentials
ConnectionRouter.kms_provider = kms_provider
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a list of database functions and ConnectionRouter isn't one of them. If we do need this, put it in routers.py.

app_config, conn.alias, include_auto_created=False
):
if getattr(model, "encrypted", False):
encrypted_fields = self.get_encrypted_fields(model, conn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still initialize with conn.schema_editor() as editor: outside the loops and change this line to editor._get_encrypted_fields_map(model).


self.stdout.write(json.dumps(schema_map, indent=2))

def generate_encrypted_fields_schema_map(self, conn):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 10 instances of conn = in Django, but it's far outnumbered by 86 instances of connection = so I'd prefer not to use the abbreviated name. I supppose I find it easier and more pleasurable not to have to read and interpret abbreviation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants