Skip to content

INTPYTHON-527 Add Queryable Encryption support #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 88 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
bc52c8e
INTPYTHON-527 Add Queryable Encryption support
aclark4life Jun 25, 2025
38fb110
Fix test for unencrypted field not in field map
aclark4life Jun 27, 2025
65bd15a
Fix test for unencrypted field not in field map
aclark4life Jun 27, 2025
e08945b
Add comment about suppressing EncryptedCollectionError
aclark4life Jun 27, 2025
7b34b44
Don't rely on features to fall back to unencrypted
aclark4life Jun 27, 2025
8e83ada
Remove _nodb_cursor and disable version check
aclark4life Jun 28, 2025
4da895c
Don't surpress encrypted error
aclark4life Jun 28, 2025
ed54a9b
Rename get_encrypted_client -> get_client_encryption
aclark4life Jun 28, 2025
8a7766c
Add encryption router
aclark4life Jun 30, 2025
eab2f2e
Add "encryption" database to encryption tests
aclark4life Jun 30, 2025
10a361e
Move encrypted_fields_map to schema (1/2)
aclark4life Jul 1, 2025
01d5485
Move encrypted_fields_map to schema (2/x)
aclark4life Jul 1, 2025
db32487
Refactor helpers
aclark4life Jul 2, 2025
b2be223
Restore get_database_version functionality
aclark4life Jul 2, 2025
27d4b8e
Move encrypted router to tests
aclark4life Jul 2, 2025
c4d1c66
Fix router tests
aclark4life Jul 2, 2025
2772aff
Test feature `supports_queryable_encryption`
aclark4life Jul 2, 2025
d2ddf4e
Add path and bsonType to _get_encrypted_fields_map
aclark4life Jul 2, 2025
e25357e
Use the right database; rename some vars
aclark4life Jul 2, 2025
6487086
Refactor helpers again
aclark4life Jul 2, 2025
bc76db3
Allow user to customize some QE settings.
aclark4life Jul 2, 2025
4dbaa8f
Allow uer to customize KMS provider.
aclark4life Jul 2, 2025
9cc5ad2
Refactor
aclark4life Jul 2, 2025
c751b2d
Alpha sort helper functions
aclark4life Jul 2, 2025
b13a07f
Fix get_database_version
aclark4life Jul 3, 2025
534da6b
A better fix for using `buildInfo` command.
aclark4life Jul 3, 2025
13578ab
Add `queries` key to encrypted fields map
aclark4life Jul 4, 2025
3342d7f
Update django_mongodb_backend/schema.py
aclark4life Jul 7, 2025
9fd21e4
Update django_mongodb_backend/schema.py
aclark4life Jul 7, 2025
9bbe741
Update tests/encryption_/models.py
aclark4life Jul 7, 2025
d1eb737
Update tests/encryption_/models.py
aclark4life Jul 7, 2025
176f016
Fix conditional
aclark4life Jul 7, 2025
264b37a
Use column instead of name
aclark4life Jul 7, 2025
1771f56
Avoid double conditional
aclark4life Jul 7, 2025
819058a
Update tests and remove test router
aclark4life Jul 7, 2025
9a3c18e
Update django_mongodb_backend/fields/encryption.py
aclark4life Jul 7, 2025
071192e
Add deconstruct method for encryption fields
aclark4life Jul 7, 2025
b2a0534
Add setup & teardown for QE features test
aclark4life Jul 7, 2025
81cc887
Add query type classes and update test
aclark4life Jul 8, 2025
be3dd16
Add missing queries to deconstruct
aclark4life Jul 8, 2025
a2342e2
Add get_encrypted_fields_map management command
aclark4life Jul 8, 2025
05a7610
Add EncryptedRouter
aclark4life Jul 8, 2025
96b3fda
Optimistically add QE to release notes :-)
aclark4life Jul 8, 2025
1eb71d5
Fix label
aclark4life Jul 8, 2025
08209d3
Save encrypted models to encrypted db
aclark4life Jul 9, 2025
90fe562
Refactor and rename QueryTypes -> QueryType
aclark4life Jul 9, 2025
8c2b84c
Refactor, reword, alpha sort, add comments.
aclark4life Jul 9, 2025
ab680fd
Alpha-sort
aclark4life Jul 9, 2025
4a267f5
Document-driven design
aclark4life Jul 9, 2025
3fdc1f7
Document-driven design
aclark4life Jul 9, 2025
d562a76
Document-driven design
aclark4life Jul 9, 2025
163758d
Add encryption.rst
aclark4life Jul 9, 2025
b95c343
Make key_vault_namespace a required kwarg
aclark4life Jul 9, 2025
5205a0b
Reuse schema editor to create encrypted fields map
aclark4life Jul 9, 2025
b07c3e6
Add --database to get_encrypted_fields_map command
aclark4life Jul 9, 2025
e557632
Add WIP configuration docs
aclark4life Jul 9, 2025
c5f8888
Add check for mongodb 7.0
aclark4life Jul 9, 2025
a7bc5c5
Let's go with "Queryable Encryption" everywhere.
aclark4life Jul 9, 2025
09423bc
Update django_mongodb_backend/fields/encryption.py
aclark4life Jul 9, 2025
c756cf8
Update tests/encryption_/tests.py
aclark4life Jul 9, 2025
841797c
Update tests/encryption_/tests.py
aclark4life Jul 9, 2025
2386397
Remove gratuitous use of with and append
aclark4life Jul 10, 2025
d685d2a
Always use `assertRaisesMessage` for > precision
aclark4life Jul 10, 2025
08ea317
only include migratable models for given database
aclark4life Jul 10, 2025
3e839d7
Refactor QueryType, add encryptino_ migration
aclark4life Jul 10, 2025
75c6936
Refactor tests and fix schema test
aclark4life Jul 10, 2025
534452f
Remove migration, already tested by schema
aclark4life Jul 10, 2025
bf26a8a
Router & schema updates
aclark4life Jul 10, 2025
bf078ad
Re-add test routers
aclark4life Jul 10, 2025
2780e32
Fix test router
aclark4life Jul 10, 2025
31d3feb
Remove ENCRYPTED_DB_ALIAS, ENCRYPTED_APPS
aclark4life Jul 10, 2025
b005726
Get rid of more settings
aclark4life Jul 10, 2025
e7290e4
Remove router allow_relation
aclark4life Jul 10, 2025
76deec0
Use class method
aclark4life Jul 10, 2025
02ce21e
Remove ENCRYPTED_DB_ALIAS
aclark4life Jul 10, 2025
c8a5118
Rename Person to Patient to match tutorial
aclark4life Jul 11, 2025
39f1cbc
queries only takes a single object
timgraham Jul 11, 2025
e504fc5
Move kms_provder to monkeypatch'd ConnectionRouter
aclark4life Jul 11, 2025
0aa423f
Check settings for KMS_PROVIDER & add test.
aclark4life Jul 11, 2025
c27be37
Remove get_key_vault_namespace
aclark4life Jul 12, 2025
13de3bb
Remove get_kms_providers, get_customer_master_key
aclark4life Jul 12, 2025
7e3cd34
Update QE config docs
aclark4life Jul 12, 2025
4a9daa7
Add remaining KMS providers
aclark4life Jul 12, 2025
516642f
Look out for more credentials!
aclark4life Jul 12, 2025
a319e8e
Move encrypted db name back to router
aclark4life Jul 12, 2025
5807033
Remove comments
aclark4life Jul 12, 2025
37e7e06
Remove comments
aclark4life Jul 12, 2025
f19c901
Update comment
aclark4life Jul 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion django_mongodb_backend/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,4 +286,7 @@ def validate_no_broken_transaction(self):

def get_database_version(self):
"""Return a tuple of the database's version."""
return tuple(self.connection.server_info()["versionArray"])
# Avoid PyMongo or require PyMongo>=4.14.0 which
# will contain a fix for the buildInfo command.
# https://jira.mongodb.org/browse/PYTHON-5429
return tuple(self.connection.admin.command("buildInfo")["versionArray"])
129 changes: 129 additions & 0 deletions django_mongodb_backend/encryption.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Queryable Encryption helpers
import os

from bson.binary import STANDARD
from bson.codec_options import CodecOptions
from pymongo.encryption import AutoEncryptionOpts, ClientEncryption

KEY_VAULT_COLLECTION_NAME = "__keyVault"
KEY_VAULT_DATABASE_NAME = "keyvault"
KEY_VAULT_NAMESPACE = f"{KEY_VAULT_DATABASE_NAME}.{KEY_VAULT_COLLECTION_NAME}"
KMS_CREDENTIALS = {
"aws": {
"key": os.getenv("AWS_KEY_ARN", ""),
"region": os.getenv("AWS_KEY_REGION", ""),
},
"azure": {
"keyName": os.getenv("AZURE_KEY_NAME", ""),
"keyVaultEndpoint": os.getenv("AZURE_KEY_VAULT_ENDPOINT", ""),
},
"gcp": {
"projectId": os.getenv("GCP_PROJECT_ID", ""),
"location": os.getenv("GCP_LOCATION", ""),
"keyRing": os.getenv("GCP_KEY_RING", ""),
"keyName": os.getenv("GCP_KEY_NAME", ""),
},
"kmip": {},
"local": {},
}
KMS_PROVIDERS = {
"aws": {
"accessKeyId": os.getenv("AWS_ACCESS_KEY_ID", "not an access key"),
"secretAccessKey": os.getenv("AWS_SECRET_ACCESS_KEY", "not a secret key"),
},
"azure": {
"tenantId": os.getenv("AZURE_TENANT_ID", "not a tenant ID"),
"clientId": os.getenv("AZURE_CLIENT_ID", "not a client ID"),
"clientSecret": os.getenv("AZURE_CLIENT_SECRET", "not a client secret"),
},
# TODO: Provide a valid test key
#
# "Failed to parse KMS provider gcp: unable to parse base64 from UTF-8 field privateKey"
#
# "gcp": {
# "email": os.getenv("GCP_EMAIL", "not an email"),
# "privateKey": os.getenv("GCP_PRIVATE_KEY", "not a private key"),
# },
"kmip": {
"endpoint": os.getenv("KMIP_KMS_ENDPOINT", "not a valid endpoint"),
},
"local": {
"key": bytes.fromhex(
"000102030405060708090a0b0c0d0e0f"
"101112131415161718191a1b1c1d1e1f"
"202122232425262728292a2b2c2d2e2f"
"303132333435363738393a3b3c3d3e3f"
"404142434445464748494a4b4c4d4e4f"
"505152535455565758595a5b5c5d5e5f"
)
},
}


class EncryptedRouter:
def _get_db_for_model(self, model):
if getattr(model, "encrypted", False):
return "encrypted"
return "default"

def db_for_read(self, model, **hints):
return self._get_db_for_model(model)

def db_for_write(self, model, **hints):
return self._get_db_for_model(model)

def allow_migrate(self, db, app_label, model_name=None, model=None, **hints):
if model:
return db == self._get_db_for_model(model)
return db == "default"


class QueryType:
"""
Class that supports building encrypted equality and range queries
for MongoDB's Queryable Encryption.
"""

@classmethod
def equality(cls, *, contention=None):
query = {"queryType": "equality"}
if contention is not None:
query["contention"] = contention
return query

@classmethod
def range(cls, *, sparsity=None, precision=None, trimFactor=None):
query = {"queryType": "range"}
if sparsity is not None:
query["sparsity"] = sparsity
if precision is not None:
query["precision"] = precision
if trimFactor is not None:
query["trimFactor"] = trimFactor
return query


def get_auto_encryption_opts(
*, key_vault_namespace, crypt_shared_lib_path=None, kms_providers=None, schema_map=None
):
"""
Returns an `AutoEncryptionOpts` instance for use with Queryable Encryption.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct verb style (per some PEP) is "Return".

"""
# WARNING: Provide a schema map for production use. You can generate a schema map
# with the management command `get_encrypted_fields_map` after adding
# django_mongodb_backend to INSTALLED_APPS.
return AutoEncryptionOpts(
key_vault_namespace=key_vault_namespace,
kms_providers=kms_providers,
crypt_shared_lib_path=crypt_shared_lib_path,
schema_map=schema_map,
)


def get_client_encryption(client, key_vault_namespace=None, kms_providers=None):
"""
Returns a `ClientEncryption` instance for use with Queryable Encryption.
"""

codec_options = CodecOptions(uuid_representation=STANDARD)
return ClientEncryption(kms_providers, key_vault_namespace, client, codec_options)
19 changes: 19 additions & 0 deletions django_mongodb_backend/features.py
Original file line number Diff line number Diff line change
Expand Up @@ -592,6 +592,10 @@ def django_test_expected_failures(self):
def is_mongodb_6_3(self):
return self.connection.get_database_version() >= (6, 3)

@cached_property
def is_mongodb_7_0(self):
return self.connection.get_database_version() >= (7, 0)

@cached_property
def supports_atlas_search(self):
"""Does the server support Atlas search queries and search indexes?"""
Expand Down Expand Up @@ -624,3 +628,18 @@ def supports_transactions(self):
hello = client.command("hello")
# a replica set or a sharded cluster
return "setName" in hello or hello.get("msg") == "isdbgrid"

@cached_property
def supports_queryable_encryption(self):
"""
Queryable Encryption is supported if the server is Atlas or Enterprise
and is configured as a replica set or sharded cluster.
"""
self.connection.ensure_connection()
client = self.connection.connection.admin
build_info = client.command("buildInfo")
is_enterprise = "enterprise" in build_info.get("modules")
# `supports_transactions` already checks if the server is a
# replica set or sharded cluster.
is_not_single = self.supports_transactions
return is_enterprise and is_not_single and self.is_mongodb_7_0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return is_enterprise and is_not_single and self.is_mongodb_7_0
# TODO: check if the server is Atlas
return is_enterprise and is_not_single and self.is_mongodb_7_0

2 changes: 2 additions & 0 deletions django_mongodb_backend/fields/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from .duration import register_duration_field
from .embedded_model import EmbeddedModelField
from .embedded_model_array import EmbeddedModelArrayField
from .encryption import EncryptedCharField
from .json import register_json_field
from .objectid import ObjectIdField

Expand All @@ -11,6 +12,7 @@
"ArrayField",
"EmbeddedModelArrayField",
"EmbeddedModelField",
"EncryptedCharField",
"ObjectIdAutoField",
"ObjectIdField",
]
Expand Down
23 changes: 23 additions & 0 deletions django_mongodb_backend/fields/encryption.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from django.db import models


class EncryptedCharField(models.CharField):
encrypted = True

def __init__(self, *args, queries=None, **kwargs):
self.queries = queries
super().__init__(*args, **kwargs)

def deconstruct(self):
name, path, args, kwargs = super().deconstruct()

if self.queries is not None:
kwargs["queries"] = self.queries

if path.startswith("django_mongodb_backend.fields.encryption"):
path = path.replace(
"django_mongodb_backend.fields.encryption",
"django_mongodb_backend.fields",
)

return name, path, args, kwargs
12 changes: 12 additions & 0 deletions django_mongodb_backend/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@
Trim,
Upper,
)
from django.db.utils import ConnectionRouter

from .encryption import KMS_CREDENTIALS
from .query_utils import process_lhs

MONGO_OPERATORS = {
Expand Down Expand Up @@ -268,10 +270,20 @@ def trunc_time(self, compiler, connection):
}


def kms_credentials(self, provider): # noqa: ARG001
return KMS_CREDENTIALS.get(provider, None)


def kms_provider(self): # noqa: ARG001
return getattr(settings, "KMS_PROVIDER", None)
Comment on lines +277 to +278
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You misunderstood what I meant. The idea is that the kms_provider() would be a router method with a signature similar to db_for_read() (it takes a model). The user could implement the kms_provider method on their router if they to specify a kms_provider, either the same one for all models, or model by model.

All that said, kms_provider is an optional parameter of create_encrypted_collection() and there is no discussion of it in the design doc, so please first confirm we need it. Possibly it's only for explicit encryption? If it is needed, perhaps it will be helpful if I first sketch out the design before you dive into the implementation.

The documenation for ClientEncryption says, "Explicit client-side field level encryption." so I'm confused. I guess it's also needed for auto-encryption?



def register_functions():
Cast.as_mql = cast
Concat.as_mql = concat
ConcatPair.as_mql = concat_pair
ConnectionRouter.kms_credentials = kms_credentials
ConnectionRouter.kms_provider = kms_provider
Comment on lines +285 to +286
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a list of database functions and ConnectionRouter isn't one of them. If we do need this, put it in routers.py.

Cot.as_mql = cot
Extract.as_mql = extract
Func.as_mql = func
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import json

from django.apps import apps
from django.core.management.base import BaseCommand
from django.db import DEFAULT_DB_ALIAS, connections, router


class Command(BaseCommand):
help = "Generate a `schema_map` of encrypted fields for all encrypted"
" models in the database for use with `get_autoencryption_opts` in"
" production environments."

def add_arguments(self, parser):
parser.add_argument(
"--database",
default=DEFAULT_DB_ALIAS,
help="Specify the database to use for generating the encrypted"
"fields map. Defaults to the 'default' database.",
)

def handle(self, *args, **options):
db = options["database"]
connection = connections[db]

schema_map = self.generate_encrypted_fields_schema_map(connection)

self.stdout.write(json.dumps(schema_map, indent=2))

def generate_encrypted_fields_schema_map(self, conn):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 10 instances of conn = in Django, but it's far outnumbered by 86 instances of connection = so I'd prefer not to use the abbreviated name. I supppose I find it easier and more pleasurable not to have to read and interpret abbreviation.

schema_map = {}

for app_config in apps.get_app_configs():
for model in router.get_migratable_models(
app_config, conn.alias, include_auto_created=False
):
if getattr(model, "encrypted", False):
encrypted_fields = self.get_encrypted_fields(model, conn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still initialize with conn.schema_editor() as editor: outside the loops and change this line to editor._get_encrypted_fields_map(model).

if encrypted_fields:
collection = model._meta.db_table
schema_map[collection] = {"fields": encrypted_fields}

return schema_map

def get_encrypted_fields(self, model, conn):
return conn.schema_editor()._get_encrypted_fields_map(model)
7 changes: 7 additions & 0 deletions django_mongodb_backend/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,10 @@ def delete(self, *args, **kwargs):

def save(self, *args, **kwargs):
raise NotSupportedError("EmbeddedModels cannot be saved.")


class EncryptedModel(models.Model):
encrypted = True

class Meta:
abstract = True
59 changes: 56 additions & 3 deletions django_mongodb_backend/schema.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
from django.core.exceptions import ImproperlyConfigured
from django.db import router
from django.db.backends.base.schema import BaseDatabaseSchemaEditor
from django.db.models import Index, UniqueConstraint
from pymongo.operations import SearchIndexModel

from django_mongodb_backend.indexes import SearchIndex

from .encryption import get_client_encryption
from .fields import EmbeddedModelField
from .indexes import SearchIndex
from .query import wrap_database_errors
from .utils import OperationCollector

Expand Down Expand Up @@ -41,7 +43,7 @@ def get_database(self):
@wrap_database_errors
@ignore_embedded_models
def create_model(self, model):
self.get_database().create_collection(model._meta.db_table)
self._create_collection(model)
self._create_model_indexes(model)
# Make implicit M2M tables.
for field in model._meta.local_many_to_many:
Expand Down Expand Up @@ -418,3 +420,54 @@ def _field_should_have_unique(self, field):
db_type = field.db_type(self.connection)
# The _id column is automatically unique.
return db_type and field.unique and field.column != "_id"

def _create_collection(self, model):
"""
If the model is encrypted create an encrypted collection with the
encrypted fields map else create a normal collection.
"""

db = self.get_database()
if getattr(model, "encrypted", False):
client = self.connection.connection

options = client._options.auto_encryption_opts
key_vault_namespace = options._key_vault_namespace
kms_providers = options._kms_providers

ce = get_client_encryption(
client,
key_vault_namespace=key_vault_namespace,
kms_providers=kms_providers,
)
if not router.kms_provider():
raise ImproperlyConfigured(
"No KMS_PROVIDER found. Please configure KMS_PROVIDER in settings."
)
provider = router.kms_provider()
credentials = router.kms_credentials(provider)
ce.create_encrypted_collection(
db,
model._meta.db_table,
self._get_encrypted_fields_map(model),
provider,
credentials,
)
else:
db.create_collection(model._meta.db_table)

def _get_encrypted_fields_map(self, model):
conn = self.connection
fields = model._meta.fields

return {
"fields": [
{
"path": field.column,
"bsonType": field.db_type(conn),
**({"queries": field.queries} if getattr(field, "queries", None) else {}),
}
for field in fields
if getattr(field, "encrypted", False)
]
}
Loading