From d1ec1cad7ed36066af0688e63fab35548bd83e6d Mon Sep 17 00:00:00 2001 From: Tanner Stirrat Date: Wed, 13 Nov 2024 13:07:17 -0700 Subject: [PATCH 1/7] Remove some whitespace --- pages/_meta.json | 1 - 1 file changed, 1 deletion(-) diff --git a/pages/_meta.json b/pages/_meta.json index 6d8961b..b5a72cb 100644 --- a/pages/_meta.json +++ b/pages/_meta.json @@ -7,7 +7,6 @@ "title": "SpiceDB Documentation", "type": "page" }, - "authzed": { "title": "AuthZed Product Documentation", "type": "page" From 1d4bf92f0523c252784057651aafc712fc484f2c Mon Sep 17 00:00:00 2001 From: Tanner Stirrat Date: Wed, 13 Nov 2024 13:07:30 -0700 Subject: [PATCH 2/7] Add page to meta --- pages/spicedb/ops/_meta.json | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/pages/spicedb/ops/_meta.json b/pages/spicedb/ops/_meta.json index 52996fb..537c61a 100644 --- a/pages/spicedb/ops/_meta.json +++ b/pages/spicedb/ops/_meta.json @@ -1,5 +1,6 @@ { "observability": "Observability Tooling", "deploying-spicedb-operator": "Deploying the SpiceDB Operator", - "deploying-spicedb-on-eks": "Deploying SpiceDB on AWS EKS" + "deploying-spicedb-on-eks": "Deploying SpiceDB on AWS EKS", + "bulk-operations": "Bulk Importing Relationships" } From 72b77526d25b5af6a3fc2641afab135c16240721 Mon Sep 17 00:00:00 2001 From: Tanner Stirrat Date: Wed, 13 Nov 2024 13:08:04 -0700 Subject: [PATCH 3/7] Add docs on bulk operations --- pages/spicedb/ops/bulk-operations.mdx | 76 +++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 pages/spicedb/ops/bulk-operations.mdx diff --git a/pages/spicedb/ops/bulk-operations.mdx b/pages/spicedb/ops/bulk-operations.mdx new file mode 100644 index 0000000..539d736 --- /dev/null +++ b/pages/spicedb/ops/bulk-operations.mdx @@ -0,0 +1,76 @@ +# Bulk Importing Relationships + +## Overview + +When setting up a SpiceDB cluster for the first time, there's often a data ingest process required to +set up the initial set of relations. +This can be done with [`WriteRelationships`](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.PermissionsService.WriteRelationships) running in a loop, but you can only create 1,000 relationships at a time with this approach, and each transaction creates a new revision which incurs a bit of overhead. + +For faster ingest, we provide an [`ImportBulkRelationships`](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.PermissionsService.ImportBulkRelationships) call, which takes advantage of client-side gRPC streaming to accelerate the process and removes the cap on the number of relations that can be written at once. + +## Batching + +There are two batch sizes to consider: the number of relationships in a chunk written to the stream and the overall number of relationships in the lifetime of the request. +Breaking the request into chunks is a network optimization that makes it faster to push relationships from the client to the cluster. + +The overall number of relationships should reflect how many rows can easily be written in a single transaction by your datastore. +Note that you probably **don't** want to push all of your relationships through in a single request, as this could time out in your datastore. + +## Example + +We'll use the [authzed-dotnet](https://github.com/authzed/authzed-dotnet) client for this example. +Other client libraries will have different syntax and structures around their streaming and iteration, +but this should demonstrate the two different levels of chunking that we'll do in the process. + +```csharp +var TOTAL_RELATIONSHIPS_TO_WRITE = 1000; +var RELATIONSHIPS_PER_TRANSACTION = 100; +var RELATIONSHIPS_PER_REQUEST_CHUNK = 10; + +// Start by breaking the full list into a sequence of chunks where each chunk fits easily +// into a datastore transaction. +var transactionChunks = allRelationshipsToWrite.Chunk(RELATIONSHIPS_PER_TRANSACTION); + +foreach (var relationshipsForRequest in transactionChunks) { + // For each of those transaction chunks, break it down further into chunks that + // optimize for network throughput. + var requestChunks = relationshipsForRequest.Chunk(RELATIONSHIPS_PER_REQUEST_CHUNK); + // Open up a client stream to the server for this transaction chunk + using var importCall = permissionsService.ImportBulkRelationships(); + foreach (var requestChunk in requestChunks) { + // For each network chunk, write to the client stream. + // NOTE: this makes the calls sequentially rather than concurrently; this could be + // optimized further by using tasks. + await importCall.RequestStream.WriteAsync(new ImportBulkRelationshipsRequest{ + Relationships = { requestChunk } + }); + } + // When we're done with the transaction chunk, complete the call and process the response. + await importCall.RequestStream.CompleteAsync(); + var importResponse = await importCall; + Console.WriteLine("request successful"); + Console.WriteLine(importResponse.NumLoaded); + // Repeat! +} +``` + +The code for this example is [available here](https://github.com/authzed/authzed-dotnet/blob/main/examples/bulk-import/BulkImport/Program.cs). + +## Why does it work this way? + +SpiceDB's `ImportBulkRelationships` service uses [gRPC client streaming] as a network optimization. +It **does not** commit those relationships to your datastore as it receives them, but rather opens a database transaction +at the start of the call and then commits that transaction when the client ends the stream. + +This is because there isn't a good way to handle server-side errors in a commit-as-you-go approach. +If you use client streaming, you might receive an error that closes the stream, but that doesn't necessarily mean +that the last chunk you sent is where the error happened. +The error source could be sent as error context, but error handling and resumption would be difficult and cumbersome. + +A [gRPC bidirectional streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#bidirectional-streaming-rpc) approach could +help address this by ACKing each chunk individually, but that also requires a good amount of bookkeeping on the client to ensure +that every chunk that's written by the client has been acknowledged by the server. +Requiring multiple client-streaming requests means that you can use normal language error-handling flows +and know exactly what's been written to the server. + +[gRPC client streaming]: https://grpc.io/docs/what-is-grpc/core-concepts/#client-streaming-rpc From 10510e07f1a6fc4472886297d16e82909b591519 Mon Sep 17 00:00:00 2001 From: Tanner Stirrat Date: Wed, 13 Nov 2024 15:29:52 -0700 Subject: [PATCH 4/7] Add section on retryable client, reword why section --- pages/spicedb/ops/bulk-operations.mdx | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/pages/spicedb/ops/bulk-operations.mdx b/pages/spicedb/ops/bulk-operations.mdx index 539d736..88c0a72 100644 --- a/pages/spicedb/ops/bulk-operations.mdx +++ b/pages/spicedb/ops/bulk-operations.mdx @@ -4,7 +4,7 @@ When setting up a SpiceDB cluster for the first time, there's often a data ingest process required to set up the initial set of relations. -This can be done with [`WriteRelationships`](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.PermissionsService.WriteRelationships) running in a loop, but you can only create 1,000 relationships at a time with this approach, and each transaction creates a new revision which incurs a bit of overhead. +This can be done with [`WriteRelationships`](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.PermissionsService.WriteRelationships) running in a loop, but you can only create 1,000 relationships (by default) at a time with this approach, and each transaction creates a new revision which incurs a bit of overhead. For faster ingest, we provide an [`ImportBulkRelationships`](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.PermissionsService.ImportBulkRelationships) call, which takes advantage of client-side gRPC streaming to accelerate the process and removes the cap on the number of relations that can be written at once. @@ -56,6 +56,20 @@ foreach (var relationshipsForRequest in transactionChunks) { The code for this example is [available here](https://github.com/authzed/authzed-dotnet/blob/main/examples/bulk-import/BulkImport/Program.cs). +## Retrying and Resuming + +`ImportBulkrelationships`'s semantics only allow the creation of relationships. +If a relationship is imported that already exists in the database, it will error. +This can be frustrating when populating an instance if the process fails with a retryable error, such as those related to transient +network conditions. +The [authzed-go](https://github.com/authzed/authzed-go) client offers a [`RetryableClient`](https://github.com/authzed/authzed-go/blob/main/v1/retryable_client.go) +with retry logic built into its `ImportBulkRelationships` logic. + +This is used internally by [zed](https://github.com/authzed/zed) and is exposed by the `authzed-go` library, and works by +either skipping over the offending batch if the `Skip` strategy is used or falling back to `WriteRelationships` with a touch +semantic if the `Touch` strategy is used. +Similar logic can be implemented using the other client libraries. + ## Why does it work this way? SpiceDB's `ImportBulkRelationships` service uses [gRPC client streaming] as a network optimization. @@ -63,7 +77,9 @@ It **does not** commit those relationships to your datastore as it receives them at the start of the call and then commits that transaction when the client ends the stream. This is because there isn't a good way to handle server-side errors in a commit-as-you-go approach. -If you use client streaming, you might receive an error that closes the stream, but that doesn't necessarily mean +We take this approach because if we were to commit each chunk sent over the network, the semantics +of server-side errors are ambiguous. +For example, you might receive an error that closes the stream, but that doesn't necessarily mean that the last chunk you sent is where the error happened. The error source could be sent as error context, but error handling and resumption would be difficult and cumbersome. From 8f4b3afb14b42d81be761f71f1eb7fafd6d8241e Mon Sep 17 00:00:00 2001 From: Tanner Stirrat Date: Tue, 19 Nov 2024 08:54:21 -0700 Subject: [PATCH 5/7] Add tabs and add python --- pages/spicedb/ops/bulk-operations.mdx | 95 ++++++++++++++++++--------- 1 file changed, 64 insertions(+), 31 deletions(-) diff --git a/pages/spicedb/ops/bulk-operations.mdx b/pages/spicedb/ops/bulk-operations.mdx index 88c0a72..3ecaad0 100644 --- a/pages/spicedb/ops/bulk-operations.mdx +++ b/pages/spicedb/ops/bulk-operations.mdx @@ -1,3 +1,5 @@ +import { Tabs } from 'nextra/components' + # Bulk Importing Relationships ## Overview @@ -22,37 +24,68 @@ We'll use the [authzed-dotnet](https://github.com/authzed/authzed-dotnet) client Other client libraries will have different syntax and structures around their streaming and iteration, but this should demonstrate the two different levels of chunking that we'll do in the process. -```csharp -var TOTAL_RELATIONSHIPS_TO_WRITE = 1000; -var RELATIONSHIPS_PER_TRANSACTION = 100; -var RELATIONSHIPS_PER_REQUEST_CHUNK = 10; - -// Start by breaking the full list into a sequence of chunks where each chunk fits easily -// into a datastore transaction. -var transactionChunks = allRelationshipsToWrite.Chunk(RELATIONSHIPS_PER_TRANSACTION); - -foreach (var relationshipsForRequest in transactionChunks) { - // For each of those transaction chunks, break it down further into chunks that - // optimize for network throughput. - var requestChunks = relationshipsForRequest.Chunk(RELATIONSHIPS_PER_REQUEST_CHUNK); - // Open up a client stream to the server for this transaction chunk - using var importCall = permissionsService.ImportBulkRelationships(); - foreach (var requestChunk in requestChunks) { - // For each network chunk, write to the client stream. - // NOTE: this makes the calls sequentially rather than concurrently; this could be - // optimized further by using tasks. - await importCall.RequestStream.WriteAsync(new ImportBulkRelationshipsRequest{ - Relationships = { requestChunk } - }); - } - // When we're done with the transaction chunk, complete the call and process the response. - await importCall.RequestStream.CompleteAsync(); - var importResponse = await importCall; - Console.WriteLine("request successful"); - Console.WriteLine(importResponse.NumLoaded); - // Repeat! -} -``` + + + ```csharp + var TOTAL_RELATIONSHIPS_TO_WRITE = 1000; + var RELATIONSHIPS_PER_TRANSACTION = 100; + var RELATIONSHIPS_PER_REQUEST_CHUNK = 10; + + // Start by breaking the full list into a sequence of chunks where each chunk fits easily + // into a datastore transaction. + var transactionChunks = allRelationshipsToWrite.Chunk(RELATIONSHIPS_PER_TRANSACTION); + + foreach (var relationshipsForRequest in transactionChunks) { + // For each of those transaction chunks, break it down further into chunks that + // optimize for network throughput. + var requestChunks = relationshipsForRequest.Chunk(RELATIONSHIPS_PER_REQUEST_CHUNK); + // Open up a client stream to the server for this transaction chunk + using var importCall = permissionsService.ImportBulkRelationships(); + foreach (var requestChunk in requestChunks) { + // For each network chunk, write to the client stream. + // NOTE: this makes the calls sequentially rather than concurrently; this could be + // optimized further by using tasks. + await importCall.RequestStream.WriteAsync(new ImportBulkRelationshipsRequest{ + Relationships = { requestChunk } + }); + } + // When we're done with the transaction chunk, complete the call and process the response. + await importCall.RequestStream.CompleteAsync(); + var importResponse = await importCall; + Console.WriteLine("request successful"); + Console.WriteLine(importResponse.NumLoaded); + // Repeat! + } + ``` + + + ```python + from itertools import batched + + TOTAL_RELATIONSHIPS_TO_WRITE = 1_000 + + RELATIONSHIPS_PER_TRANSACTION = 100 + RELATIONSHIPS_PER_REQUEST_CHUNK = 10 + + # NOTE: batched takes a larger iterator and makes an iterator of smaller chunks out of it. + # We iterate over chunks of size RELATIONSHIPS_PER_TRANSACTION, and then we break each request into + # chunks of size RELATIONSHIPS_PER_REQUEST_CHUNK. + transaction_chunks = batched( + all_relationships_to_write, RELATIONSHIPS_PER_TRANSACTION + ) + for relationships_for_request in transaction_chunks: + request_chunks = batched(relationships_for_request, RELATIONSHIPS_PER_REQUEST_CHUNK) + response = client.ImportBulkRelationships( + ( + ImportBulkRelationshipsRequest(relationships=relationships_chunk) + for relationships_chunk in request_chunks + ) + ) + print("request successful") + print(response.num_loaded) + ``` + + The code for this example is [available here](https://github.com/authzed/authzed-dotnet/blob/main/examples/bulk-import/BulkImport/Program.cs). From b867b8096dbb3e97199c4334acf0107764777371 Mon Sep 17 00:00:00 2001 From: Sohan <1119120+sohanmaheshwar@users.noreply.github.com> Date: Wed, 20 Nov 2024 12:00:07 +0100 Subject: [PATCH 6/7] Update bulk-operations.mdx Minor typo in API name --- pages/spicedb/ops/bulk-operations.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/spicedb/ops/bulk-operations.mdx b/pages/spicedb/ops/bulk-operations.mdx index 3ecaad0..8b58899 100644 --- a/pages/spicedb/ops/bulk-operations.mdx +++ b/pages/spicedb/ops/bulk-operations.mdx @@ -91,7 +91,7 @@ The code for this example is [available here](https://github.com/authzed/authzed ## Retrying and Resuming -`ImportBulkrelationships`'s semantics only allow the creation of relationships. +`ImportBulkRelationships`'s semantics only allow the creation of relationships. If a relationship is imported that already exists in the database, it will error. This can be frustrating when populating an instance if the process fails with a retryable error, such as those related to transient network conditions. From 0d1880dd679e3e2b7553abe7de595923a7f79795 Mon Sep 17 00:00:00 2001 From: Tanner Stirrat Date: Wed, 20 Nov 2024 08:25:01 -0700 Subject: [PATCH 7/7] Change name to dotnet --- pages/spicedb/ops/bulk-operations.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/spicedb/ops/bulk-operations.mdx b/pages/spicedb/ops/bulk-operations.mdx index 3ecaad0..f36f4c1 100644 --- a/pages/spicedb/ops/bulk-operations.mdx +++ b/pages/spicedb/ops/bulk-operations.mdx @@ -24,7 +24,7 @@ We'll use the [authzed-dotnet](https://github.com/authzed/authzed-dotnet) client Other client libraries will have different syntax and structures around their streaming and iteration, but this should demonstrate the two different levels of chunking that we'll do in the process. - + ```csharp var TOTAL_RELATIONSHIPS_TO_WRITE = 1000;