From e3c28394ce0c7b89750282ab23168418d9c0a721 Mon Sep 17 00:00:00 2001 From: Petra Jaros Date: Mon, 7 Oct 2024 11:51:40 -0400 Subject: [PATCH 1/3] "Egress, authorized by UCAN": First draft --- rfc/egress-with-ucan.md | 159 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 rfc/egress-with-ucan.md diff --git a/rfc/egress-with-ucan.md b/rfc/egress-with-ucan.md new file mode 100644 index 0000000..ead6663 --- /dev/null +++ b/rfc/egress-with-ucan.md @@ -0,0 +1,159 @@ +# Egress, authorized by UCAN + +## Goals + +* We (Storacha) want to charge users for data egress when retrieving data through our Gateway. +* We want to provide the same free option we offer today, with a rate-limit. +* We want to continue to provide access through Bitswap, but make it optional. +* We eventually want to allow Storage Nodes to offer their own direct egress, bypassing our Gateway, and to charge the customer according to an agreement with them. + +## Abstract + +A proposed means for offering content egress from the Storacha Network which can be selectively authorized by UCAN, and which can track egress to bill appropriate customers. As usual within the Storacha Network, we begin from self-sovereignty: a Space definitionally has the authority to do anything with its content. + +To make content available as on a traditional CDN, an Agent acts on authority of a Space to make that Space's contents, or some specific CID in it, available using a bearer token, an opaque, unguessable string. The Gateway then responds to requests which contain a token by validating the proof chain, finding the content, charging for egress, and proxying it to the requester. + +This process does *not* include making and tracking Location Commitments. A Location Commitment is an attestation by a Storage Node that it holds a particular piece of content on behalf of a particular Space, and that it can provide it. In this proposal, we assume such a system already exists. + +## A new command *(ability)*: `/space/content/retrieve` *(`space/content/retrieve`)* + +We will add a new UCAN command *(or in UCAN 0.9, ability)* to our system, called `/space/content/retrieve` *(or in UCAN 0.9, `space/content/retrieve`)*. + +> **Command:** `/space/content/retrieve` *(0.9: `space/content/retrieve`)* +> +> * **Meaning:** To retrieve a piece of content from a Space. +> * **Subject:** The Space from which content will be retrieved. +> * **Arguments *(0.9: `nb`)*:** +> * `cid`: The CID of the content which will be retrieved. +> * **Receipt:** [TBD, but must provide instructions to access the data (using HTTP?) without further UCAN authorization.] + +The Space may then delegate this to another Principal to give them authority to access the Space's content. Typically, this will not be done directly (though it may), but indirectly through an Account and an Agent: the Space will delegate all of its capability to an Account, which will delegate all of *its* capability to an Agent when it logs in. Then the Agent (ie, the logged-in customer) can share access to the content as they see fit. + +## A new DID method: `did:bearer` + +Sometimes, it's impractical to delegate retrieval to every entity which may want to get the content from a Space. For instance, ordinary users browsing a website cannot reasonably acquire a delegation giving them access to each image on the page. The images must be bound to the Space's billing account, and should be rate-limitable to avoid excessive charges and abuse, but also must be accessible with a simple HTTP GET. + +Traditional content delivery networks often use a **token** for analogous purposes. A CDN customer receives a token from the service which they attach to the URLs of their assets. When those URLs are fetched, the CDN uses the token to charge the correct customer and to allow the customer to shut off abusive access. The token is a bearer token: simply knowing and presenting the token is enough to authorize the HTTP request. + +The `did:bearer` method provides a similar scheme within a DID, which can then be applied to UCAN. The DID includes a token, and anyone who presents that token is by definition authenticated as that DID. This allows for a simple authorization bridge from an HTTP bearer token scheme to a UCAN delegation chain. + +### Format + +The format for the `did:bearer` method conforms to the [DID specification](https://www.w3.org/TR/did-core/). It consists of the `did:bearer` prefix, followed by the percent-encoded token. The ABNF for the format is described below, using the ABNF syntax described in [RFC 5234](https://www.rfc-editor.org/rfc/rfc5234). The definition of `idchar` is found in the [DID syntax](https://www.w3.org/TR/did-core/#did-syntax). + +```rfc5234 +did-bearer-format = "did:bearer:" idchar +``` + +### Operations + +The following section outlines the DID operations for the `did:bearer` method. + +#### Create + +Creating a `did:bearer` value begins with a token (a string) which it should represent. + +1. Percent-encode the token value, as specified in [RFC 3986 Section 2.1](https://www.rfc-editor.org/rfc/rfc3986#section-2.1). +2. Prepend the result with `did:bearer:`. + +For example, the token `abc$*)123` would be represented by the DID `did:bearer:abc%24%2a%29123`. Anyone presenting the token `abc$*)123` by any means a service accepts may be considered by that service to be the [DID subject](https://www.w3.org/TR/did-core/#dfn-did-subjects) of `did:bearer:abc%24%2a%29123`. + +The DID Document for a `did:bearer` DID is extremely simple. A `did:bearer` DID is associated with no cryptographic keys. It is not capable of signing. All authentication occurs out of band, so it has no verification methods. The DID Document is therefore trivially generated from the DID. An example is given below that expands the DID `did:bearer:abcde12345` into its associated DID Document: + +```json +{ + "@context": "https://www.w3.org/ns/did/v1", + "id": "did:bearer:abcde12345" +} +``` + +#### Read + +Reading a `did:bearer` value is a matter of deterministically expanding the value to a DID Document. This process is described above in [Create](#create). + +#### Update + +This DID Method does not support updating the DID Document. + +#### Deactivate + +This DID Method does not support deactivating the DID Document. + +## Making content retrievable with a token + +A UCAN delegation is used to allow token access to content. Typically, this is performed by an Agent within a Storacha Client, who has the authority to `/space/content/retrieve` from the Space by way of logging into an Account—thus, the delegation chain flows Space → Account → Agent → Token. + +```json5 +// UCAN 1.0 +{ + "iss": "did:key:zAgent", + "aud": "did:bearer:abcde12345", + "sub": "did:key:zSpace", + "cmd": "/space/content/retrieve", + "pol": [ + ["==", ".cid", "bafy...7pcu"], + ], + "nonce": …, + "exp": … +} +``` + +```json5 +// UCAN 0.9 +{ + "iss": "did:key:zAgent", + "aud": "did:bearer:abcde12345", + "att": [ + { + "with": "did:key:zSpace", + "can": "space/content/retrieve", + "nb": { + "cid": "bafy...7pcu" + }, + } + ], + "nnc": …, + "exp": …, + "prf": […] +} +``` + +The delegation must be available to the Executor at invocation-time. Since the Invoker will be using a token and not speaking UCAN, they will not be able to deliver the proof, so the Executor must have access to it in a store. The Client should therefore invoke `access/delegate` (UCAN 1.0 equivalent TBD) to store the delegation with Storacha. + +## The Gateway + +To retrieve a piece of content over HTTP, the Retriever will connect to a Gateway. At first, there will be exactly one Gateway: the existing Storacha Gateway (also known as the web3.storage Gateway). Later, as Storage Nodes become more sophisticated, they may decide to run their own Gateways, offering their own direct egress to customers. + +The Gateway will offer an HTTP endpoint. Currently, the Storacha Gateway's endpoint takes the form of `https://.ipfs.w3s.link/`. The Gateway will accept a token as part of the URL. To serve the request, the Gateway will perform the following steps: + +1. Look up the Location Commitment for the given CID. If not found, respond with 404 Not Found. +2. If there is no token given, serve the request through the standard rate-limiter. (This behavior is likely to change in the future.) +3. If there is a token given (eg, `abcde12345`), build its corresponding DID (eg, `did:bearer:abcde12345`). +4. Look up all delegations with the token DID as audience. +5. Attempt to prove the ability to invoke `/space/content/retrieve` on the Space listed in Location Commitment, with the given CID. If more than one Location Commitment is found, attempt each in turn: a CID may be stored in multiple Spaces, and the token may be able to retrieve the content through one and not another. +6. If no such proof chain is available, respond with 401 Unauthorized. [or 404 Not Found?] +7. If a proof chain is found, execute the invocation on the Executor. +8. Using the information in the receipt, fetch the content and proxy it as the response. + +## The Executor + +The [Executor](https://github.com/ucan-wg/invocation?tab=readme-ov-file#executor) of the `/space/content/retrieve` invocation will be the same service as the Gateway. [It would also be possible for Storage Nodes to run their own Executors for the central Storacha Gateway to use, but this would require more sophistication from the Storage Nodes without the benefit to them of offering their own egress service.] + +To execute the invocation, the Executor will perform the following steps: + +1. Validate the proof chain. +2. [Look up the Location Commitment again? :/ Should it become an argument?] +3. [Charge the Space for the egress.] +4. [Produce a suitable receipt.] + +## [To Come] + +* Rather than serve non-token content rate-limited by default, require a delegation of `/space/content/retrieve` to some DID representing "anyone". +* Bitswap should execute a `/space/content/retrieve` as well to respond to requests. Bitswap should be authorized by delegating to some DID representing Bitswap/Hoverboard. + +## Open questions + +* What does the `/space/content/retrieve` receipt look like? +* Can you enumerate the contents of a space? Is that in scope here? +* What does a Gateway URLs with a token look like? +* What can be cached? This seems relatively cacheable, but we should be explicit in the design to make sure we're on a suitable path. This process needs to be *fast*, at least once the cache is warm. From cddd71bfb1d5324bf182753a626ebc9682604dc4 Mon Sep 17 00:00:00 2001 From: Petra Jaros Date: Fri, 11 Oct 2024 12:54:36 -0400 Subject: [PATCH 2/3] Egress proposal, round 2 --- rfc/egress-with-ucan.md | 222 ++++++++++++++++++++++++---------------- 1 file changed, 135 insertions(+), 87 deletions(-) diff --git a/rfc/egress-with-ucan.md b/rfc/egress-with-ucan.md index ead6663..de1bd1f 100644 --- a/rfc/egress-with-ucan.md +++ b/rfc/egress-with-ucan.md @@ -1,97 +1,120 @@ # Egress, authorized by UCAN +## Authors + +- [Petra Jaros](https://github.com/Peeja), [Storacha Network](https://storacha.network/) + +## Language + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). + ## Goals -* We (Storacha) want to charge users for data egress when retrieving data through our Gateway. +* We (Storacha) want to charge customer for data egress when retrieving data through our Gateway. * We want to provide the same free option we offer today, with a rate-limit. -* We want to continue to provide access through Bitswap, but make it optional. -* We eventually want to allow Storage Nodes to offer their own direct egress, bypassing our Gateway, and to charge the customer according to an agreement with them. +* We want to allow customers to make their data private: inaccessible without a token and inaccessible through Bitswap. ## Abstract A proposed means for offering content egress from the Storacha Network which can be selectively authorized by UCAN, and which can track egress to bill appropriate customers. As usual within the Storacha Network, we begin from self-sovereignty: a Space definitionally has the authority to do anything with its content. -To make content available as on a traditional CDN, an Agent acts on authority of a Space to make that Space's contents, or some specific CID in it, available using a bearer token, an opaque, unguessable string. The Gateway then responds to requests which contain a token by validating the proof chain, finding the content, charging for egress, and proxying it to the requester. +To make content available as on a traditional CDN, an Agent acts on authority of a Space to authorize the Gateway to serve that Space's contents to anyone bearing a token, an opaque, unguessable string. The Gateway then responds to requests which contain a token by validating the proof chain, finding the content, charging for egress, and proxying it to the requester. This process does *not* include making and tracking Location Commitments. A Location Commitment is an attestation by a Storage Node that it holds a particular piece of content on behalf of a particular Space, and that it can provide it. In this proposal, we assume such a system already exists. -## A new command *(ability)*: `/space/content/retrieve` *(`space/content/retrieve`)* - -We will add a new UCAN command *(or in UCAN 0.9, ability)* to our system, called `/space/content/retrieve` *(or in UCAN 0.9, `space/content/retrieve`)*. - -> **Command:** `/space/content/retrieve` *(0.9: `space/content/retrieve`)* -> -> * **Meaning:** To retrieve a piece of content from a Space. -> * **Subject:** The Space from which content will be retrieved. -> * **Arguments *(0.9: `nb`)*:** -> * `cid`: The CID of the content which will be retrieved. -> * **Receipt:** [TBD, but must provide instructions to access the data (using HTTP?) without further UCAN authorization.] +## Amend command *(ability)*: Get Blob -The Space may then delegate this to another Principal to give them authority to access the Space's content. Typically, this will not be done directly (though it may), but indirectly through an Account and an Agent: the Space will delegate all of its capability to an Account, which will delegate all of *its* capability to an Agent when it logs in. Then the Agent (ie, the logged-in customer) can share access to the content as they see fit. +We will modify the [Get Blob](https://github.com/storacha/specs/blob/main/w3-blob.md#get-blob) UCAN command *(or in UCAN 0.9, ability)*. -## A new DID method: `did:bearer` +First, we update the original spec to reality. Where the spec names the command `/space/content/get/blob/0/1`, we have implemented it in UCAN 0.9 as `space/blob/get/0/1`. We will continue to use that name, and use that name in this document, as it provides a better authorization hierarchy—specifically, the Space can grant a Principal `space/blob/*`. -Sometimes, it's impractical to delegate retrieval to every entity which may want to get the content from a Space. For instance, ordinary users browsing a website cannot reasonably acquire a delegation giving them access to each image on the page. The images must be bound to the Space's billing account, and should be rate-limitable to avoid excessive charges and abuse, but also must be accessible with a simple HTTP GET. +Second, we will add an additional argument, `token`. This represents the bearer token when invoked through the Gateway, or something similar. It can be `null` or missing, and will be in existing uses of the command. -Traditional content delivery networks often use a **token** for analogous purposes. A CDN customer receives a token from the service which they attach to the URLs of their assets. When those URLs are fetched, the CDN uses the token to charge the correct customer and to allow the customer to shut off abusive access. The token is a bearer token: simply knowing and presenting the token is enough to authorize the HTTP request. +Note that the command has a `/0/1` suffix. This appears to be included to mark it as unstable. This proposal does not include removing the suffix, though that will likely make sense to do soon. -The `did:bearer` method provides a similar scheme within a DID, which can then be applied to UCAN. The DID includes a token, and anyone who presents that token is by definition authenticated as that DID. This allows for a simple authorization bridge from an HTTP bearer token scheme to a UCAN delegation chain. +```ts +type GetBlob = { + /** "Retrieve details about a Blob from a Space." */ + cmd: "/space/blob/get/0/1" // (0.9: "space/blob/get/0/1") -### Format + /** + * The Space which contains the Blob. This Space will be charged egress fees + * if the Blob data is actually retrieved by way of this invocation. + */ + sub: SpaceDID -The format for the `did:bearer` method conforms to the [DID specification](https://www.w3.org/TR/did-core/). It consists of the `did:bearer` prefix, followed by the percent-encoded token. The ABNF for the format is described below, using the ABNF syntax described in [RFC 5234](https://www.rfc-editor.org/rfc/rfc5234). The definition of `idchar` is found in the [DID syntax](https://www.w3.org/TR/did-core/#did-syntax). + args: { + /** The multihash digest of the Blob to look up. */ + digest: Uint8Array -```rfc5234 -did-bearer-format = "did:bearer:" idchar + /** The token, if any, used for this request. */ + token?: string | null + } +} ``` -### Operations - -The following section outlines the DID operations for the `did:bearer` method. - -#### Create +The receipt type remains [unchanged](https://github.com/storacha/specs/blob/main/w3-blob.md#get-blob-receipt). -Creating a `did:bearer` value begins with a token (a string) which it should represent. +### Existing uses -1. Percent-encode the token value, as specified in [RFC 3986 Section 2.1](https://www.rfc-editor.org/rfc/rfc3986#section-2.1). -2. Prepend the result with `did:bearer:`. +The existing use of this command is to read metadata about the Blob, specifically the size. For instance the Console uses this to display the size of a file Shard in the UI. This use remains unchanged. The `token` should be left missing, or set to `null`. -For example, the token `abc$*)123` would be represented by the DID `did:bearer:abc%24%2a%29123`. Anyone presenting the token `abc$*)123` by any means a service accepts may be considered by that service to be the [DID subject](https://www.w3.org/TR/did-core/#dfn-did-subjects) of `did:bearer:abc%24%2a%29123`. +## Making content retrievable with a token -The DID Document for a `did:bearer` DID is extremely simple. A `did:bearer` DID is associated with no cryptographic keys. It is not capable of signing. All authentication occurs out of band, so it has no verification methods. The DID Document is therefore trivially generated from the DID. An example is given below that expands the DID `did:bearer:abcde12345` into its associated DID Document: +The Gateway itself will be authorized to `/space/blob/get/0/1` by delegation. This signals that the Gateway may serve the data. Typically, this delegation is performed by an Agent within a Storacha Client, who has received the authority to `/space/blob/get/0/1` from the Space by way of logging into an Account—thus, the delegation chain flows Space → Account → Agent → Gateway. -```json +```json5 +// UCAN 1.0 { - "@context": "https://www.w3.org/ns/did/v1", - "id": "did:bearer:abcde12345" + "iss": "did:key:zAgent", + "aud": "did:web:ipfs.w3s.link", + "sub": "did:key:zSpace", + "cmd": "/space/blob/get/0/1", + "pol": [ + ["==", ".token", "abc123def456"], + ], + "nonce": …, + "exp": … } ``` -#### Read - -Reading a `did:bearer` value is a matter of deterministically expanding the value to a DID Document. This process is described above in [Create](#create). - -#### Update - -This DID Method does not support updating the DID Document. +```json5 +// UCAN 0.9 +{ + "iss": "did:key:zAgent", + "aud": "did:web:ipfs.w3s.link", + "att": [ + { + "with": "did:key:zSpace", + "can": "space/blob/get/0/1", + "nb": { + "token": "abc123def456" + }, + } + ], + "nnc": …, + "exp": …, + "prf": […] +} +``` -#### Deactivate +Such delegations SHOULD NOT specify a `digest`, as this will be the digest of a Shard Blob, not the file which it contains part or all of. We are not (currently) supporting a use case in which the files in a Space have different access permissions from one another: all permissions should apply to an entire Space. -This DID Method does not support deactivating the DID Document. +The UCAN 1.0 version of these delegations SHOULD specify a top-level policy of the form `["==", ".token", ]`. This prevents customers from shooting themselves in the foot by getting too fancy with their tokens, which is not a supported use case, or one we see value in. Such use cases may be considered for support in the future. -## Making content retrievable with a token +## Making content retrievable *without* a token -A UCAN delegation is used to allow token access to content. Typically, this is performed by an Agent within a Storacha Client, who has the authority to `/space/content/retrieve` from the Space by way of logging into an Account—thus, the delegation chain flows Space → Account → Agent → Token. +The customer may also authorize the Gateway to serve requests with no token. Separately, the Gateway will treat these requests differently: it will rate-limit them and not charge for egress. However, that has no bearing on the UCAN semantics of the delegation. ```json5 // UCAN 1.0 { "iss": "did:key:zAgent", - "aud": "did:bearer:abcde12345", + "aud": "did:web:ipfs.w3s.link", "sub": "did:key:zSpace", - "cmd": "/space/content/retrieve", + "cmd": "/space/blob/get/0/1", "pol": [ - ["==", ".cid", "bafy...7pcu"], + ["==", ".token", null], ], "nonce": …, "exp": … @@ -102,13 +125,13 @@ A UCAN delegation is used to allow token access to content. Typically, this is p // UCAN 0.9 { "iss": "did:key:zAgent", - "aud": "did:bearer:abcde12345", + "aud": "did:web:ipfs.w3s.link", "att": [ { "with": "did:key:zSpace", - "can": "space/content/retrieve", + "can": "space/blob/get/0/1", "nb": { - "cid": "bafy...7pcu" + "token": null }, } ], @@ -118,42 +141,67 @@ A UCAN delegation is used to allow token access to content. Typically, this is p } ``` -The delegation must be available to the Executor at invocation-time. Since the Invoker will be using a token and not speaking UCAN, they will not be able to deliver the proof, so the Executor must have access to it in a store. The Client should therefore invoke `access/delegate` (UCAN 1.0 equivalent TBD) to store the delegation with Storacha. - -## The Gateway - -To retrieve a piece of content over HTTP, the Retriever will connect to a Gateway. At first, there will be exactly one Gateway: the existing Storacha Gateway (also known as the web3.storage Gateway). Later, as Storage Nodes become more sophisticated, they may decide to run their own Gateways, offering their own direct egress to customers. - -The Gateway will offer an HTTP endpoint. Currently, the Storacha Gateway's endpoint takes the form of `https://.ipfs.w3s.link/`. The Gateway will accept a token as part of the URL. To serve the request, the Gateway will perform the following steps: +The UCAN 1.0 version of these delegations MUST specify an explicit `null` policy for the token, rather than include no policy. The Gateway MUST refuse to serve a request on the authority of an unchecked `token` field. This prevents a malicious actor from generating unexpected egress-incurring requests by including an invented token which will trigger the egress billing. -1. Look up the Location Commitment for the given CID. If not found, respond with 404 Not Found. -2. If there is no token given, serve the request through the standard rate-limiter. (This behavior is likely to change in the future.) -3. If there is a token given (eg, `abcde12345`), build its corresponding DID (eg, `did:bearer:abcde12345`). -4. Look up all delegations with the token DID as audience. -5. Attempt to prove the ability to invoke `/space/content/retrieve` on the Space listed in Location Commitment, with the given CID. If more than one Location Commitment is found, attempt each in turn: a CID may be stored in multiple Spaces, and the token may be able to retrieve the content through one and not another. -6. If no such proof chain is available, respond with 401 Unauthorized. [or 404 Not Found?] -7. If a proof chain is found, execute the invocation on the Executor. -8. Using the information in the receipt, fetch the content and proxy it as the response. +The delegation must be available to the Gateway at request-time, so it must have access to it in a store. The authorizing Client should therefore invoke `access/delegate` (UCAN 1.0 equivalent TBD) to store the delegation with Storacha. -## The Executor - -The [Executor](https://github.com/ucan-wg/invocation?tab=readme-ov-file#executor) of the `/space/content/retrieve` invocation will be the same service as the Gateway. [It would also be possible for Storage Nodes to run their own Executors for the central Storacha Gateway to use, but this would require more sophistication from the Storage Nodes without the benefit to them of offering their own egress service.] - -To execute the invocation, the Executor will perform the following steps: - -1. Validate the proof chain. -2. [Look up the Location Commitment again? :/ Should it become an argument?] -3. [Charge the Space for the egress.] -4. [Produce a suitable receipt.] - -## [To Come] +## The Gateway -* Rather than serve non-token content rate-limited by default, require a delegation of `/space/content/retrieve` to some DID representing "anyone". -* Bitswap should execute a `/space/content/retrieve` as well to respond to requests. Bitswap should be authorized by delegating to some DID representing Bitswap/Hoverboard. +To retrieve a piece of content over HTTP, the HTTP client will connect to the existing Storacha Gateway (also known as the web3.storage Gateway). Later, other entities (either Storage Providers or customers themselves, or entirely separate parties) may implement their own Gateways, which may or may not follow the behaviors here. This specification applies only to the Storacha Gateway itself. + +The Gateway will offer an HTTP endpoint. Currently, the Storacha Gateway's endpoint takes the form of `https://.ipfs.w3s.link/`. (Note that any query params are currently ignored by the Gateway, and could only have any effect on the client, such as JavaScript in a served HTML document reading them.) The Gateway will now accept a token as part of the URL, as an `authToken` query param, which may be among any other (ignored) query params. It will also recognize a token provided in the request headers as `Authorization: Bearer `. + +To serve the request, the Gateway will perform the following steps: + +1. Look up the Location Commitment(s) for the given root CID from the Indexing Service. + * If not found, fall through to the Gateway's prior retrieval behavior. This is a deprecated branch of behavior to serve data stored before the advent of the Indexing Service, which will launch at the same time as these changes. Only data added after these changes are launched is subject to this authorization scheme. +2. Get the set of unique Spaces from those Location Commitments. (There will usually be one Location Commitment, with one Space.) +3. Repeat with each Space in any order, stopping if a successful response is produced: + 1. Look up delegations in the store where the audience is `did:web:ipfs.w3s.link` and the subject is the Space. + 2. Add them to the collection. + 3. Create an invocation: + + ```json5 + // Shown as UCAN 1.0 + { + "iss": "did:web:ipfs.w3s.link", + "aud": "did:web:ipfs.w3s.link", + // The Space + "sub": "did:key:zSpace", + "cmd": "/space/blob/get/0/1", + "args": { + // The `authToken` in the HTTP request, or... + "token": "the-token", + // null if none was given. + "token": null + }, + // Random + "nonce": { "/": "abc123" }, + "exp": null, + "prf": [ + // The found delegations + ] + } + ``` + + 4. Attempt to execute the invocation. + * If the invocation fails to authorize, respond with 401 Unauthorized. + * If authorization succeeds, execute as: + 1. Perform the existing Gateway file retrieval, responding with the file data. + 2. Asynchronously, inform the Accounting Service of the egress, to be billed to the Space. + +After deciding that a given token may retrieve a certain root CID from a certain Space, the Gateway MAY cache the decision by `(root-cid, token)` for some reasonable duration. The Gateway SHOULD NOT cache authorization *failures*, as the customer may be in the process of storing the delegation. + +## Hoverboard (Bitswap) + +Hoverboard does not handle tokens, but must respect the same authorization semantics as the Gateway does with a non-token request. Because there is no shareable service to perform that auth, Hoverboard will need to reimplement a simpler version of the same logic, only caring about the no-token execution path. This may be improved in the future, but some version of this is required to correctly implement private data storage. + +## Potential future work + +We may eventually create an object-store service. This service will sit between the Gateway and the Storage Nodes. It will speak UCAN, while the Storage Nodes will continue to speak only HTTP byte-range requests. This service will be the Executor of these `/space/blob/get/0/1` invocations, rather than the Gateway. This will allow other Principals to use the same mechanisms to retrieve Blobs, including Hoverboard as well as customers storing and retrieving Blobs directly. It would return a means to fetch the actual data in the receipt for the invocation, such as an HTTP URL. ## Open questions -* What does the `/space/content/retrieve` receipt look like? -* Can you enumerate the contents of a space? Is that in scope here? -* What does a Gateway URLs with a token look like? -* What can be cached? This seems relatively cacheable, but we should be explicit in the design to make sure we're on a suitable path. This process needs to be *fast*, at least once the cache is warm. +* When auth fails, should we return 401 Unauthorized, or 404 Not Found to mask that the data exists? +* Should `token` be `gatewayToken`? Or `authToken`? Is `token` too broad a term to be in the args of that invocation? +* What if we find a Location Commitment, but it has no Space on it (since [it's optional](https://github.com/storacha/indexing-service/pull/18))? From 880327d6b433b96fe565901cf23375b2b462471d Mon Sep 17 00:00:00 2001 From: Petra Jaros Date: Tue, 15 Oct 2024 15:50:00 -0400 Subject: [PATCH 3/3] Egress proposal, round 3 --- rfc/egress-with-ucan.md | 134 ++++++++++++++++++++++++---------------- 1 file changed, 82 insertions(+), 52 deletions(-) diff --git a/rfc/egress-with-ucan.md b/rfc/egress-with-ucan.md index de1bd1f..4e61c3c 100644 --- a/rfc/egress-with-ucan.md +++ b/rfc/egress-with-ucan.md @@ -22,54 +22,69 @@ To make content available as on a traditional CDN, an Agent acts on authority of This process does *not* include making and tracking Location Commitments. A Location Commitment is an attestation by a Storage Node that it holds a particular piece of content on behalf of a particular Space, and that it can provide it. In this proposal, we assume such a system already exists. -## Amend command *(ability)*: Get Blob +## New command *(ability)*: `/space/content/serve` *(`space/content/serve`)* -We will modify the [Get Blob](https://github.com/storacha/specs/blob/main/w3-blob.md#get-blob) UCAN command *(or in UCAN 0.9, ability)*. +We add a new command *(UCAN 0.9: ability)* to serve content, named `/space/content/serve` *(`space/content/serve`)*. This command is used to validate the authority to serve content. Notably, it will not (yet) actually *serve* the content, though it may in the future. That is, a successful response will not contain the means to get the content. The Invoker of this command will (for now, at least) always be an entity which *can* get any content in the network (such as the Gateway), but would like to determine whether it *should*. -First, we update the original spec to reality. Where the spec names the command `/space/content/get/blob/0/1`, we have implemented it in UCAN 0.9 as `space/blob/get/0/1`. We will continue to use that name, and use that name in this document, as it provides a better authorization hierarchy—specifically, the Space can grant a Principal `space/blob/*`. +In the future, this could be invoked from external clients to actually get content, in essence porting the [Gateway spec](https://specs.ipfs.tech/http-gateways/path-gateway/) to UCAN. -Second, we will add an additional argument, `token`. This represents the bearer token when invoked through the Gateway, or something similar. It can be `null` or missing, and will be in existing uses of the command. - -Note that the command has a `/0/1` suffix. This appears to be included to mark it as unstable. This proposal does not include removing the suffix, though that will likely make sense to do soon. +### Invocation ```ts -type GetBlob = { - /** "Retrieve details about a Blob from a Space." */ - cmd: "/space/blob/get/0/1" // (0.9: "space/blob/get/0/1") +type ServeContent = { + /** + * "Serve content owned by the subject Space." + * + * A Principal who may `/space/content/serve` is permitted to serve any + * content owned by the Space, in the manner of an [IPFS Gateway]. The + * content may be a Blob stored by a Storage Node, or indexed content stored + * within such Blobs (ie, Shards). + * + * Note that the args do not currently specify *what* content should be + * served. Invoking this command does not currently *serve* the content in + * any way, but merely validates the authority to do so. Currently, the + * entirety of a Space must use the same authorization, thus the content does + * not need to be identified. In the future, this command may refer directly + * to a piece of content by CID. + * + * [IPFS Gateway]: https://specs.ipfs.tech/http-gateways/path-gateway/ + */ + cmd: "/space/content/serve" // (0.9: "space/content/serve") /** - * The Space which contains the Blob. This Space will be charged egress fees - * if the Blob data is actually retrieved by way of this invocation. + * The Space which contains the content. This Space will be charged egress + * fees if content is actually retrieved by way of this invocation. */ sub: SpaceDID args: { - /** The multihash digest of the Blob to look up. */ - digest: Uint8Array - - /** The token, if any, used for this request. */ + /** The authorization token, if any, used for this request. */ token?: string | null } } ``` -The receipt type remains [unchanged](https://github.com/storacha/specs/blob/main/w3-blob.md#get-blob-receipt). +### Result -### Existing uses +```ts +type ServeContentResult = Result<{}, ServeContentError> -The existing use of this command is to read metadata about the Blob, specifically the size. For instance the Console uses this to display the size of a file Shard in the UI. This use remains unchanged. The `token` should be left missing, or set to `null`. +type ServeContentError = { + message: string +} +``` ## Making content retrievable with a token -The Gateway itself will be authorized to `/space/blob/get/0/1` by delegation. This signals that the Gateway may serve the data. Typically, this delegation is performed by an Agent within a Storacha Client, who has received the authority to `/space/blob/get/0/1` from the Space by way of logging into an Account—thus, the delegation chain flows Space → Account → Agent → Gateway. +The Gateway itself will be authorized to `/space/content/serve` by delegation. This signals that the Gateway may serve the data. Typically, this delegation is performed by an Agent within a Storacha Client, who has received the authority to `/space/content/serve` from the Space by way of logging into an Account—thus, the delegation chain flows Space → Account → Agent → Gateway. ```json5 // UCAN 1.0 { "iss": "did:key:zAgent", - "aud": "did:web:ipfs.w3s.link", + "aud": "did:web:w3s.link", "sub": "did:key:zSpace", - "cmd": "/space/blob/get/0/1", + "cmd": "/space/content/serve", "pol": [ ["==", ".token", "abc123def456"], ], @@ -82,11 +97,11 @@ The Gateway itself will be authorized to `/space/blob/get/0/1` by delegation. Th // UCAN 0.9 { "iss": "did:key:zAgent", - "aud": "did:web:ipfs.w3s.link", + "aud": "did:web:w3s.link", "att": [ { "with": "did:key:zSpace", - "can": "space/blob/get/0/1", + "can": "space/content/serve", "nb": { "token": "abc123def456" }, @@ -98,10 +113,6 @@ The Gateway itself will be authorized to `/space/blob/get/0/1` by delegation. Th } ``` -Such delegations SHOULD NOT specify a `digest`, as this will be the digest of a Shard Blob, not the file which it contains part or all of. We are not (currently) supporting a use case in which the files in a Space have different access permissions from one another: all permissions should apply to an entire Space. - -The UCAN 1.0 version of these delegations SHOULD specify a top-level policy of the form `["==", ".token", ]`. This prevents customers from shooting themselves in the foot by getting too fancy with their tokens, which is not a supported use case, or one we see value in. Such use cases may be considered for support in the future. - ## Making content retrievable *without* a token The customer may also authorize the Gateway to serve requests with no token. Separately, the Gateway will treat these requests differently: it will rate-limit them and not charge for egress. However, that has no bearing on the UCAN semantics of the delegation. @@ -110,9 +121,9 @@ The customer may also authorize the Gateway to serve requests with no token. Sep // UCAN 1.0 { "iss": "did:key:zAgent", - "aud": "did:web:ipfs.w3s.link", + "aud": "did:web:w3s.link", "sub": "did:key:zSpace", - "cmd": "/space/blob/get/0/1", + "cmd": "/space/content/serve", "pol": [ ["==", ".token", null], ], @@ -125,11 +136,11 @@ The customer may also authorize the Gateway to serve requests with no token. Sep // UCAN 0.9 { "iss": "did:key:zAgent", - "aud": "did:web:ipfs.w3s.link", + "aud": "did:web:w3s.link", "att": [ { "with": "did:key:zSpace", - "can": "space/blob/get/0/1", + "can": "space/content/serve", "nb": { "token": null }, @@ -141,34 +152,59 @@ The customer may also authorize the Gateway to serve requests with no token. Sep } ``` -The UCAN 1.0 version of these delegations MUST specify an explicit `null` policy for the token, rather than include no policy. The Gateway MUST refuse to serve a request on the authority of an unchecked `token` field. This prevents a malicious actor from generating unexpected egress-incurring requests by including an invented token which will trigger the egress billing. +The UCAN 1.0 version of these delegations SHOULD specify an explicit `null` policy for the token, rather than include no policy: no policy would match both a `null`/missing token *and* any token, which almost certainly not intended. + +The Gateway MAY refuse to serve a request on the authority of an unchecked `token` field. This prevents a malicious actor from generating unexpected egress-incurring requests by including an invented token which will trigger the egress billing. The primary interface for creating these delegations SHOULD provide no means to create such a delegation; given that, the Gateway does not strictly need to check this, as the customer would have to go through some effort to make such a delegation. The delegation must be available to the Gateway at request-time, so it must have access to it in a store. The authorizing Client should therefore invoke `access/delegate` (UCAN 1.0 equivalent TBD) to store the delegation with Storacha. ## The Gateway -To retrieve a piece of content over HTTP, the HTTP client will connect to the existing Storacha Gateway (also known as the web3.storage Gateway). Later, other entities (either Storage Providers or customers themselves, or entirely separate parties) may implement their own Gateways, which may or may not follow the behaviors here. This specification applies only to the Storacha Gateway itself. +To retrieve a piece of content over HTTP, the HTTP client will connect to the existing [Storacha Gateway](https://w3s.link/) (also known as the web3.storage Gateway). The Gateway offers an HTTP interface which conforms to the [IPFS Subdomain Gateway Spec](https://specs.ipfs.tech/http-gateways/subdomain-gateway/). We extend that spec as follows: + +### Request Headers + +#### `Authorization` (request header) -The Gateway will offer an HTTP endpoint. Currently, the Storacha Gateway's endpoint takes the form of `https://.ipfs.w3s.link/`. (Note that any query params are currently ignored by the Gateway, and could only have any effect on the client, such as JavaScript in a served HTML document reading them.) The Gateway will now accept a token as part of the URL, as an `authToken` query param, which may be among any other (ignored) query params. It will also recognize a token provided in the request headers as `Authorization: Bearer `. +Optional. Provides a means of authorizing the content read. Follows the normal [HTTP `Authorization` spec](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization). The authentication scheme must be [`Bearer`](https://datatracker.ietf.org/doc/html/rfc6750). The token is the authorization token the Gateway should use. + +```text +Authorization: Bearer abc123def456 +``` + +### Request Query Parameters + +#### `auth-token` (request query parameter) + +Optional. Provides a means of authorizing the content read. The value is the authorization token the Gateway should use. + +```text +https://bafybeidd2gyhagleh47qeg77xqndy2qy3yzn4vkxmk775bg2t5lpuy7pcu.ipfs.w3s.link/?auth-token=abc123def456 +``` + +Requesting clients SHOULD NOT provide both an `Authorization` header and an `auth-token` query parameter. The Gateway MAY use either token if it does, leading to undefined behavior. + +### Serving a request To serve the request, the Gateway will perform the following steps: 1. Look up the Location Commitment(s) for the given root CID from the Indexing Service. - * If not found, fall through to the Gateway's prior retrieval behavior. This is a deprecated branch of behavior to serve data stored before the advent of the Indexing Service, which will launch at the same time as these changes. Only data added after these changes are launched is subject to this authorization scheme. -2. Get the set of unique Spaces from those Location Commitments. (There will usually be one Location Commitment, with one Space.) -3. Repeat with each Space in any order, stopping if a successful response is produced: - 1. Look up delegations in the store where the audience is `did:web:ipfs.w3s.link` and the subject is the Space. +2. If none are found, respond with `404 Not Found`. +3. Note if any Location Commitments were found with no Spaces. (If so, these are from before these changes, and mean we should fall back to the previous behavior later.) +4. Get the set of unique Spaces from those Location Commitments. (There will usually be one Location Commitment, with one Space.) +5. Repeat with each Space in any order, stopping if a successful response is produced: + 1. Look up delegations in the store where the audience is `did:web:w3s.link` and the subject is the Space. 2. Add them to the collection. 3. Create an invocation: ```json5 // Shown as UCAN 1.0 { - "iss": "did:web:ipfs.w3s.link", - "aud": "did:web:ipfs.w3s.link", + "iss": "did:web:w3s.link", + "aud": "did:web:w3s.link", // The Space "sub": "did:key:zSpace", - "cmd": "/space/blob/get/0/1", + "cmd": "/space/content/serve", "args": { // The `authToken` in the HTTP request, or... "token": "the-token", @@ -185,9 +221,11 @@ To serve the request, the Gateway will perform the following steps: ``` 4. Attempt to execute the invocation. - * If the invocation fails to authorize, respond with 401 Unauthorized. + * If the invocation fails to authorize, + * If any Location Commitments were found with no Spaces attached, Perform the existing Gateway file retrieval *with no token*, responding with the file data. + * Otherwise, respond with `404 Not Found`. * If authorization succeeds, execute as: - 1. Perform the existing Gateway file retrieval, responding with the file data. + 1. Perform the existing Gateway file retrieval, passing along the token, responding with the file data. 2. Asynchronously, inform the Accounting Service of the egress, to be billed to the Space. After deciding that a given token may retrieve a certain root CID from a certain Space, the Gateway MAY cache the decision by `(root-cid, token)` for some reasonable duration. The Gateway SHOULD NOT cache authorization *failures*, as the customer may be in the process of storing the delegation. @@ -196,12 +234,4 @@ After deciding that a given token may retrieve a certain root CID from a certain Hoverboard does not handle tokens, but must respect the same authorization semantics as the Gateway does with a non-token request. Because there is no shareable service to perform that auth, Hoverboard will need to reimplement a simpler version of the same logic, only caring about the no-token execution path. This may be improved in the future, but some version of this is required to correctly implement private data storage. -## Potential future work - -We may eventually create an object-store service. This service will sit between the Gateway and the Storage Nodes. It will speak UCAN, while the Storage Nodes will continue to speak only HTTP byte-range requests. This service will be the Executor of these `/space/blob/get/0/1` invocations, rather than the Gateway. This will allow other Principals to use the same mechanisms to retrieve Blobs, including Hoverboard as well as customers storing and retrieving Blobs directly. It would return a means to fetch the actual data in the receipt for the invocation, such as an HTTP URL. - -## Open questions - -* When auth fails, should we return 401 Unauthorized, or 404 Not Found to mask that the data exists? -* Should `token` be `gatewayToken`? Or `authToken`? Is `token` too broad a term to be in the args of that invocation? -* What if we find a Location Commitment, but it has no Space on it (since [it's optional](https://github.com/storacha/indexing-service/pull/18))? +The DID representing Hoverboard is TBD. \ No newline at end of file