Skip to content

Commit

Permalink
Egress proposal, round 2
Browse files Browse the repository at this point in the history
  • Loading branch information
Peeja committed Oct 11, 2024
1 parent e3c2839 commit a672e64
Showing 1 changed file with 135 additions and 87 deletions.
222 changes: 135 additions & 87 deletions rfc/egress-with-ucan.md
Original file line number Diff line number Diff line change
@@ -1,97 +1,120 @@
# Egress, authorized by UCAN

## Authors

- [Petra Jaros](https://github.com/Peeja), [Storacha Network](https://storacha.network/)

## Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).

## Goals

* We (Storacha) want to charge users for data egress when retrieving data through our Gateway.
* We (Storacha) want to charge customer for data egress when retrieving data through our Gateway.
* We want to provide the same free option we offer today, with a rate-limit.
* We want to continue to provide access through Bitswap, but make it optional.
* We eventually want to allow Storage Nodes to offer their own direct egress, bypassing our Gateway, and to charge the customer according to an agreement with them.
* We want to allow customers to make their data private: inaccessible without a token and inaccessible through Bitswap.

## Abstract

A proposed means for offering content egress from the Storacha Network which can be selectively authorized by UCAN, and which can track egress to bill appropriate customers. As usual within the Storacha Network, we begin from self-sovereignty: a Space definitionally has the authority to do anything with its content.

To make content available as on a traditional CDN, an Agent acts on authority of a Space to make that Space's contents, or some specific CID in it, available using a bearer token, an opaque, unguessable string. The Gateway then responds to requests which contain a token by validating the proof chain, finding the content, charging for egress, and proxying it to the requester.
To make content available as on a traditional CDN, an Agent acts on authority of a Space to authorize the Gateway to serve that Space's contents to anyone bearing a token, an opaque, unguessable string. The Gateway then responds to requests which contain a token by validating the proof chain, finding the content, charging for egress, and proxying it to the requester.

This process does *not* include making and tracking Location Commitments. A Location Commitment is an attestation by a Storage Node that it holds a particular piece of content on behalf of a particular Space, and that it can provide it. In this proposal, we assume such a system already exists.

## A new command *(ability)*: `/space/content/retrieve` *(`space/content/retrieve`)*

We will add a new UCAN command *(or in UCAN 0.9, ability)* to our system, called `/space/content/retrieve` *(or in UCAN 0.9, `space/content/retrieve`)*.

> **Command:** `/space/content/retrieve` *(0.9: `space/content/retrieve`)*
>
> * **Meaning:** To retrieve a piece of content from a Space.
> * **Subject:** The Space from which content will be retrieved.
> * **Arguments *(0.9: `nb`)*:**
> * `cid`: The CID of the content which will be retrieved.
> * **Receipt:** [TBD, but must provide instructions to access the data (using HTTP?) without further UCAN authorization.]
## Amend command *(ability)*: Get Blob

The Space may then delegate this to another Principal to give them authority to access the Space's content. Typically, this will not be done directly (though it may), but indirectly through an Account and an Agent: the Space will delegate all of its capability to an Account, which will delegate all of *its* capability to an Agent when it logs in. Then the Agent (ie, the logged-in customer) can share access to the content as they see fit.
We will modify the [Get Blob](https://github.com/storacha/specs/blob/main/w3-blob.md#get-blob) UCAN command *(or in UCAN 0.9, ability)*.

## A new DID method: `did:bearer`
First, we update the original spec to reality. Where the spec names the command `/space/content/get/blob/0/1`, we have implemented it in UCAN 0.9 as `space/blob/get/0/1`. We will continue to use that name, and use that name in this document, as it provides a better authorization hierarchy—specifically, the Space can grant a Principal `space/blob/*`.

Sometimes, it's impractical to delegate retrieval to every entity which may want to get the content from a Space. For instance, ordinary users browsing a website cannot reasonably acquire a delegation giving them access to each image on the page. The images must be bound to the Space's billing account, and should be rate-limitable to avoid excessive charges and abuse, but also must be accessible with a simple HTTP GET.
Second, we will add an additional argument, `token`. This represents the bearer token when invoked through the Gateway, or something similar. It can be `null` or missing, and will be in existing uses of the command.

Traditional content delivery networks often use a **token** for analogous purposes. A CDN customer receives a token from the service which they attach to the URLs of their assets. When those URLs are fetched, the CDN uses the token to charge the correct customer and to allow the customer to shut off abusive access. The token is a bearer token: simply knowing and presenting the token is enough to authorize the HTTP request.
Note that the command has a `/0/1` suffix. This appears to be included to mark it as unstable. This proposal does not include removing the suffix, though that will likely make sense to do soon.

The `did:bearer` method provides a similar scheme within a DID, which can then be applied to UCAN. The DID includes a token, and anyone who presents that token is by definition authenticated as that DID. This allows for a simple authorization bridge from an HTTP bearer token scheme to a UCAN delegation chain.
```ts
type GetBlob = {
/** "Retrieve details about a Blob from a Space." */
cmd: "/space/blob/get/0/1" // (0.9: "space/blob/get/0/1")

### Format
/**
* The Space which contains the Blob. This Space will be charged egress fees
* if the Blob data is actually retrieved by way of this invocation.
*/
sub: SpaceDID

The format for the `did:bearer` method conforms to the [DID specification](https://www.w3.org/TR/did-core/). It consists of the `did:bearer` prefix, followed by the percent-encoded token. The ABNF for the format is described below, using the ABNF syntax described in [RFC 5234](https://www.rfc-editor.org/rfc/rfc5234). The definition of `idchar` is found in the [DID syntax](https://www.w3.org/TR/did-core/#did-syntax).
args: {
/** The multihash digest of the Blob to look up. */
digest: Uint8Array

```rfc5234
did-bearer-format = "did:bearer:" idchar
/** The token, if any, used for this request. */
token?: string | null
}
}
```
### Operations

The following section outlines the DID operations for the `did:bearer` method.

#### Create
The receipt type remains [unchanged](https://github.com/storacha/specs/blob/main/w3-blob.md#get-blob-receipt).
Creating a `did:bearer` value begins with a token (a string) which it should represent.
### Existing uses
1. Percent-encode the token value, as specified in [RFC 3986 Section 2.1](https://www.rfc-editor.org/rfc/rfc3986#section-2.1).
2. Prepend the result with `did:bearer:`.
The existing use of this command is to read metadata about the Blob, specifically the size. For instance the Console uses this to display the size of a file Shard in the UI. This use remains unchanged. The `token` should be left missing, or set to `null`.
For example, the token `abc$*)123` would be represented by the DID `did:bearer:abc%24%2a%29123`. Anyone presenting the token `abc$*)123` by any means a service accepts may be considered by that service to be the [DID subject](https://www.w3.org/TR/did-core/#dfn-did-subjects) of `did:bearer:abc%24%2a%29123`.
## Making content retrievable with a token
The DID Document for a `did:bearer` DID is extremely simple. A `did:bearer` DID is associated with no cryptographic keys. It is not capable of signing. All authentication occurs out of band, so it has no verification methods. The DID Document is therefore trivially generated from the DID. An example is given below that expands the DID `did:bearer:abcde12345` into its associated DID Document:
The Gateway itself will be authorized to `/space/blob/get/0/1` by delegation. This signals that the Gateway may serve the data. Typically, this delegation is performed by an Agent within a Storacha Client, who has received the authority to `/space/blob/get/0/1` from the Space by way of logging into an Account—thus, the delegation chain flows Space → Account → Agent → Gateway.
```json
```json5
// UCAN 1.0
{
"@context": "https://www.w3.org/ns/did/v1",
"id": "did:bearer:abcde12345"
"iss": "did:key:zAgent",
"aud": "did:web:ipfs.w3s.link",
"sub": "did:key:zSpace",
"cmd": "/space/blob/get/0/1",
"pol": [
["==", ".token", "abc123def456"],
],
"nonce": …,
"exp":
}
```
#### Read

Reading a `did:bearer` value is a matter of deterministically expanding the value to a DID Document. This process is described above in [Create](#create).

#### Update

This DID Method does not support updating the DID Document.
```json5
// UCAN 0.9
{
"iss": "did:key:zAgent",
"aud": "did:web:ipfs.w3s.link",
"att": [
{
"with": "did:key:zSpace",
"can": "space/blob/get/0/1",
"nb": {
"token": "abc123def456"
},
}
],
"nnc": …,
"exp": …,
"prf": […]
}
```
#### Deactivate
Such delegations SHOULD NOT specify a `digest`, as this will be the digest of a Shard Blob, not the file which it contains part or all of. We are not (currently) supporting a use case in which the files in a Space have different access permissions from one another: all permissions should apply to an entire Space.
This DID Method does not support deactivating the DID Document.
The UCAN 1.0 version of these delegations SHOULD specify a top-level policy of the form `["==", ".token", <string>]`. This prevents customers from shooting themselves in the foot by getting too fancy with their tokens, which is not a supported use case, or one we see value in. Such use cases may be considered for support in the future.
## Making content retrievable with a token
## Making content retrievable *without* a token
A UCAN delegation is used to allow token access to content. Typically, this is performed by an Agent within a Storacha Client, who has the authority to `/space/content/retrieve` from the Space by way of logging into an Account—thus, the delegation chain flows Space → Account → Agent → Token.
The customer may also authorize the Gateway to serve requests with no token. Separately, the Gateway will treat these requests differently: it will rate-limit them and not charge for egress. However, that has no bearing on the UCAN semantics of the delegation.
```json5
// UCAN 1.0
{
"iss": "did:key:zAgent",
"aud": "did:bearer:abcde12345",
"aud": "did:web:ipfs.w3s.link",
"sub": "did:key:zSpace",
"cmd": "/space/content/retrieve",
"cmd": "/space/blob/get/0/1",
"pol": [
["==", ".cid", "bafy...7pcu"],
["==", ".token", null],
],
"nonce": …,
"exp":
Expand All @@ -102,13 +125,13 @@ A UCAN delegation is used to allow token access to content. Typically, this is p
// UCAN 0.9
{
"iss": "did:key:zAgent",
"aud": "did:bearer:abcde12345",
"aud": "did:web:ipfs.w3s.link",
"att": [
{
"with": "did:key:zSpace",
"can": "space/content/retrieve",
"can": "space/blob/get/0/1",
"nb": {
"cid": "bafy...7pcu"
"token": null
},
}
],
Expand All @@ -118,42 +141,67 @@ A UCAN delegation is used to allow token access to content. Typically, this is p
}
```
The delegation must be available to the Executor at invocation-time. Since the Invoker will be using a token and not speaking UCAN, they will not be able to deliver the proof, so the Executor must have access to it in a store. The Client should therefore invoke `access/delegate` (UCAN 1.0 equivalent TBD) to store the delegation with Storacha.

## The Gateway

To retrieve a piece of content over HTTP, the Retriever will connect to a Gateway. At first, there will be exactly one Gateway: the existing Storacha Gateway (also known as the web3.storage Gateway). Later, as Storage Nodes become more sophisticated, they may decide to run their own Gateways, offering their own direct egress to customers.

The Gateway will offer an HTTP endpoint. Currently, the Storacha Gateway's endpoint takes the form of `https://<cid>.ipfs.w3s.link/`. The Gateway will accept a token as part of the URL. To serve the request, the Gateway will perform the following steps:
The UCAN 1.0 version of these delegations MUST specify an explicit `null` policy for the token, rather than include no policy. The Gateway MUST refuse to serve a request on the authority of an unchecked `token` field. This prevents a malicious actor from generating unexpected egress-incurring requests by including an invented token which will trigger the egress billing.
1. Look up the Location Commitment for the given CID. If not found, respond with 404 Not Found.
2. If there is no token given, serve the request through the standard rate-limiter. (This behavior is likely to change in the future.)
3. If there is a token given (eg, `abcde12345`), build its corresponding DID (eg, `did:bearer:abcde12345`).
4. Look up all delegations with the token DID as audience.
5. Attempt to prove the ability to invoke `/space/content/retrieve` on the Space listed in Location Commitment, with the given CID. If more than one Location Commitment is found, attempt each in turn: a CID may be stored in multiple Spaces, and the token may be able to retrieve the content through one and not another.
6. If no such proof chain is available, respond with 401 Unauthorized. [or 404 Not Found?]
7. If a proof chain is found, execute the invocation on the Executor.
8. Using the information in the receipt, fetch the content and proxy it as the response.
The delegation must be available to the Gateway at request-time, so it must have access to it in a store. The authorizing Client should therefore invoke `access/delegate` (UCAN 1.0 equivalent TBD) to store the delegation with Storacha.
## The Executor

The [Executor](https://github.com/ucan-wg/invocation?tab=readme-ov-file#executor) of the `/space/content/retrieve` invocation will be the same service as the Gateway. [It would also be possible for Storage Nodes to run their own Executors for the central Storacha Gateway to use, but this would require more sophistication from the Storage Nodes without the benefit to them of offering their own egress service.]

To execute the invocation, the Executor will perform the following steps:

1. Validate the proof chain.
2. [Look up the Location Commitment again? :/ Should it become an argument?]
3. [Charge the Space for the egress.]
4. [Produce a suitable receipt.]

## [To Come]
## The Gateway
* Rather than serve non-token content rate-limited by default, require a delegation of `/space/content/retrieve` to some DID representing "anyone".
* Bitswap should execute a `/space/content/retrieve` as well to respond to requests. Bitswap should be authorized by delegating to some DID representing Bitswap/Hoverboard.
To retrieve a piece of content over HTTP, the HTTP client will connect to the existing Storacha Gateway (also known as the web3.storage Gateway). Later, other entities (either Storage Providers or customers themselves, or entirely separate parties) may implement their own Gateways, which may or may not follow the behaviors here. This specification applies only to the Storacha Gateway itself.
The Gateway will offer an HTTP endpoint. Currently, the Storacha Gateway's endpoint takes the form of `https://<root-cid>.ipfs.w3s.link/<optional-path-and-query-params>`. (Note that any query params are currently ignored by the Gateway, and could only have any effect on the client, such as JavaScript in a served HTML document reading them.) The Gateway will now accept a token as part of the URL, as an `authToken` query param, which may be among any other (ignored) query params. It will also recognize a token provided in the request headers as `Authorization: Bearer <token>`.

To serve the request, the Gateway will perform the following steps:

1. Look up the Location Commitment(s) for the given root CID from the Indexing Service.
* If not found, fall through to the Gateway's prior retrieval behavior. This is a deprecated branch of behavior to serve data stored before the advent of the Indexing Service, which will launch at the same time as these changes. Only data added after these changes are launched is subject to this authorization scheme.
2. Get the set of unique Spaces from those Location Commitments. (There will usually be one Location Commitment, with one Space.)
3. Repeat with each Space in any order, stopping if a successful response is produced:
1. Look up delegations in the store where the audience is `did:web:ipfs.w3s.link` and the subject is the Space.
2. Add them to the collection.
3. Create an invocation:

```json5
// Shown as UCAN 1.0
{
"iss": "did:web:ipfs.w3s.link",
"aud": "did:web:ipfs.w3s.link",
// The Space
"sub": "did:key:zSpace",
"cmd": "/space/blob/get/0/1",
"args": {
// The `authToken` in the HTTP request, or...
"token": "the-token",
// null if none was given.
"token": null
},
// Random
"nonce": { "/": "abc123" },
"exp": null,
"prf": [
// The found delegations
]
}
```

4. Attempt to execute the invocation.
* If the invocation fails to authorize, respond with 401 Unauthorized.
* If authorization succeeds, execute as:
1. Perform the existing Gateway file retrieval, responding with the file data.
2. Asynchronously, inform the Accounting Service of the egress, to be billed to the Space.

After deciding that a given token may retrieve a certain root CID from a certain Space, the Gateway MAY cache the decision by `(root-cid, token)` for some reasonable duration. The Gateway SHOULD NOT cache authorization *failures*, as the customer may be in the process of storing the delegation.

## Hoverboard (Bitswap)

Hoverboard does not handle tokens, but must respect the same authorization semantics as the Gateway does with a non-token request. Because there is no shareable service to perform that auth, Hoverboard will need to reimplement a simpler version of the same logic, only caring about the no-token execution path. This may be improved in the future, but some version of this is required to correctly implement private data storage.

## Potential future work

We may eventually create an object-store service. This service will sit between the Gateway and the Storage Nodes. It will speak UCAN, while the Storage Nodes will continue to speak only HTTP byte-range requests. This service will be the Executor of these `/space/blob/get/0/1` invocations, rather than the Gateway. This will allow other Principals to use the same mechanisms to retrieve Blobs, including Hoverboard as well as customers storing and retrieving Blobs directly. It would return a means to fetch the actual data in the receipt for the invocation, such as an HTTP URL.

## Open questions

* What does the `/space/content/retrieve` receipt look like?
* Can you enumerate the contents of a space? Is that in scope here?
* What does a Gateway URLs with a token look like?
* What can be cached? This seems relatively cacheable, but we should be explicit in the design to make sure we're on a suitable path. This process needs to be *fast*, at least once the cache is warm.
* Should we return 401 Unauthorized, or 404 Not Found to mask that the data exisits?
- Should `token` be `gatewayToken`? Or `authToken`? Is `token` too broad a term to be in the args of that invocation?
- What if we find a Location Commitment, but it has no Space on it (which is optional)?

0 comments on commit a672e64

Please sign in to comment.