Skip to content

Commit

Permalink
feat: add head command to verify CAR is in dest bucket (#2)
Browse files Browse the repository at this point in the history
* feat: add `head` command to verify CAR is in dest bucket

Check the head response for a car cid at a bucket endpoint. Logs errors if http status code is not 200 or `content-length` header is `0`

```shell
sha256it head -r auto -b carpark --endpoint https://<ACCOUNT_ID>.r2.cloudflarestorage.com [car cid]
```

- `DEST_ACCESS_KEY_ID` and `DEST_SECRET_ACCESS_KEY` must be set in env
- `--endpoint` is the s3 compatible api url

**output**

```json
{"bucket":"carpark","cid":{"/":"[car cid]"},"key":"[car cid]/[car cid].car","length":10862134,"region"="auto","status":200}
```

License: MIT
Signed-off-by: Oli Evans <[email protected]>

* chore: typo

License: MIT
Signed-off-by: Oli Evans <[email protected]>

---------

Signed-off-by: Oli Evans <[email protected]>
  • Loading branch information
olizilla authored Sep 11, 2023
1 parent bc0567d commit 2da56d2
Show file tree
Hide file tree
Showing 2 changed files with 161 additions and 4 deletions.
96 changes: 95 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,97 @@
# sha256it

Lambda that calculates sha256 of a key of a CAR file in S3.
Tools to calculates sha256 CAR CID for an object in S3 and copy it to R2 _(or other S3 compatible api)_

Lambdas to hash and copy CARS are deployed via SST and seed.run.

The `sha256it` cli uses those lambdas to do the work.

## Getting started

The repo contains the infra deployment code and a cli to use it.

```
├── packages
| ├── cli - sha256it cli to hash, copy, and verify cars
| └── functions - lambdas for hashing and copying CARs
└── stacks - sst and aws cdk code to deploy the lambdas
```

To work on this codebase **you need**:

- Node.js >= v18 (prod env is node v18)
- Install the deps with `npm i`

You can then run the tests locally with `npm test`.

## Usage

Commands for the cli, defined in the `packages/cli` directory.

### list

Fetch a list of keys, partitioned by starting key with the `--start-after` flag

```shell
sha256it list --region us-west-2 --bucket bucketname --prefix complete \
> keys.ndjson
```

- `ACCESS_KEY_ID` and `SECRET_ACCESS_KEY` must be set in env

**output**

```json
{"region":"us-west-2","bucket":"[bucket-name]","key":"complete/[root cid].car"}
```

### hash

Hash a list of keys.

```shell
sha256it hash --endpoint https://???.lambda-url.us-west-2.on.aws/ \
< keys.ndjson \
> hashed.ndjson
```

- `--endpoint` is the function url of the `hash` lambda

**output**

```json
{"bucket":"[bucket name]","cid":{"/":"[car cid]"},"key":"complete/[root cid].car","region":"us-west-2"}
```

### copy

Copy CARs from source to dest.
_note: Get the function url of the copy lambda from aws console._

```shell
sha256it copy --endpoint https://!!!.lambda-url.us-west-2.on.aws/ \
< hashed.ndjson \
> copied.ndjson
```

- `--endpoint` is the function url of the `copy` lambda

### head

Check the head response for a car cid at a bucket endpoint

```shell
sha256it head --region auto --bucket carpark --endpoint https://<ACCOUNT_ID>.r2.cloudflarestorage.com \
< copied.ndjson \
> verified.ndjson
```

- `DEST_ACCESS_KEY_ID` and `DEST_SECRET_ACCESS_KEY` must be set in env
- `--endpoint` is the s3 compatible api url


**output**

```json
{"bucket":"carpark","cid":{"/":"[car cid]"},"key":"[car cid]/[car cid].car","length":10862134,"region":"auto","status":200}
```
69 changes: 66 additions & 3 deletions packages/cli/bin.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import fs from 'node:fs'
import { Readable, Writable } from 'node:stream'
import sade from 'sade'
import { S3Client, ListObjectsV2Command } from '@aws-sdk/client-s3'
import { S3Client, ListObjectsV2Command, HeadObjectCommand } from '@aws-sdk/client-s3'
import dotenv from 'dotenv'
import { Parse } from 'ndjson-web'
import * as dagJSON from '@ipld/dag-json'
Expand All @@ -29,8 +29,8 @@ cli
.option('-p, --prefix', 'Key prefix.')
.option('-s, --start-after', 'Start listing after this key.')
.action(async (/** @type {Record<string, string|undefined>} */ options) => {
const accessKeyId = notNully(process.env, 'ACCESS_KEY_ID', 'missing environment variable')
const secretAccessKey = notNully(process.env, 'SECRET_ACCESS_KEY', 'missing environment variable')
const accessKeyId = notNully(process.env, 'AWS_ACCESS_KEY_ID', 'missing environment variable')
const secretAccessKey = notNully(process.env, 'AWS_SECRET_ACCESS_KEY', 'missing environment variable')
const { endpoint } = options
const region = notNully(options, 'region', 'missing required option')
const bucket = notNully(options, 'bucket', 'missing required option')
Expand Down Expand Up @@ -201,4 +201,67 @@ const notNully = (obj, key, msg = 'unexpected null value') => {
return value
}

cli.command('head [carCid]')
.describe('Check head response for a car cid at a bucket endpoint')
.example('head bagbaieraaosiqlj4gia2lx35dl7ofqynk7xs47aijrnyrfwny6nea4i6srua -r auto -b carpark --endpoint https://<ACCOUNT_ID>.r2.cloudflarestorage.com')
.option('-e, --endpoint', 'Bucket endpoint. e.g https://<ACCOUNT_ID>.r2.cloudflarestorage.com')
.option('-r, --region', 'Bucket region', 'us-east-1') // "When using the S3 API, the region for an R2 bucket is auto. For compatibility with tools that do not allow you to specify a region, an empty value and us-east-1 will alias to the auto region."
.option('-b, --bucket', 'Bucket name.')
.action(async (/** @type {string|undefined} */ cidstr, /** @type {Record<string, string|undefined>} */ options) => {
const accessKeyId = notNully(process.env, 'DEST_ACCESS_KEY_ID', 'missing environment variable')
const secretAccessKey = notNully(process.env, 'DEST_SECRET_ACCESS_KEY', 'missing environment variable')
const endpoint = options.endpoint ?? notNully(process.env, 'DEST_ENDPOINT', 'missing required option')
const region = notNully(options, 'region', 'missing required option')
const bucket = notNully(options, 'bucket', 'missing required option')
const client = new S3Client({ region, endpoint, credentials: { accessKeyId, secretAccessKey }, })

if (cidstr) {
const res = await head(cidstr, bucket, region, client)
return console.log(dagJSON.stringify(res))
}

const source = /** @type {ReadableStream<Uint8Array>} */ (Readable.toWeb(process.stdin))
await source
.pipeThrough(/** @type {Parse<{ region?: string, bucket?: string, key: string, cid: { '/': string }, root?: { '/': string } }>} */ (new Parse()))
.pipeThrough(new Parallel(concurrency, item => head(item.cid['/'], bucket, region, client)))
.pipeThrough(new TransformStream({
transform: (item, controller) => controller.enqueue(`${dagJSON.stringify(item)}\n`)
}))
.pipeTo(Writable.toWeb(process.stdout))
})

/**
* Check head response for car cid at the given bucket endpoint
* Flag error if status not 200 or content-length: 0
*
* public url access is not enabled on carpark, so we must provide auth
*
* @param {string} cidstr
* @param {string} bucket
* @param {string} region
* @param {S3Client} client
*/
async function head (cidstr, bucket, region, client) {
const cid = Link.parse(cidstr)
const key = `${cid}/${cid}.car`
try {
const cmd = new HeadObjectCommand({ Bucket: bucket, Key: key })
const res = await client.send(cmd)
const length = res.ContentLength
const status = res.$metadata.httpStatusCode
if (status === 200) {
if (length > 0) {
return { cid, region, bucket, key, status, length }
}
console.warn(`error: ${region}/${bucket}/${key} - content-length: 0`)
return { cid, region, bucket, key, status, length, error: `content-length: 0` }
}
console.warn(`error: ${region}/${bucket}/${key} - http status: ${status}`)
return { cid, region, bucket, key, status, error: `http status: ${status}` }
} catch (err) {
console.warn(`error: ${region}/${bucket}/${key}`, err.message ?? err)
return { cid, region, bucket, key, error: err.message ?? err }
}
}

cli.parse(process.argv)

0 comments on commit 2da56d2

Please sign in to comment.