Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Commit

Permalink
fix: dedupe batch writes (#2)
Browse files Browse the repository at this point in the history
We have duplicate multihash+car entries. We have to dedupe these from
batch write commands or dynamo rejects it.

Conceptually, it is redundant to track multiple instances of the same
block appearing in the same CAR so it is fine and good to dedupe here

License: MIT

Signed-off-by: Oli Evans <[email protected]>
  • Loading branch information
olizilla authored Sep 1, 2023
1 parent 33bc598 commit 6cd9c55
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
2 changes: 2 additions & 0 deletions write-cli.js
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#!/usr/bin/env node

import { DynamoDBClient } from '@aws-sdk/client-dynamodb'
import { createDynamo, createDynamoTable } from './test/_helpers.js'
import { write } from './write.js'
Expand Down
9 changes: 8 additions & 1 deletion write.js
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,21 @@ export async function write (srcStream, dst, segment, totalSegments, client = ne
srcCount += batch.length
spinner.suffixText = `src: ${srcCount} dst: ${dstCount}`

// remove duplicates
const itemMap = new Map()
for (const item of batch) {
itemMap.set(`${item.blockmultihash}#${item.carpath}`, item)
}

/** @type {Array<import('@aws-sdk/client-dynamodb').PutRequest} */
const puts = batch.map(item => {
const puts = Array.from(itemMap.values()).map(item => {
return {
PutRequest: {
Item: marshall(item)
}
}
})

const cmd = new BatchWriteItemCommand({
RequestItems: {
[dst]: puts
Expand Down

0 comments on commit 6cd9c55

Please sign in to comment.