Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add import data from gql script #247

Merged
merged 17 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
/vendor
/wallets
test-results.xml
/scripts/import-data/bundles
/scripts/import-data/transactions

# Generated docs
/docs/sqlite/bundles
Expand Down
327 changes: 327 additions & 0 deletions scripts/import-data/fetch-data-gql.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
/**
* AR.IO Gateway
* Copyright (C) 2022-2023 Permanent Data Solutions, Inc. All Rights Reserved.
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
import * as fs from 'node:fs/promises';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
const args = process.argv.slice(2);
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';
let MIN_BLOCK_HEIGHT = 0;
let MAX_BLOCK_HEIGHT: number | undefined;
let BLOCK_RANGE_SIZE = 100;

args.forEach((arg, index) => {
djwhitt marked this conversation as resolved.
Show resolved Hide resolved
switch (arg) {
case '--gqlEndpoint':
if (args[index + 1]) {
GQL_ENDPOINT = args[index + 1];
} else {
console.error('Missing value for --gqlEndpoint');
process.exit(1);
}
break;
case '--minHeight':
if (args[index + 1]) {
MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --minHeight');
process.exit(1);
}
break;
case '--maxHeight':
if (args[index + 1]) {
MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --maxHeight');
process.exit(1);
}
break;
case '--blockRangeSize':
if (args[index + 1]) {
BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --blockRangeSize');
process.exit(1);
}
break;
default:
break;
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for parsed arguments

While the code handles missing values well, it should validate the parsed values to ensure they are valid numbers and within acceptable ranges.

Apply this diff to add validation:

     case '--minHeight':
       if (args[index + 1]) {
-        MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
+        const value = parseInt(args[index + 1], 10);
+        if (isNaN(value) || value < 0) {
+          console.error('Invalid value for --minHeight. Must be a non-negative integer.');
+          process.exit(1);
+        }
+        MIN_BLOCK_HEIGHT = value;
       } else {
         console.error('Missing value for --minHeight');
         process.exit(1);
       }
       break;
     case '--maxHeight':
       if (args[index + 1]) {
-        MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
+        const value = parseInt(args[index + 1], 10);
+        if (isNaN(value) || value < 0) {
+          console.error('Invalid value for --maxHeight. Must be a non-negative integer.');
+          process.exit(1);
+        }
+        MAX_BLOCK_HEIGHT = value;
       } else {
         console.error('Missing value for --maxHeight');
         process.exit(1);
       }
       break;
     case '--blockRangeSize':
       if (args[index + 1]) {
-        BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10);
+        const value = parseInt(args[index + 1], 10);
+        if (isNaN(value) || value <= 0) {
+          console.error('Invalid value for --blockRangeSize. Must be a positive integer.');
+          process.exit(1);
+        }
+        BLOCK_RANGE_SIZE = value;
       } else {
         console.error('Missing value for --blockRangeSize');
         process.exit(1);
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';
let MIN_BLOCK_HEIGHT = 0;
let MAX_BLOCK_HEIGHT: number | undefined;
let BLOCK_RANGE_SIZE = 100;
args.forEach((arg, index) => {
switch (arg) {
case '--gqlEndpoint':
if (args[index + 1]) {
GQL_ENDPOINT = args[index + 1];
} else {
console.error('Missing value for --gqlEndpoint');
process.exit(1);
}
break;
case '--minHeight':
if (args[index + 1]) {
MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --minHeight');
process.exit(1);
}
break;
case '--maxHeight':
if (args[index + 1]) {
MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --maxHeight');
process.exit(1);
}
break;
case '--blockRangeSize':
if (args[index + 1]) {
BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --blockRangeSize');
process.exit(1);
}
break;
default:
break;
}
});
let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';
let MIN_BLOCK_HEIGHT = 0;
let MAX_BLOCK_HEIGHT: number | undefined;
let BLOCK_RANGE_SIZE = 100;
args.forEach((arg, index) => {
switch (arg) {
case '--gqlEndpoint':
if (args[index + 1]) {
GQL_ENDPOINT = args[index + 1];
} else {
console.error('Missing value for --gqlEndpoint');
process.exit(1);
}
break;
case '--minHeight':
if (args[index + 1]) {
const value = parseInt(args[index + 1], 10);
if (isNaN(value) || value < 0) {
console.error('Invalid value for --minHeight. Must be a non-negative integer.');
process.exit(1);
}
MIN_BLOCK_HEIGHT = value;
} else {
console.error('Missing value for --minHeight');
process.exit(1);
}
break;
case '--maxHeight':
if (args[index + 1]) {
const value = parseInt(args[index + 1], 10);
if (isNaN(value) || value < 0) {
console.error('Invalid value for --maxHeight. Must be a non-negative integer.');
process.exit(1);
}
MAX_BLOCK_HEIGHT = value;
} else {
console.error('Missing value for --maxHeight');
process.exit(1);
}
break;
case '--blockRangeSize':
if (args[index + 1]) {
const value = parseInt(args[index + 1], 10);
if (isNaN(value) || value <= 0) {
console.error('Invalid value for --blockRangeSize. Must be a positive integer.');
process.exit(1);
}
BLOCK_RANGE_SIZE = value;
} else {
console.error('Missing value for --blockRangeSize');
process.exit(1);
}
break;
default:
break;
}
});


const fetchWithRetry = async (
djwhitt marked this conversation as resolved.
Show resolved Hide resolved
url: string,
options: RequestInit = {},
retries = 5,
retryInterval = 300, // interval in milliseconds
): Promise<Response> => {
let attempt = 0;

while (attempt < retries) {
try {
const response = await fetch(url, options);

if (response.ok) {
return response;
}

throw new Error(`HTTP error! status: ${response.status}`);
} catch (error) {
attempt++;

if (attempt >= retries) {
throw new Error(
`Fetch failed after ${retries} attempts: ${(error as Error).message}`,
);
}

const waitTime = retryInterval * attempt;
console.warn(
`Fetch attempt ${attempt} failed. Retrying in ${waitTime}ms...`,
);

await new Promise((resolve) => setTimeout(resolve, waitTime));
}
}

throw new Error('Unexpected error in fetchWithRetry');
};

const fetchLatestBlockHeight = async () => {
const response = await fetchWithRetry('https://arweave.net/info', {
method: 'GET',
});
const { blocks } = await response.json();
return blocks as number;
karlprieb marked this conversation as resolved.
Show resolved Hide resolved
};

type BlockRange = { min: number; max: number };
const getBlockRanges = ({
minBlock,
maxBlock,
rangeSize,
}: {
minBlock: number;
maxBlock: number;
rangeSize: number;
}) => {
if (minBlock >= maxBlock || rangeSize <= 0) {
throw new Error(
'Invalid input: ensure minBlock < maxBlock and rangeSize > 0',
);
}
karlprieb marked this conversation as resolved.
Show resolved Hide resolved

const ranges: BlockRange[] = [];
let currentMin = minBlock;

while (currentMin < maxBlock) {
const currentMax = Math.min(currentMin + rangeSize - 1, maxBlock);
ranges.push({ min: currentMin, max: currentMax });
currentMin = currentMax + 1;
}

return ranges;
};

const gqlQuery = ({
minBlock,
maxBlock,
cursor,
}: {
minBlock: number;
maxBlock: number;
cursor?: string;
}) => `
query {
transactions(
block: {
min: ${minBlock}
max: ${maxBlock}
}
tags: [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this configurable (maybe via cli specified file or environment variable) so users can use custom filters. We can leave the ArDrive tags as default right now for our own convenience.

{
name: "App-Name"
values: [
"ArDrive-App"
"ArDrive-Web"
"ArDrive-CLI"
"ArDrive-Desktop"
"ArDrive-Mobile"
"ArDrive-Core"
"ArDrive-Sync"
]
}
]
first: 100
sort: HEIGHT_ASC
after: "${cursor !== undefined ? cursor : ''}"
karlprieb marked this conversation as resolved.
Show resolved Hide resolved
) {
pageInfo {
hasNextPage
}
edges {
cursor
node {
id
bundledIn {
id
}
block {
height
}
}
}
}
}
`;

const fetchGql = async ({
minBlock,
maxBlock,
cursor,
}: {
minBlock: number;
maxBlock: number;
cursor?: string;
}) => {
const response = await fetchWithRetry(GQL_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ query: gqlQuery({ minBlock, maxBlock, cursor }) }),
});
const result = await response.json();
if (result.errors) {
throw new Error(`GraphQL error: ${JSON.stringify(result.errors)}`);
}
const { data } = result;
return data;
};

type BlockTransactions = Map<number, Set<string>>;
const getTransactionsForRange = async ({ min, max }: BlockRange) => {
let cursor: string | undefined;
let hasNextPage = true;
let page = 0;
const transactions: BlockTransactions = new Map();
const bundles: BlockTransactions = new Map();

while (hasNextPage) {
console.log(
`Fetching transactions and bundles from block ${min} to ${max}. Page ${page}`,
);
const {
transactions: { edges, pageInfo },
} = await fetchGql({
minBlock: min,
maxBlock: max,
cursor,
});

hasNextPage = pageInfo.hasNextPage;
cursor = hasNextPage ? edges[edges.length - 1].cursor : undefined;

karlprieb marked this conversation as resolved.
Show resolved Hide resolved
for (const edge of edges) {
const blockHeight = edge.node.block.height;
const bundleId = edge.node.bundledIn?.id;
const id = edge.node.id;

if (!transactions.has(blockHeight)) {
transactions.set(blockHeight, new Set());
}
if (!bundles.has(blockHeight)) {
bundles.set(blockHeight, new Set());
}

if (bundleId !== undefined) {
bundles.get(blockHeight)?.add(bundleId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do another GQL call here (one for each unique bundle ID) to retrieve the bundle itself and check whether it has a parent (to get the root ID in case of BDIs). Let's make this behavior configurable via CLI arg too, but default it to enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to be able to get BDIs and L1 bundles and make it configurable like "get only L1 bundles"/"get L1 bundles and BDIs"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it might be nice to skip the extra lookups and only get the BDIs for cases where someone wants the script to run faster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it would be a setting for bundles where you can get the root tx id OR the bdi id (current behavior in this PR), right?

} else {
transactions.get(blockHeight)?.add(id);
}
}

page++;
}

return { transactions, bundles };
};

const writeTransactionsToFile = async ({
outputDir,
transactions,
}: {
outputDir: string;
transactions: BlockTransactions;
}) => {
try {
await fs.mkdir(outputDir, { recursive: true });
} catch (error) {
console.error(`Failed to create directory: ${error}`);
throw error;
}

for (const [height, ids] of transactions.entries()) {
if (ids.size === 0) continue;

const content = JSON.stringify([...ids], null, 2);
const filePath = path.join(outputDir, `${height}.json`);

try {
await fs.writeFile(filePath, content);
} catch (error) {
console.error(`Failed to write ${filePath}: ${error}`);
throw error;
}
}
};

(async () => {
if (MAX_BLOCK_HEIGHT === undefined) {
MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
}

const blockRanges = getBlockRanges({
minBlock: MIN_BLOCK_HEIGHT,
maxBlock: MAX_BLOCK_HEIGHT,
rangeSize: BLOCK_RANGE_SIZE,
});

console.log(
`Starting to fetch transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
);

for (const range of blockRanges) {
const { transactions, bundles } = await getTransactionsForRange(range);

await writeTransactionsToFile({
outputDir: path.join(__dirname, 'transactions'),
transactions,
});
await writeTransactionsToFile({
outputDir: path.join(__dirname, 'bundles'),
transactions: bundles,
});

console.log(
`Transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT} saved!`,
);
}
})();
2 changes: 1 addition & 1 deletion tsconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,5 @@
"swc": true,
"esm": true
},
"include": ["src", "test"]
"include": ["src", "test", "scripts"]
}
Loading