Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add import data from gql script #247

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from

Conversation

karlprieb
Copy link
Collaborator

This Nodejs script fetches ArDrive L1 transactions and bundles from a gateway through GQL interface.

The script can receive some parameters like:

  • --gqlEndpoint # defaults to Goldsky
  • --minHeight # defaults to 0
  • --maxHeight # defaults to the latest Arweave block
  • --blockRangeSize # defaults to 100

Example:
node --import=./register.js scripts/import-data/fetch-data-gql.ts --minHeight 1000000 --maxHeight 1000500

This Nodejs script fetches ArDrive L1 transactions and bundles from a
gateway through GQL interface.

The script can receive some parameters like:
  - --gqlEndpoint # defaults to Goldsky
  - --minHeight # defaults to 0
  - --maxHeight # defaults to the latest Arweave block
  - --blockRangeSize # defaults to 100

Example:
`node --import=./register.js scripts/import-data/fetch-data-gql.ts --minHeight 1000000 --maxHeight 1000500`
@karlprieb karlprieb requested a review from djwhitt December 10, 2024 17:35
Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.87%. Comparing base (c977f8d) to head (5e8fb38).
Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
src/workers/bundle-data-importer.ts 66.66% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #247   +/-   ##
========================================
  Coverage    70.86%   70.87%           
========================================
  Files           35       35           
  Lines         8958     8967    +9     
  Branches       523      523           
========================================
+ Hits          6348     6355    +7     
- Misses        2608     2610    +2     
  Partials         2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

coderabbitai bot commented Dec 10, 2024

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 eslint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

error @permaweb/[email protected]: The engine "yarn" is incompatible with this module. Expected version "please-use-npm". Got "1.22.22"
error Found incompatible module.

📝 Walkthrough

Walkthrough

The changes in this pull request include modifications to several files: the .gitignore, fetch-data-gql.ts, tsconfig.json, docker-compose.yaml, import-data.ts, utils.ts, config.ts, ar-io.ts, system.ts, and bundle-data-importer.ts. The .gitignore file now excludes specific directories related to data imports. The fetch-data-gql.ts file implements functionality for fetching transaction data from a GraphQL endpoint. The tsconfig.json file has been updated to include the scripts directory in the TypeScript compilation process, and various updates have been made to enhance the configuration and functionality of the data importer.

Changes

File Change Summary
.gitignore Added entries to ignore /scripts/import-data/bundles, /scripts/import-data/transactions, and /scripts/import-data/parquet.
scripts/import-data/fetch-data-gql.ts New file created to fetch transaction data from a GraphQL endpoint, with multiple utility functions.
tsconfig.json Updated include array to add the scripts directory for TypeScript compilation.
docker-compose.yaml Added environment variable BUNDLE_DATA_IMPORTER_QUEUE_SIZE to core service.
scripts/import-data/import-data.ts New file created for importing transaction and bundle data into AR.IO Gateway.
scripts/import-data/utils.ts New file created with fetchWithRetry and fetchLatestBlockHeight functions.
src/config.ts Added new constant BUNDLE_DATA_IMPORTER_QUEUE_SIZE initialized from environment variable.
src/routes/ar-io.ts Added check in /ar-io/admin/queue-bundle endpoint for queue fullness, responding with 429 if full.
src/system.ts Exported bundleDataImporter and updated its constructor to use maxQueueSize from config.
src/workers/bundle-data-importer.ts Updated BundleDataImporter constructor to use maxQueueSize from config and added isQueueFull method.
scripts/import-data/count-fetched-ids.ts New file created to count IDs from transaction and bundle files in specified directories.
scripts/import-data/export-parquet.ts New file created for exporting data in Parquet format from AR.IO Gateway.

Possibly related PRs

Suggested reviewers

  • djwhitt

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 035c895 and 5e8fb38.

📒 Files selected for processing (3)
  • scripts/import-data/export-parquet.ts (1 hunks)
  • scripts/import-data/fetch-data-gql.ts (1 hunks)
  • scripts/import-data/utils.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • scripts/import-data/utils.ts
  • scripts/import-data/export-parquet.ts
  • scripts/import-data/fetch-data-gql.ts

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (2)
scripts/import-data/fetch-data-gql.ts (2)

30-46: Consider using a command-line argument parsing library

Utilizing a library such as commander, yargs, or minimist can simplify argument handling, provide built-in validations, and improve code readability.


243-270: Improve error handling in writeTransactionsToFile

Enhance robustness by handling specific file system errors and providing more informative messages to the user.

Consider wrapping file operations in try-catch blocks and distinguishing between different types of errors (e.g., permission issues, disk space).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c977f8d and f384110.

📒 Files selected for processing (3)
  • .gitignore (1 hunks)
  • scripts/import-data/fetch-data-gql.ts (1 hunks)
  • tsconfig.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🔇 Additional comments (2)
scripts/import-data/fetch-data-gql.ts (1)

273-275: Verify that MAX_BLOCK_HEIGHT is defined after fetching

Ensure that MAX_BLOCK_HEIGHT is correctly assigned after calling fetchLatestBlockHeight() to avoid potential issues in subsequent code.

Run the following script to confirm MAX_BLOCK_HEIGHT is properly assigned:

✅ Verification successful

MAX_BLOCK_HEIGHT assignment is properly implemented

The verification shows that:

  • MAX_BLOCK_HEIGHT is correctly declared as a number
  • It's properly assigned either from CLI arguments or by fetching from the Arweave network
  • fetchLatestBlockHeight correctly retrieves and returns the block height
  • The value is used consistently throughout the code for block range calculations and logging
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that MAX_BLOCK_HEIGHT is assigned after fetching the latest block height.

node -e "
(async () => {
  let MAX_BLOCK_HEIGHT;
  const fetchLatestBlockHeight = async () => {
    const response = await fetch('https://arweave.net/info');
    const data = await response.json();
    return data.height;
  };
  MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
  console.log('MAX_BLOCK_HEIGHT:', MAX_BLOCK_HEIGHT);
})();
"

Length of output: 381


Script:

#!/bin/bash
# Let's examine how MAX_BLOCK_HEIGHT is used in the codebase
rg "MAX_BLOCK_HEIGHT" -A 3

# Also check the implementation of fetchLatestBlockHeight
ast-grep --pattern 'async function fetchLatestBlockHeight() { $$$ }'

Length of output: 1788


Script:

#!/bin/bash
# Let's check the implementation of fetchLatestBlockHeight with a different pattern
ast-grep --pattern 'const fetchLatestBlockHeight = async () => { $$$ }'

# Also check how getBlockRanges uses maxBlock parameter
ast-grep --pattern 'function getBlockRanges($$$) { $$$ }'

Length of output: 634

tsconfig.json (1)

28-28: Including the scripts directory in TypeScript compilation

Adding "scripts" to the include array ensures that TypeScript compiles the new script files, which is appropriate.

scripts/import-data/fetch-data-gql.ts Show resolved Hide resolved
scripts/import-data/fetch-data-gql.ts Show resolved Hide resolved
scripts/import-data/fetch-data-gql.ts Outdated Show resolved Hide resolved
scripts/import-data/fetch-data-gql.ts Show resolved Hide resolved
scripts/import-data/fetch-data-gql.ts Outdated Show resolved Hide resolved
scripts/import-data/fetch-data-gql.ts Outdated Show resolved Hide resolved
karlprieb and others added 2 commits December 10, 2024 16:53
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
scripts/import-data/fetch-data-gql.ts (1)

323-325: Update log message to reflect the current block range

The log message indicates the full block range from MIN_BLOCK_HEIGHT to MAX_BLOCK_HEIGHT, which may be misleading during iteration. Modify it to display the current range being processed for clearer progress tracking.

Apply this diff to fix the log message:

   console.log(
-    `Transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT} saved!`,
+    `Transactions and bundles from block ${range.min} to ${range.max} saved!`,
   );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f384110 and d6cfb5f.

📒 Files selected for processing (1)
  • scripts/import-data/fetch-data-gql.ts (1 hunks)
🔇 Additional comments (4)
scripts/import-data/fetch-data-gql.ts (4)

111-112: ⚠️ Potential issue

Correct the property name when fetching the latest block height

The fetchLatestBlockHeight function attempts to extract blocks from the response, but the correct property for the latest block height is height. This change ensures that the latest block height is accurately retrieved.

Apply this diff to fix the property name:

 const { blocks } = await response.json();
- return blocks as number;
+ return height as number;

Or adjust the code accordingly:

   const data = await response.json();
-  const { blocks } = data;
+  const { height } = data;
   return height as number;

125-129: 🛠️ Refactor suggestion

Allow processing when minBlock equals maxBlock

The current condition prevents processing a single block range where minBlock equals maxBlock. Adjust the condition to include this scenario, allowing the script to handle ranges where both values are the same.

Apply this diff to modify the condition:

- if (minBlock >= maxBlock || rangeSize <= 0) {
+ if (minBlock > maxBlock || rangeSize <= 0) {
     throw new Error(
       'Invalid input: ensure minBlock < maxBlock and rangeSize > 0',
     );
   }

174-174: ⚠️ Potential issue

Conditionally include the after parameter in the GraphQL query

Passing an empty string for the after parameter may cause issues with the GraphQL endpoint. Omit the after parameter when cursor is undefined to prevent potential errors.

Apply this diff to adjust the query:

       sort: HEIGHT_ASC
-      after: "${cursor !== undefined ? cursor : ''}"
+      ${cursor !== undefined ? `after: "${cursor}"` : ''}
     ) {

239-241: ⚠️ Potential issue

Handle empty edges array to prevent errors

When processing the edges array, ensure it is not empty before accessing its elements to avoid runtime exceptions when the array is empty.

Apply this diff to add a condition:

     hasNextPage = pageInfo.hasNextPage;
-    cursor = hasNextPage ? edges[edges.length - 1].cursor : undefined;
+    if (hasNextPage && edges.length > 0) {
+      cursor = edges[edges.length - 1].cursor;
+    } else {
+      cursor = undefined;
+    }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
scripts/import-data/fetch-data-gql.ts (3)

158-171: Consider moving App-Name values to configuration

The hardcoded App-Name values should be moved to a configuration file for better maintainability and reusability.

Create a config file and import the values:

// config.ts
export const ARDRIVE_APP_NAMES = [
  "ArDrive-App",
  "ArDrive-Web",
  "ArDrive-CLI",
  "ArDrive-Desktop",
  "ArDrive-Mobile",
  "ArDrive-Core",
  "ArDrive-Sync",
];

Then update the query:

    tags: [
      {
        name: "App-Name"
-        values: [
-          "ArDrive-App"
-          "ArDrive-Web"
-          "ArDrive-CLI"
-          "ArDrive-Desktop"
-          "ArDrive-Mobile"
-          "ArDrive-Core"
-          "ArDrive-Sync"
-        ]
+        values: ARDRIVE_APP_NAMES
      }
    ]

227-262: Improve pagination handling with rate limiting

The while loop for pagination could benefit from rate limiting to prevent overwhelming the API.

Add rate limiting:

+ const sleep = (ms: number) => new Promise(resolve => setTimeout(resolve, ms));
+ const RATE_LIMIT_DELAY = 1000; // 1 second delay between requests

  while (hasNextPage) {
    console.log(
      `Fetching transactions and bundles from block ${min} to ${max}. Page ${page}`,
    );
    const {
      transactions: { edges, pageInfo },
    } = await fetchGql({
      minBlock: min,
      maxBlock: max,
      cursor,
    });

+   // Add delay between requests
+   await sleep(RATE_LIMIT_DELAY);

267-294: Consider batching file operations for large datasets

For better performance with large datasets, consider batching file operations and adding progress tracking.

+ const BATCH_SIZE = 100;
+ let processedCount = 0;
+ const totalCount = transactions.size;

- for (const [height, ids] of transactions.entries()) {
+ const entries = Array.from(transactions.entries());
+ for (let i = 0; i < entries.length; i += BATCH_SIZE) {
+   const batch = entries.slice(i, i + BATCH_SIZE);
+   await Promise.all(
+     batch.map(async ([height, ids]) => {
      if (ids.size === 0) return;

      const content = JSON.stringify([...ids], null, 2);
      const filePath = path.join(outputDir, `${height}.json`);

      try {
        await fs.writeFile(filePath, content);
+       processedCount += 1;
+       if (processedCount % 10 === 0) {
+         console.log(`Progress: ${processedCount}/${totalCount} files written`);
+       }
      } catch (error) {
        console.error(`Failed to write ${filePath}: ${error}`);
        throw error;
      }
+     })
+   );
  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d6cfb5f and 4bbf4b6.

📒 Files selected for processing (1)
  • scripts/import-data/fetch-data-gql.ts (1 hunks)
🔇 Additional comments (3)
scripts/import-data/fetch-data-gql.ts (3)

1-24: LGTM! Well-structured imports and license header

The code follows best practices by:

  • Including the AGPL v3 license header
  • Using Node.js built-in modules
  • Implementing proper ESM path resolution

111-112: ⚠️ Potential issue

Fix incorrect property access in fetchLatestBlockHeight

The response from arweave.net/info contains a height property, not blocks.


239-241: ⚠️ Potential issue

Handle empty edges array to prevent errors

When processing the edges array, ensure it's not empty before accessing its elements.

Comment on lines 25 to 67
let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';
let MIN_BLOCK_HEIGHT = 0;
let MAX_BLOCK_HEIGHT: number | undefined;
let BLOCK_RANGE_SIZE = 100;

args.forEach((arg, index) => {
switch (arg) {
case '--gqlEndpoint':
if (args[index + 1]) {
GQL_ENDPOINT = args[index + 1];
} else {
console.error('Missing value for --gqlEndpoint');
process.exit(1);
}
break;
case '--minHeight':
if (args[index + 1]) {
MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --minHeight');
process.exit(1);
}
break;
case '--maxHeight':
if (args[index + 1]) {
MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --maxHeight');
process.exit(1);
}
break;
case '--blockRangeSize':
if (args[index + 1]) {
BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --blockRangeSize');
process.exit(1);
}
break;
default:
break;
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for parsed arguments

While the code handles missing values well, it should validate the parsed values to ensure they are valid numbers and within acceptable ranges.

Apply this diff to add validation:

     case '--minHeight':
       if (args[index + 1]) {
-        MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
+        const value = parseInt(args[index + 1], 10);
+        if (isNaN(value) || value < 0) {
+          console.error('Invalid value for --minHeight. Must be a non-negative integer.');
+          process.exit(1);
+        }
+        MIN_BLOCK_HEIGHT = value;
       } else {
         console.error('Missing value for --minHeight');
         process.exit(1);
       }
       break;
     case '--maxHeight':
       if (args[index + 1]) {
-        MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
+        const value = parseInt(args[index + 1], 10);
+        if (isNaN(value) || value < 0) {
+          console.error('Invalid value for --maxHeight. Must be a non-negative integer.');
+          process.exit(1);
+        }
+        MAX_BLOCK_HEIGHT = value;
       } else {
         console.error('Missing value for --maxHeight');
         process.exit(1);
       }
       break;
     case '--blockRangeSize':
       if (args[index + 1]) {
-        BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10);
+        const value = parseInt(args[index + 1], 10);
+        if (isNaN(value) || value <= 0) {
+          console.error('Invalid value for --blockRangeSize. Must be a positive integer.');
+          process.exit(1);
+        }
+        BLOCK_RANGE_SIZE = value;
       } else {
         console.error('Missing value for --blockRangeSize');
         process.exit(1);
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';
let MIN_BLOCK_HEIGHT = 0;
let MAX_BLOCK_HEIGHT: number | undefined;
let BLOCK_RANGE_SIZE = 100;
args.forEach((arg, index) => {
switch (arg) {
case '--gqlEndpoint':
if (args[index + 1]) {
GQL_ENDPOINT = args[index + 1];
} else {
console.error('Missing value for --gqlEndpoint');
process.exit(1);
}
break;
case '--minHeight':
if (args[index + 1]) {
MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --minHeight');
process.exit(1);
}
break;
case '--maxHeight':
if (args[index + 1]) {
MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --maxHeight');
process.exit(1);
}
break;
case '--blockRangeSize':
if (args[index + 1]) {
BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10);
} else {
console.error('Missing value for --blockRangeSize');
process.exit(1);
}
break;
default:
break;
}
});
let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';
let MIN_BLOCK_HEIGHT = 0;
let MAX_BLOCK_HEIGHT: number | undefined;
let BLOCK_RANGE_SIZE = 100;
args.forEach((arg, index) => {
switch (arg) {
case '--gqlEndpoint':
if (args[index + 1]) {
GQL_ENDPOINT = args[index + 1];
} else {
console.error('Missing value for --gqlEndpoint');
process.exit(1);
}
break;
case '--minHeight':
if (args[index + 1]) {
const value = parseInt(args[index + 1], 10);
if (isNaN(value) || value < 0) {
console.error('Invalid value for --minHeight. Must be a non-negative integer.');
process.exit(1);
}
MIN_BLOCK_HEIGHT = value;
} else {
console.error('Missing value for --minHeight');
process.exit(1);
}
break;
case '--maxHeight':
if (args[index + 1]) {
const value = parseInt(args[index + 1], 10);
if (isNaN(value) || value < 0) {
console.error('Invalid value for --maxHeight. Must be a non-negative integer.');
process.exit(1);
}
MAX_BLOCK_HEIGHT = value;
} else {
console.error('Missing value for --maxHeight');
process.exit(1);
}
break;
case '--blockRangeSize':
if (args[index + 1]) {
const value = parseInt(args[index + 1], 10);
if (isNaN(value) || value <= 0) {
console.error('Invalid value for --blockRangeSize. Must be a positive integer.');
process.exit(1);
}
BLOCK_RANGE_SIZE = value;
} else {
console.error('Missing value for --blockRangeSize');
process.exit(1);
}
break;
default:
break;
}
});

Comment on lines 296 to 329
(async () => {
if (MAX_BLOCK_HEIGHT === undefined) {
MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
}

const blockRanges = getBlockRanges({
minBlock: MIN_BLOCK_HEIGHT,
maxBlock: MAX_BLOCK_HEIGHT,
rangeSize: BLOCK_RANGE_SIZE,
});

console.log(
`Starting to fetch transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
);

for (const range of blockRanges) {
const { transactions, bundles } = await getTransactionsForRange(range);

await writeTransactionsToFile({
outputDir: path.join(__dirname, 'transactions'),
transactions,
});
await writeTransactionsToFile({
outputDir: path.join(__dirname, 'bundles'),
transactions: bundles,
});

if (transactions.size !== 0 || bundles.size !== 0) {
console.log(
`Transactions and bundles from block ${range.min} to ${range.max} saved!`,
);
}
}
})();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add interrupt handling and detailed progress tracking

The main execution could benefit from proper cleanup on interruption and more detailed progress tracking.

+ let interrupted = false;
+ 
+ process.on('SIGINT', () => {
+   console.log('\nGracefully shutting down...');
+   interrupted = true;
+ });
+ 
+ const totalRanges = blockRanges.length;
+ let processedRanges = 0;
+ 
  for (const range of blockRanges) {
+   if (interrupted) {
+     console.log('Operation interrupted by user');
+     process.exit(0);
+   }
+ 
    const { transactions, bundles } = await getTransactionsForRange(range);

    await writeTransactionsToFile({
      outputDir: path.join(__dirname, 'transactions'),
      transactions,
    });
    await writeTransactionsToFile({
      outputDir: path.join(__dirname, 'bundles'),
      transactions: bundles,
    });

+   processedRanges += 1;
    if (transactions.size !== 0 || bundles.size !== 0) {
      console.log(
-       `Transactions and bundles from block ${range.min} to ${range.max} saved!`,
+       `Progress: ${processedRanges}/${totalRanges} ranges processed. ` +
+       `Transactions and bundles from block ${range.min} to ${range.max} saved!`,
      );
    }
  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(async () => {
if (MAX_BLOCK_HEIGHT === undefined) {
MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
}
const blockRanges = getBlockRanges({
minBlock: MIN_BLOCK_HEIGHT,
maxBlock: MAX_BLOCK_HEIGHT,
rangeSize: BLOCK_RANGE_SIZE,
});
console.log(
`Starting to fetch transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
);
for (const range of blockRanges) {
const { transactions, bundles } = await getTransactionsForRange(range);
await writeTransactionsToFile({
outputDir: path.join(__dirname, 'transactions'),
transactions,
});
await writeTransactionsToFile({
outputDir: path.join(__dirname, 'bundles'),
transactions: bundles,
});
if (transactions.size !== 0 || bundles.size !== 0) {
console.log(
`Transactions and bundles from block ${range.min} to ${range.max} saved!`,
);
}
}
})();
(async () => {
if (MAX_BLOCK_HEIGHT === undefined) {
MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
}
const blockRanges = getBlockRanges({
minBlock: MIN_BLOCK_HEIGHT,
maxBlock: MAX_BLOCK_HEIGHT,
rangeSize: BLOCK_RANGE_SIZE,
});
console.log(
`Starting to fetch transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
);
let interrupted = false;
process.on('SIGINT', () => {
console.log('\nGracefully shutting down...');
interrupted = true;
});
const totalRanges = blockRanges.length;
let processedRanges = 0;
for (const range of blockRanges) {
if (interrupted) {
console.log('Operation interrupted by user');
process.exit(0);
}
const { transactions, bundles } = await getTransactionsForRange(range);
await writeTransactionsToFile({
outputDir: path.join(__dirname, 'transactions'),
transactions,
});
await writeTransactionsToFile({
outputDir: path.join(__dirname, 'bundles'),
transactions: bundles,
});
processedRanges += 1;
if (transactions.size !== 0 || bundles.size !== 0) {
console.log(
`Progress: ${processedRanges}/${totalRanges} ranges processed. ` +
`Transactions and bundles from block ${range.min} to ${range.max} saved!`,
);
}
}
})();

Copy link
Collaborator

@djwhitt djwhitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an extra GQL call to get the root parent for BDIs. It also would be worthwhile to make the GQL query we're using to find data items configurable.

I commented on a couple places where we could pull in libraries too, but perhaps the intent is to make this script as standalone as possible. I think that's worth doing for now, but if it becomes to cumbersome pulling in some libraries is fine too.

min: ${minBlock}
max: ${maxBlock}
}
tags: [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this configurable (maybe via cli specified file or environment variable) so users can use custom filters. We can leave the ArDrive tags as default right now for our own convenience.

scripts/import-data/fetch-data-gql.ts Outdated Show resolved Hide resolved
}

if (bundleId !== undefined) {
bundles.get(blockHeight)?.add(bundleId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do another GQL call here (one for each unique bundle ID) to retrieve the bundle itself and check whether it has a parent (to get the root ID in case of BDIs). Let's make this behavior configurable via CLI arg too, but default it to enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to be able to get BDIs and L1 bundles and make it configurable like "get only L1 bundles"/"get L1 bundles and BDIs"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it might be nice to skip the extra lookups and only get the BDIs for cases where someone wants the script to run faster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it would be a setting for bundles where you can get the root tx id OR the bdi id (current behavior in this PR), right?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (7)
scripts/import-data/import-data.ts (2)

36-103: Refactor command-line argument parsing for scalability and maintainability

The repetitive pattern in the command-line argument parsing can make the code harder to maintain as new options are added. Consider using a dedicated command-line argument parsing library like commander, yargs, or minimist to simplify this section and improve scalability.


31-31: Verify existence of directories and handle missing paths

The script assumes that TRANSACTIONS_DIR and BUNDLES_DIR exist. If these directories do not exist, the script may fail. It's advisable to check for the existence of these directories and create them if necessary or inform the user about the missing directories.

Would you like assistance in adding code to verify and create directories if they are missing?

Also applies to: 70-85

scripts/import-data/utils.ts (1)

19-62: Include response details in error messages

When a non-OK HTTP response is received, the error thrown only includes the status code. Including the response body or status text can provide more context and aid in debugging.

Apply this diff to enhance error messages:

...
if (!response.ok) {
-   throw new Error(`HTTP error! status: ${response.status}`);
+   const errorText = await response.text();
+   throw new Error(`HTTP error! status: ${response.status}, body: ${errorText}`);
}
...
src/workers/bundle-data-importer.ts (1)

141-143: Add documentation for the isQueueFull method

The new isQueueFull method is a valuable addition. Adding JSDoc comments would improve code readability and maintainability by explaining the purpose and usage of this method.

Apply this diff to add documentation:

+ /**
+  * Checks if the bundle data importer queue has reached its maximum capacity.
+  * @returns {Promise<boolean>} True if the queue is full, false otherwise.
+  */
  async isQueueFull(): Promise<boolean> {
    return this.queue.length() >= this.maxQueueSize;
  }
docker-compose.yaml (1)

111-111: Set a default value for BUNDLE_DATA_IMPORTER_QUEUE_SIZE

The environment variable BUNDLE_DATA_IMPORTER_QUEUE_SIZE currently defaults to an empty string if not set, which may lead to unexpected behavior. Consider setting a sensible default value to ensure the application functions correctly even if the environment variable is not provided.

Apply this diff to provide a default value (e.g., 1000):

-       - BUNDLE_DATA_IMPORTER_QUEUE_SIZE=${BUNDLE_DATA_IMPORTER_QUEUE_SIZE:-}
+       - BUNDLE_DATA_IMPORTER_QUEUE_SIZE=${BUNDLE_DATA_IMPORTER_QUEUE_SIZE:-1000}
scripts/import-data/fetch-data-gql.ts (2)

241-285: Improve error handling and progress tracking in getTransactionsForRange

The function could benefit from:

  1. Better error handling for GraphQL responses
  2. Progress tracking for long-running operations
  3. Graceful interruption handling

Apply this diff to enhance the function:

 const getTransactionsForRange = async ({ min, max }: BlockRange) => {
   let cursor: string | undefined;
   let hasNextPage = true;
   const transactions: BlockTransactions = new Map();
   const bundles: BlockTransactions = new Map();
+  let totalProcessed = 0;
+  const startTime = Date.now();
 
   while (hasNextPage) {
+    try {
       const {
         transactions: { edges, pageInfo },
       } = await fetchGql({
         minBlock: min,
         maxBlock: max,
         cursor,
       });
 
       hasNextPage = pageInfo.hasNextPage;
-      cursor = hasNextPage ? edges[edges.length - 1].cursor : undefined;
+      if (hasNextPage && edges.length > 0) {
+        cursor = edges[edges.length - 1].cursor;
+      } else {
+        cursor = undefined;
+      }
 
       for (const edge of edges) {
         const blockHeight = edge.node.block.height;
         const bundleId = edge.node.bundledIn?.id;
         const id = edge.node.id;
 
         if (!transactions.has(blockHeight)) {
           transactions.set(blockHeight, new Set());
         }
         if (!bundles.has(blockHeight)) {
           bundles.set(blockHeight, new Set());
         }
 
         if (bundleId !== undefined) {
           if (BUNDLES_FETCH_ROOT_TX) {
             const rootTxId = await getRootTxId(bundleId);
             bundles.get(blockHeight)?.add(rootTxId);
           } else {
             bundles.get(blockHeight)?.add(bundleId);
           }
         } else {
           transactions.get(blockHeight)?.add(id);
         }
+        totalProcessed++;
       }
+
+      // Log progress every 1000 items
+      if (totalProcessed % 1000 === 0) {
+        const elapsedSeconds = (Date.now() - startTime) / 1000;
+        console.log(
+          `Progress: Processed ${totalProcessed} items in ${elapsedSeconds.toFixed(2)}s`,
+        );
+      }
+    } catch (error) {
+      console.error(`Error processing range ${min}-${max}: ${error}`);
+      throw error;
+    }
   }
 
   return { transactions, bundles };
 };

287-314: Enhance error handling in writeTransactionsToFile

The function should handle more specific error cases and provide better error messages.

Apply this diff to improve error handling:

 const writeTransactionsToFile = async ({
   outputDir,
   transactions,
 }: {
   outputDir: string;
   transactions: BlockTransactions;
 }) => {
   try {
     await fs.mkdir(outputDir, { recursive: true });
   } catch (error) {
-    console.error(`Failed to create directory: ${error}`);
+    const errorMessage = error instanceof Error ? error.message : String(error);
+    console.error(`Failed to create directory ${outputDir}: ${errorMessage}`);
     throw error;
   }
 
   for (const [height, ids] of transactions.entries()) {
     if (ids.size === 0) continue;
 
-    const content = JSON.stringify([...ids]);
     const filePath = path.join(outputDir, `${height}.json`);
 
     try {
+      const content = JSON.stringify([...ids], null, 2); // Pretty print for better readability
       await fs.writeFile(filePath, content);
     } catch (error) {
-      console.error(`Failed to write ${filePath}: ${error}`);
+      const errorMessage = error instanceof Error ? error.message : String(error);
+      console.error(
+        `Failed to write to ${filePath} for block height ${height}: ${errorMessage}`,
+      );
       throw error;
     }
   }
 };
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4bbf4b6 and 4b57bb4.

📒 Files selected for processing (8)
  • docker-compose.yaml (1 hunks)
  • scripts/import-data/fetch-data-gql.ts (1 hunks)
  • scripts/import-data/import-data.ts (1 hunks)
  • scripts/import-data/utils.ts (1 hunks)
  • src/config.ts (1 hunks)
  • src/routes/ar-io.ts (1 hunks)
  • src/system.ts (1 hunks)
  • src/workers/bundle-data-importer.ts (3 hunks)
🔇 Additional comments (3)
src/routes/ar-io.ts (1)

213-216: LGTM! Good addition of queue overflow protection

The check for queue capacity before accepting new bundles is a good practice to prevent system overload.

src/config.ts (1)

201-205: LGTM! Well-documented configuration constant

The new configuration constant follows the established pattern and provides a reasonable default value.

src/system.ts (1)

472-477: LGTM! Verify the usage of the exported bundleDataImporter.

The changes look good. The export of bundleDataImporter and the addition of maxQueueSize parameter align with the PR objectives for importing data from GQL.

Let's verify the usage of the exported bundleDataImporter:

✅ Verification successful

The previous search might have been too specific. Let's broaden our search to find any references to bundleDataImporter and system imports, as well as locate route files where this exported instance might be used.


We found references to bundleDataImporter in three files. Let's examine how it's actually used in the routes and test files to verify if the export is properly utilized and necessary.


Export of bundleDataImporter is correctly used and necessary

The export is properly utilized in the routes file (src/routes/ar-io.ts) where it's accessed through the system instance to check queue status and process bundles. The test file confirms the implementation details and usage patterns align with the exported functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for imports and usage of bundleDataImporter
rg -A 3 "import.*bundleDataImporter.*from.*system" 

Length of output: 52


Script:

#!/bin/bash
# Let's try a broader search for bundleDataImporter usage
rg -l "bundleDataImporter"

# Also search for any imports from system.ts
rg "from ['\"].*system['\"]"

# Let's also check if there are any route files that might use it
fd "routes?"

Length of output: 163


Script:

#!/bin/bash
# Let's check how bundleDataImporter is used in the routes file
rg -A 5 "bundleDataImporter" src/routes/ar-io.ts

# Also check the test file to understand the usage pattern
rg -A 5 "bundleDataImporter" src/workers/bundle-data-importer.test.ts

Length of output: 2442

scripts/import-data/import-data.ts Show resolved Hide resolved
scripts/import-data/import-data.ts Show resolved Hide resolved
scripts/import-data/import-data.ts Show resolved Hide resolved
scripts/import-data/utils.ts Show resolved Hide resolved
scripts/import-data/fetch-data-gql.ts Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (3)
scripts/import-data/utils.ts (1)

76-100: Optimize file processing and add folder validation

The function could benefit from several improvements:

  1. Validate folder existence before processing
  2. Use async iteration for better memory efficiency with large directories
  3. Add input validation for min/max values

Apply this diff to implement the improvements:

 export const getFilesInRange = async ({
   folder,
   min,
   max,
 }: {
   folder: string;
   min: number;
   max: number;
 }): Promise<string[]> => {
+  if (min > max) {
+    throw new Error('Min value cannot be greater than max value');
+  }
+
   try {
+    await fs.access(folder);
     const files = await fs.readdir(folder);
 
     const filesInRange = files
       .filter((file) => path.extname(file) === '.json')
       .filter((file) => {
         const match = file.match(/^\d+/);
-        const number = match ? parseInt(match[0], 10) : null;
+        if (!match) return false;
+        const number = parseInt(match[0], 10);
+        if (isNaN(number)) return false;
         return number !== null && number >= min && number <= max;
       });
 
     return filesInRange;
   } catch (error) {
+    if (error instanceof Error && error.message.includes('ENOENT')) {
+      throw new Error(`Folder does not exist: ${folder}`);
+    }
     throw new Error(`Error processing files: ${(error as Error).message}`);
   }
 };
scripts/import-data/count-fetched-ids.ts (2)

26-29: Move configuration to a separate file

Consider moving these constants to a dedicated configuration file for better maintainability and reusability across scripts.

Create a new file config.ts:

import path from 'node:path';
import { fileURLToPath } from 'node:url';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

export const DEFAULT_CONFIG = {
  TRANSACTIONS_DIR: path.join(__dirname, 'transactions'),
  BUNDLES_DIR: path.join(__dirname, 'bundles'),
  MIN_BLOCK_HEIGHT: 0,
  MAX_BLOCK_HEIGHT: Infinity,
} as const;

86-130: Remove commented-out code

The commented-out importFromFiles function should be removed if it's no longer needed. If it's for future use, consider moving it to a separate file or tracking it in an issue.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b57bb4 and 72c9922.

📒 Files selected for processing (3)
  • scripts/import-data/count-fetched-ids.ts (1 hunks)
  • scripts/import-data/import-data.ts (1 hunks)
  • scripts/import-data/utils.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/import-data/import-data.ts

scripts/import-data/utils.ts Show resolved Hide resolved
scripts/import-data/utils.ts Outdated Show resolved Hide resolved
scripts/import-data/count-fetched-ids.ts Show resolved Hide resolved
scripts/import-data/count-fetched-ids.ts Show resolved Hide resolved
scripts/import-data/count-fetched-ids.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@djwhitt djwhitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed some commented code that was left in that could be removed, but otherwise looks good.

return counter;
};

// const importFromFiles = async ({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this could be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my bad

@djwhitt djwhitt self-requested a review December 12, 2024 22:38
@djwhitt
Copy link
Collaborator

djwhitt commented Dec 12, 2024

Marked my review as pending again since the Parquet export script will go into this branch too.

@karlprieb karlprieb force-pushed the PE-7172-import-data-from-gql branch from a970197 to 035c895 Compare December 13, 2024 16:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)
scripts/import-data/export-parquet.ts (1)

25-31: Use consistent naming convention for constants.

Consider using SCREAMING_SNAKE_CASE for all constant declarations to maintain consistency with JavaScript/TypeScript conventions:

-let ARIO_ENDPOINT = 'http://localhost:4000';
-let ADMIN_KEY: string | undefined;
-let OUTPUT_DIR = path.join(__dirname, 'parquet');
-let MAX_FILE_ROWS = 1_000_000;
-let MIN_BLOCK_HEIGHT = 0;
-let MAX_BLOCK_HEIGHT: number | undefined;
+const ARIO_ENDPOINT = 'http://localhost:4000';
+const ADMIN_KEY: string | undefined;
+const OUTPUT_DIR = path.join(__dirname, 'parquet');
+const MAX_FILE_ROWS = 1_000_000;
+const MIN_BLOCK_HEIGHT = 0;
+const MAX_BLOCK_HEIGHT: number | undefined;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a970197 and 035c895.

📒 Files selected for processing (5)
  • .gitignore (1 hunks)
  • scripts/import-data/count-fetched-ids.ts (1 hunks)
  • scripts/import-data/export-parquet.ts (1 hunks)
  • scripts/import-data/fetch-data-gql.ts (1 hunks)
  • scripts/import-data/utils.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • .gitignore
  • scripts/import-data/count-fetched-ids.ts
  • scripts/import-data/utils.ts
  • scripts/import-data/fetch-data-gql.ts
🔇 Additional comments (1)
scripts/import-data/export-parquet.ts (1)

1-24: LGTM! Well-structured imports and proper license header.

The code follows best practices by using the node: protocol for built-in modules and includes the necessary .js extension for ESM imports.

scripts/import-data/export-parquet.ts Show resolved Hide resolved
Comment on lines 88 to 114
(async () => {
if (ADMIN_KEY === undefined) {
throw new Error('Missing admin key');
}

if (MAX_BLOCK_HEIGHT === undefined) {
MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
}

await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${ADMIN_KEY}`,
},
body: JSON.stringify({
outputDir: OUTPUT_DIR,
startHeight: MIN_BLOCK_HEIGHT,
endHeight: MAX_BLOCK_HEIGHT,
maxFileRows: MAX_FILE_ROWS,
}),
});

console.log(
`Parquet export started from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
);
})();
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and user feedback.

The main execution block needs improvements in error handling and user feedback:

 (async () => {
+  try {
     if (ADMIN_KEY === undefined) {
       throw new Error('Missing admin key');
     }
 
     if (MAX_BLOCK_HEIGHT === undefined) {
       MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
     }
 
-    await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, {
+    const response = await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, {
       method: 'POST',
       headers: {
         'Content-Type': 'application/json',
         Authorization: `Bearer ${ADMIN_KEY}`,
       },
       body: JSON.stringify({
         outputDir: OUTPUT_DIR,
         startHeight: MIN_BLOCK_HEIGHT,
         endHeight: MAX_BLOCK_HEIGHT,
         maxFileRows: MAX_FILE_ROWS,
       }),
     });
+
+    if (!response.ok) {
+      throw new Error(`Export failed with status: ${response.status}`);
+    }
 
     console.log(
       `Parquet export started from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
     );
+  } catch (error) {
+    console.error('Export failed:', error.message);
+    process.exit(1);
+  }
 })();

Also consider adding:

  1. Progress updates for long-running exports
  2. Status endpoint to check export progress
  3. Graceful shutdown handling
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(async () => {
if (ADMIN_KEY === undefined) {
throw new Error('Missing admin key');
}
if (MAX_BLOCK_HEIGHT === undefined) {
MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
}
await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${ADMIN_KEY}`,
},
body: JSON.stringify({
outputDir: OUTPUT_DIR,
startHeight: MIN_BLOCK_HEIGHT,
endHeight: MAX_BLOCK_HEIGHT,
maxFileRows: MAX_FILE_ROWS,
}),
});
console.log(
`Parquet export started from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
);
})();
(async () => {
try {
if (ADMIN_KEY === undefined) {
throw new Error('Missing admin key');
}
if (MAX_BLOCK_HEIGHT === undefined) {
MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
}
const response = await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${ADMIN_KEY}`,
},
body: JSON.stringify({
outputDir: OUTPUT_DIR,
startHeight: MIN_BLOCK_HEIGHT,
endHeight: MAX_BLOCK_HEIGHT,
maxFileRows: MAX_FILE_ROWS,
}),
});
if (!response.ok) {
throw new Error(`Export failed with status: ${response.status}`);
}
console.log(
`Parquet export started from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
);
} catch (error) {
console.error('Export failed:', error.message);
process.exit(1);
}
})();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IT will throw and error if POST fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlprieb, understood. Since fetchWithRetry throws an error if the POST fails, additional error handling might not be necessary. Thank you for the clarification.

🐰🤔


✏️ Learnings added
Learnt from: karlprieb
PR: ar-io/ar-io-node#247
File: scripts/import-data/export-parquet.ts:0-0
Timestamp: 2024-12-13T17:40:34.289Z
Learning: The `fetchWithRetry` function throws an error if the POST request fails.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

scripts/import-data/export-parquet.ts Show resolved Hide resolved
@karlprieb karlprieb force-pushed the PE-7172-import-data-from-gql branch from cfe7e45 to 3e409f4 Compare December 13, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants