-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance page update with parallelism pitfalls section #2240
Conversation
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -6,7 +6,13 @@ keywords: [scaling, parallelism, finetuning] | |||
|
|||
# Optimizing dlt | |||
|
|||
## Yield pages instead of rows | |||
This page contains a collection of tips and tricks to optimize dlt pipelines for speed, scalability and memory footprint. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add here a small reminder that dlt works in three steps and link this page https://dlthub.com/docs/reference/explainers/how-dlt-works
|
||
Instead of using Python Requests directly, you can use the built-in [requests wrapper](../general-usage/http/requests) or [`RESTClient`](../general-usage/http/rest-client) for API calls. This will make your pipeline more resilient to intermittent network errors and other random glitches. | ||
2. If you are running pipelines in parallel against the same destination dataset and are using a staging destination, you should change the staging destination bucket subfolder to be unique for each pipeline or alternatively disable cleaning up the staging destination after each load for all pipelines: [how to prevent staging files truncation](../dlt-ecosystem/staging#how-to-prevent-staging-files-truncation) If you do not, files might be deleted by one pipeline that are still required to be loaded by another pipeline running in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rephrase it in a clearer way? something like:
2. If you're running multiple pipelines in parallel that write to the same destination dataset and use a staging area, make sure to do one of the following:
a. Assign a unique subfolder in the staging destination bucket for each pipeline, or
b. [Disable automatic cleanup of the staging area](../dlt-ecosystem/staging#how-to-prevent-staging-files-truncation) after each load for all pipelines.
If you don’t do this, one pipeline might delete staging files that are still needed by another pipeline running at the same time.
Co-authored-by: Alena Astrakhantseva <[email protected]>
Co-authored-by: Alena Astrakhantseva <[email protected]>
…to docs/update-performance-page
Description
This PR does the following: