-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throttling when parsing multiple sitemaps #77
Comments
bsq-panagiotis
added a commit
to bsq-panagiotis/sitemapper
that referenced
this issue
Feb 10, 2021
# New features added * Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object * Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. seantomburke#77 * Added an option to have retry requests upon failure and to set the number of maximum retries per crawl. # Documentation changes * Updated documentation to include all the new features described above. Co-Authored-By: Panagiotis Tzamtzis <[email protected]> Co-Authored-By: PanagiotisTzamtzis <[email protected]>
seantomburke
pushed a commit
to bsq-panagiotis/sitemapper
that referenced
this issue
Nov 6, 2021
# New features added * Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object * Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. seantomburke#77 * Added an option to have retry requests upon failure and to set the number of maximum retries per crawl. # Documentation changes * Updated documentation to include all the new features described above. Co-Authored-By: Panagiotis Tzamtzis <[email protected]> Co-Authored-By: PanagiotisTzamtzis <[email protected]>
seantomburke
pushed a commit
that referenced
this issue
Nov 11, 2021
# New features added * Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object * Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. #77 * Added an option to have retry requests upon failure and to set the number of maximum retries per crawl. # Documentation changes * Updated documentation to include all the new features described above. Co-Authored-By: Panagiotis Tzamtzis <[email protected]> Co-Authored-By: PanagiotisTzamtzis <[email protected]>
seantomburke
added a commit
that referenced
this issue
Nov 11, 2021
* New features & updated documentation # New features added * Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object * Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. #77 * Added an option to have retry requests upon failure and to set the number of maximum retries per crawl. # Documentation changes * Updated documentation to include all the new features described above. Co-Authored-By: Panagiotis Tzamtzis <[email protected]> Co-Authored-By: PanagiotisTzamtzis <[email protected]> * Fix for error on the main sitemap In this case the errors object in the results was not an ErrorsDataArray but a single ErrorsData * Bug fixes * Error logging improvements with more details for `UnknownStateErrors` & errors when parsing the parent sitemap * Retries option was not working when `debug` was set to false * Bug fix * Console.log statement was getting triggered when `debug` option was set to false * Update src/examples/index.js * 3.2.0 * Cleaning up, changing error to errors, updating Typescript, removing returnErrors option * Removing returnErrors option * quotes fix * Updates * Fixing errors array * updating tests Co-authored-by: PanagiotisTzamtzis <[email protected]> Co-authored-by: Sean Thomas Burke <[email protected]> Co-authored-by: Sean Thomas Burke <[email protected]>
seantomburke
added a commit
that referenced
this issue
Dec 24, 2021
* New features & updated documentation * Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object * Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. #77 * Added an option to have retry requests upon failure and to set the number of maximum retries per crawl. * Updated documentation to include all the new features described above. Co-Authored-By: Panagiotis Tzamtzis <[email protected]> Co-Authored-By: PanagiotisTzamtzis <[email protected]> * Fix for error on the main sitemap In this case the errors object in the results was not an ErrorsDataArray but a single ErrorsData * Bug fixes * Error logging improvements with more details for `UnknownStateErrors` & errors when parsing the parent sitemap * Retries option was not working when `debug` was set to false * Bug fix * Console.log statement was getting triggered when `debug` option was set to false * Update src/examples/index.js * 3.2.0 * Cleaning up, changing error to errors, updating Typescript, removing returnErrors option * Removing returnErrors option * quotes fix * Updates * Fixing errors array * updating tests Co-authored-by: PanagiotisTzamtzis <[email protected]> Co-authored-by: Sean Thomas Burke <[email protected]> Co-authored-by: Sean Thomas Burke <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there an option available to add an artificial delay (throttling) between requests to avoid getting blocked by firewalls?
I couldn't find any mention of this feature in the documentation.
The text was updated successfully, but these errors were encountered: