Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New features & updated documentation #78

Merged
merged 12 commits into from
Nov 11, 2021

Conversation

bsq-panagiotis
Copy link
Contributor

New features added

  • Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the SitesData object

  • Added an option to set a concurrency limit using the p-limit library, to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. Default max concurrency limit is set to 10 Throttling when parsing multiple sitemaps #77

  • Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

Documentation changes

  • Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis [email protected]
Co-Authored-By: Panagiotis Tzamtzis [email protected]

@bsq-panagiotis
Copy link
Contributor Author

bsq-panagiotis commented Feb 11, 2021

Hi @seantomburke ,

First of all congrats on the work you've done on this library.
I hope that changes I prepared make sense to you. Let me know if you want something changed.

@seantomburke seantomburke self-requested a review February 16, 2021 22:06
@seantomburke
Copy link
Owner

@bsq-panagiotis Thanks for submitting a PR with all the great additions! I'll be reviewing thoroughly to make sure there are no breaking changes with the existing package.

Copy link
Owner

@seantomburke seantomburke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks great! Let's actually bump the minor version to 3.2.0 since we're adding an external dependency.
npm version minor

src/examples/index.js Outdated Show resolved Hide resolved
src/examples/index.js Outdated Show resolved Hide resolved
bsq-panagiotis and others added 12 commits November 6, 2021 03:18
# New features added

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. seantomburke#77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

# Documentation changes

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <[email protected]>
Co-Authored-By: PanagiotisTzamtzis <[email protected]>
In this case the errors object in the results was not an ErrorsDataArray but a single ErrorsData
* Error logging improvements with more details for `UnknownStateErrors` & errors when parsing the parent sitemap

* Retries option was not working when `debug` was set to false
* Console.log statement was getting triggered when `debug` option was set to false
@seantomburke seantomburke merged commit 19f9e12 into seantomburke:master Nov 11, 2021
seantomburke added a commit that referenced this pull request Dec 24, 2021
* New features & updated documentation

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. #77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <[email protected]>
Co-Authored-By: PanagiotisTzamtzis <[email protected]>

* Fix for error on the main sitemap

In this case the errors object in the results was not an ErrorsDataArray but a single ErrorsData

* Bug fixes

* Error logging improvements with more details for `UnknownStateErrors` & errors when parsing the parent sitemap

* Retries option was not working when `debug` was set to false

* Bug fix

* Console.log statement was getting triggered when `debug` option was set to false

* Update src/examples/index.js

* 3.2.0

* Cleaning up, changing error to errors, updating Typescript, removing returnErrors option

* Removing returnErrors option

* quotes fix

* Updates

* Fixing errors array

* updating tests

Co-authored-by: PanagiotisTzamtzis <[email protected]>
Co-authored-by: Sean Thomas Burke <[email protected]>
Co-authored-by: Sean Thomas Burke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants