Skip to content

Commit

Permalink
New features & updated documentation
Browse files Browse the repository at this point in the history
# New features added

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. seantomburke#77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

# Documentation changes

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <[email protected]>
Co-Authored-By: PanagiotisTzamtzis <[email protected]>
  • Loading branch information
2 people authored and seantomburke committed Nov 6, 2021
1 parent d20782d commit 382a92f
Show file tree
Hide file tree
Showing 7 changed files with 209 additions and 33 deletions.
27 changes: 25 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,13 @@ sitemapper.fetch('https://wp.seantburke.com/sitemap.xml')

You can add options on the initial Sitemapper object when instantiating it.

+ `requestHeaders`: (Object) - Additional Request Headers
+ `timeout`: (Number) - Maximum timeout for a single URL
+ `requestHeaders`: (Object) - Additional Request Headers (e.g. `User-Agent`)
+ `timeout`: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)
+ `url`: (String) - Sitemap URL to crawl
+ `debug`: (Boolean) - Enables/Disables debug console logging. Default: False
+ `concurrency`: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10
+ `retries`: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0
+ `returnErrors`: (Boolean) - Enables/Disables the reporting of errors in results ("errors" property). Default: False

```javascript

Expand All @@ -77,6 +82,24 @@ const sitemapper = new Sitemapper({

```

An example using all available options:

```javascript

const sitemapper = new Sitemapper({
url: 'https://art-works.community/sitemap.xml',
timeout: 15000,
requestHeaders: {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'
},
debug: true,
concurrency: 2,
retries: 1,
returnErrors: true
});

```

### Examples in ES5
```javascript
var Sitemapper = require('sitemapper');
Expand Down
2 changes: 1 addition & 1 deletion lib/assets/sitemapper.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion lib/examples/index.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

45 changes: 38 additions & 7 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@
},
"dependencies": {
"got": "^11.8.0",
"p-limit": "^3.1.0",
"xml2js": "^0.4.23"
}
}
Loading

0 comments on commit 382a92f

Please sign in to comment.