Skip to content

Commit

Permalink
New features & updated documentation (#78)
Browse files Browse the repository at this point in the history
* New features & updated documentation

# New features added

* Ability to report on sitemap crawl errors in returned results. Added a new "errors" property in the `SitesData` object

* Added an option to set a concurrency limit to rate limit sitemap crawling. Useful when crawling sitemaps with multiple children to avoid getting blocked by firewalls. #77

* Added an option to have retry requests upon failure and to set the number of maximum retries per crawl.

# Documentation changes

* Updated documentation to include all the new features described above.

Co-Authored-By: Panagiotis Tzamtzis <[email protected]>
Co-Authored-By: PanagiotisTzamtzis <[email protected]>

* Fix for error on the main sitemap

In this case the errors object in the results was not an ErrorsDataArray but a single ErrorsData

* Bug fixes

* Error logging improvements with more details for `UnknownStateErrors` & errors when parsing the parent sitemap

* Retries option was not working when `debug` was set to false

* Bug fix

* Console.log statement was getting triggered when `debug` option was set to false

* Update src/examples/index.js

* 3.2.0

* Cleaning up, changing error to errors, updating Typescript, removing returnErrors option

* Removing returnErrors option

* quotes fix

* Updates

* Fixing errors array

* updating tests

Co-authored-by: PanagiotisTzamtzis <[email protected]>
Co-authored-by: Sean Thomas Burke <[email protected]>
Co-authored-by: Sean Thomas Burke <[email protected]>
  • Loading branch information
4 people authored Nov 11, 2021
1 parent d20782d commit 19f9e12
Show file tree
Hide file tree
Showing 11 changed files with 304 additions and 50 deletions.
25 changes: 23 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,12 @@ sitemapper.fetch('https://wp.seantburke.com/sitemap.xml')

You can add options on the initial Sitemapper object when instantiating it.

+ `requestHeaders`: (Object) - Additional Request Headers
+ `timeout`: (Number) - Maximum timeout for a single URL
+ `requestHeaders`: (Object) - Additional Request Headers (e.g. `User-Agent`)
+ `timeout`: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)
+ `url`: (String) - Sitemap URL to crawl
+ `debug`: (Boolean) - Enables/Disables debug console logging. Default: False
+ `concurrency`: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10
+ `retries`: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0

```javascript

Expand All @@ -77,6 +81,23 @@ const sitemapper = new Sitemapper({

```

An example using all available options:

```javascript

const sitemapper = new Sitemapper({
url: 'https://art-works.community/sitemap.xml',
timeout: 15000,
requestHeaders: {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'
},
debug: true,
concurrency: 2,
retries: 1,
});

```

### Examples in ES5
```javascript
var Sitemapper = require('sitemapper');
Expand Down
2 changes: 1 addition & 1 deletion lib/assets/sitemapper.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion lib/examples/index.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 39 additions & 8 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "sitemapper",
"version": "3.1.16",
"version": "3.2.0",
"description": "Parser for XML Sitemaps to be used with Robots.txt and web crawlers",
"keywords": [
"parse",
Expand Down Expand Up @@ -78,6 +78,7 @@
},
"dependencies": {
"got": "^11.8.0",
"p-limit": "^3.1.0",
"xml2js": "^0.4.23"
}
}
36 changes: 23 additions & 13 deletions sitemapper.d.ts
Original file line number Diff line number Diff line change
@@ -1,26 +1,36 @@
export interface SitemapperResponse {
url: string;
sites: string[];
url: string;
sites: string[];
errors: SitemapperErrorData[];
}

export interface SitemapperErrorData {
type: string;
url: string;
retries: number;
}

export interface SitemapperOptions {
url?: string;
timeout?: number;
requestHeaders?: {[name: string]: string};
url?: string;
timeout?: number;
requestHeaders?: {[name: string]: string};
debug?: boolean;
concurrency?: number;
retries?: number;
}

declare class Sitemapper {

timeout: number;
timeout: number;

constructor(options: SitemapperOptions)
constructor(options: SitemapperOptions)

/**
* Gets the sites from a sitemap.xml with a given URL
*
* @param url URL to the sitemap.xml file
*/
fetch(url?: string): Promise<SitemapperResponse>;
/**
* Gets the sites from a sitemap.xml with a given URL
*
* @param url URL to the sitemap.xml file
*/
fetch(url?: string): Promise<SitemapperResponse>;
}

export default Sitemapper;
Loading

0 comments on commit 19f9e12

Please sign in to comment.