Repository Scraper is a collection of Grunt tasks that can be used to pull useful repository data including
repository names, descriptions, commit and code line count, language breakdown, readme content and organizes the data in json. There are also tasks to filter repository data including abnormal data, language abbreviations
Pulls all repository names and represents them indexed in a json array
Gets the commit count for each repository in the repository list
Gets the code line count for each repository
Pulls the language breakdown for each repository in a json array i.e [ 'PHP', 'JS', 'HTML' ]
Gets the readme content for all repositories
Pulls the descriptions of each repository
Checks each repository against a code line and language criteria and identifies any odd repository
Filters and replaces any repository properties with those filters in data-filters.json
Filters and replaces readme data such as URLs and directory names
- Grunt v1.01+
- JS ES6
- NodeJS
- Clone the repository scraper repository into your project folder
git clone https://github.com/kyleruss/repository-scraper.git
- Initialize the tasks in your
Gruntfile.js
module.exports = function (grunt)
{
grunt.loadTasks('Tasks');
}
- Run any of the tasks via the Grunt CLI for example
grunt repo-load
- You can find the pulled repository data after executing a task in
repository-data.json
The repository json data is organized into a named indexed json object where each repository object is indexed by their name.
There are several notable keys in each repository object specifically, name, link, codeLines, commits, languages and readme.
{
'my-project':
{
'name': 'my-project',
'link': 'https://github.com/kyleruss/my-project',
'codeLines': 1800,
'commits': 120,
'languages': ['PHP', 'JS', 'HTML'],
'readme': '<readme html content />'
}
}
Repository Scraper is available under the MIT License
See LICENSE for more details