Node crawler boilerplate

Minimal Node crawler boilerplate with modern ES6 features built-in (i.e. Promises in requests, import/export syntax, etc.), cheerio and express

Goals

Start building your own crawler within seconds
Give you a minimalist skeleton and modern ES6 features that are not currently supported out of the box in Node

How?

Just clone the repo, install the dependencies (yarn install), write your crawler and run yarn start, voilà!

Scripts

yarn start - serves the app on localhost in watch mode
yarn run build - builds the project, the out directory is /dist

Basic example

Just a straightforward example to help you understand the usage of some of the tools in this project

import requestPromise from "request-promise-native";
import cheerio from "cheerio";
import app from "express";

const app = express();

app.get("/", async (req, res) => {
  const $ = await requestPromise("https://path-to-website.com/", {
    transform: body => cheerio.load(body),
  });
  
  const header = $("h1").text();
  // ...do the rest of your crawling...
  
  // send whatever you'd like to the browser
  res.send(header);
});

app.listen(3000);

Packages

TypeScript is here just to get modern ES6 features in Node, like import/export
cheerio - jQuery-like selectors for Node
request-promise-native - use Promises in Node requests
express - watch (and interact) whatever you expect in the browesr rather than CLI
nodemon - runs the server in watch mode (i.e. will rebuild each time the code has changed)

Future

It would be nice to add a script to run tests

Suggestions

If you use the fs - fs-extra - be able to use Promises in filesystem methods instead of callbacks

License

MIT

Thanks for using this boilerplate! 🙏 @eliranlevi

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
nodemon.json		nodemon.json
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Node crawler boilerplate

Goals

How?

Scripts

Basic example

Packages

Future

Suggestions

License

About

Releases

Packages

Contributors 2

Languages

License

eliranlevi/node-crawler-boilerplate

Folders and files

Latest commit

History

Repository files navigation

Node crawler boilerplate

Goals

How?

Scripts

Basic example

Packages

Future

Suggestions

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages