Skip to content

Minimal Node crawler boilerplate with modern ES6 features built-in (Promise, import/export, etc.), cheerio and express. Clone and start building your own crawler within seconds

License

Notifications You must be signed in to change notification settings

eliranlevi/node-crawler-boilerplate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Node crawler boilerplate

Minimal Node crawler boilerplate with modern ES6 features built-in (i.e. Promises in requests, import/export syntax, etc.), cheerio and express

Goals

  • Start building your own crawler within seconds
  • Give you a minimalist skeleton and modern ES6 features that are not currently supported out of the box in Node

How?

Just clone the repo, install the dependencies (yarn install), write your crawler and run yarn start, voilà!

Scripts

  • yarn start - serves the app on localhost in watch mode
  • yarn run build - builds the project, the out directory is /dist

Basic example

Just a straightforward example to help you understand the usage of some of the tools in this project

import requestPromise from "request-promise-native";
import cheerio from "cheerio";
import app from "express";

const app = express();

app.get("/", async (req, res) => {
  const $ = await requestPromise("https://path-to-website.com/", {
    transform: body => cheerio.load(body),
  });
  
  const header = $("h1").text();
  // ...do the rest of your crawling...
  
  // send whatever you'd like to the browser
  res.send(header);
});

app.listen(3000);

Packages

  • TypeScript is here just to get modern ES6 features in Node, like import/export
  • cheerio - jQuery-like selectors for Node
  • request-promise-native - use Promises in Node requests
  • express - watch (and interact) whatever you expect in the browesr rather than CLI
  • nodemon - runs the server in watch mode (i.e. will rebuild each time the code has changed)

Future

  • It would be nice to add a script to run tests

Suggestions

  • If you use the fs - fs-extra - be able to use Promises in filesystem methods instead of callbacks

License

MIT


Thanks for using this boilerplate! 🙏 @eliranlevi

About

Minimal Node crawler boilerplate with modern ES6 features built-in (Promise, import/export, etc.), cheerio and express. Clone and start building your own crawler within seconds

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published