Skip to content

Latest commit

 

History

History
67 lines (34 loc) · 1.67 KB

README.md

File metadata and controls

67 lines (34 loc) · 1.67 KB

BLKWDW

This is a NodeJS application built using ExpressJS, Mongoose, Puppeteer and an angular forntend. The goal of this open-source project is to build a web crawler equiped with a user interface.

Image

Getting Started

To get started just install all depentencies for the NodeJS and Angular app :

npm install

cd client/

npm install

To start the backend : npm run start

and the frontend while in the front/ folder : ng serve

You will find your application running locally on port 4200

What can this application do

At this point in time, this application works as follows: a spider needs to provide the url of a page that cotains a list of the items you want to crawl, as well as a CSS selector for all the items you want to crawl. The spider will the navigate to every item and extract the information.

flow

Creating a Spider

A 'spider' contains all the data needed to crawl a website. It contans these fields:

  • name
  • description
  • the url where it will start crawling
  • the CSS selector of all the items in that page
  • the attributes you want to extract for each item

Pipelines

Each spider can have a number of pipeliens. These are parts of code you add to modify the data the spider has just extracted. You can write this code directly in the frontend and save it. The object you can modify is called 'data'

Gallery

Homepage

flow

Spiders

flow

Creating a new spider

flow

Pipelines

flow

Creating a new pipeline

flow