big-file-parser

This is a nodejs library for parsing large files line by line without crashing your system.

Use case

In case when you want to parse larger file, like a large log file which you parse, and insert log lines into elastic search for analysing data, or while parsing itself you use your algorithm to analyse the data line by line.

The library also can be used to do data-imports from large feed files.

How to use

As specified in example.

Include library

const bigFileParser = require('../lib/bigFileParser');

Create a new instance for file, with full file path includeing filename that you wish to parse.

let myParser = new bigFileParser(filePathWithName);

Start parsing by calling parse function on parser instance.

myParser.parse();

You can listen to line event which will be emitted as soon as a line is read from file.Its emmitted with the data that is been read from the file.

myParser.on('line', (line) => console.log("Line received :"+ line));

How it works

For every instance, when you start parsing by calling a parse function it checks the number of CPU cores available on the machine, and size of the file to be parsed.

Then it forks node processes, same as number of CPU core's, each with start and end markers, indicating where to start and end reading the file per process.

Each process creates a separate read stream then starts reading the file as per the pointers/markers provided. On every line read it emits event named line, with the data which was read. We need to assign a listener to the library instance for line event to receive the data read from file.

It internally joins the lines if the end pointer passed to the process does not match with ending of line.

We do not need to know the legth of the lines in file, it currently assumes \n or \r\n or \r as line breaks.

ToDo

Package the library and publish it over NPM.
Do profiling of library and add statistics to README.md
Make end of line which is currently assumed as \n or \r\n or \r, configurable.

Please suggest features and raise a bug as you find(:smirk:) and I love pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
example		example
lib		lib
tests		tests
.eslintrc		.eslintrc
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

big-file-parser

Use case

How to use

How it works

ToDo

About

Releases

Packages

Languages

bmhaskar/big-file-parser

Folders and files

Latest commit

History

Repository files navigation

big-file-parser

Use case

How to use

How it works

ToDo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages