Lambda-architecture

In this project, we are trying to build data pipeline using Lambda architecture to handle massive quantities of data by taking advantage of both batch and stream processing methods. Besides, we also analyze Twitter's tweets.

Prerequisite

Python 3.*
Apache Spark 3.2.*
Account for Twitter API

Setup

Config.ini file
- Change config.template.ini to config.ini
- Adjust some basic value in config.ini
logs folder
- Grant full permission : sudo chmod a+rwx src/logs

Usage

Clone repository

  git clone

Run Docker containers

  make start-docker

Setup virtual env for project

  make setup-env

Run project

   make start-all

Analyze

  Go to notebook for analyzing

Common Error

If not find twitter keyspace, run container cassandra-init-schema again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Lambda-architecture

Prerequisite

Setup

Usage

Common Error

Files

README.md

Latest commit

History

README.md

File metadata and controls

Lambda-architecture

Prerequisite

Setup

Usage

Common Error