Skip to content

Latest commit

 

History

History
48 lines (36 loc) · 889 Bytes

README.md

File metadata and controls

48 lines (36 loc) · 889 Bytes

Lambda-architecture

In this project, we are trying to build data pipeline using Lambda architecture to handle massive quantities of data by taking advantage of both batch and stream processing methods. Besides, we also analyze Twitter's tweets.

Prerequisite

  • Python 3.*
  • Apache Spark 3.2.*
  • Account for Twitter API

Setup

  1. Config.ini file
    • Change config.template.ini to config.ini
    • Adjust some basic value in config.ini
  2. logs folder
    • Grant full permission : sudo chmod a+rwx src/logs

Usage

  1. Clone repository
  git clone 
  1. Run Docker containers
  make start-docker
  1. Setup virtual env for project
  make setup-env
  1. Run project
   make start-all
  1. Analyze
  Go to notebook for analyzing

Common Error

  1. If not find twitter keyspace, run container cassandra-init-schema again