Creating-Streaming-Data-Pipeline-Using-Kafka

Introduction

In this project, we are going to de-congest the traffic by analyzing the road traffic data from different toll gates. As a vehicle passes a toll, the vehicle’s data like vehicle_id, vehicle_type, toll_plaza_id and timestamp are streamed to Kafka.

Our job is to create a end-to-end data pipe line that collects the streaming data using kafka and load it into a s3 bucket for further analysis using Athena.

Objectives:

In this assignment we will create a streaming data pipeline by performing these steps:

Install Kafka & Java in EC2 instance
Start the Zoo-Keeper server
Start the Kafka server
Create a topic named toll in kafka.
Create a streaming generator program toll_traffic_generator_producer.ipynb
Produce the topic toll data
Consume the toll data using toll_traffic_consumer.ipynb
Load data to s3
Create crawler to retrieve the data from s3 and create a table
Analyse the data using Athena

Architecture

Technology Used

Programming Language - Python
Amazon Web Service (AWS)
1. S3 (Simple Storage Service)
2. Athena
3. Glue Crawler
4. Glue Catalog
5. EC2
Apache Kafka

Dataset Used

We will be creating a real time traffic simulator data using python.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Architecture_Kafka.png		Architecture_Kafka.png
README.md		README.md
command_kafka.txt		command_kafka.txt
toll_traffic_consumer.ipynb		toll_traffic_consumer.ipynb
toll_traffic_generator_producer.ipynb		toll_traffic_generator_producer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Creating-Streaming-Data-Pipeline-Using-Kafka

Introduction

Objectives:

Architecture

Technology Used

Dataset Used

About

Releases

Packages

Languages

vekr1518/Creating-Streaming-Data-Pipeline-Using-Kafka

Folders and files

Latest commit

History

Repository files navigation

Creating-Streaming-Data-Pipeline-Using-Kafka

Introduction

Objectives:

Architecture

Technology Used

Dataset Used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages