Building-Realtime-Streaming-Kinesis-Flink

Use Case Depicted in this project:

This Project will simulate real-time accident data and architect a pipeline to help us analyze and take quick actions using AWS Kinesis, Apache Flink, Grafana, and Amazon SNS. The services used in this project are highly scalable, on-demand, and cost-effective.

This Project uses the US accidents dataset, which includes a few of the following fields:

Severity
Start_Time
End_Time
Location
Description
City
Country

Goal of this project is to have three endpoints:

A Raw layer - Raw data is streamed directly and serves as the single source of truth (SSOT).
Analytical layer - This layer stores the filtered and transformed raw data using Apache Flink into the lakehouse for future reporting, BI applications, and developing data science applications on top of it.
Real-time layer - The layer serves the cases over a specific severity threshold and is reported via a Grafana Dashboard. Alerts can be created easily in Grafana with the help of metrics and the graphs formed. Notifications are sent quickly to the concerned departments in this layer.

Tech Stack:

Languages

SQL, Python3

Services

AWS S3,
AWS Glue,
AWS Athena,
AWS Cloud9,
Apache Flink,
Amazon Kinesis,
Amazon SNS,
AWS Lambda,
Amazon CloudWatch,
Grafana,
Apache Zepplin

Architecture

Implementation Steps:

Create an S3 bucket
Upload US accidents dataset in S3 through aws cli or interface
Creat two Datastream in Kinesis Datastream. One for raw data capture and another one for processed/aggregated data
Setup Cloud9 environment and run the Simulation Code
- Read the file from s3
- Convert each row to JSON
- Add a Transaction Timestamp (Txn_Timestamp)
- Push each data to datastream created
Create a Flink Streaming Application
Execute the code SQL Code
- This code will create tables in the database and insert records
Build and deploy the Flink application in Kinesis Data Analytics (KDA)
Execute the Lambda Function
- Lamba function reads data from data stream, execute the data and publish in cloudwatch
- SNS alert also configured in Lambda function
Create CloudWatch Metrics
Create Grafana Dashboards for visualizing the data
Create Kinesis Firehose and read the data from datastream and store it in s3 (NOTE: This step is required for data retention. KDA is the main player for processing the data. if KDA fails due to unforeseen reasons, data will be lost as the Kinesis Data Stream will keep the data for 1 day only. To avoid this SPOF, We are creating a Kinesis Firehose to process and save the data back to S3)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Architecture.png		Architecture.png
Grafana_Dashboard.png		Grafana_Dashboard.png
Kinesis Datastream.png		Kinesis Datastream.png
Kinesis Streaming Application.png		Kinesis Streaming Application.png
Lamba_with_trigger.png		Lamba_with_trigger.png
README.md		README.md
Stream-data-app-simulation.py		Stream-data-app-simulation.py
lambda_function.py		lambda_function.py
sql-flink-us-accidents-Zeppelin.rtf		sql-flink-us-accidents-Zeppelin.rtf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building-Realtime-Streaming-Kinesis-Flink

Use Case Depicted in this project:

Goal of this project is to have three endpoints:

Tech Stack:

Languages

Services

Architecture

Implementation Steps:

Grafana Dashboard:

About

Releases

Packages

Languages

vekr1518/Building-Realtime-Streaming-Kinesis-Flink

Folders and files

Latest commit

History

Repository files navigation

Building-Realtime-Streaming-Kinesis-Flink

Use Case Depicted in this project:

Goal of this project is to have three endpoints:

Tech Stack:

Languages

Services

Architecture

Implementation Steps:

Grafana Dashboard:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages