Skip to content

vekr1518/Bigdata-Processing-AWS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Bigdata-Processing-AWS

Introduction

In this project, we are going to process a dataset from kaggle (Amazon Book Ratings.csv). The dataset is huge (3 GB) and has 1 million records. We are going to process this 3 GB data in AWS EMR Framework.

Architecture

This is an image

Implementation Steps

  • Load the source data from Kaggle to S3 bucket
  • Create an EMR cluster with EC2 instance
  • Create a key value pair and connect to the cluster using SSH in your terminal
  • Create a script Amazon_Book_Review.py
  • Execute the script using spark-submit
  • After processing, we can see the filtered data loaded to S3 bucket

Without spark and hadoop installation, we can process the bigdata using AWS Elastic Mapreduce (EMR) Framework

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages