In this project we will analyse over some data available from Covid Dataset
To analyse the daily covid data received from GOI website and show that by transforming in the below format:
Date | State | Confirmed | Recovered | Deceased | Total tested |
---|---|---|---|---|---|
Technological Stacks Implemented: Hadoop, Hive and Spark
Data Pipeline:
Mapping: Below table shows us the mapping of the data and different files are used to decide where to get the data of each column,
Report Fields | Source file | Source field | Rule |
---|---|---|---|
Date | Raw Data 25,26,27,28 | Date Announced | Directly |
State | Raw Data 25,26,27,28 | Detected State | Directly |
Confirmed | Raw Data 25,26,27,28 | Num Cases | Aggregated on state |
Recovered | Raw Data 25,26,27,28 | Num Cases | Aggregated on state |
Deceased | Raw Data 25,26,27,28 | Num Cases | Aggregated on state |
Total tested | Raw Data 25,26,27,28 | Total Tested | Convert cumulative to daily count |