COVID19 Data Analysis using Python

Kaggle Link - https://www.kaggle.com/code/rasikasrimal/covid-19-data-analysis

My other Covid-19 Data Visualization Project (advanced) - https://github.com/rasikasrimal/covid-19-data-visualization
Kaggle Link - https://www.kaggle.com/code/rasikasrimal/covid-19-data-visualization

This project provides a hands-on exploration of COVID19 data analysis techniques using Python. It covers data preparation, analysis, and visualization, aiming to understand correlations with happiness metrics.

Project Overview

Course Objectives: Learn data preparation, exploration, and visualization techniques.
Datasets Used: COVID19 dataset (John Hopkins University) and World Happiness Report dataset.
Project Structure: Divided into tasks covering dataset import, measure calculation, and result visualization.

Getting Started

Clone the repository: git clone https://github.com/rasikasrimal/Covid19DataAnalysisUsingPython.git
Navigate to the project directory: cd Covid19DataAnalysisUsingPython
Explore the project files and tasks in the respective folders.

Requirements

Python 3
Libraries: pandas, matplotlib, seaborn

In Detail Overview Of The Project

Importing COVID-19 dataset

import pandas as pd

corona_dataset_csv = pd.read_csv("Datasets/covid19_Confirmed_dataset.csv")
corona_dataset_csv.head(10)

2.Aggregating the rows by the country

corona_dataset_aggregated = corona_dataset_csv.groupby("Country/Region").sum()
corona_dataset_aggregated.head()

3.Visualizing data related to a country for example China

corona_dataset_aggregated.loc["China"][1:].plot(label='China')  
corona_dataset_aggregated.loc["Italy"][1:].plot(label='Italy')

plt.legend()
plt.title('COVID-19 Cases in China and Italy')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.grid(True)

plt.show()

4.Calculating a good measure

corona_dataset_aggregated.loc['China'][1:].plot()

corona_dataset_aggregated.loc['Italy'][1:].plot()

5.Calculating the first derivative of the curve

corona_dataset_aggregated = corona_dataset_aggregated.apply(pd.to_numeric, errors='coerce')

corona_dataset_aggregated.fillna(method='ffill', inplace=True)
corona_dataset_aggregated.loc['China'].diff().plot()


plt.xlabel('Date')
plt.ylabel('Difference')
plt.title('Daily Change in COVID-19 Cases for China')

plt.grid(True)
plt.show()

6.Maximum infection rate for all of the countries.

Chart Preview:

Province/State	Country/Region	Lat	Long	...	4/29/20	4/30/20
NaN	Afghanistan	33.0000	65.0000	...	1828	1939
NaN	Albania	41.1533	20.1683	...	750	766
NaN	Algeria	28.0339	1.6596	...	3649	3848
NaN	Andorra	42.5063	1.5218	...	743	743
NaN	Angola	-11.2027	17.8739	...	27	27
NaN	Antigua and Barbuda	17.0608	-61.7964	...	24	24
NaN	Argentina	-38.4161	-63.6167	...	4127	4285
NaN	Armenia	40.0691	45.0382	...	1867	1932
Australian Capital Territory	Australia	-35.4735	149.0124	...	106	106
New South Wales	Australia	-33.8688	151.2093	...	3016	3016

countries = list(corona_dataset_aggregated.index)
max_infection_rates = []

for c in countries:
    max_infection_rates.append(corona_dataset_aggregated.loc[c].diff().max())
corona_dataset_aggregated["max_infection_rate"] = max_infection_rates
corona_dataset_aggregated.head()

Code Execution Results:

Province/State	...	4/29/20	4/30/20	max_infection_rate
Afghanistan	...	1939	2171	232.0
Albania	...	766	773	34.0
Algeria	...	3848	4006	199.0
Andorra	...	743	745	43.0
Angola	...	27	27	5.0

7.Create a new dataframe with only needed column

corona_data = pd.DataFrame(corona_dataset_aggregated["max_infection_rate"])
corona_data.head()

Country/Region	max_infection_rate
Afghanistan	232.0
Albania	34.0
Algeria	199.0
Andorra	43.0
Angola	5.0

WorldHappinessReport.csv dataset:

Importing the WorldHappinessReport.csv dataset
selecting needed columns for our analysis
join the datasets
calculate the correlations as the result of our analysis

Importing the dataset

happiness_report_csv = pd.read_csv("Datasets/worldwide_happiness_report.csv")
happiness_report_csv.head()

Overall rank	Country or region	Score	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption
1	Finland	7.769	1.340	1.587	0.986	0.596	0.153	0.393
2	Denmark	7.600	1.383	1.573	0.996	0.592	0.252	0.410
3	Norway	7.554	1.488	1.582	1.028	0.603	0.271	0.341
4	Iceland	7.494	1.380	1.624	1.026	0.591	0.354	0.118
5	Netherlands	7.488	1.396	1.522	0.999	0.557	0.322	0.298

Removing(drop) useless columns:

drop_cols = ["Overall rank", "Score", "Generosity", "Perceptions of corruption"]
happiness_report_csv.drop(drop_cols, axis=1, inplace=True)
happiness_report_csv.head()

Country or region	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
Finland	1.340	1.587	0.986	0.596
Denmark	1.383	1.573	0.996	0.592
Norway	1.488	1.582	1.028	0.603
Iceland	1.380	1.624	1.026	0.591
Netherlands	1.396	1.522	0.999	0.557

Changing the indices of the dataframe:

happiness_report_csv.set_index("Country or region", inplace=True)
happiness_report_csv.head()

Corona dataset:

Country/Region	max_infection_rate
Afghanistan	232.0
Albania	34.0
Algeria	199.0
Andorra	43.0
Angola	5.0

Wolrd happiness report Dataset :

Country or region	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
Finland	1.340	1.587	0.986	0.596
Denmark	1.383	1.573	0.996	0.592
Norway	1.488	1.582	1.028	0.603
Iceland	1.380	1.624	1.026	0.591
Netherlands	1.396	1.522	0.999	0.557

data = corona_data.join(happiness_report_csv, how="inner")
data.head()

Country	max_infection_rate	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
Afghanistan	232.0	0.350	0.517	0.361	0.000
Albania	34.0	0.947	0.848	0.874	0.383
Algeria	199.0	1.002	1.160	0.785	0.086
Argentina	291.0	1.092	1.432	0.881	0.471
Armenia	134.0	0.850	1.055	0.815	0.283

Correlation matrix:

data.corr()

	max_infection_rate	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
max_infection_rate	1.000000	0.250118	0.191958	0.289263	0.078196
GDP per capita	0.250118	1.000000	0.759468	0.863062	0.394603
Social support	0.191958	0.759468	1.000000	0.765286	0.456246
Healthy life expectancy	0.289263	0.863062	0.765286	1.000000	0.427892
Freedom to make life choices	0.078196	0.394603	0.456246	0.427892	1.000000

Visualization of the results

Plotting GDP vs maximum Infection rate:

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x=x, y=y)

In Log Scale:

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x=x, y=np.log(y))

sns.regplot(x=x, y=np.log(y))

Plotting Social support vs maximum Infection rate:

x = data["Social support"]
y = data["max_infection_rate"]
sns.scatterplot(x=x, y=np.log(y))

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Basics		Basics
Datasets for practice		Datasets for practice
Datasets		Datasets
Screenshots		Screenshots
README.md		README.md
covid19 data analysis notebook.ipynb		covid19 data analysis notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID19 Data Analysis using Python

Project Overview

Getting Started

Requirements

In Detail Overview Of The Project

WorldHappinessReport.csv dataset:

Importing the dataset

Visualization of the results

About

Releases

Packages

Languages

rasikasrimal/covid-19-data-analysis-using-python

Folders and files

Latest commit

History

Repository files navigation

COVID19 Data Analysis using Python

Project Overview

Getting Started

Requirements

In Detail Overview Of The Project

WorldHappinessReport.csv dataset:

Importing the dataset

Visualization of the results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages