This Python script analyzes and visualizes COVID-19 data from the Worldometer dataset between April 18 and May 18, 2020. It performs calculations like case fatality rates (CFR) and tests-per-positive-case ratios, then visualizes this data using various plots.
The code is designed to:
- Filter and extract COVID-19 statistics for specific countries or dates.
- Calculate important metrics such as the case fatality ratio and tests per positive case.
- Plot histograms and scatter plots to visualize trends in COVID-19 death rates and testing quality across countries.
- Highlight specific countries in the visualizations.
The dataset used in this script is a CSV file named worldometer_snapshots_April18_to_May18.csv
. The file contains:
- COVID-19 data for multiple countries.
- Daily snapshots of case numbers, deaths, tests conducted, and populations.
To run this project, you need to install the following Python libraries:
pip install numpy pandas matplotlib
worldometer_snapshots_April18_to_May18.csv
: CSV file containing COVID-19 data.covid_analysis.py
: The main script for analyzing and visualizing the data.README.md
: This file.
- Ensure that you have the dataset (
worldometer_snapshots_April18_to_May18.csv
) in the same directory as the script. - Run the Python script:
python covid_analysis.py
The script will display several visualizations, including histograms and scatter plots, in pop-up windows.
numpy
,pandas
: Used for data manipulation and numerical calculations.matplotlib
: Used for plotting graphs.datetime
: Used for handling date formatting and filtering by date.
- The dataset is loaded using
pandas.read_csv
. - The script filters data for a specific country (
USA
) and a specific date (18/05/2020
).
-
The case fatality ratio (CFR) is computed as:
Case Fatality Ratio = Total Deaths / Total Cases
-
Histograms: Plots showing the distribution of death rates across countries.
-
Scatter Plot: Shows the relationship between testing quality (number of tests per positive case) and death rates, with marker size representing the population and color representing the total number of deaths.
Specific countries are highlighted on the scatter plot for better context.
min_number_of_cases
: Used to filter out countries with fewer than 1000 cases.x_axis_limit
: Used to limit the x-axis in the scatter plot for better readability.
The script also calculates the overall death rate for countries with "good testing" (defined as having more than 50 tests per positive case).
- Histogram of Death Rates: Distribution of death rates (case fatality ratio) for all countries.
- Histogram of Greatly Affected Countries: Same as above, but limited to countries with more than 1000 cases.
- Scatter Plot: Visualizes the relationship between testing quality and death rate, highlighting certain countries.
This project is for educational purposes and is open for modification and reuse. No license specified.
This README.md
file provides an overview of the project, how to set it up, and explains what the code does step-by-step. Feel free to adjust it as needed!