DISCLAIMER - This is an auto generated README.md created within jupyterlab using the main India_&_World_COVID_19.ipynb file and hence has some cuts and glitches. I highly recommend going through the main notebook file for the perfect version.
To run the project in its whole, extra datasets (given as .zip) are needed. You can clone the repository in your working directory with the command:
git clone https://github.com/rawatraghav/INDIA-and-COVID-19.git
or download repository as zip file.
The coronavirus (COVID-19) pandemic is the greatest global humanitarian challenge the world has faced since World War II. The pandemic virus has spread widely, and the number of cases is rising daily. The government is working to slow down its spread.
Till date it has spread across 215 countries infecting 5,491,194 people and killing 346,331 so far. In India, as many as 138,536 COVID-19 cases have been reported so far. Of these, 57,692 have recovered and 4,024 have died. COVID19
Corona Virus Explained in Simple Terms:
- Let's say Raghav got infected yesterday, but he won't know it untill next 14 days
- Raghav thinks he is healthy but he is infecting 10 persons per day
- Now these 10 persons think they are completely healthy, they travel, go out and infect 100 others
- These 100 persons think they are healthy but they have already infected 1000 persons
- No one knows who is healthy or who can infect you
- All you can do is be responsible, stay in quarentine
India has responded quickly, implementing a proactive, nationwide, lockdown, to flatten the curve and use the time to plan and resource responses adequately. As of 23rd May 2020, India has witnessed 3720 deaths from 32 States and Union Territories, with a total of 123202 confirmed cases due to COVID-19. Globally the Data Scientists are using AI and machine learning to analyze, predict, and take safety measures against COVID-19 in India.
We need a explore the COVID situation in India and the world, and strong model that predicts how the virus could spread across India in the next 15 days. ###Steps to be achieved:
- Analyze the present condition in India
- Collect the COVID-19 data from websites
- Figure out the death rate and cure rate per 100 citizens across the affected states
- Plotting charts to visualize the following:
- Age group distribution of affected patients
- Total sample tests done till date
- Growth rate of COVID in top 15 states
- Top 10 States in each health facility
- State wise testing insights
- ICMR testing centres in each state
- Use Facebook Prophet to predict the confirmed cases in India
- Use ARIMA time series model to predict the confirmed cases in India
- Compare the Indian COVID-19 cases on global level
# importing the required libraries
# !pip install folium
import pandas as pd
# Visualisation libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium
from folium import plugins
# Manipulating the default plot size
plt.rcParams['figure.figsize'] = 10, 12
# Disable warnings
import warnings
warnings.filterwarnings('ignore')
How it started in India?:
The first COVID-19 case was reported on 30th January 2020 when a student arrived Kerala from Wuhan. Just in next 2 days, Kerela reported 2 more cases. For almost a month, no new cases were reported in India, however, on 2nd March 2020, five new cases of corona virus were reported in Kerala again and since then the cases have been rising affecting 25 states, till now (Bihar and Manipur being the most recent). Here is a brief timeline of the cases in India.
###COVID-19 in India - Timeline
- Sikkim on Saturday reported its first +ve COVID-19 case
- With over 6,500 fresh cases, the Covid in India rose to 1,25,101 on Saturday morning, with 3,720 fatalities
- West Bengal asks Railways not to send migrant trains to State till May 26 in view of Cyclone Amphan
- 196 new COVID 19 positive cases were reported in Karnataka on Saturday
- Complete lockdown in Bengaluru on Sunday.
- Bruhat Bengaluru Mahanagara Palike (BBMP) Commissioner B.H. Anil Kumar said the conditions and restrictions on Sunday will be similar to that under coronavirus lockdown 1.0.
- Medical resource optimization
- Ensuring demand planning stability
- Contact tracing
- Situational awareness and critical response analysis
1.1 Scraping the datasets from the official Govt. website
# for date and time opeations
from datetime import datetime
# for file and folder operations
import os
# for regular expression opeations
import re
# for listing files in a folder
import glob
# for getting web contents
import requests
import json
import csv
import numpy as np
# for scraping web contents
# from bs4 import BeautifulSoup
raw_1 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data1.csv')
raw_2 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data2.csv')
raw_3 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data3.csv')
raw_4 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data4.csv')
raw_5 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data5.csv')
raw_6 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data6.csv')
raw_7 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data7.csv')
raw_8 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data8.csv')
raw_9 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data9.csv')
raw_10 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data10.csv')
raw_11 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data11.csv')
raw_12 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data12.csv')
raw_13 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data13.csv')
raw_14 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data14.csv')
raw_15 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data15.csv')
raw_16 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data16.csv')
full_data = pd.concat([raw_1,
raw_2,
raw_3,
raw_4,
raw_5,
raw_6,
raw_7,
raw_8,
raw_9,
raw_10,
raw_11,
raw_12,
raw_13,
raw_14,
raw_15,
raw_16])
print(full_data.shape)
full_data.head()
(352525, 22)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Patient Number | State Patient Number | Date Announced | Estimated Onset Date | Age Bracket | Gender | Detected City | Detected District | Detected State | State code | ... | Contracted from which Patient (Suspected) | Nationality | Type of transmission | Status Change Date | Source_1 | Source_2 | Source_3 | Backup Notes | Num Cases | Entry_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | KL-TS-P1 | 30/01/2020 | NaN | 20 | F | Thrissur | Thrissur | Kerala | KL | ... | NaN | India | Imported | 14/02/2020 | https://twitter.com/vijayanpinarayi/status/122... | https://weather.com/en-IN/india/news/news/2020... | NaN | Student from Wuhan | 1.0 | NaN |
1 | 2.0 | KL-AL-P1 | 02/02/2020 | NaN | NaN | NaN | Alappuzha | Alappuzha | Kerala | KL | ... | NaN | India | Imported | 14/02/2020 | https://www.indiatoday.in/india/story/kerala-r... | https://weather.com/en-IN/india/news/news/2020... | NaN | Student from Wuhan | 1.0 | NaN |
2 | 3.0 | KL-KS-P1 | 03/02/2020 | NaN | NaN | NaN | Kasaragod | Kasaragod | Kerala | KL | ... | NaN | India | Imported | 14/02/2020 | https://www.indiatoday.in/india/story/kerala-n... | https://twitter.com/ANI/status/122422148580539... | https://weather.com/en-IN/india/news/news/2020... | Student from Wuhan | 1.0 | NaN |
3 | 4.0 | DL-P1 | 02/03/2020 | NaN | 45 | M | East Delhi (Mayur Vihar) | East Delhi | Delhi | DL | ... | NaN | India | Imported | 15/03/2020 | https://www.indiatoday.in/india/story/not-a-ja... | https://economictimes.indiatimes.com/news/poli... | NaN | Travel history to Italy and Austria | 1.0 | NaN |
4 | 5.0 | TS-P1 | 02/03/2020 | NaN | 24 | M | Hyderabad | Hyderabad | Telangana | TG | ... | NaN | India | Imported | 02/03/2020 | https://www.deccanherald.com/national/south/qu... | https://www.indiatoday.in/india/story/coronavi... | https://www.thehindu.com/news/national/coronav... | Travel history to Dubai, Singapore contact | 1.0 | NaN |
5 rows × 22 columns
day_wise = pd.read_csv('https://api.covid19india.org/csv/latest/case_time_series.csv')
day_wise.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Date | Daily Confirmed | Total Confirmed | Daily Recovered | Total Recovered | Daily Deceased | Total Deceased | |
---|---|---|---|---|---|---|---|
0 | 30 January | 1 | 1 | 0 | 0 | 0 | 0 |
1 | 31 January | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 01 February | 0 | 1 | 0 | 0 | 0 | 0 |
3 | 02 February | 1 | 2 | 0 | 0 | 0 | 0 |
4 | 03 February | 1 | 3 | 0 | 0 | 0 | 0 |
state_wise = pd.read_csv('https://api.covid19india.org/csv/latest/state_wise.csv')
state_wise.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
State | Confirmed | Recovered | Deaths | Active | Last_Updated_Time | Migrated_Other | State_code | Delta_Confirmed | Delta_Recovered | Delta_Deaths | State_Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Total | 6877073 | 5871898 | 106037 | 898090 | 08/10/2020 21:24:52 | 1048 | TT | 44085 | 47436 | 483 | NaN |
1 | Maharashtra | 1480489 | 1196441 | 39072 | 244527 | 07/10/2020 22:32:57 | 449 | MH | 0 | 0 | 0 | [Sep 9] :239 cases have been removed from the ... |
2 | Andhra Pradesh | 734427 | 678828 | 6086 | 49513 | 07/10/2020 17:50:55 | 0 | AP | 0 | 0 | 0 | NaN |
3 | Karnataka | 679356 | 552519 | 9675 | 117143 | 08/10/2020 21:13:54 | 19 | KA | 10704 | 9613 | 101 | NaN |
4 | Tamil Nadu | 640943 | 586454 | 10052 | 44437 | 08/10/2020 21:13:56 | 0 | TN | 5088 | 5718 | 68 | [July 22]: 444 backdated deceased entries adde... |
from datetime import date
state_wise_daily = pd.read_csv('https://api.covid19india.org/csv/latest/state_wise_daily.csv')
state_wise_daily = state_wise_daily.melt(id_vars=['Date','Status'],
value_vars=state_wise_daily.columns[2:],
var_name='State',value_name='Count')
state_wise_daily = state_wise_daily.pivot_table(index=['Date','State'],
columns=['Status'],values='Count').reset_index()
state_codes = {code:state for code, state in zip(state_wise['State_code'], state_wise['State'])}
state_codes['DD'] = 'Daman and Diu'
state_wise_daily['State_Name'] = state_wise_daily['State'].map(state_codes)
state_wise_daily
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Status | Date | State | Confirmed | Deceased | Recovered | State_Name |
---|---|---|---|---|---|---|
0 | 01-Apr-20 | AN | 0 | 0 | 0 | Andaman and Nicobar Islands |
1 | 01-Apr-20 | AP | 67 | 0 | 1 | Andhra Pradesh |
2 | 01-Apr-20 | AR | 0 | 0 | 0 | Arunachal Pradesh |
3 | 01-Apr-20 | AS | 15 | 0 | 0 | Assam |
4 | 01-Apr-20 | BR | 3 | 0 | 0 | Bihar |
... | ... | ... | ... | ... | ... | ... |
8107 | 31-May-20 | TT | 8789 | 222 | 4928 | Total |
8108 | 31-May-20 | UN | 448 | 0 | 0 | State Unassigned |
8109 | 31-May-20 | UP | 374 | 4 | 192 | Uttar Pradesh |
8110 | 31-May-20 | UT | 158 | 0 | 0 | Uttarakhand |
8111 | 31-May-20 | WB | 371 | 8 | 187 | West Bengal |
8112 rows × 6 columns
state_wise.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
State | Confirmed | Recovered | Deaths | Active | Last_Updated_Time | Migrated_Other | State_code | Delta_Confirmed | Delta_Recovered | Delta_Deaths | State_Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Total | 6877073 | 5871898 | 106037 | 898090 | 08/10/2020 21:24:52 | 1048 | TT | 44085 | 47436 | 483 | NaN |
1 | Maharashtra | 1480489 | 1196441 | 39072 | 244527 | 07/10/2020 22:32:57 | 449 | MH | 0 | 0 | 0 | [Sep 9] :239 cases have been removed from the ... |
2 | Andhra Pradesh | 734427 | 678828 | 6086 | 49513 | 07/10/2020 17:50:55 | 0 | AP | 0 | 0 | 0 | NaN |
3 | Karnataka | 679356 | 552519 | 9675 | 117143 | 08/10/2020 21:13:54 | 19 | KA | 10704 | 9613 | 101 | NaN |
4 | Tamil Nadu | 640943 | 586454 | 10052 | 44437 | 08/10/2020 21:13:56 | 0 | TN | 5088 | 5718 | 68 | [July 22]: 444 backdated deceased entries adde... |
# date-time information
# =====================
#saving a copy of the dataframe
df_India = state_wise.copy()
# today's date
now = datetime.now()
# format date to month-day-year
df_India['Date'] = now.strftime("%m/%d/%Y")
# add 'Date' column to dataframe
df_India['Date'] = pd.to_datetime(df_India['Date'], format='%m/%d/%Y')
df_India.head(36)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
State | Confirmed | Recovered | Deaths | Active | Last_Updated_Time | Migrated_Other | State_code | Delta_Confirmed | Delta_Recovered | Delta_Deaths | State_Notes | Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Total | 6877073 | 5871898 | 106037 | 898090 | 08/10/2020 21:24:52 | 1048 | TT | 44085 | 47436 | 483 | NaN | 2020-10-08 |
1 | Maharashtra | 1480489 | 1196441 | 39072 | 244527 | 07/10/2020 22:32:57 | 449 | MH | 0 | 0 | 0 | [Sep 9] :239 cases have been removed from the ... | 2020-10-08 |
2 | Andhra Pradesh | 734427 | 678828 | 6086 | 49513 | 07/10/2020 17:50:55 | 0 | AP | 0 | 0 | 0 | NaN | 2020-10-08 |
3 | Karnataka | 679356 | 552519 | 9675 | 117143 | 08/10/2020 21:13:54 | 19 | KA | 10704 | 9613 | 101 | NaN | 2020-10-08 |
4 | Tamil Nadu | 640943 | 586454 | 10052 | 44437 | 08/10/2020 21:13:56 | 0 | TN | 5088 | 5718 | 68 | [July 22]: 444 backdated deceased entries adde... | 2020-10-08 |
5 | Uttar Pradesh | 427459 | 378662 | 6245 | 42552 | 08/10/2020 21:21:49 | 0 | UP | 3133 | 3690 | 45 | NaN | 2020-10-08 |
6 | Delhi | 300833 | 272948 | 5643 | 22242 | 08/10/2020 21:13:59 | 0 | DL | 2726 | 2643 | 27 | [July 14]: Value for the total tests conducted... | 2020-10-08 |
7 | West Bengal | 284030 | 249737 | 5439 | 28854 | 08/10/2020 21:21:51 | 0 | WB | 3526 | 2970 | 63 | NaN | 2020-10-08 |
8 | Odisha | 244142 | 216984 | 1027 | 26131 | 08/10/2020 21:14:03 | 0 | OR | 3144 | 3312 | 16 | [July 12th] :20 non-covid deaths reported in s... | 2020-10-08 |
9 | Kerala | 258851 | 167256 | 931 | 90579 | 08/10/2020 20:37:52 | 85 | KL | 5445 | 7003 | 24 | Mahe native who expired in Kannur included in ... | 2020-10-08 |
10 | Telangana | 206644 | 179075 | 1201 | 26368 | 08/10/2020 10:44:51 | 0 | TG | 1896 | 2067 | 12 | [July 27] : Telangana bulletin for the previou... | 2020-10-08 |
11 | Bihar | 192671 | 180357 | 929 | 11384 | 08/10/2020 21:21:53 | 1 | BR | 1244 | 1006 | 2 | NaN | 2020-10-08 |
12 | Assam | 191397 | 157635 | 794 | 32965 | 08/10/2020 21:14:07 | 3 | AS | 1188 | 0 | 9 | NaN | 2020-10-08 |
13 | Gujarat | 147950 | 128023 | 3541 | 16386 | 08/10/2020 21:14:08 | 0 | GJ | 1278 | 1266 | 10 | NaN | 2020-10-08 |
14 | Rajasthan | 150467 | 127526 | 1590 | 21351 | 07/10/2020 19:54:02 | 0 | RJ | 0 | 0 | 0 | NaN | 2020-10-08 |
15 | Madhya Pradesh | 142012 | 122687 | 2547 | 16778 | 08/10/2020 21:24:54 | 0 | MP | 1705 | 2420 | 29 | NaN | 2020-10-08 |
16 | Haryana | 138582 | 126267 | 1548 | 10767 | 08/10/2020 21:21:58 | 0 | HR | 1184 | 1426 | 20 | [Aug 2]: 21 Foreign Evacuees have been merged ... | 2020-10-08 |
17 | Chhattisgarh | 131739 | 103828 | 1134 | 26777 | 07/10/2020 23:21:49 | 0 | CT | 0 | 0 | 0 | [Sep 9]:57 backdated deceased cases have been ... | 2020-10-08 |
18 | Punjab | 120868 | 107200 | 3741 | 9927 | 08/10/2020 20:37:54 | 0 | PB | 0 | 1615 | 29 | NaN | 2020-10-08 |
19 | Jharkhand | 89702 | 79176 | 767 | 9759 | 07/10/2020 22:33:14 | 0 | JH | 0 | 0 | 0 | NaN | 2020-10-08 |
20 | Jammu and Kashmir | 81793 | 69020 | 1291 | 11482 | 08/10/2020 21:14:12 | 0 | JK | 696 | 1336 | 9 | NaN | 2020-10-08 |
21 | Uttarakhand | 52959 | 43631 | 688 | 8367 | 07/10/2020 21:48:59 | 273 | UT | 0 | 0 | 0 | NaN | 2020-10-08 |
22 | Goa | 37102 | 31902 | 484 | 4716 | 08/10/2020 21:14:14 | 0 | GA | 432 | 458 | 7 | NaN | 2020-10-08 |
23 | Puducherry | 30539 | 25256 | 556 | 4727 | 08/10/2020 21:14:16 | 0 | PY | 378 | 326 | 5 | NaN | 2020-10-08 |
24 | Tripura | 27756 | 23043 | 301 | 4389 | 08/10/2020 11:38:56 | 23 | TR | 214 | 389 | 3 | [Aug 4]: Tripura bulletin for the previous day... | 2020-10-08 |
25 | Himachal Pradesh | 16565 | 13316 | 226 | 2996 | 07/10/2020 22:33:17 | 27 | HP | 0 | 0 | 0 | NaN | 2020-10-08 |
26 | Chandigarh | 12922 | 11344 | 186 | 1392 | 08/10/2020 21:14:18 | 0 | CH | 102 | 154 | 4 | NaN | 2020-10-08 |
27 | Manipur | 12489 | 9604 | 80 | 2805 | 07/10/2020 19:40:19 | 0 | MN | 0 | 0 | 0 | NaN | 2020-10-08 |
28 | Arunachal Pradesh | 11267 | 8396 | 21 | 2850 | 08/10/2020 00:47:05 | 0 | AR | 0 | 0 | 0 | [July 25]: All numbers corresponding to Papum ... | 2020-10-08 |
29 | Meghalaya | 7165 | 4694 | 60 | 2411 | 07/10/2020 21:49:06 | 0 | ML | 0 | 0 | 0 | NaN | 2020-10-08 |
30 | Nagaland | 6715 | 5450 | 12 | 1194 | 07/10/2020 19:40:22 | 59 | NL | 0 | 0 | 0 | NaN | 2020-10-08 |
31 | Ladakh | 4802 | 3511 | 63 | 1228 | 08/10/2020 01:48:06 | 0 | LA | 0 | 0 | 0 | [Sep 08] : Testing details are not available i... | 2020-10-08 |
32 | Andaman and Nicobar Islands | 3935 | 3696 | 54 | 185 | 07/10/2020 23:21:51 | 0 | AN | 0 | 0 | 0 | NaN | 2020-10-08 |
33 | Dadra and Nagar Haveli and Daman and Diu | 3118 | 2979 | 2 | 109 | 07/10/2020 20:13:04 | 28 | DN | 0 | 0 | 0 | NaN | 2020-10-08 |
34 | Sikkim | 3234 | 2534 | 51 | 568 | 07/10/2020 23:21:53 | 81 | SK | 0 | 0 | 0 | NaN | 2020-10-08 |
35 | Mizoram | 2150 | 1919 | 0 | 231 | 08/10/2020 10:41:00 | 0 | MZ | 2 | 24 | 0 | NaN | 2020-10-08 |
# latitude and longitude information
# ==================================
# latitude of the states
lat = {'Delhi':28.7041, 'Haryana':29.0588, 'Kerala':10.8505, 'Rajasthan':27.0238,
'Telengana':18.1124, 'Uttar Pradesh':26.8467, 'Ladakh':34.2996, 'Tamil Nadu':11.1271,
'Jammu and Kashmir':33.7782, 'Punjab':31.1471, 'Karnataka':15.3173, 'Maharashtra':19.7515,
'Andhra Pradesh':15.9129, 'Odisha':20.9517, 'Uttarakhand':30.0668, 'West Bengal':22.9868,
'Puducherry': 11.9416, 'Chandigarh': 30.7333, 'Chhattisgarh':21.2787, 'Gujarat': 22.2587,
'Himachal Pradesh': 31.1048, 'Madhya Pradesh': 22.9734, 'Bihar': 25.0961, 'Manipur':24.6637,
'Mizoram':23.1645, 'Goa': 15.2993, 'Andaman and Nicobar Islands': 11.7401, 'Assam' : 26.2006,
'Jharkhand': 23.6102, 'Arunachal Pradesh': 28.2180, 'Tripura': 23.9408, 'Nagaland': 26.1584,
'Meghalaya' : 25.4670, 'Dadar Nagar Haveli' : 20.1809, 'Sikkim': 27.5330}
# longitude of the states
long = {'Delhi':77.1025, 'Haryana':76.0856, 'Kerala':76.2711, 'Rajasthan':74.2179,
'Telengana':79.0193, 'Uttar Pradesh':80.9462, 'Ladakh':78.2932, 'Tamil Nadu':78.6569,
'Jammu and Kashmir':76.5762, 'Punjab':75.3412, 'Karnataka':75.7139, 'Maharashtra':75.7139,
'Andhra Pradesh':79.7400, 'Odisha':85.0985, 'Uttarakhand':79.0193, 'West Bengal':87.8550,
'Puducherry': 79.8083, 'Chandigarh': 76.7794, 'Chhattisgarh':81.8661, 'Gujarat': 71.1924,
'Himachal Pradesh': 77.1734, 'Madhya Pradesh': 78.6569, 'Bihar': 85.3131, 'Manipur':93.9063,
'Mizoram':92.9376, 'Goa': 74.1240, 'Andaman and Nicobar Islands': 92.6586, 'Assam' : 92.9376,
'Jharkhand': 85.2799, 'Arunachal Pradesh': 94.7278, 'Tripura': 91.9882, 'Nagaland': 94.5624,
'Meghalaya' : 91.3662, 'Dadar Nagar Haveli' : 73.0169, 'Sikkim': 88.5122}
# add latitude column based on 'Name of State / UT' column
df_India['Latitude'] = df_India['State'].map(lat)
# add longitude column based on 'Name of State / UT' column
df_India['Longitude'] = df_India['State'].map(long)
df_India.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
State | Confirmed | Recovered | Deaths | Active | Last_Updated_Time | Migrated_Other | State_code | Delta_Confirmed | Delta_Recovered | Delta_Deaths | State_Notes | Date | Latitude | Longitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Total | 6877073 | 5871898 | 106037 | 898090 | 08/10/2020 21:24:52 | 1048 | TT | 44085 | 47436 | 483 | NaN | 2020-10-08 | NaN | NaN |
1 | Maharashtra | 1480489 | 1196441 | 39072 | 244527 | 07/10/2020 22:32:57 | 449 | MH | 0 | 0 | 0 | [Sep 9] :239 cases have been removed from the ... | 2020-10-08 | 19.7515 | 75.7139 |
2 | Andhra Pradesh | 734427 | 678828 | 6086 | 49513 | 07/10/2020 17:50:55 | 0 | AP | 0 | 0 | 0 | NaN | 2020-10-08 | 15.9129 | 79.7400 |
3 | Karnataka | 679356 | 552519 | 9675 | 117143 | 08/10/2020 21:13:54 | 19 | KA | 10704 | 9613 | 101 | NaN | 2020-10-08 | 15.3173 | 75.7139 |
4 | Tamil Nadu | 640943 | 586454 | 10052 | 44437 | 08/10/2020 21:13:56 | 0 | TN | 5088 | 5718 | 68 | [July 22]: 444 backdated deceased entries adde... | 2020-10-08 | 11.1271 | 78.6569 |
# rename columns
df_India = df_India.rename(columns={'Delta_Recovered' :'Cured/Discharged',
'Total Confirmed cases *': 'Confirmed',
'Total Confirmed cases ': 'Confirmed',
'Total Confirmed cases* ': 'Confirmed'})
df_India = df_India.rename(columns={'Cured/Discharged':'Delta_Cured'})
df_India = df_India.rename(columns={'State':'State/UnionTerritory'})
df_India = df_India.rename(columns={'State':'State/UnionTerritory'})
df_India = df_India.rename(columns=lambda x: re.sub('Total Confirmed cases \(Including .* foreign Nationals\) ',
'Total Confirmed cases',x))
df_India = df_India.rename(columns={'Deaths ( more than 70% cases due to comorbidities )':'Deaths',
'Deaths**':'Deaths'})
# unique state names
df_India['State/UnionTerritory'].unique()
array(['Total', 'Maharashtra', 'Andhra Pradesh', 'Karnataka',
'Tamil Nadu', 'Uttar Pradesh', 'Delhi', 'West Bengal', 'Odisha',
'Kerala', 'Telangana', 'Bihar', 'Assam', 'Gujarat', 'Rajasthan',
'Madhya Pradesh', 'Haryana', 'Chhattisgarh', 'Punjab', 'Jharkhand',
'Jammu and Kashmir', 'Uttarakhand', 'Goa', 'Puducherry', 'Tripura',
'Himachal Pradesh', 'Chandigarh', 'Manipur', 'Arunachal Pradesh',
'Meghalaya', 'Nagaland', 'Ladakh', 'Andaman and Nicobar Islands',
'Dadra and Nagar Haveli and Daman and Diu', 'Sikkim', 'Mizoram',
'State Unassigned', 'Lakshadweep'], dtype=object)
# number of missing values
df_India.isna().sum()
State/UnionTerritory 0
Confirmed 0
Recovered 0
Deaths 0
Active 0
Last_Updated_Time 0
Migrated_Other 0
State_code 0
Delta_Confirmed 0
Delta_Cured 0
Delta_Deaths 0
State_Notes 26
Date 0
Latitude 5
Longitude 5
dtype: int64
# number of unique values
df_India.nunique()
State/UnionTerritory 38
Confirmed 37
Recovered 37
Deaths 36
Active 37
Last_Updated_Time 38
Migrated_Other 13
State_code 38
Delta_Confirmed 21
Delta_Cured 21
Delta_Deaths 19
State_Notes 12
Date 1
Latitude 33
Longitude 30
dtype: int64
# fix datatype
df_India['Date'] = pd.to_datetime(df_India['Date'])
# rename state/UT names
df_India['State/UnionTerritory'].replace('Chattisgarh', 'Chhattisgarh', inplace=True)
df_India['State/UnionTerritory'].replace('Pondicherry', 'Puducherry', inplace=True)
df_India = df_India.drop(['Migrated_Other','State_Notes'], axis=1)
df_India = df_India.drop([0], axis=0)
df_India.head(36)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
State/UnionTerritory | Confirmed | Recovered | Deaths | Active | Last_Updated_Time | State_code | Delta_Confirmed | Delta_Cured | Delta_Deaths | Date | Latitude | Longitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Maharashtra | 1480489 | 1196441 | 39072 | 244527 | 07/10/2020 22:32:57 | MH | 0 | 0 | 0 | 2020-10-08 | 19.7515 | 75.7139 |
2 | Andhra Pradesh | 734427 | 678828 | 6086 | 49513 | 07/10/2020 17:50:55 | AP | 0 | 0 | 0 | 2020-10-08 | 15.9129 | 79.7400 |
3 | Karnataka | 679356 | 552519 | 9675 | 117143 | 08/10/2020 21:13:54 | KA | 10704 | 9613 | 101 | 2020-10-08 | 15.3173 | 75.7139 |
4 | Tamil Nadu | 640943 | 586454 | 10052 | 44437 | 08/10/2020 21:13:56 | TN | 5088 | 5718 | 68 | 2020-10-08 | 11.1271 | 78.6569 |
5 | Uttar Pradesh | 427459 | 378662 | 6245 | 42552 | 08/10/2020 21:21:49 | UP | 3133 | 3690 | 45 | 2020-10-08 | 26.8467 | 80.9462 |
6 | Delhi | 300833 | 272948 | 5643 | 22242 | 08/10/2020 21:13:59 | DL | 2726 | 2643 | 27 | 2020-10-08 | 28.7041 | 77.1025 |
7 | West Bengal | 284030 | 249737 | 5439 | 28854 | 08/10/2020 21:21:51 | WB | 3526 | 2970 | 63 | 2020-10-08 | 22.9868 | 87.8550 |
8 | Odisha | 244142 | 216984 | 1027 | 26131 | 08/10/2020 21:14:03 | OR | 3144 | 3312 | 16 | 2020-10-08 | 20.9517 | 85.0985 |
9 | Kerala | 258851 | 167256 | 931 | 90579 | 08/10/2020 20:37:52 | KL | 5445 | 7003 | 24 | 2020-10-08 | 10.8505 | 76.2711 |
10 | Telangana | 206644 | 179075 | 1201 | 26368 | 08/10/2020 10:44:51 | TG | 1896 | 2067 | 12 | 2020-10-08 | NaN | NaN |
11 | Bihar | 192671 | 180357 | 929 | 11384 | 08/10/2020 21:21:53 | BR | 1244 | 1006 | 2 | 2020-10-08 | 25.0961 | 85.3131 |
12 | Assam | 191397 | 157635 | 794 | 32965 | 08/10/2020 21:14:07 | AS | 1188 | 0 | 9 | 2020-10-08 | 26.2006 | 92.9376 |
13 | Gujarat | 147950 | 128023 | 3541 | 16386 | 08/10/2020 21:14:08 | GJ | 1278 | 1266 | 10 | 2020-10-08 | 22.2587 | 71.1924 |
14 | Rajasthan | 150467 | 127526 | 1590 | 21351 | 07/10/2020 19:54:02 | RJ | 0 | 0 | 0 | 2020-10-08 | 27.0238 | 74.2179 |
15 | Madhya Pradesh | 142012 | 122687 | 2547 | 16778 | 08/10/2020 21:24:54 | MP | 1705 | 2420 | 29 | 2020-10-08 | 22.9734 | 78.6569 |
16 | Haryana | 138582 | 126267 | 1548 | 10767 | 08/10/2020 21:21:58 | HR | 1184 | 1426 | 20 | 2020-10-08 | 29.0588 | 76.0856 |
17 | Chhattisgarh | 131739 | 103828 | 1134 | 26777 | 07/10/2020 23:21:49 | CT | 0 | 0 | 0 | 2020-10-08 | 21.2787 | 81.8661 |
18 | Punjab | 120868 | 107200 | 3741 | 9927 | 08/10/2020 20:37:54 | PB | 0 | 1615 | 29 | 2020-10-08 | 31.1471 | 75.3412 |
19 | Jharkhand | 89702 | 79176 | 767 | 9759 | 07/10/2020 22:33:14 | JH | 0 | 0 | 0 | 2020-10-08 | 23.6102 | 85.2799 |
20 | Jammu and Kashmir | 81793 | 69020 | 1291 | 11482 | 08/10/2020 21:14:12 | JK | 696 | 1336 | 9 | 2020-10-08 | 33.7782 | 76.5762 |
21 | Uttarakhand | 52959 | 43631 | 688 | 8367 | 07/10/2020 21:48:59 | UT | 0 | 0 | 0 | 2020-10-08 | 30.0668 | 79.0193 |
22 | Goa | 37102 | 31902 | 484 | 4716 | 08/10/2020 21:14:14 | GA | 432 | 458 | 7 | 2020-10-08 | 15.2993 | 74.1240 |
23 | Puducherry | 30539 | 25256 | 556 | 4727 | 08/10/2020 21:14:16 | PY | 378 | 326 | 5 | 2020-10-08 | 11.9416 | 79.8083 |
24 | Tripura | 27756 | 23043 | 301 | 4389 | 08/10/2020 11:38:56 | TR | 214 | 389 | 3 | 2020-10-08 | 23.9408 | 91.9882 |
25 | Himachal Pradesh | 16565 | 13316 | 226 | 2996 | 07/10/2020 22:33:17 | HP | 0 | 0 | 0 | 2020-10-08 | 31.1048 | 77.1734 |
26 | Chandigarh | 12922 | 11344 | 186 | 1392 | 08/10/2020 21:14:18 | CH | 102 | 154 | 4 | 2020-10-08 | 30.7333 | 76.7794 |
27 | Manipur | 12489 | 9604 | 80 | 2805 | 07/10/2020 19:40:19 | MN | 0 | 0 | 0 | 2020-10-08 | 24.6637 | 93.9063 |
28 | Arunachal Pradesh | 11267 | 8396 | 21 | 2850 | 08/10/2020 00:47:05 | AR | 0 | 0 | 0 | 2020-10-08 | 28.2180 | 94.7278 |
29 | Meghalaya | 7165 | 4694 | 60 | 2411 | 07/10/2020 21:49:06 | ML | 0 | 0 | 0 | 2020-10-08 | 25.4670 | 91.3662 |
30 | Nagaland | 6715 | 5450 | 12 | 1194 | 07/10/2020 19:40:22 | NL | 0 | 0 | 0 | 2020-10-08 | 26.1584 | 94.5624 |
31 | Ladakh | 4802 | 3511 | 63 | 1228 | 08/10/2020 01:48:06 | LA | 0 | 0 | 0 | 2020-10-08 | 34.2996 | 78.2932 |
32 | Andaman and Nicobar Islands | 3935 | 3696 | 54 | 185 | 07/10/2020 23:21:51 | AN | 0 | 0 | 0 | 2020-10-08 | 11.7401 | 92.6586 |
33 | Dadra and Nagar Haveli and Daman and Diu | 3118 | 2979 | 2 | 109 | 07/10/2020 20:13:04 | DN | 0 | 0 | 0 | 2020-10-08 | NaN | NaN |
34 | Sikkim | 3234 | 2534 | 51 | 568 | 07/10/2020 23:21:53 | SK | 0 | 0 | 0 | 2020-10-08 | 27.5330 | 88.5122 |
35 | Mizoram | 2150 | 1919 | 0 | 231 | 08/10/2020 10:41:00 | MZ | 2 | 24 | 0 | 2020-10-08 | 23.1645 | 92.9376 |
36 | State Unassigned | 0 | 0 | 0 | 0 | 19/07/2020 09:40:01 | UN | 0 | 0 | 0 | 2020-10-08 | NaN | NaN |
# complete data info
df_India.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 37 entries, 1 to 37
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 State/UnionTerritory 37 non-null object
1 Confirmed 37 non-null int64
2 Recovered 37 non-null int64
3 Deaths 37 non-null int64
4 Active 37 non-null int64
5 Last_Updated_Time 37 non-null object
6 State_code 37 non-null object
7 Delta_Confirmed 37 non-null int64
8 Delta_Cured 37 non-null int64
9 Delta_Deaths 37 non-null int64
10 Date 37 non-null datetime64[ns]
11 Latitude 33 non-null float64
12 Longitude 33 non-null float64
dtypes: datetime64[ns](1), float64(2), int64(7), object(3)
memory usage: 4.0+ KB
# saving data
# ===========
# file names as year-month-day.csv format
file_name = now.strftime("%Y_%m_%d")+' - COVID-19_India_preprocessed.csv'
# location for saving the file
file_loc = '/content/'
# save file as a scv file
df_India.to_csv(file_loc + file_name, index=False)
from datetime import date
total_cases = df_India['Confirmed'].sum()
print('Total number of confirmed COVID 2019 cases across India till date ',date.today(),':', total_cases)
Total number of confirmed COVID 2019 cases across India till date 2020-10-08 : 6877073
#Learn how to highlight your dataframe
df_temp = df_India.drop(['Latitude', 'Longitude', 'Date'], axis = 1) #Removing Date, Latitude and Longitude and other extra columns
df_temp.style.background_gradient(cmap='Reds')
today = now.strftime("%Y_%m_%d")
total_cured = df_India['Delta_Cured'].sum()
recovered = df_India['Recovered'].sum()
print("Total people who were recovered as of "+today+" are: ", recovered)
total_cases = df_India['Confirmed'].sum()
print("Total people who were detected COVID+ve as of "+today+" are: ", total_cases)
total_death = df_India['Deaths'].sum()
print("Total people who died due to COVID19 as of "+today+" are: ",total_death)
total_active = total_cases-recovered-total_death
print("Total active COVID19 cases as of "+today+" are: ",total_active)
Total people who were recovered as of 2020_10_08 are: 5871898
Total people who were detected COVID+ve as of 2020_10_08 are: 6877073
Total people who died due to COVID19 as of 2020_10_08 are: 106037
Total active COVID19 cases as of 2020_10_08 are: 899138
#Total Active is the Total cases - (Number of death + Cured)
df_India['Total Active'] = df_India['Confirmed'] - (df_India['Deaths'] + df_India['Recovered'])
total_active = df_India['Total Active'].sum()
print('Total number of active COVID 19 cases across India:', total_active)
Tot_Cases = df_India.groupby('State/UnionTerritory')['Total Active'].sum().sort_values(ascending=False).to_frame()
Tot_Cases.style.background_gradient(cmap='Reds')
Total number of active COVID 19 cases across India: 899138
state_cases = df_India.groupby('State/UnionTerritory')['Confirmed','Deaths','Delta_Cured'].max().reset_index()
#state_cases = state_cases.astype({'Deaths': 'int'})
state_cases['Active'] = state_cases['Confirmed'] - (state_cases['Deaths']+state_cases['Delta_Cured'])
state_cases["Death Rate (per 100)"] = np.round(100*state_cases["Deaths"]/state_cases["Confirmed"],2)
state_cases["Cure Rate (per 100)"] = np.round(100*state_cases["Delta_Cured"]/state_cases["Confirmed"],2)
state_cases.sort_values('Confirmed', ascending= False).fillna(0).style.background_gradient(cmap='Blues',subset=["Confirmed"])\
.background_gradient(cmap='Blues',subset=["Deaths"])\
.background_gradient(cmap='Blues',subset=["Delta_Cured"])\
.background_gradient(cmap='Blues',subset=["Active"])\
.background_gradient(cmap='Blues',subset=["Death Rate (per 100)"])\
.background_gradient(cmap='Blues',subset=["Cure Rate (per 100)"])
Visualization Inference:
- Almost +1,611 cases of COVID-19 has been reported today (23rd May) taking total cases to 123202.
- The cases have been confirmed across 32 states and union territories.
- Out of 123202 cases, 51784 people have been cured, discharged or migrated.
- Maharashtra, Tamilnaidu, Gujrat and Delhi are worsely affected states with maximum number of confirmed cases
- Till 23rd of May 3720 people have died in India
# age_details = pd.read_csv('/content/AgeGroupDetails.csv')
india_covid_19 = pd.read_csv('./covid_19_india.csv')
hospital_beds = pd.read_csv('./HospitalBedsIndia.csv')
individual_details = pd.read_csv('./IndividualDetails.csv')
ICMR_details = pd.read_csv('./ICMRTestingDetails.csv')
ICMR_labs = pd.read_csv('./ICMRTestingLabs.csv')
state_testing = pd.read_csv('./StatewiseTestingDetails.csv')
population = pd.read_csv('./population_india_census2011.csv')
india_covid_19['Date'] = pd.to_datetime(india_covid_19['Date'],dayfirst = True)
state_testing['Date'] = pd.to_datetime(state_testing['Date'])
ICMR_details['DateTime'] = pd.to_datetime(ICMR_details['DateTime'],dayfirst = True)
ICMR_details = ICMR_details.dropna(subset=['TotalSamplesTested', 'TotalPositiveCases'])
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recovered_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
latest_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/09-29-2020.csv')
We could see that the age group <40 is the most affected which is against the trend which says elderly people are more at risk of being affected. Only 17% of people >60 are affected.
dates = list(confirmed_df.columns[4:])
dates = list(pd.to_datetime(dates))
dates_india = dates[8:]
# print(dates_india)
tes = list(pd.to_datetime(dates))
dates_india = dates[8:]
df1 = confirmed_df.groupby('Country/Region').sum().reset_index()
df2 = deaths_df.groupby('Country/Region').sum().reset_index()
df3 = recovered_df.groupby('Country/Region').sum().reset_index()
k = df1[df1['Country/Region']=='India'].loc[:,'1/30/20':]
india_confirmed = k.values.tolist()[0]
k = df2[df2['Country/Region']=='India'].loc[:,'1/30/20':]
india_deaths = k.values.tolist()[0]
k = df3[df3['Country/Region']=='India'].loc[:,'1/30/20':]
india_recovered = k.values.tolist()[0]
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 11)
plt.yticks(fontsize = 10)
plt.xlabel("Dates",fontsize = 20)
plt.ylabel('Total cases',fontsize = 20)
plt.title("Total Confirmed, Active, Death in India" , fontsize = 20)
ax1 = plt.plot_date(y= india_confirmed,x= dates_india,label = 'Confirmed',linestyle ='-',color = 'b')
ax2 = plt.plot_date(y= india_recovered,x= dates_india,label = 'Recovered',linestyle ='-',color = 'g')
ax3 = plt.plot_date(y= india_deaths,x= dates_india,label = 'Death',linestyle ='-',color = 'r')
plt.legend()
<matplotlib.legend.Legend at 0x25ced931dc0>
import matplotlib.dates as mdates
ICMR_details['Percent_positive'] = round((ICMR_details['TotalPositiveCases']/ICMR_details['TotalSamplesTested'])*100,1)
fig, ax1 = plt.subplots(figsize= (15,5))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
ax1.set_ylabel('Positive Cases (% of Total Samples Tested)')
ax1.bar(ICMR_details['DateTime'] , ICMR_details['Percent_positive'], color="red",label = 'Percentage of Positive Cases')
ax1.text(ICMR_details['DateTime'][0],4, 'Total Samples Tested as of Apr 23rd = 541789', style='italic',fontsize= 10,
bbox={'facecolor': 'white' ,'alpha': 0.5, 'pad': 5})
ax2 = ax1.twinx()
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
ax2.set_ylabel('Num Samples Tested')
ax2.fill_between(ICMR_details['DateTime'],ICMR_details['TotalSamplesTested'],color = 'black',alpha = 0.5,label = 'Samples Tested');
plt.legend(loc="upper left")
plt.title('Total Samples Tested')
plt.show()
import json
# get response from the web page
response = requests.get('https://api.covid19india.org/state_test_data.json')
# get contents from the response
content = response.content
# parse the json file
parsed = json.loads(content)
# keys
parsed.keys()
dict_keys(['states_tested_data'])
# save data in a dataframe
tested = pd.DataFrame(parsed['states_tested_data'])
# first few rows
tested.tail()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
antigentests | coronaenquirycalls | cumulativepeopleinquarantine | negative | numcallsstatehelpline | numicubeds | numisolationbeds | numventilators | othertests | peopleinicu | ... | testsperpositivecase | testsperthousand | totaln95masks | totalpeoplecurrentlyinquarantine | totalpeoplereleasedfromquarantine | totalppe | totaltested | unconfirmed | updatedon | _djhdx | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6026 | 1221370 | 1243 | 12715 | 790 | ... | 2383040 | 2426 | 107757 | 2409262 | 3397988 | 04/10/2020 | NaN | |||||||||
6027 | 1245401 | 1243 | 12715 | 790 | ... | 2395040 | 2425 | 107761 | 2415262 | 3438128 | 05/10/2020 | NaN | |||||||||
6028 | 1267956 | 1243 | 12715 | 790 | ... | 2405040 | 2413 | 107779 | 2420262 | 3480510 | 06/10/2020 | NaN | |||||||||
6029 | 1288884 | 1243 | 12715 | 790 | ... | 2417040 | 2410 | 107787 | 2425262 | 3523161 | 07/10/2020 | NaN | |||||||||
6030 | 1243 | 12715 | 790 | ... | 2428040 | 2415 | 107792 | 2430262 | 3565602 | 08/10/2020 | NaN |
5 rows × 32 columns
# fix datatype
tested['updatedon'] = pd.to_datetime(tested['updatedon'])
# save file as a scv file
tested.to_csv('updated_tests_latest_state_level.csv', index=False)
state_test_cases = tested.groupby(['updatedon','state'])['totaltested','populationncp2019projection','testpositivityrate', 'testsperpositivecase', 'testsperthousand','totalpeoplecurrentlyinquarantine'].max().reset_index()
state_test_cases[:-50]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
updatedon | state | totaltested | populationncp2019projection | testpositivityrate | testsperpositivecase | testsperthousand | totalpeoplecurrentlyinquarantine | |
---|---|---|---|---|---|---|---|---|
0 | 2020-01-04 | Delhi | 2621 | 19814000 | 0.00% | 0.13 | ||
1 | 2020-01-04 | Kerala | 7965 | 35125000 | 3.33% | 30 | 0.23 | 622 |
2 | 2020-01-04 | West Bengal | 659 | 96906000 | 5.61% | 18 | 0.01 | |
3 | 2020-01-05 | Andaman and Nicobar Islands | 3754 | 397000 | 0.88% | 114 | 9.46 | 643 |
4 | 2020-01-05 | Andhra Pradesh | 102460 | 52221000 | 1.43% | 70 | 1.96 | |
... | ... | ... | ... | ... | ... | ... | ... | ... |
5975 | 2020-12-08 | Jammu and Kashmir | 750847 | 13203000 | 3.52% | 28 | 56.87 | 42494 |
5976 | 2020-12-08 | Jharkhand | 402072 | 37403000 | 5.04% | 20 | 10.75 | |
5977 | 2020-12-08 | Karnataka | 1826317 | 65798000 | 0.00% | 27.76 | 289355 | |
5978 | 2020-12-08 | Kerala | 1056360 | 35125000 | 3.61% | 28 | 30.07 | 12426 |
5979 | 2020-12-08 | Ladakh | 23034 | 293000 | 7.86% | 13 | 78.61 | 363 |
5980 rows × 8 columns
state_test_cases = tested.groupby('state')['totaltested','populationncp2019projection','testpositivityrate', 'testsperpositivecase', 'testsperthousand','totalpeoplecurrentlyinquarantine'].max()
state_test_cases['testpositivityrate'] = state_test_cases['testpositivityrate'].str.replace('%', '')
state_test_cases = state_test_cases.apply(pd.to_numeric)
state_test_cases.nunique()
totaltested 35
populationncp2019projection 34
testpositivityrate 34
testsperpositivecase 19
testsperthousand 25
totalpeoplecurrentlyinquarantine 28
dtype: int64
state_test_cases.sort_values('totaltested', ascending= False).style.background_gradient(cmap='Blues',subset=["totaltested"])\
.background_gradient(cmap='Blues',subset=["populationncp2019projection"])\
.background_gradient(cmap='Blues',subset=["testpositivityrate"])\
.background_gradient(cmap='Blues',subset=["testsperpositivecase"])\
.background_gradient(cmap='Blues',subset=["testsperthousand"])\
.background_gradient(cmap='Blues',subset=["totalpeoplecurrentlyinquarantine"])
all_state = list(df_India['State/UnionTerritory'].unique())
latest = india_covid_19[india_covid_19['Date'] > '24-03-20']
state_cases = latest.groupby('State/UnionTerritory')['Confirmed','Deaths','Cured'].max().reset_index()
latest['Active'] = latest['Confirmed'] - (latest['Deaths']- latest['Cured'])
state_cases = state_cases.sort_values('Confirmed', ascending= False).fillna(0)
states =list(state_cases['State/UnionTerritory'][0:15])
states_confirmed = {}
states_deaths = {}
states_recovered = {}
states_dates = {}
for state in states:
df = latest[latest['State/UnionTerritory'] == state].reset_index()
k = []
l = []
m = []
n = []
for i in range(1,len(df)):
k.append(df['Confirmed'][i]-df['Confirmed'][i-1])
l.append(df['Deaths'][i]-df['Deaths'][i-1])
m.append(df['Cured'][i]-df['Cured'][i-1])
n.append(df['Active'][i]-df['Active'][i-1])
states_confirmed[state] = k
states_deaths[state] = l
states_recovered[state] = m
# states_active[state] = n
date = list(df['Date'])
states_dates[state] = date[1:]
fig = plt.figure(figsize= (25,17))
plt.suptitle('Day-by-Day Confirmed Cases in Top 15 States in India',fontsize = 20,y=1.0)
k=0
for i in range(1,15):
ax = fig.add_subplot(5,3,i)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
ax.bar(states_dates[states[k]],states_confirmed[states[k]],label = 'Day wise Confirmed Cases ')
plt.title(states[k],fontsize = 20)
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='upper left')
k=k+1
plt.tight_layout(pad=5.0)
def calc_growthRate(values):
k = []
for i in range(1,len(values)):
summ = 0
for j in range(i):
summ = summ + values[j]
rate = (values[i]/summ)*100
k.append(int(rate))
return k
fig = plt.figure(figsize= (25,17))
plt.suptitle('Growth Rate in Top 15 States',fontsize = 20,y=1.0)
k=0
for i in range(1,15):
ax = fig.add_subplot(5,3,i)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
#ax.bar(states_dates[states[k]],states_confirmed[states[k]],label = 'Day wise Confirmed Cases ')
growth_rate = calc_growthRate(states_confirmed[states[k]])
ax.plot_date(states_dates[states[k]][21:],growth_rate[20:],color = '#9370db',label = 'Growth Rate',linewidth =3,linestyle='-')
plt.title(states[k],fontsize = 20)
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='upper left')
k=k+1
plt.tight_layout(pad=3.0)
Though being highly populated the relative confimred cases of India is low compared to other countries. This could be because of two reasons:
- 67 days lockdown imposed by prime minister Narendra Modi in several stages (Source : Health Ministry)
- Low testing rate (Source: news18)
cols_object = list(hospital_beds.columns[2:8])
for cols in cols_object:
hospital_beds[cols] = hospital_beds[cols].astype(int,errors = 'ignore')
hospital_beds = hospital_beds.drop('Sno',axis=1)
hospital_beds.head(36)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
State/UT | NumPrimaryHealthCenters_HMIS | NumCommunityHealthCenters_HMIS | NumSubDistrictHospitals_HMIS | NumDistrictHospitals_HMIS | TotalPublicHealthFacilities_HMIS | NumPublicBeds_HMIS | NumRuralHospitals_NHP18 | NumRuralBeds_NHP18 | NumUrbanHospitals_NHP18 | NumUrbanBeds_NHP18 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Andaman & Nicobar Islands | 27 | 4 | NaN | 3 | 34 | 1246 | 27 | 575 | 3 | 500 |
1 | Andhra Pradesh | 1417 | 198 | 31.0 | 20 | 1666 | 60799 | 193 | 6480 | 65 | 16658 |
2 | Arunachal Pradesh | 122 | 62 | NaN | 15 | 199 | 2320 | 208 | 2136 | 10 | 268 |
3 | Assam | 1007 | 166 | 14.0 | 33 | 1220 | 19115 | 1176 | 10944 | 50 | 6198 |
4 | Bihar | 2007 | 63 | 33.0 | 43 | 2146 | 17796 | 930 | 6083 | 103 | 5936 |
5 | Chandigarh | 40 | 2 | 1.0 | 4 | 47 | 3756 | 0 | 0 | 4 | 778 |
6 | Chhattisgarh | 813 | 166 | 12.0 | 32 | 1023 | 14354 | 169 | 5070 | 45 | 4342 |
7 | Dadra & Nagar Haveli | 9 | 2 | 1.0 | 1 | 13 | 568 | 10 | 273 | 1 | 316 |
8 | Daman & Diu | 4 | 2 | NaN | 2 | 8 | 298 | 5 | 240 | 0 | 0 |
9 | Delhi | 534 | 25 | 9.0 | 47 | 615 | 20572 | 0 | 0 | 109 | 24383 |
10 | Goa | 31 | 4 | 2.0 | 3 | 40 | 2666 | 17 | 1405 | 25 | 1608 |
11 | Gujarat | 1770 | 385 | 44.0 | 37 | 2236 | 41129 | 364 | 11715 | 122 | 20565 |
12 | Haryana | 500 | 131 | 24.0 | 28 | 683 | 13841 | 609 | 6690 | 59 | 4550 |
13 | Himachal Pradesh | 516 | 79 | 61.0 | 15 | 671 | 8706 | 705 | 5665 | 96 | 6734 |
14 | Jammu & Kashmir | 702 | 87 | NaN | 29 | 818 | 11342 | 56 | 7234 | 76 | 4417 |
15 | Jharkhand | 343 | 179 | 13.0 | 23 | 558 | 7404 | 519 | 5842 | 36 | 4942 |
16 | Karnataka | 2547 | 207 | 147.0 | 42 | 2943 | 56333 | 2471 | 21072 | 374 | 49093 |
17 | Kerala | 933 | 229 | 82.0 | 53 | 1297 | 39511 | 981 | 16865 | 299 | 21139 |
18 | Lakshadweep | 4 | 3 | 2.0 | 1 | 10 | 250 | 9 | 300 | 0 | 0 |
19 | Madhya Pradesh | 1420 | 324 | 72.0 | 51 | 1867 | 38140 | 334 | 10020 | 117 | 18819 |
20 | Maharashtra | 2638 | 430 | 101.0 | 70 | 3239 | 68998 | 273 | 12398 | 438 | 39048 |
21 | Manipur | 87 | 17 | 1.0 | 9 | 114 | 2562 | 23 | 730 | 7 | 697 |
22 | Meghalaya | 138 | 29 | NaN | 13 | 180 | 4585 | 143 | 1970 | 14 | 2487 |
23 | Mizoram | 65 | 10 | 3.0 | 9 | 87 | 2312 | 56 | 604 | 34 | 1393 |
24 | Nagaland | 134 | 21 | NaN | 11 | 166 | 1944 | 21 | 630 | 15 | 1250 |
25 | Odisha | 1360 | 377 | 27.0 | 35 | 1799 | 16497 | 1655 | 6339 | 149 | 12180 |
26 | Puducherry | 40 | 4 | 5.0 | 4 | 53 | 4462 | 3 | 96 | 11 | 3473 |
27 | Punjab | 521 | 146 | 47.0 | 28 | 742 | 13527 | 510 | 5805 | 172 | 12128 |
28 | Rajasthan | 2463 | 579 | 64.0 | 33 | 3139 | 51844 | 602 | 21088 | 150 | 10760 |
29 | Sikkim | 25 | 2 | 1.0 | 4 | 32 | 1145 | 24 | 260 | 9 | 1300 |
30 | Tamil Nadu | 1854 | 385 | 310.0 | 32 | 2581 | 72616 | 692 | 40179 | 525 | 37353 |
31 | Telangana | 788 | 82 | 47.0 | 15 | 932 | 17358 | 802 | 7668 | 61 | 13315 |
32 | Tripura | 114 | 22 | 12.0 | 9 | 157 | 4895 | 99 | 1140 | 56 | 3277 |
33 | Uttar Pradesh | 3277 | 671 | NaN | 174 | 4122 | 58310 | 4442 | 39104 | 193 | 37156 |
34 | Uttarakhand | 275 | 69 | 19.0 | 20 | 383 | 6660 | 410 | 3284 | 50 | 5228 |
35 | West Bengal | 1374 | 406 | 70.0 | 55 | 1905 | 51163 | 1272 | 19684 | 294 | 58882 |
# top_10_primary = hospital_beds.nlargest(10,'NumPrimaryHealthCenters_HMIS')
top_10_community = hospital_beds.nlargest(10,'NumCommunityHealthCenters_HMIS')
top_10_district_hospitals = hospital_beds.nlargest(10,'NumDistrictHospitals_HMIS')
top_10_public_facility = hospital_beds.nlargest(10,'TotalPublicHealthFacilities_HMIS')
top_10_public_beds = hospital_beds.nlargest(10,'NumPublicBeds_HMIS')
plt.subplot(222)
plt.title('Community Health Centers')
plt.barh(top_10_community['State/UT'],top_10_community['NumCommunityHealthCenters_HMIS'],color = '#9370db');
plt.subplot(224)
plt.title('Total Public Health Facilities')
plt.barh(top_10_community['State/UT'],top_10_public_facility['TotalPublicHealthFacilities_HMIS'],color='#9370db');
plt.subplot(223)
plt.title('District Hospitals')
plt.barh(top_10_community['State/UT'],top_10_district_hospitals['NumDistrictHospitals_HMIS'],color = '#87479d');
top_rural_hos = hospital_beds.nlargest(10,'NumRuralHospitals_NHP18')
top_rural_beds = hospital_beds.nlargest(10,'NumRuralBeds_NHP18')
top_urban_hos = hospital_beds.nlargest(10,'NumUrbanHospitals_NHP18')
top_urban_beds = hospital_beds.nlargest(10,'NumUrbanBeds_NHP18')
plt.figure(figsize=(15,10))
plt.suptitle('Urban and Rural Health Facility',fontsize=20)
plt.subplot(221)
plt.title('Rural Hospitals')
plt.barh(top_rural_hos['State/UT'],top_rural_hos['NumRuralHospitals_NHP18'],color = '#87479d');
plt.subplot(222)
plt.title('Urban Hospitals')
plt.barh(top_urban_hos['State/UT'],top_urban_hos['NumUrbanHospitals_NHP18'],color = '#9370db');
plt.subplot(223)
plt.title('Rural Beds')
plt.barh(top_rural_beds['State/UT'],top_rural_beds['NumRuralBeds_NHP18'],color = '#87479d');
plt.subplot(224)
plt.title('Urban Beds')
plt.barh(top_urban_beds['State/UT'],top_urban_beds['NumUrbanBeds_NHP18'],color = '#9370db');
state_test = pd.pivot_table(state_testing, values=['TotalSamples','Negative','Positive'], index='State', aggfunc='max')
state_names = list(state_test.index)
state_test['State'] = state_names
plt.figure(figsize=(25,20))
sns.set_color_codes("pastel")
sns.barplot(x="TotalSamples", y= state_names, data=state_test,label="Total Samples", color = '#7370db')
sns.barplot(x='Negative', y=state_names, data=state_test,label='Negative', color= '#af8887')
sns.barplot(x='Positive', y=state_names, data=state_test,label='Positive', color='#6ff79d')
plt.title('Testing statewise insight',fontsize = 20)
plt.legend(ncol=2, loc="lower right", frameon=True);
values = list(ICMR_labs['state'].value_counts())
names = list(ICMR_labs['state'].value_counts().index)
plt.figure(figsize=(15,10))
sns.set_color_codes("pastel")
plt.title('ICMR Testing Centers in each State', fontsize = 20)
sns.barplot(x= values, y= names,color = '#ff2345');
train = pd.read_csv('/content/train.csv')
test = pd.read_csv('/content/test.csv')
train['Date'] = pd.to_datetime(train['Date'])
test['Date'] = pd.to_datetime(test['Date'])
Prophet is open source software released by Facebook’s Core Data Science team. It is available for download on CRAN and PyPI.
We use Prophet, a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
-
Accurate and fast: Prophet is used in many applications across Facebook for producing reliable forecasts for planning and goal setting. Facebook finds it to perform better than any other approach in the majority of cases. It fit models in Stan so that you get forecasts in just a few seconds.
-
Fully automatic: Get a reasonable forecast on messy data with no manual effort. Prophet is robust to outliers, missing data, and dramatic changes in your time series.
-
Tunable forecasts: The Prophet procedure includes many possibilities for users to tweak and adjust forecasts. You can use human-interpretable parameters to improve your forecast by adding your domain knowledge
-
Available in R or Python: Facebook has implemented the Prophet procedure in R and Python. Both of them share the same underlying Stan code for fitting. You can use whatever language you’re comfortable with to get forecasts.
- https://facebook.github.io/prophet/
- https://facebook.github.io/prophet/docs/
- https://github.com/facebook/prophet
- https://facebook.github.io/prophet/docs/quick_start.html
!pip install Prophet
Collecting Prophet
Downloading prophet-0.1.1.post1.tar.gz (90 kB)
Requirement already satisfied: pytz>=2014.9 in c:\users\raghav\anaconda3\lib\site-packages (from Prophet) (2020.1)
Requirement already satisfied: pandas>=0.15.1 in c:\users\raghav\anaconda3\lib\site-packages (from Prophet) (1.0.4)
Requirement already satisfied: six>=1.8.0 in c:\users\raghav\anaconda3\lib\site-packages (from Prophet) (1.15.0)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\raghav\anaconda3\lib\site-packages (from pandas>=0.15.1->Prophet) (2.8.1)
Requirement already satisfied: numpy>=1.13.3 in c:\users\raghav\anaconda3\lib\site-packages (from pandas>=0.15.1->Prophet) (1.19.1)
Building wheels for collected packages: Prophet
Building wheel for Prophet (setup.py): started
Building wheel for Prophet (setup.py): finished with status 'done'
Created wheel for Prophet: filename=prophet-0.1.1.post1-py3-none-any.whl size=13254 sha256=ba625745471e8c2acffc86c1c928c450f8274af8233d264d6f65a0cd31fff95f
Stored in directory: c:\users\raghav\appdata\local\pip\cache\wheels\98\36\19\702df5440d2cf01c8221d08fb26bfe66e872100e7bfd75bb8f
Successfully built Prophet
Installing collected packages: Prophet
Successfully installed Prophet-0.1.1.post1
# !pip install pystan
# !pip install fbprophet
!conda install -c conda-forge fbprophet
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, add_changepoints_to_plot
k = df1[df1['Country/Region']=='India'].loc[:,'1/22/20':]
india_confirmed = k.values.tolist()[0]
data = pd.DataFrame(columns = ['ds','y'])
data['ds'] = dates
data['y'] = india_confirmed
The input to Prophet is always a dataframe with two columns: ds and y. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast.
Generating a week ahead forecast of confirmed cases of NCOVID-19 using Prophet, with 95% prediction interval by creating a base model with no tweaking of seasonality-related parameters and additional regressors.
prop = Prophet(interval_width=0.95)
prop.fit(data)
future = prop.make_future_dataframe(periods=15)
future.tail(15)
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
ds | |
---|---|
253 | 2020-10-01 |
254 | 2020-10-02 |
255 | 2020-10-03 |
256 | 2020-10-04 |
257 | 2020-10-05 |
258 | 2020-10-06 |
259 | 2020-10-07 |
260 | 2020-10-08 |
261 | 2020-10-09 |
262 | 2020-10-10 |
263 | 2020-10-11 |
264 | 2020-10-12 |
265 | 2020-10-13 |
266 | 2020-10-14 |
267 | 2020-10-15 |
The predict method will assign each row in future a predicted value which it names yhat. If you pass in historical dates, it will provide an in-sample fit. The forecast object here is a new dataframe that includes a column yhat with the forecast, as well as columns for components and uncertainty intervals.
#predicting the future with date, and upper and lower limit of y value
forecast = prop.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
ds | yhat | yhat_lower | yhat_upper | |
---|---|---|---|---|
263 | 2020-10-11 | 7.100590e+06 | 7.006306e+06 | 7.195882e+06 |
264 | 2020-10-12 | 7.178642e+06 | 7.074952e+06 | 7.289030e+06 |
265 | 2020-10-13 | 7.257725e+06 | 7.147775e+06 | 7.366396e+06 |
266 | 2020-10-14 | 7.336894e+06 | 7.217428e+06 | 7.455318e+06 |
267 | 2020-10-15 | 7.417946e+06 | 7.301104e+06 | 7.551448e+06 |
You can plot the forecast by calling the Prophet.plot method and passing in your forecast dataframe.
confirmed_forecast_plot = prop.plot(forecast)
confirmed_forecast_plot =prop.plot_components(forecast)
from statsmodels.tsa.arima_model import ARIMA
from datetime import timedelta
arima = ARIMA(data['y'], order=(5, 1, 0))
arima = arima.fit(trend='c', full_output=True, disp=True)
forecast = arima.forecast(steps= 30)
pred = list(forecast[0])
start_date = data['ds'].max()
prediction_dates = []
for i in range(30):
date = start_date + timedelta(days=1)
prediction_dates.append(date)
start_date = date
plt.figure(figsize= (15,10))
plt.xlabel("Dates",fontsize = 20)
plt.ylabel('Total cases',fontsize = 20)
plt.title("Predicted Values for the next 15 Days" , fontsize = 20)
plt.plot_date(y= pred,x= prediction_dates,linestyle ='dashed',color = '#ff9999',label = 'Predicted');
plt.plot_date(y=data['y'],x=data['ds'],linestyle = '-',color = 'blue',label = 'Actual');
plt.legend();
# df_India.head()
df = df_India.dropna(subset = ["Latitude","Longitude"], inplace=True)
# Learn how to use folium to create a zoomable map
map = folium.Map(location=[20, 70], zoom_start=4,tiles='Stamenterrain')
for lat, lon, value, name in zip(df_India['Latitude'], df_India['Longitude'], df_India['Confirmed'], df_India['State/UnionTerritory']):
folium.CircleMarker([lat, lon], radius=value*0.002, popup = ('<strong>State</strong>: ' + str(name).capitalize() + '<br>''<strong>Total Cases</strong>: ' + str(value) + '<br>'),color='red',fill_color='red',fill_opacity=0.09 ).add_to(map)
map
#Part 3: Exploring World wide data
world_confirmed = confirmed_df[confirmed_df.columns[-1:]].sum()
world_recovered = recovered_df[recovered_df.columns[-1:]].sum()
world_deaths = deaths_df[deaths_df.columns[-1:]].sum()
world_active = world_confirmed - (world_recovered - world_deaths)
labels = ['Active','Recovered','Deceased']
sizes = [world_active,world_recovered,world_deaths]
color= ['blue','green','red']
explode = []
for i in labels:
explode.append(0.05)
plt.figure(figsize= (15,10))
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=9, explode = explode,colors = color)
centre_circle = plt.Circle((0,0),0.70,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.title('World COVID-19 Cases',fontsize = 20)
plt.axis('equal')
plt.tight_layout()
dates
[Timestamp('2020-01-22 00:00:00'),
Timestamp('2020-01-23 00:00:00'),
Timestamp('2020-01-24 00:00:00'),
Timestamp('2020-01-25 00:00:00'),
Timestamp('2020-01-26 00:00:00'),
Timestamp('2020-01-27 00:00:00'),
Timestamp('2020-01-28 00:00:00'),
Timestamp('2020-01-29 00:00:00'),
Timestamp('2020-01-30 00:00:00'),
Timestamp('2020-01-31 00:00:00'),
Timestamp('2020-02-01 00:00:00'),
Timestamp('2020-02-02 00:00:00'),
Timestamp('2020-02-03 00:00:00'),
Timestamp('2020-02-04 00:00:00'),
Timestamp('2020-02-05 00:00:00'),
Timestamp('2020-02-06 00:00:00'),
Timestamp('2020-02-07 00:00:00'),
Timestamp('2020-02-08 00:00:00'),
Timestamp('2020-02-09 00:00:00'),
Timestamp('2020-02-10 00:00:00'),
Timestamp('2020-02-11 00:00:00'),
Timestamp('2020-02-12 00:00:00'),
Timestamp('2020-02-13 00:00:00'),
Timestamp('2020-02-14 00:00:00'),
Timestamp('2020-02-15 00:00:00'),
Timestamp('2020-02-16 00:00:00'),
Timestamp('2020-02-17 00:00:00'),
Timestamp('2020-02-18 00:00:00'),
Timestamp('2020-02-19 00:00:00'),
Timestamp('2020-02-20 00:00:00'),
Timestamp('2020-02-21 00:00:00'),
Timestamp('2020-02-22 00:00:00'),
Timestamp('2020-02-23 00:00:00'),
Timestamp('2020-02-24 00:00:00'),
Timestamp('2020-02-25 00:00:00'),
Timestamp('2020-02-26 00:00:00'),
Timestamp('2020-02-27 00:00:00'),
Timestamp('2020-02-28 00:00:00'),
Timestamp('2020-02-29 00:00:00'),
Timestamp('2020-03-01 00:00:00'),
Timestamp('2020-03-02 00:00:00'),
Timestamp('2020-03-03 00:00:00'),
Timestamp('2020-03-04 00:00:00'),
Timestamp('2020-03-05 00:00:00'),
Timestamp('2020-03-06 00:00:00'),
Timestamp('2020-03-07 00:00:00'),
Timestamp('2020-03-08 00:00:00'),
Timestamp('2020-03-09 00:00:00'),
Timestamp('2020-03-10 00:00:00'),
Timestamp('2020-03-11 00:00:00'),
Timestamp('2020-03-12 00:00:00'),
Timestamp('2020-03-13 00:00:00'),
Timestamp('2020-03-14 00:00:00'),
Timestamp('2020-03-15 00:00:00'),
Timestamp('2020-03-16 00:00:00'),
Timestamp('2020-03-17 00:00:00'),
Timestamp('2020-03-18 00:00:00'),
Timestamp('2020-03-19 00:00:00'),
Timestamp('2020-03-20 00:00:00'),
Timestamp('2020-03-21 00:00:00'),
Timestamp('2020-03-22 00:00:00'),
Timestamp('2020-03-23 00:00:00'),
Timestamp('2020-03-24 00:00:00'),
Timestamp('2020-03-25 00:00:00'),
Timestamp('2020-03-26 00:00:00'),
Timestamp('2020-03-27 00:00:00'),
Timestamp('2020-03-28 00:00:00'),
Timestamp('2020-03-29 00:00:00'),
Timestamp('2020-03-30 00:00:00'),
Timestamp('2020-03-31 00:00:00'),
Timestamp('2020-04-01 00:00:00'),
Timestamp('2020-04-02 00:00:00'),
Timestamp('2020-04-03 00:00:00'),
Timestamp('2020-04-04 00:00:00'),
Timestamp('2020-04-05 00:00:00'),
Timestamp('2020-04-06 00:00:00'),
Timestamp('2020-04-07 00:00:00'),
Timestamp('2020-04-08 00:00:00'),
Timestamp('2020-04-09 00:00:00'),
Timestamp('2020-04-10 00:00:00'),
Timestamp('2020-04-11 00:00:00'),
Timestamp('2020-04-12 00:00:00'),
Timestamp('2020-04-13 00:00:00'),
Timestamp('2020-04-14 00:00:00'),
Timestamp('2020-04-15 00:00:00'),
Timestamp('2020-04-16 00:00:00'),
Timestamp('2020-04-17 00:00:00'),
Timestamp('2020-04-18 00:00:00'),
Timestamp('2020-04-19 00:00:00'),
Timestamp('2020-04-20 00:00:00'),
Timestamp('2020-04-21 00:00:00'),
Timestamp('2020-04-22 00:00:00'),
Timestamp('2020-04-23 00:00:00'),
Timestamp('2020-04-24 00:00:00'),
Timestamp('2020-04-25 00:00:00'),
Timestamp('2020-04-26 00:00:00'),
Timestamp('2020-04-27 00:00:00'),
Timestamp('2020-04-28 00:00:00'),
Timestamp('2020-04-29 00:00:00'),
Timestamp('2020-04-30 00:00:00'),
Timestamp('2020-05-01 00:00:00'),
Timestamp('2020-05-02 00:00:00'),
Timestamp('2020-05-03 00:00:00'),
Timestamp('2020-05-04 00:00:00'),
Timestamp('2020-05-05 00:00:00'),
Timestamp('2020-05-06 00:00:00'),
Timestamp('2020-05-07 00:00:00'),
Timestamp('2020-05-08 00:00:00'),
Timestamp('2020-05-09 00:00:00'),
Timestamp('2020-05-10 00:00:00'),
Timestamp('2020-05-11 00:00:00'),
Timestamp('2020-05-12 00:00:00'),
Timestamp('2020-05-13 00:00:00'),
Timestamp('2020-05-14 00:00:00'),
Timestamp('2020-05-15 00:00:00'),
Timestamp('2020-05-16 00:00:00'),
Timestamp('2020-05-17 00:00:00'),
Timestamp('2020-05-18 00:00:00'),
Timestamp('2020-05-19 00:00:00'),
Timestamp('2020-05-20 00:00:00'),
Timestamp('2020-05-21 00:00:00'),
Timestamp('2020-05-22 00:00:00'),
Timestamp('2020-05-23 00:00:00'),
Timestamp('2020-05-24 00:00:00'),
Timestamp('2020-05-25 00:00:00'),
Timestamp('2020-05-26 00:00:00'),
Timestamp('2020-05-27 00:00:00'),
Timestamp('2020-05-28 00:00:00'),
Timestamp('2020-05-29 00:00:00'),
Timestamp('2020-05-30 00:00:00'),
Timestamp('2020-05-31 00:00:00'),
Timestamp('2020-06-01 00:00:00'),
Timestamp('2020-06-02 00:00:00'),
Timestamp('2020-06-03 00:00:00'),
Timestamp('2020-06-04 00:00:00'),
Timestamp('2020-06-05 00:00:00'),
Timestamp('2020-06-06 00:00:00'),
Timestamp('2020-06-07 00:00:00'),
Timestamp('2020-06-08 00:00:00'),
Timestamp('2020-06-09 00:00:00'),
Timestamp('2020-06-10 00:00:00'),
Timestamp('2020-06-11 00:00:00'),
Timestamp('2020-06-12 00:00:00'),
Timestamp('2020-06-13 00:00:00'),
Timestamp('2020-06-14 00:00:00'),
Timestamp('2020-06-15 00:00:00'),
Timestamp('2020-06-16 00:00:00'),
Timestamp('2020-06-17 00:00:00'),
Timestamp('2020-06-18 00:00:00'),
Timestamp('2020-06-19 00:00:00'),
Timestamp('2020-06-20 00:00:00'),
Timestamp('2020-06-21 00:00:00'),
Timestamp('2020-06-22 00:00:00'),
Timestamp('2020-06-23 00:00:00'),
Timestamp('2020-06-24 00:00:00'),
Timestamp('2020-06-25 00:00:00'),
Timestamp('2020-06-26 00:00:00'),
Timestamp('2020-06-27 00:00:00'),
Timestamp('2020-06-28 00:00:00'),
Timestamp('2020-06-29 00:00:00'),
Timestamp('2020-06-30 00:00:00'),
Timestamp('2020-07-01 00:00:00'),
Timestamp('2020-07-02 00:00:00'),
Timestamp('2020-07-03 00:00:00'),
Timestamp('2020-07-04 00:00:00'),
Timestamp('2020-07-05 00:00:00'),
Timestamp('2020-07-06 00:00:00'),
Timestamp('2020-07-07 00:00:00'),
Timestamp('2020-07-08 00:00:00'),
Timestamp('2020-07-09 00:00:00'),
Timestamp('2020-07-10 00:00:00'),
Timestamp('2020-07-11 00:00:00'),
Timestamp('2020-07-12 00:00:00'),
Timestamp('2020-07-13 00:00:00'),
Timestamp('2020-07-14 00:00:00'),
Timestamp('2020-07-15 00:00:00'),
Timestamp('2020-07-16 00:00:00'),
Timestamp('2020-07-17 00:00:00'),
Timestamp('2020-07-18 00:00:00'),
Timestamp('2020-07-19 00:00:00'),
Timestamp('2020-07-20 00:00:00'),
Timestamp('2020-07-21 00:00:00'),
Timestamp('2020-07-22 00:00:00'),
Timestamp('2020-07-23 00:00:00'),
Timestamp('2020-07-24 00:00:00'),
Timestamp('2020-07-25 00:00:00'),
Timestamp('2020-07-26 00:00:00'),
Timestamp('2020-07-27 00:00:00'),
Timestamp('2020-07-28 00:00:00'),
Timestamp('2020-07-29 00:00:00'),
Timestamp('2020-07-30 00:00:00'),
Timestamp('2020-07-31 00:00:00'),
Timestamp('2020-08-01 00:00:00'),
Timestamp('2020-08-02 00:00:00'),
Timestamp('2020-08-03 00:00:00'),
Timestamp('2020-08-04 00:00:00'),
Timestamp('2020-08-05 00:00:00'),
Timestamp('2020-08-06 00:00:00'),
Timestamp('2020-08-07 00:00:00'),
Timestamp('2020-08-08 00:00:00'),
Timestamp('2020-08-09 00:00:00'),
Timestamp('2020-08-10 00:00:00'),
Timestamp('2020-08-11 00:00:00'),
Timestamp('2020-08-12 00:00:00'),
Timestamp('2020-08-13 00:00:00'),
Timestamp('2020-08-14 00:00:00'),
Timestamp('2020-08-15 00:00:00'),
Timestamp('2020-08-16 00:00:00'),
Timestamp('2020-08-17 00:00:00'),
Timestamp('2020-08-18 00:00:00'),
Timestamp('2020-08-19 00:00:00'),
Timestamp('2020-08-20 00:00:00'),
Timestamp('2020-08-21 00:00:00'),
Timestamp('2020-08-22 00:00:00'),
Timestamp('2020-08-23 00:00:00'),
Timestamp('2020-08-24 00:00:00'),
Timestamp('2020-08-25 00:00:00'),
Timestamp('2020-08-26 00:00:00'),
Timestamp('2020-08-27 00:00:00'),
Timestamp('2020-08-28 00:00:00'),
Timestamp('2020-08-29 00:00:00'),
Timestamp('2020-08-30 00:00:00'),
Timestamp('2020-08-31 00:00:00'),
Timestamp('2020-09-01 00:00:00'),
Timestamp('2020-09-02 00:00:00'),
Timestamp('2020-09-03 00:00:00'),
Timestamp('2020-09-04 00:00:00'),
Timestamp('2020-09-05 00:00:00'),
Timestamp('2020-09-06 00:00:00'),
Timestamp('2020-09-07 00:00:00'),
Timestamp('2020-09-08 00:00:00'),
Timestamp('2020-09-09 00:00:00'),
Timestamp('2020-09-10 00:00:00'),
Timestamp('2020-09-11 00:00:00'),
Timestamp('2020-09-12 00:00:00'),
Timestamp('2020-09-13 00:00:00'),
Timestamp('2020-09-14 00:00:00'),
Timestamp('2020-09-15 00:00:00'),
Timestamp('2020-09-16 00:00:00'),
Timestamp('2020-09-17 00:00:00'),
Timestamp('2020-09-18 00:00:00'),
Timestamp('2020-09-19 00:00:00'),
Timestamp('2020-09-20 00:00:00'),
Timestamp('2020-09-21 00:00:00'),
Timestamp('2020-09-22 00:00:00'),
Timestamp('2020-09-23 00:00:00'),
Timestamp('2020-09-24 00:00:00'),
Timestamp('2020-09-25 00:00:00'),
Timestamp('2020-09-26 00:00:00'),
Timestamp('2020-09-27 00:00:00'),
Timestamp('2020-09-28 00:00:00'),
Timestamp('2020-09-29 00:00:00'),
Timestamp('2020-09-30 00:00:00')]
hotspots = ['China','Germany','Iran','Italy','Spain','US','Korea, South','France','Turkey','United Kingdom','India']
dates = list(confirmed_df.columns[4:])
dates = list(pd.to_datetime(dates))
dates_india = dates[8:]
df1 = confirmed_df.groupby('Country/Region').sum().reset_index()
df2 = deaths_df.groupby('Country/Region').sum().reset_index()
df3 = recovered_df.groupby('Country/Region').sum().reset_index()
global_confirmed = {}
global_deaths = {}
global_recovered = {}
global_active= {}
for country in hotspots:
k =df1[df1['Country/Region'] == country].loc[:,'1/30/20':]
global_confirmed[country] = k.values.tolist()[0]
k =df2[df2['Country/Region'] == country].loc[:,'1/30/20':]
global_deaths[country] = k.values.tolist()[0]
k =df3[df3['Country/Region'] == country].loc[:,'1/30/20':]
global_recovered[country] = k.values.tolist()[0]
# for country in hotspots:
# k = list(map(int.__sub__, global_confirmed[country], global_deaths[country]))
# global_active[country] = list(map(int.__sub__, k, global_recovered[country]))
fig = plt.figure(figsize= (15,25))
plt.suptitle('Active, Recovered, Deaths in Hotspot Countries and India as of '+ today,fontsize = 20,y=1.0)
#plt.legend()
k=0
for i in range(1,12):
ax = fig.add_subplot(6,2,i)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
# ax.bar(dates_india,global_active[hotspots[k]],color = 'green',alpha = 0.6,label = 'Active');
ax.bar(dates_india,global_confirmed[hotspots[k]],color='blue',label = 'Confirmed');
ax.bar(dates_india,global_recovered[hotspots[k]],color='grey',label = 'Recovered');
ax.bar(dates_india,global_deaths[hotspots[k]],color='red',label = 'Death');
plt.title(hotspots[k])
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='upper left')
k=k+1
plt.tight_layout(pad=3.0)
countries = ['China','Germany','Iran','Italy','Spain','US','Korea, South','France','United Kingdom','India']
global_confirmed = []
global_recovered = []
global_deaths = []
for country in countries:
k =df1[df1['Country/Region'] == country].loc[:,'1/30/20':]
global_confirmed.append(k.values.tolist()[0])
k =df2[df2['Country/Region'] == country].loc[:,'1/30/20':]
global_deaths.append(k.values.tolist()[0])
k =df3[df3['Country/Region'] == country].loc[:,'1/30/20':]
global_deaths.append(k.values.tolist()[0])
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 11)
plt.yticks(fontsize = 10)
plt.xlabel("Dates",fontsize = 20)
plt.ylabel('Total cases',fontsize = 20)
plt.title("Comparison with other Countries" , fontsize = 20)
for i in range(len(countries)):
plt.plot_date(y= global_confirmed[i],x= dates_india,label = countries[i])
plt.legend();
SOURCE: www.cdc.gov/coronavirus
- https://www.mohfw.gov.in/
- https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
- https://www.worldometers.info/coronavirus/#countries
- https://api.covid19india.org/
The latest data can also be extracted from the available APIs and reading the json. Below are the API list that have been provided by crowd sourced. Extract and use these data to find meaningful insights.
- National time series, statewise stats and test counts
- State-district-wise State-district-wise V2
- Travel history
- Raw data
- States Daily changes
- Statewise Tested Numbers
Extracting data from Hirokuapp
api = pd.read_json('https://corona-virus-stats.herokuapp.com/api/v1/cases/countries-search')
json_data = api['data']['rows']
from pandas.io.json import json_normalize
data = json_normalize(json_data)
data
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | country_abbreviation | total_cases | new_cases | total_deaths | new_deaths | total_recovered | active_cases | serious_critical | cases_per_mill_pop | flag | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | World | 4,525,103 | 3,077 | 303,351 | 269 | 1,703,742 | 2,518,010 | 45,560 | 581.0 | https://upload.wikimedia.org/wikipedia/commons... | |
1 | USA | US | 1,457,593 | 0 | 86,912 | 0 | 318,027 | 1,052,654 | 16,240 | 4,404.0 | https://www.worldometers.info/img/flags/us-fla... |
2 | Spain | ES | 272,646 | 0 | 27,321 | 0 | 186,480 | 58,845 | 1,376 | 5,831.0 | https://www.worldometers.info/img/flags/sp-fla... |
3 | Russia | RU | 252,245 | 0 | 2,305 | 0 | 53,530 | 196,410 | 2,300 | 1,728.0 | https://www.worldometers.info/img/flags/rs-fla... |
4 | UK | GB | 233,151 | 0 | 33,614 | 0 | N/A | 199,193 | 1,559 | 3,434.0 | https://www.worldometers.info/img/flags/uk-fla... |
5 | Italy | IT | 223,096 | 0 | 31,368 | 0 | 115,288 | 76,440 | 855 | 3,690.0 | https://www.worldometers.info/img/flags/it-fla... |
6 | Brazil | BR | 203,165 | 247 | 13,999 | 6 | 79,479 | 109,687 | 8,318 | 956.0 | https://www.worldometers.info/img/flags/br-fla... |
7 | France | FR | 178,870 | 0 | 27,425 | 0 | 59,605 | 91,840 | 2,299 | 2,740.0 | https://www.worldometers.info/img/flags/fr-fla... |
8 | Germany | DE | 174,975 | 0 | 7,928 | 0 | 150,300 | 16,747 | 1,329 | 2,088.0 | https://www.worldometers.info/img/flags/gm-fla... |
9 | Turkey | TR | 144,749 | 0 | 4,007 | 0 | 104,030 | 36,712 | 963 | 1,716.0 | https://www.worldometers.info/img/flags/tu-fla... |
# to parse json contents
import json
# to parse csv files
import csv
import requests
# get response from the web page for LIVE data
response = requests.get('https://api.covid19india.org/raw_data3.json')
# get contents from the response
content = response.content
# parse the json file
parsed = json.loads(content)
# keys
parsed.keys()
dict_keys(['raw_data'])
# save to df
df = pd.DataFrame(parsed['raw_data'])
# shape of the dataframe
print(df.shape)
# # list of columns
print(df.columns)
# # first few rows
df.head()
(10020, 20)
Index(['agebracket', 'contractedfromwhichpatientsuspected', 'currentstatus',
'dateannounced', 'detectedcity', 'detecteddistrict', 'detectedstate',
'entryid', 'gender', 'nationality', 'notes', 'numcases',
'patientnumber', 'source1', 'source2', 'source3', 'statecode',
'statepatientnumber', 'statuschangedate', 'typeoftransmission'],
dtype='object')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
agebracket | contractedfromwhichpatientsuspected | currentstatus | dateannounced | detectedcity | detecteddistrict | detectedstate | entryid | gender | nationality | notes | numcases | patientnumber | source1 | source2 | source3 | statecode | statepatientnumber | statuschangedate | typeoftransmission | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Hospitalized | 27/04/2020 | West Bengal | 1 | Details awaited | 38 | 27892 | mohfw.gov.in | WB | |||||||||||
1 | Hospitalized | 27/04/2020 | Bhilwara | Rajasthan | 2 | Details awaited | 2 | 27893 | https://twitter.com/ANI/status/125461859651442... | RJ | ||||||||||
2 | Hospitalized | 27/04/2020 | Jaipur | Rajasthan | 3 | Details awaited | 9 | 27894 | https://twitter.com/ANI/status/125461859651442... | RJ | ||||||||||
3 | 28 | Deceased | 27/04/2020 | Surajpol | Jaipur | Rajasthan | 4 | M | Details awaited | 1 | 27895 | https://twitter.com/ANI/status/125461859651442... | RJ | |||||||
4 | Hospitalized | 27/04/2020 | Jaisalmer | Rajasthan | 5 | Details awaited | 1 | 27896 | https://twitter.com/ANI/status/125461859651442... | RJ |
# creating patient id column from patient number
# ===============================================
df['p_id'] = df['patientnumber'].apply(lambda x : 'P'+str(x))
df.columns
Index(['agebracket', 'contractedfromwhichpatientsuspected', 'currentstatus',
'dateannounced', 'detectedcity', 'detecteddistrict', 'detectedstate',
'entryid', 'gender', 'nationality', 'notes', 'numcases',
'patientnumber', 'source1', 'source2', 'source3', 'statecode',
'statepatientnumber', 'statuschangedate', 'typeoftransmission', 'p_id'],
dtype='object')
# order of columns
cols = ['patientnumber', 'p_id', 'statepatientnumber',
'dateannounced', 'agebracket', 'gender',
'detectedcity', 'detecteddistrict', 'detectedstate', 'statecode', 'nationality',
'typeoftransmission', 'contractedfromwhichpatientsuspected',
'statuschangedate', 'currentstatus', 'source1', 'source2', 'source3', 'notes']
# rearrange columns
df = df[cols]
# rename columns
df.columns = ['patient_number', 'p_id', 'state_patient_number',
'date_announced', 'age_bracket', 'gender',
'detected_city', 'detected_district', 'detected_state', 'state_code', 'nationality',
'type_of_transmission', 'contracted_from_which_patient_suspected',
'status_change_date', 'current_status', 'source1', 'source2', 'source3', 'notes']
# dataframe shape
df.shape
(10020, 19)
# first 3 rows of the dataframe
df.head(3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
patient_number | p_id | state_patient_number | date_announced | age_bracket | gender | detected_city | detected_district | detected_state | state_code | nationality | type_of_transmission | contracted_from_which_patient_suspected | status_change_date | current_status | source1 | source2 | source3 | notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 27892 | P27892 | 27/04/2020 | West Bengal | WB | Hospitalized | mohfw.gov.in | Details awaited | |||||||||||
1 | 27893 | P27893 | 27/04/2020 | Bhilwara | Rajasthan | RJ | Hospitalized | https://twitter.com/ANI/status/125461859651442... | Details awaited | ||||||||||
2 | 27894 | P27894 | 27/04/2020 | Jaipur | Rajasthan | RJ | Hospitalized | https://twitter.com/ANI/status/125461859651442... | Details awaited |
# no. of empty values in each column
# ==================================
print(df.shape, '\n')
for i in df.columns:
print(i, '\t', df[df[i]==''].shape[0])
(10020, 19)
patient_number 12
p_id 0
state_patient_number 4882
date_announced 0
age_bracket 4837
gender 5366
detected_city 9599
detected_district 86
detected_state 5
state_code 5
nationality 10020
type_of_transmission 10020
contracted_from_which_patient_suspected 9779
status_change_date 10020
current_status 0
source1 65
source2 9940
source3 9989
notes 8089
# no. of non-empty values in each column
# ===================================
print(df.shape, '\n')
for i in df.columns:
print(i, '\t', df[df[i]!=''].shape[0])
(10020, 19)
patient_number 10008
p_id 10020
state_patient_number 5138
date_announced 10020
age_bracket 5183
gender 4654
detected_city 421
detected_district 9934
detected_state 10015
state_code 10015
nationality 0
type_of_transmission 0
contracted_from_which_patient_suspected 241
status_change_date 0
current_status 10020
source1 9955
source2 80
source3 31
notes 1931
# replacing empty strings with np.nan
# ==================================-
print(df.shape)
df = df.replace(r'', np.nan, regex=True)
df.isna().sum()
(10020, 19)
patient_number 12
p_id 0
state_patient_number 4882
date_announced 0
age_bracket 4837
gender 5366
detected_city 9599
detected_district 86
detected_state 5
state_code 5
nationality 10020
type_of_transmission 10020
contracted_from_which_patient_suspected 9779
status_change_date 10020
current_status 0
source1 65
source2 9940
source3 9989
notes 8089
dtype: int64
# droping empty rows (row with just row number but without patient entry
# ======================================================================
print(df.shape)
# df.dropna(subset=['detected_state'], inplace=True)
print(df.shape)
df.isna().sum()
(10020, 19)
(10020, 19)
patient_number 12
p_id 0
state_patient_number 4882
date_announced 0
age_bracket 4837
gender 5366
detected_city 9599
detected_district 86
detected_state 5
state_code 5
nationality 10020
type_of_transmission 10020
contracted_from_which_patient_suspected 9779
status_change_date 10020
current_status 0
source1 65
source2 9940
source3 9989
notes 8089
dtype: int64
# save to csv`
df.to_csv('patients_data.csv', index=False)
# get response from the web page
response = requests.get('https://api.covid19india.org/state_test_data.json')
# get contents from the response
content = response.content
# parse the json file
parsed = json.loads(content)
# keys
parsed.keys()
dict_keys(['states_tested_data'])
# get response from the web page
response = requests.get('https://api.covid19india.org/state_test_data.json')
# get contents from the response
content = response.content
# parse the json file
parsed = json.loads(content)
# keys
parsed.keys()
dict_keys(['states_tested_data'])
# save data in a dataframe
th = pd.DataFrame(parsed['states_tested_data'])
# first few rows
th
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
antigentests | coronaenquirycalls | cumulativepeopleinquarantine | negative | numcallsstatehelpline | numicubeds | numisolationbeds | numventilators | othertests | peopleinicu | ... | testsperpositivecase | testsperthousand | totaln95masks | totalpeoplecurrentlyinquarantine | totalpeoplereleasedfromquarantine | totalppe | totaltested | unconfirmed | updatedon | _djhdx | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1210 | 50 | ... | 117 | 3.53 | 1403 | 181 | 17/04/2020 | NaN | ||||||||||||
1 | 280 | 50 | ... | 99 | 6.75 | 614 | 347 | 2679 | 246 | 24/04/2020 | NaN | ||||||||||
2 | 298 | 50 | ... | 86 | 7.17 | 724 | 420 | 2848 | 106 | 27/04/2020 | NaN | ||||||||||
3 | 340 | 50 | ... | 114 | 9.46 | 643 | 556 | 3754 | 199 | 01/05/2020 | NaN | ||||||||||
4 | 471 | 98 | ... | 202 | 16.82 | 16 | 1196 | 6677 | 136 | 16/05/2020 | NaN | ||||||||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5803 | 1057947 | 1243 | 12675 | 790 | ... | 2298040 | 2444 | 107697 | 2370262 | 3098657 | 27/09/2020 | NaN | |||||||||
5804 | 1082706 | 1243 | 12675 | 790 | ... | 2308040 | 2447 | 107712 | 2375262 | 3139938 | 28/09/2020 | NaN | |||||||||
5805 | 1107750 | 1243 | 12675 | 790 | ... | 2323040 | 2439 | 107706 | 2381262 | 3183697 | 29/09/2020 | NaN | |||||||||
5806 | 1131535 | 1243 | 12715 | 790 | ... | 2335040 | 2442 | 107721 | 2387262 | 3227462 | 30/09/2020 | NaN | |||||||||
5807 | 1155254 | 1243 | 12715 | 790 | ... | 2350040 | 2439 | 107726 | 2393262 | 3271316 | 01/10/2020 | NaN |
5808 rows × 32 columns
th.columns
Index(['antigentests', 'coronaenquirycalls', 'cumulativepeopleinquarantine',
'negative', 'numcallsstatehelpline', 'numicubeds', 'numisolationbeds',
'numventilators', 'othertests', 'peopleinicu', 'peopleonventilators',
'populationncp2019projection', 'positive', 'rtpcrtests', 'source1',
'source2', 'source3', 'state', 'tagpeopleinquarantine',
'tagtotaltested', 'testpositivityrate', 'testspermillion',
'testsperpositivecase', 'testsperthousand', 'totaln95masks',
'totalpeoplecurrentlyinquarantine', 'totalpeoplereleasedfromquarantine',
'totalppe', 'totaltested', 'unconfirmed', 'updatedon', '_djhdx'],
dtype='object')
# save to csv`
th.to_csv('tests_latest_state_level.csv', index=False)
# to get web contents
import requests
# to parse json contents
import json
# to parse csv files
import csv
# get response from the web page
response = requests.get('https://api.covid19india.org/zones.json')
# get contents from the response
content = response.content
# parse the json file
parsed = json.loads(content)
# keys
parsed.keys()
dict_keys(['zones'])
zo = pd.DataFrame(parsed['zones'])
zo.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
district | districtcode | lastupdated | source | state | statecode | zone | |
---|---|---|---|---|---|---|---|
0 | Nicobars | AN_Nicobars | 01/05/2020 | https://www.facebook.com/airnewsalerts/photos/... | Andaman and Nicobar Islands | AN | Green |
1 | North and Middle Andaman | AN_North and Middle Andaman | 01/05/2020 | https://www.facebook.com/airnewsalerts/photos/... | Andaman and Nicobar Islands | AN | Green |
2 | South Andaman | AN_South Andaman | 01/05/2020 | https://www.facebook.com/airnewsalerts/photos/... | Andaman and Nicobar Islands | AN | Red |
3 | Anantapur | AP_Anantapur | 01/05/2020 | https://www.facebook.com/airnewsalerts/photos/... | Andhra Pradesh | AP | Orange |
4 | Chittoor | AP_Chittoor | 01/05/2020 | https://www.facebook.com/airnewsalerts/photos/... | Andhra Pradesh | AP | Red |
# save to csv`
zo.to_csv('zones.csv', index=False)
response = requests.get('https://api.covid19india.org/data.json')
content = response.content
parsed = json.loads(content)
parsed.keys()
dict_keys(['cases_time_series', 'statewise', 'tested'])
national = pd.DataFrame(parsed['cases_time_series'])
national.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
dailyconfirmed | dailydeceased | dailyrecovered | date | totalconfirmed | totaldeceased | totalrecovered | |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 30 January | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 31 January | 1 | 0 | 0 |
2 | 0 | 0 | 0 | 01 February | 1 | 0 | 0 |
3 | 1 | 0 | 0 | 02 February | 2 | 0 | 0 |
4 | 1 | 0 | 0 | 03 February | 3 | 0 | 0 |
national.columns
Index(['dailyconfirmed', 'dailydeceased', 'dailyrecovered', 'date',
'totalconfirmed', 'totaldeceased', 'totalrecovered'],
dtype='object')
national = national[['date', 'totalconfirmed', 'totaldeceased', 'totalrecovered',
'dailyconfirmed', 'dailydeceased', 'dailyrecovered']]
national.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
date | totalconfirmed | totaldeceased | totalrecovered | dailyconfirmed | dailydeceased | dailyrecovered | |
---|---|---|---|---|---|---|---|
0 | 30 January | 1 | 0 | 0 | 1 | 0 | 0 |
1 | 31 January | 1 | 0 | 0 | 0 | 0 | 0 |
2 | 01 February | 1 | 0 | 0 | 0 | 0 | 0 |
3 | 02 February | 2 | 0 | 0 | 1 | 0 | 0 |
4 | 03 February | 3 | 0 | 0 | 1 | 0 | 0 |
# save to csv`
national.to_csv('nation_level_daily.csv', index=False)
state_level = pd.DataFrame(parsed['statewise'])
state_level.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
active | confirmed | deaths | deltaconfirmed | deltadeaths | deltarecovered | lastupdatedtime | migratedother | recovered | state | statecode | statenotes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 945551 | 6397896 | 99833 | 5936 | 29 | 2941 | 02/10/2020 12:20:45 | 918 | 5351594 | Total | TT | |
1 | 259006 | 1400922 | 37056 | 0 | 0 | 0 | 01/10/2020 23:39:43 | 434 | 1104426 | Maharashtra | MH | [Sep 9] :239 cases have been removed from the ... |
2 | 57858 | 700235 | 5869 | 0 | 0 | 0 | 01/10/2020 22:07:48 | 0 | 636508 | Andhra Pradesh | AP | |
3 | 46369 | 603290 | 9586 | 0 | 0 | 0 | 01/10/2020 18:46:44 | 0 | 547335 | Tamil Nadu | TN | [July 22]: 444 backdated deceased entries adde... |
4 | 110412 | 611837 | 8994 | 0 | 0 | 0 | 01/10/2020 22:07:49 | 19 | 492412 | Karnataka | KA |
state_level.columns
Index(['active', 'confirmed', 'deaths', 'deltaconfirmed', 'deltadeaths',
'deltarecovered', 'lastupdatedtime', 'migratedother', 'recovered',
'state', 'statecode', 'statenotes'],
dtype='object')
state_level = state_level[['state', 'statecode', 'lastupdatedtime',
'confirmed', 'active', 'deaths', 'recovered',
'deltaconfirmed', 'deltadeaths', 'deltarecovered', 'statenotes']]
state_level.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
state | statecode | lastupdatedtime | confirmed | active | deaths | recovered | deltaconfirmed | deltadeaths | deltarecovered | statenotes | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Total | TT | 02/10/2020 12:20:45 | 6397896 | 945551 | 99833 | 5351594 | 5936 | 29 | 2941 | |
1 | Maharashtra | MH | 01/10/2020 23:39:43 | 1400922 | 259006 | 37056 | 1104426 | 0 | 0 | 0 | [Sep 9] :239 cases have been removed from the ... |
2 | Andhra Pradesh | AP | 01/10/2020 22:07:48 | 700235 | 57858 | 5869 | 636508 | 0 | 0 | 0 | |
3 | Tamil Nadu | TN | 01/10/2020 18:46:44 | 603290 | 46369 | 9586 | 547335 | 0 | 0 | 0 | [July 22]: 444 backdated deceased entries adde... |
4 | Karnataka | KA | 01/10/2020 22:07:49 | 611837 | 110412 | 8994 | 492412 | 0 | 0 | 0 |
# save to csv`
state_level.to_csv('state_level_latest.csv', index=False)