Skip to content

This notebook involves scraping government websites hosting live coronavirus case updates in India to utilize in forecasting the future cases and preparedness for the pandemic.

License

Notifications You must be signed in to change notification settings

rawatraghav/INDIA-and-COVID-19

Repository files navigation

DISCLAIMER - This is an auto generated README.md created within jupyterlab using the main India_&_World_COVID_19.ipynb file and hence has some cuts and glitches. I highly recommend going through the main notebook file for the perfect version.


To run the project in its whole, extra datasets (given as .zip) are needed. You can clone the repository in your working directory with the command: git clone https://github.com/rawatraghav/INDIA-and-COVID-19.git or download repository as zip file.


alt text

COVID-19 - Pandemic in India!

About COVID-19

The coronavirus (COVID-19) pandemic is the greatest global humanitarian challenge the world has faced since World War II. The pandemic virus has spread widely, and the number of cases is rising daily. The government is working to slow down its spread.

Till date it has spread across 215 countries infecting 5,491,194 people and killing 346,331 so far. In India, as many as 138,536 COVID-19 cases have been reported so far. Of these, 57,692 have recovered and 4,024 have died. COVID19

Corona Virus Explained in Simple Terms:

  • Let's say Raghav got infected yesterday, but he won't know it untill next 14 days
  • Raghav thinks he is healthy but he is infecting 10 persons per day
  • Now these 10 persons think they are completely healthy, they travel, go out and infect 100 others
  • These 100 persons think they are healthy but they have already infected 1000 persons
  • No one knows who is healthy or who can infect you
  • All you can do is be responsible, stay in quarentine

alt text

Problem Statement:

India has responded quickly, implementing a proactive, nationwide, lockdown, to flatten the curve and use the time to plan and resource responses adequately. As of 23rd May 2020, India has witnessed 3720 deaths from 32 States and Union Territories, with a total of 123202 confirmed cases due to COVID-19. Globally the Data Scientists are using AI and machine learning to analyze, predict, and take safety measures against COVID-19 in India. 

We need a explore the COVID situation in India and the world, and strong model that predicts how the virus could spread across India in the next 15 days. ###Steps to be achieved:

  • Analyze the present condition in India
  • Collect the COVID-19 data from websites
  • Figure out the death rate and cure rate per 100 citizens across the affected states
  • Plotting charts to visualize the following:
  • Age group distribution of affected patients
  • Total sample tests done till date
  • Growth rate of COVID in top 15 states
  • Top 10 States in each health facility
  • State wise testing insights
  • ICMR testing centres in each state
  • Use Facebook Prophet to predict the confirmed cases in India
  • Use ARIMA time series model to predict the confirmed cases in India
  • Compare the Indian COVID-19 cases on global level

Importing the required libraries

# importing the required libraries
# !pip install folium
import pandas as pd

# Visualisation libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium 
from folium import plugins

# Manipulating the default plot size
plt.rcParams['figure.figsize'] = 10, 12

# Disable warnings 
import warnings
warnings.filterwarnings('ignore')

Part 1: Analysing the present condition in India

How it started in India?:

The first COVID-19 case was reported on 30th January 2020 when a student arrived Kerala from Wuhan. Just in next 2 days, Kerela reported 2 more cases. For almost a month, no new cases were reported in India, however, on 2nd March 2020, five new cases of corona virus were reported in Kerala again and since then the cases have been rising affecting 25 states, till now (Bihar and Manipur being the most recent). Here is a brief timeline of the cases in India. ###COVID-19 in India - Timeline

Recent COVID-19 updates in India

  • Sikkim on Saturday reported its first +ve COVID-19 case
  • With over 6,500 fresh cases, the Covid in India rose to 1,25,101 on Saturday morning, with 3,720 fatalities
  • West Bengal asks Railways not to send migrant trains to State till May 26 in view of Cyclone Amphan
  • 196 new COVID 19 positive cases were reported in Karnataka on Saturday
  • Complete lockdown in Bengaluru on Sunday.
  • Bruhat Bengaluru Mahanagara Palike (BBMP) Commissioner B.H. Anil Kumar said the conditions and restrictions on Sunday will be similar to that under coronavirus lockdown 1.0.

How is AI-ML useful in fighting the COVID-19 pandemic?

  • Medical resource optimization
  • Ensuring demand planning stability
  • Contact tracing
  • Situational awareness and critical response analysis

1.1 Scraping the datasets from the official Govt. website

# for date and time opeations
from datetime import datetime
# for file and folder operations
import os
# for regular expression opeations
import re
# for listing files in a folder
import glob
# for getting web contents
import requests 
import json
import csv
import numpy as np
# for scraping web contents
# from bs4 import BeautifulSoup
raw_1 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data1.csv')
raw_2 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data2.csv')
raw_3 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data3.csv')
raw_4 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data4.csv')
raw_5 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data5.csv')
raw_6 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data6.csv')
raw_7 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data7.csv')
raw_8 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data8.csv')
raw_9 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data9.csv')
raw_10 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data10.csv')
raw_11 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data11.csv')
raw_12 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data12.csv')
raw_13 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data13.csv')
raw_14 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data14.csv')
raw_15 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data15.csv')
raw_16 = pd.read_csv('https://api.covid19india.org/csv/latest/raw_data16.csv')
full_data = pd.concat([raw_1,
raw_2,
raw_3,
raw_4,
raw_5,
raw_6,
raw_7,
raw_8,
raw_9,
raw_10,
raw_11,
raw_12,
raw_13,
raw_14,
raw_15,
raw_16])

print(full_data.shape)
full_data.head()
(352525, 22)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Patient Number State Patient Number Date Announced Estimated Onset Date Age Bracket Gender Detected City Detected District Detected State State code ... Contracted from which Patient (Suspected) Nationality Type of transmission Status Change Date Source_1 Source_2 Source_3 Backup Notes Num Cases Entry_ID
0 1.0 KL-TS-P1 30/01/2020 NaN 20 F Thrissur Thrissur Kerala KL ... NaN India Imported 14/02/2020 https://twitter.com/vijayanpinarayi/status/122... https://weather.com/en-IN/india/news/news/2020... NaN Student from Wuhan 1.0 NaN
1 2.0 KL-AL-P1 02/02/2020 NaN NaN NaN Alappuzha Alappuzha Kerala KL ... NaN India Imported 14/02/2020 https://www.indiatoday.in/india/story/kerala-r... https://weather.com/en-IN/india/news/news/2020... NaN Student from Wuhan 1.0 NaN
2 3.0 KL-KS-P1 03/02/2020 NaN NaN NaN Kasaragod Kasaragod Kerala KL ... NaN India Imported 14/02/2020 https://www.indiatoday.in/india/story/kerala-n... https://twitter.com/ANI/status/122422148580539... https://weather.com/en-IN/india/news/news/2020... Student from Wuhan 1.0 NaN
3 4.0 DL-P1 02/03/2020 NaN 45 M East Delhi (Mayur Vihar) East Delhi Delhi DL ... NaN India Imported 15/03/2020 https://www.indiatoday.in/india/story/not-a-ja... https://economictimes.indiatimes.com/news/poli... NaN Travel history to Italy and Austria 1.0 NaN
4 5.0 TS-P1 02/03/2020 NaN 24 M Hyderabad Hyderabad Telangana TG ... NaN India Imported 02/03/2020 https://www.deccanherald.com/national/south/qu... https://www.indiatoday.in/india/story/coronavi... https://www.thehindu.com/news/national/coronav... Travel history to Dubai, Singapore contact 1.0 NaN

5 rows × 22 columns

Daily Cases

day_wise = pd.read_csv('https://api.covid19india.org/csv/latest/case_time_series.csv')

day_wise.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Date Daily Confirmed Total Confirmed Daily Recovered Total Recovered Daily Deceased Total Deceased
0 30 January 1 1 0 0 0 0
1 31 January 0 1 0 0 0 0
2 01 February 0 1 0 0 0 0
3 02 February 1 2 0 0 0 0
4 03 February 1 3 0 0 0 0

State Wise

state_wise = pd.read_csv('https://api.covid19india.org/csv/latest/state_wise.csv')

state_wise.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
State Confirmed Recovered Deaths Active Last_Updated_Time Migrated_Other State_code Delta_Confirmed Delta_Recovered Delta_Deaths State_Notes
0 Total 6877073 5871898 106037 898090 08/10/2020 21:24:52 1048 TT 44085 47436 483 NaN
1 Maharashtra 1480489 1196441 39072 244527 07/10/2020 22:32:57 449 MH 0 0 0 [Sep 9] :239 cases have been removed from the ...
2 Andhra Pradesh 734427 678828 6086 49513 07/10/2020 17:50:55 0 AP 0 0 0 NaN
3 Karnataka 679356 552519 9675 117143 08/10/2020 21:13:54 19 KA 10704 9613 101 NaN
4 Tamil Nadu 640943 586454 10052 44437 08/10/2020 21:13:56 0 TN 5088 5718 68 [July 22]: 444 backdated deceased entries adde...
from datetime import date

state_wise_daily = pd.read_csv('https://api.covid19india.org/csv/latest/state_wise_daily.csv')

state_wise_daily = state_wise_daily.melt(id_vars=['Date','Status'],
                                        value_vars=state_wise_daily.columns[2:],
                                        var_name='State',value_name='Count')

state_wise_daily = state_wise_daily.pivot_table(index=['Date','State'],
                                        columns=['Status'],values='Count').reset_index()

state_codes = {code:state for code, state in zip(state_wise['State_code'], state_wise['State'])}
state_codes['DD'] = 'Daman and Diu'
state_wise_daily['State_Name'] = state_wise_daily['State'].map(state_codes)

state_wise_daily
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Status Date State Confirmed Deceased Recovered State_Name
0 01-Apr-20 AN 0 0 0 Andaman and Nicobar Islands
1 01-Apr-20 AP 67 0 1 Andhra Pradesh
2 01-Apr-20 AR 0 0 0 Arunachal Pradesh
3 01-Apr-20 AS 15 0 0 Assam
4 01-Apr-20 BR 3 0 0 Bihar
... ... ... ... ... ... ...
8107 31-May-20 TT 8789 222 4928 Total
8108 31-May-20 UN 448 0 0 State Unassigned
8109 31-May-20 UP 374 4 192 Uttar Pradesh
8110 31-May-20 UT 158 0 0 Uttarakhand
8111 31-May-20 WB 371 8 187 West Bengal

8112 rows × 6 columns

state_wise.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
State Confirmed Recovered Deaths Active Last_Updated_Time Migrated_Other State_code Delta_Confirmed Delta_Recovered Delta_Deaths State_Notes
0 Total 6877073 5871898 106037 898090 08/10/2020 21:24:52 1048 TT 44085 47436 483 NaN
1 Maharashtra 1480489 1196441 39072 244527 07/10/2020 22:32:57 449 MH 0 0 0 [Sep 9] :239 cases have been removed from the ...
2 Andhra Pradesh 734427 678828 6086 49513 07/10/2020 17:50:55 0 AP 0 0 0 NaN
3 Karnataka 679356 552519 9675 117143 08/10/2020 21:13:54 19 KA 10704 9613 101 NaN
4 Tamil Nadu 640943 586454 10052 44437 08/10/2020 21:13:56 0 TN 5088 5718 68 [July 22]: 444 backdated deceased entries adde...

Data Cleaning

# date-time information
# =====================
#saving a copy of the dataframe
df_India = state_wise.copy()
# today's date
now  = datetime.now()
# format date to month-day-year
df_India['Date'] = now.strftime("%m/%d/%Y") 

# add 'Date' column to dataframe
df_India['Date'] = pd.to_datetime(df_India['Date'], format='%m/%d/%Y')

df_India.head(36)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
State Confirmed Recovered Deaths Active Last_Updated_Time Migrated_Other State_code Delta_Confirmed Delta_Recovered Delta_Deaths State_Notes Date
0 Total 6877073 5871898 106037 898090 08/10/2020 21:24:52 1048 TT 44085 47436 483 NaN 2020-10-08
1 Maharashtra 1480489 1196441 39072 244527 07/10/2020 22:32:57 449 MH 0 0 0 [Sep 9] :239 cases have been removed from the ... 2020-10-08
2 Andhra Pradesh 734427 678828 6086 49513 07/10/2020 17:50:55 0 AP 0 0 0 NaN 2020-10-08
3 Karnataka 679356 552519 9675 117143 08/10/2020 21:13:54 19 KA 10704 9613 101 NaN 2020-10-08
4 Tamil Nadu 640943 586454 10052 44437 08/10/2020 21:13:56 0 TN 5088 5718 68 [July 22]: 444 backdated deceased entries adde... 2020-10-08
5 Uttar Pradesh 427459 378662 6245 42552 08/10/2020 21:21:49 0 UP 3133 3690 45 NaN 2020-10-08
6 Delhi 300833 272948 5643 22242 08/10/2020 21:13:59 0 DL 2726 2643 27 [July 14]: Value for the total tests conducted... 2020-10-08
7 West Bengal 284030 249737 5439 28854 08/10/2020 21:21:51 0 WB 3526 2970 63 NaN 2020-10-08
8 Odisha 244142 216984 1027 26131 08/10/2020 21:14:03 0 OR 3144 3312 16 [July 12th] :20 non-covid deaths reported in s... 2020-10-08
9 Kerala 258851 167256 931 90579 08/10/2020 20:37:52 85 KL 5445 7003 24 Mahe native who expired in Kannur included in ... 2020-10-08
10 Telangana 206644 179075 1201 26368 08/10/2020 10:44:51 0 TG 1896 2067 12 [July 27] : Telangana bulletin for the previou... 2020-10-08
11 Bihar 192671 180357 929 11384 08/10/2020 21:21:53 1 BR 1244 1006 2 NaN 2020-10-08
12 Assam 191397 157635 794 32965 08/10/2020 21:14:07 3 AS 1188 0 9 NaN 2020-10-08
13 Gujarat 147950 128023 3541 16386 08/10/2020 21:14:08 0 GJ 1278 1266 10 NaN 2020-10-08
14 Rajasthan 150467 127526 1590 21351 07/10/2020 19:54:02 0 RJ 0 0 0 NaN 2020-10-08
15 Madhya Pradesh 142012 122687 2547 16778 08/10/2020 21:24:54 0 MP 1705 2420 29 NaN 2020-10-08
16 Haryana 138582 126267 1548 10767 08/10/2020 21:21:58 0 HR 1184 1426 20 [Aug 2]: 21 Foreign Evacuees have been merged ... 2020-10-08
17 Chhattisgarh 131739 103828 1134 26777 07/10/2020 23:21:49 0 CT 0 0 0 [Sep 9]:57 backdated deceased cases have been ... 2020-10-08
18 Punjab 120868 107200 3741 9927 08/10/2020 20:37:54 0 PB 0 1615 29 NaN 2020-10-08
19 Jharkhand 89702 79176 767 9759 07/10/2020 22:33:14 0 JH 0 0 0 NaN 2020-10-08
20 Jammu and Kashmir 81793 69020 1291 11482 08/10/2020 21:14:12 0 JK 696 1336 9 NaN 2020-10-08
21 Uttarakhand 52959 43631 688 8367 07/10/2020 21:48:59 273 UT 0 0 0 NaN 2020-10-08
22 Goa 37102 31902 484 4716 08/10/2020 21:14:14 0 GA 432 458 7 NaN 2020-10-08
23 Puducherry 30539 25256 556 4727 08/10/2020 21:14:16 0 PY 378 326 5 NaN 2020-10-08
24 Tripura 27756 23043 301 4389 08/10/2020 11:38:56 23 TR 214 389 3 [Aug 4]: Tripura bulletin for the previous day... 2020-10-08
25 Himachal Pradesh 16565 13316 226 2996 07/10/2020 22:33:17 27 HP 0 0 0 NaN 2020-10-08
26 Chandigarh 12922 11344 186 1392 08/10/2020 21:14:18 0 CH 102 154 4 NaN 2020-10-08
27 Manipur 12489 9604 80 2805 07/10/2020 19:40:19 0 MN 0 0 0 NaN 2020-10-08
28 Arunachal Pradesh 11267 8396 21 2850 08/10/2020 00:47:05 0 AR 0 0 0 [July 25]: All numbers corresponding to Papum ... 2020-10-08
29 Meghalaya 7165 4694 60 2411 07/10/2020 21:49:06 0 ML 0 0 0 NaN 2020-10-08
30 Nagaland 6715 5450 12 1194 07/10/2020 19:40:22 59 NL 0 0 0 NaN 2020-10-08
31 Ladakh 4802 3511 63 1228 08/10/2020 01:48:06 0 LA 0 0 0 [Sep 08] : Testing details are not available i... 2020-10-08
32 Andaman and Nicobar Islands 3935 3696 54 185 07/10/2020 23:21:51 0 AN 0 0 0 NaN 2020-10-08
33 Dadra and Nagar Haveli and Daman and Diu 3118 2979 2 109 07/10/2020 20:13:04 28 DN 0 0 0 NaN 2020-10-08
34 Sikkim 3234 2534 51 568 07/10/2020 23:21:53 81 SK 0 0 0 NaN 2020-10-08
35 Mizoram 2150 1919 0 231 08/10/2020 10:41:00 0 MZ 2 24 0 NaN 2020-10-08
# latitude and longitude information
# ==================================

# latitude of the states
lat = {'Delhi':28.7041, 'Haryana':29.0588, 'Kerala':10.8505, 'Rajasthan':27.0238,
       'Telengana':18.1124, 'Uttar Pradesh':26.8467, 'Ladakh':34.2996, 'Tamil Nadu':11.1271,
       'Jammu and Kashmir':33.7782, 'Punjab':31.1471, 'Karnataka':15.3173, 'Maharashtra':19.7515,
       'Andhra Pradesh':15.9129, 'Odisha':20.9517, 'Uttarakhand':30.0668, 'West Bengal':22.9868, 
       'Puducherry': 11.9416, 'Chandigarh': 30.7333, 'Chhattisgarh':21.2787, 'Gujarat': 22.2587, 
       'Himachal Pradesh': 31.1048, 'Madhya Pradesh': 22.9734, 'Bihar': 25.0961, 'Manipur':24.6637, 
       'Mizoram':23.1645, 'Goa': 15.2993, 'Andaman and Nicobar Islands': 11.7401, 'Assam' : 26.2006, 
       'Jharkhand': 23.6102, 'Arunachal Pradesh': 28.2180, 'Tripura': 23.9408, 'Nagaland': 26.1584, 
       'Meghalaya' : 25.4670, 'Dadar Nagar Haveli' : 20.1809, 'Sikkim': 27.5330}

# longitude of the states
long = {'Delhi':77.1025, 'Haryana':76.0856, 'Kerala':76.2711, 'Rajasthan':74.2179,
        'Telengana':79.0193, 'Uttar Pradesh':80.9462, 'Ladakh':78.2932, 'Tamil Nadu':78.6569,
        'Jammu and Kashmir':76.5762, 'Punjab':75.3412, 'Karnataka':75.7139, 'Maharashtra':75.7139,
        'Andhra Pradesh':79.7400, 'Odisha':85.0985, 'Uttarakhand':79.0193, 'West Bengal':87.8550, 
        'Puducherry': 79.8083, 'Chandigarh': 76.7794, 'Chhattisgarh':81.8661, 'Gujarat': 71.1924, 
        'Himachal Pradesh': 77.1734, 'Madhya Pradesh': 78.6569, 'Bihar': 85.3131, 'Manipur':93.9063, 
        'Mizoram':92.9376, 'Goa': 74.1240, 'Andaman and Nicobar Islands': 92.6586, 'Assam' : 92.9376, 
        'Jharkhand': 85.2799, 'Arunachal Pradesh': 94.7278, 'Tripura': 91.9882, 'Nagaland': 94.5624,
        'Meghalaya' : 91.3662, 'Dadar Nagar Haveli' : 73.0169, 'Sikkim': 88.5122}

# add latitude column based on 'Name of State / UT' column
df_India['Latitude'] = df_India['State'].map(lat)

# add longitude column based on 'Name of State / UT' column
df_India['Longitude'] = df_India['State'].map(long)

df_India.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
State Confirmed Recovered Deaths Active Last_Updated_Time Migrated_Other State_code Delta_Confirmed Delta_Recovered Delta_Deaths State_Notes Date Latitude Longitude
0 Total 6877073 5871898 106037 898090 08/10/2020 21:24:52 1048 TT 44085 47436 483 NaN 2020-10-08 NaN NaN
1 Maharashtra 1480489 1196441 39072 244527 07/10/2020 22:32:57 449 MH 0 0 0 [Sep 9] :239 cases have been removed from the ... 2020-10-08 19.7515 75.7139
2 Andhra Pradesh 734427 678828 6086 49513 07/10/2020 17:50:55 0 AP 0 0 0 NaN 2020-10-08 15.9129 79.7400
3 Karnataka 679356 552519 9675 117143 08/10/2020 21:13:54 19 KA 10704 9613 101 NaN 2020-10-08 15.3173 75.7139
4 Tamil Nadu 640943 586454 10052 44437 08/10/2020 21:13:56 0 TN 5088 5718 68 [July 22]: 444 backdated deceased entries adde... 2020-10-08 11.1271 78.6569
# rename columns
    
df_India = df_India.rename(columns={'Delta_Recovered' :'Cured/Discharged', 
                                      'Total Confirmed cases *': 'Confirmed', 
                                      'Total Confirmed cases ': 'Confirmed', 
                                      'Total Confirmed cases* ': 'Confirmed'})
df_India = df_India.rename(columns={'Cured/Discharged':'Delta_Cured'})
df_India = df_India.rename(columns={'State':'State/UnionTerritory'})
df_India = df_India.rename(columns={'State':'State/UnionTerritory'})

df_India = df_India.rename(columns=lambda x: re.sub('Total Confirmed cases \(Including .* foreign Nationals\) ',
                                                      'Total Confirmed cases',x))
df_India = df_India.rename(columns={'Deaths ( more than 70% cases due to comorbidities )':'Deaths', 
                                      'Deaths**':'Deaths'})
# unique state names
df_India['State/UnionTerritory'].unique()
array(['Total', 'Maharashtra', 'Andhra Pradesh', 'Karnataka',
       'Tamil Nadu', 'Uttar Pradesh', 'Delhi', 'West Bengal', 'Odisha',
       'Kerala', 'Telangana', 'Bihar', 'Assam', 'Gujarat', 'Rajasthan',
       'Madhya Pradesh', 'Haryana', 'Chhattisgarh', 'Punjab', 'Jharkhand',
       'Jammu and Kashmir', 'Uttarakhand', 'Goa', 'Puducherry', 'Tripura',
       'Himachal Pradesh', 'Chandigarh', 'Manipur', 'Arunachal Pradesh',
       'Meghalaya', 'Nagaland', 'Ladakh', 'Andaman and Nicobar Islands',
       'Dadra and Nagar Haveli and Daman and Diu', 'Sikkim', 'Mizoram',
       'State Unassigned', 'Lakshadweep'], dtype=object)
# number of missing values 
df_India.isna().sum()
State/UnionTerritory     0
Confirmed                0
Recovered                0
Deaths                   0
Active                   0
Last_Updated_Time        0
Migrated_Other           0
State_code               0
Delta_Confirmed          0
Delta_Cured              0
Delta_Deaths             0
State_Notes             26
Date                     0
Latitude                 5
Longitude                5
dtype: int64
# number of unique values 
df_India.nunique()
State/UnionTerritory    38
Confirmed               37
Recovered               37
Deaths                  36
Active                  37
Last_Updated_Time       38
Migrated_Other          13
State_code              38
Delta_Confirmed         21
Delta_Cured             21
Delta_Deaths            19
State_Notes             12
Date                     1
Latitude                33
Longitude               30
dtype: int64
# fix datatype
df_India['Date'] = pd.to_datetime(df_India['Date'])
# rename state/UT names
df_India['State/UnionTerritory'].replace('Chattisgarh', 'Chhattisgarh', inplace=True)
df_India['State/UnionTerritory'].replace('Pondicherry', 'Puducherry', inplace=True) 

Final dataframe

df_India = df_India.drop(['Migrated_Other','State_Notes'], axis=1)
df_India = df_India.drop([0], axis=0)

df_India.head(36)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
State/UnionTerritory Confirmed Recovered Deaths Active Last_Updated_Time State_code Delta_Confirmed Delta_Cured Delta_Deaths Date Latitude Longitude
1 Maharashtra 1480489 1196441 39072 244527 07/10/2020 22:32:57 MH 0 0 0 2020-10-08 19.7515 75.7139
2 Andhra Pradesh 734427 678828 6086 49513 07/10/2020 17:50:55 AP 0 0 0 2020-10-08 15.9129 79.7400
3 Karnataka 679356 552519 9675 117143 08/10/2020 21:13:54 KA 10704 9613 101 2020-10-08 15.3173 75.7139
4 Tamil Nadu 640943 586454 10052 44437 08/10/2020 21:13:56 TN 5088 5718 68 2020-10-08 11.1271 78.6569
5 Uttar Pradesh 427459 378662 6245 42552 08/10/2020 21:21:49 UP 3133 3690 45 2020-10-08 26.8467 80.9462
6 Delhi 300833 272948 5643 22242 08/10/2020 21:13:59 DL 2726 2643 27 2020-10-08 28.7041 77.1025
7 West Bengal 284030 249737 5439 28854 08/10/2020 21:21:51 WB 3526 2970 63 2020-10-08 22.9868 87.8550
8 Odisha 244142 216984 1027 26131 08/10/2020 21:14:03 OR 3144 3312 16 2020-10-08 20.9517 85.0985
9 Kerala 258851 167256 931 90579 08/10/2020 20:37:52 KL 5445 7003 24 2020-10-08 10.8505 76.2711
10 Telangana 206644 179075 1201 26368 08/10/2020 10:44:51 TG 1896 2067 12 2020-10-08 NaN NaN
11 Bihar 192671 180357 929 11384 08/10/2020 21:21:53 BR 1244 1006 2 2020-10-08 25.0961 85.3131
12 Assam 191397 157635 794 32965 08/10/2020 21:14:07 AS 1188 0 9 2020-10-08 26.2006 92.9376
13 Gujarat 147950 128023 3541 16386 08/10/2020 21:14:08 GJ 1278 1266 10 2020-10-08 22.2587 71.1924
14 Rajasthan 150467 127526 1590 21351 07/10/2020 19:54:02 RJ 0 0 0 2020-10-08 27.0238 74.2179
15 Madhya Pradesh 142012 122687 2547 16778 08/10/2020 21:24:54 MP 1705 2420 29 2020-10-08 22.9734 78.6569
16 Haryana 138582 126267 1548 10767 08/10/2020 21:21:58 HR 1184 1426 20 2020-10-08 29.0588 76.0856
17 Chhattisgarh 131739 103828 1134 26777 07/10/2020 23:21:49 CT 0 0 0 2020-10-08 21.2787 81.8661
18 Punjab 120868 107200 3741 9927 08/10/2020 20:37:54 PB 0 1615 29 2020-10-08 31.1471 75.3412
19 Jharkhand 89702 79176 767 9759 07/10/2020 22:33:14 JH 0 0 0 2020-10-08 23.6102 85.2799
20 Jammu and Kashmir 81793 69020 1291 11482 08/10/2020 21:14:12 JK 696 1336 9 2020-10-08 33.7782 76.5762
21 Uttarakhand 52959 43631 688 8367 07/10/2020 21:48:59 UT 0 0 0 2020-10-08 30.0668 79.0193
22 Goa 37102 31902 484 4716 08/10/2020 21:14:14 GA 432 458 7 2020-10-08 15.2993 74.1240
23 Puducherry 30539 25256 556 4727 08/10/2020 21:14:16 PY 378 326 5 2020-10-08 11.9416 79.8083
24 Tripura 27756 23043 301 4389 08/10/2020 11:38:56 TR 214 389 3 2020-10-08 23.9408 91.9882
25 Himachal Pradesh 16565 13316 226 2996 07/10/2020 22:33:17 HP 0 0 0 2020-10-08 31.1048 77.1734
26 Chandigarh 12922 11344 186 1392 08/10/2020 21:14:18 CH 102 154 4 2020-10-08 30.7333 76.7794
27 Manipur 12489 9604 80 2805 07/10/2020 19:40:19 MN 0 0 0 2020-10-08 24.6637 93.9063
28 Arunachal Pradesh 11267 8396 21 2850 08/10/2020 00:47:05 AR 0 0 0 2020-10-08 28.2180 94.7278
29 Meghalaya 7165 4694 60 2411 07/10/2020 21:49:06 ML 0 0 0 2020-10-08 25.4670 91.3662
30 Nagaland 6715 5450 12 1194 07/10/2020 19:40:22 NL 0 0 0 2020-10-08 26.1584 94.5624
31 Ladakh 4802 3511 63 1228 08/10/2020 01:48:06 LA 0 0 0 2020-10-08 34.2996 78.2932
32 Andaman and Nicobar Islands 3935 3696 54 185 07/10/2020 23:21:51 AN 0 0 0 2020-10-08 11.7401 92.6586
33 Dadra and Nagar Haveli and Daman and Diu 3118 2979 2 109 07/10/2020 20:13:04 DN 0 0 0 2020-10-08 NaN NaN
34 Sikkim 3234 2534 51 568 07/10/2020 23:21:53 SK 0 0 0 2020-10-08 27.5330 88.5122
35 Mizoram 2150 1919 0 231 08/10/2020 10:41:00 MZ 2 24 0 2020-10-08 23.1645 92.9376
36 State Unassigned 0 0 0 0 19/07/2020 09:40:01 UN 0 0 0 2020-10-08 NaN NaN
# complete data info
df_India.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 37 entries, 1 to 37
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   State/UnionTerritory  37 non-null     object        
 1   Confirmed             37 non-null     int64         
 2   Recovered             37 non-null     int64         
 3   Deaths                37 non-null     int64         
 4   Active                37 non-null     int64         
 5   Last_Updated_Time     37 non-null     object        
 6   State_code            37 non-null     object        
 7   Delta_Confirmed       37 non-null     int64         
 8   Delta_Cured           37 non-null     int64         
 9   Delta_Deaths          37 non-null     int64         
 10  Date                  37 non-null     datetime64[ns]
 11  Latitude              33 non-null     float64       
 12  Longitude             33 non-null     float64       
dtypes: datetime64[ns](1), float64(2), int64(7), object(3)
memory usage: 4.0+ KB

Save as .csv file

# saving data
# ===========

# file names as year-month-day.csv format
file_name = now.strftime("%Y_%m_%d")+' - COVID-19_India_preprocessed.csv'

# location for saving the file
file_loc = '/content/'

# save file as a scv file
df_India.to_csv(file_loc + file_name, index=False)

1.2 Analysing COVID19 Cases in India

from datetime import date



total_cases = df_India['Confirmed'].sum()
print('Total number of confirmed COVID 2019 cases across India till date ',date.today(),':', total_cases)
Total number of confirmed COVID 2019 cases across India till date  2020-10-08 : 6877073
#Learn how to highlight your dataframe
df_temp = df_India.drop(['Latitude', 'Longitude', 'Date'], axis = 1) #Removing Date, Latitude and Longitude and other extra columns
df_temp.style.background_gradient(cmap='Reds')
today = now.strftime("%Y_%m_%d")
total_cured = df_India['Delta_Cured'].sum()
recovered = df_India['Recovered'].sum()
print("Total people who were recovered as of "+today+" are: ", recovered)
total_cases = df_India['Confirmed'].sum()
print("Total people who were detected COVID+ve as of "+today+" are: ", total_cases)
total_death = df_India['Deaths'].sum()
print("Total people who died due to COVID19 as of "+today+" are: ",total_death)
total_active = total_cases-recovered-total_death
print("Total active COVID19 cases as of "+today+" are: ",total_active)
Total people who were recovered as of 2020_10_08 are:  5871898
Total people who were detected COVID+ve as of 2020_10_08 are:  6877073
Total people who died due to COVID19 as of 2020_10_08 are:  106037
Total active COVID19 cases as of 2020_10_08 are:  899138
#Total Active  is the Total cases - (Number of death + Cured)
df_India['Total Active'] = df_India['Confirmed'] - (df_India['Deaths'] + df_India['Recovered'])
total_active = df_India['Total Active'].sum()
print('Total number of active COVID 19 cases across India:', total_active)
Tot_Cases = df_India.groupby('State/UnionTerritory')['Total Active'].sum().sort_values(ascending=False).to_frame()
Tot_Cases.style.background_gradient(cmap='Reds')
Total number of active COVID 19 cases across India: 899138
state_cases = df_India.groupby('State/UnionTerritory')['Confirmed','Deaths','Delta_Cured'].max().reset_index()

#state_cases = state_cases.astype({'Deaths': 'int'})
state_cases['Active'] = state_cases['Confirmed'] - (state_cases['Deaths']+state_cases['Delta_Cured'])
state_cases["Death Rate (per 100)"] = np.round(100*state_cases["Deaths"]/state_cases["Confirmed"],2)
state_cases["Cure Rate (per 100)"] = np.round(100*state_cases["Delta_Cured"]/state_cases["Confirmed"],2)
state_cases.sort_values('Confirmed', ascending= False).fillna(0).style.background_gradient(cmap='Blues',subset=["Confirmed"])\
                        .background_gradient(cmap='Blues',subset=["Deaths"])\
                        .background_gradient(cmap='Blues',subset=["Delta_Cured"])\
                        .background_gradient(cmap='Blues',subset=["Active"])\
                        .background_gradient(cmap='Blues',subset=["Death Rate (per 100)"])\
                        .background_gradient(cmap='Blues',subset=["Cure Rate (per 100)"])

Visualization Inference:

  • Almost +1,611 cases of COVID-19 has been reported today (23rd May) taking total cases to 123202.
  • The cases have been confirmed across 32 states and union territories.
  • Out of 123202 cases, 51784 people have been cured, discharged or migrated.
  • Maharashtra, Tamilnaidu, Gujrat and Delhi are worsely affected states with maximum number of confirmed cases
  • Till 23rd of May 3720 people have died in India

Finding more detail COVID Insights in India

# age_details = pd.read_csv('/content/AgeGroupDetails.csv')
india_covid_19 = pd.read_csv('./covid_19_india.csv')
hospital_beds = pd.read_csv('./HospitalBedsIndia.csv')
individual_details = pd.read_csv('./IndividualDetails.csv')
ICMR_details = pd.read_csv('./ICMRTestingDetails.csv')
ICMR_labs = pd.read_csv('./ICMRTestingLabs.csv')
state_testing = pd.read_csv('./StatewiseTestingDetails.csv')
population = pd.read_csv('./population_india_census2011.csv')
india_covid_19['Date'] = pd.to_datetime(india_covid_19['Date'],dayfirst = True)
state_testing['Date'] = pd.to_datetime(state_testing['Date'])
ICMR_details['DateTime'] = pd.to_datetime(ICMR_details['DateTime'],dayfirst = True)
ICMR_details = ICMR_details.dropna(subset=['TotalSamplesTested', 'TotalPositiveCases'])
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recovered_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
latest_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/09-29-2020.csv')

We could see that the age group <40 is the most affected which is against the trend which says elderly people are more at risk of being affected. Only 17% of people >60 are affected.

dates = list(confirmed_df.columns[4:])
dates = list(pd.to_datetime(dates))
dates_india = dates[8:]
# print(dates_india)
tes = list(pd.to_datetime(dates))
dates_india = dates[8:]
df1 = confirmed_df.groupby('Country/Region').sum().reset_index()
df2 = deaths_df.groupby('Country/Region').sum().reset_index()
df3 = recovered_df.groupby('Country/Region').sum().reset_index()

k = df1[df1['Country/Region']=='India'].loc[:,'1/30/20':]
india_confirmed = k.values.tolist()[0] 

k = df2[df2['Country/Region']=='India'].loc[:,'1/30/20':]
india_deaths = k.values.tolist()[0] 

k = df3[df3['Country/Region']=='India'].loc[:,'1/30/20':]
india_recovered = k.values.tolist()[0] 

plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 11)
plt.yticks(fontsize = 10)
plt.xlabel("Dates",fontsize = 20)
plt.ylabel('Total cases',fontsize = 20)
plt.title("Total Confirmed, Active, Death in India" , fontsize = 20)

ax1 = plt.plot_date(y= india_confirmed,x= dates_india,label = 'Confirmed',linestyle ='-',color = 'b')
ax2 = plt.plot_date(y= india_recovered,x= dates_india,label = 'Recovered',linestyle ='-',color = 'g')
ax3 = plt.plot_date(y= india_deaths,x= dates_india,label = 'Death',linestyle ='-',color = 'r')
plt.legend()
<matplotlib.legend.Legend at 0x25ced931dc0>

png

Total Samples Tested

import matplotlib.dates as mdates
ICMR_details['Percent_positive'] = round((ICMR_details['TotalPositiveCases']/ICMR_details['TotalSamplesTested'])*100,1)

fig, ax1 = plt.subplots(figsize= (15,5))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
ax1.set_ylabel('Positive Cases (% of Total Samples Tested)')
ax1.bar(ICMR_details['DateTime'] , ICMR_details['Percent_positive'], color="red",label = 'Percentage of Positive Cases')
ax1.text(ICMR_details['DateTime'][0],4, 'Total Samples Tested as of Apr 23rd = 541789', style='italic',fontsize= 10,
        bbox={'facecolor': 'white' ,'alpha': 0.5, 'pad': 5})

ax2 = ax1.twinx()  
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
ax2.set_ylabel('Num Samples Tested')
ax2.fill_between(ICMR_details['DateTime'],ICMR_details['TotalSamplesTested'],color = 'black',alpha = 0.5,label = 'Samples Tested');

plt.legend(loc="upper left")
plt.title('Total Samples Tested')
plt.show()

png

Testing LIVE Status

import json
# get response from the web page
response = requests.get('https://api.covid19india.org/state_test_data.json')

# get contents from the response
content = response.content

# parse the json file
parsed = json.loads(content)

# keys
parsed.keys()
dict_keys(['states_tested_data'])
# save data in a dataframe
tested = pd.DataFrame(parsed['states_tested_data'])

# first few rows
tested.tail()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
antigentests coronaenquirycalls cumulativepeopleinquarantine negative numcallsstatehelpline numicubeds numisolationbeds numventilators othertests peopleinicu ... testsperpositivecase testsperthousand totaln95masks totalpeoplecurrentlyinquarantine totalpeoplereleasedfromquarantine totalppe totaltested unconfirmed updatedon _djhdx
6026 1221370 1243 12715 790 ... 2383040 2426 107757 2409262 3397988 04/10/2020 NaN
6027 1245401 1243 12715 790 ... 2395040 2425 107761 2415262 3438128 05/10/2020 NaN
6028 1267956 1243 12715 790 ... 2405040 2413 107779 2420262 3480510 06/10/2020 NaN
6029 1288884 1243 12715 790 ... 2417040 2410 107787 2425262 3523161 07/10/2020 NaN
6030 1243 12715 790 ... 2428040 2415 107792 2430262 3565602 08/10/2020 NaN

5 rows × 32 columns

# fix datatype
tested['updatedon'] = pd.to_datetime(tested['updatedon'])
# save file as a scv file
tested.to_csv('updated_tests_latest_state_level.csv', index=False)
state_test_cases = tested.groupby(['updatedon','state'])['totaltested','populationncp2019projection','testpositivityrate',	'testsperpositivecase',	'testsperthousand','totalpeoplecurrentlyinquarantine'].max().reset_index()
state_test_cases[:-50]
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
updatedon state totaltested populationncp2019projection testpositivityrate testsperpositivecase testsperthousand totalpeoplecurrentlyinquarantine
0 2020-01-04 Delhi 2621 19814000 0.00% 0.13
1 2020-01-04 Kerala 7965 35125000 3.33% 30 0.23 622
2 2020-01-04 West Bengal 659 96906000 5.61% 18 0.01
3 2020-01-05 Andaman and Nicobar Islands 3754 397000 0.88% 114 9.46 643
4 2020-01-05 Andhra Pradesh 102460 52221000 1.43% 70 1.96
... ... ... ... ... ... ... ... ...
5975 2020-12-08 Jammu and Kashmir 750847 13203000 3.52% 28 56.87 42494
5976 2020-12-08 Jharkhand 402072 37403000 5.04% 20 10.75
5977 2020-12-08 Karnataka 1826317 65798000 0.00% 27.76 289355
5978 2020-12-08 Kerala 1056360 35125000 3.61% 28 30.07 12426
5979 2020-12-08 Ladakh 23034 293000 7.86% 13 78.61 363

5980 rows × 8 columns

state_test_cases = tested.groupby('state')['totaltested','populationncp2019projection','testpositivityrate',	'testsperpositivecase',	'testsperthousand','totalpeoplecurrentlyinquarantine'].max()
state_test_cases['testpositivityrate'] = state_test_cases['testpositivityrate'].str.replace('%', '')
state_test_cases = state_test_cases.apply(pd.to_numeric)
state_test_cases.nunique()
totaltested                         35
populationncp2019projection         34
testpositivityrate                  34
testsperpositivecase                19
testsperthousand                    25
totalpeoplecurrentlyinquarantine    28
dtype: int64
state_test_cases.sort_values('totaltested', ascending= False).style.background_gradient(cmap='Blues',subset=["totaltested"])\
                        .background_gradient(cmap='Blues',subset=["populationncp2019projection"])\
                        .background_gradient(cmap='Blues',subset=["testpositivityrate"])\
                        .background_gradient(cmap='Blues',subset=["testsperpositivecase"])\
                        .background_gradient(cmap='Blues',subset=["testsperthousand"])\
                        .background_gradient(cmap='Blues',subset=["totalpeoplecurrentlyinquarantine"])
                       

Day-by-Day Confirmed Cases in Top 15 States in India

all_state = list(df_India['State/UnionTerritory'].unique())

latest = india_covid_19[india_covid_19['Date'] > '24-03-20']
state_cases = latest.groupby('State/UnionTerritory')['Confirmed','Deaths','Cured'].max().reset_index()
latest['Active'] = latest['Confirmed'] - (latest['Deaths']- latest['Cured'])
state_cases = state_cases.sort_values('Confirmed', ascending= False).fillna(0)
states =list(state_cases['State/UnionTerritory'][0:15])

states_confirmed = {}
states_deaths = {}
states_recovered = {}
states_dates = {}

for state in states:
    df = latest[latest['State/UnionTerritory'] == state].reset_index()
    k = []
    l = []
    m = []
    n = []
    for i in range(1,len(df)):
        k.append(df['Confirmed'][i]-df['Confirmed'][i-1])
        l.append(df['Deaths'][i]-df['Deaths'][i-1])
        m.append(df['Cured'][i]-df['Cured'][i-1])
        n.append(df['Active'][i]-df['Active'][i-1])
    states_confirmed[state] = k
    states_deaths[state] = l
    states_recovered[state] = m
#     states_active[state] = n
    date = list(df['Date'])
    states_dates[state] = date[1:]
    
fig = plt.figure(figsize= (25,17))
plt.suptitle('Day-by-Day Confirmed Cases in Top 15 States in India',fontsize = 20,y=1.0)
k=0
for i in range(1,15):
    ax = fig.add_subplot(5,3,i)
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
    ax.bar(states_dates[states[k]],states_confirmed[states[k]],label = 'Day wise Confirmed Cases ') 
    plt.title(states[k],fontsize = 20)
    handles, labels = ax.get_legend_handles_labels()
    fig.legend(handles, labels, loc='upper left')
    k=k+1
plt.tight_layout(pad=5.0)

png

Growth Rate in top 15 States in India

def calc_growthRate(values):
    k = []
    for i in range(1,len(values)):
        summ = 0
        for j in range(i):
            summ = summ + values[j]
        rate = (values[i]/summ)*100
        k.append(int(rate))
    return k

fig = plt.figure(figsize= (25,17))
plt.suptitle('Growth Rate in Top 15 States',fontsize = 20,y=1.0)
k=0
for i in range(1,15):
    ax = fig.add_subplot(5,3,i)
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
    #ax.bar(states_dates[states[k]],states_confirmed[states[k]],label = 'Day wise Confirmed Cases ') 
    growth_rate = calc_growthRate(states_confirmed[states[k]])
    ax.plot_date(states_dates[states[k]][21:],growth_rate[20:],color = '#9370db',label = 'Growth Rate',linewidth =3,linestyle='-')  
    plt.title(states[k],fontsize = 20)
    handles, labels = ax.get_legend_handles_labels()
    fig.legend(handles, labels, loc='upper left')
    k=k+1
plt.tight_layout(pad=3.0)

png

Though being highly populated the relative confimred cases of India is low compared to other countries. This could be because of two reasons:

  • 67 days lockdown imposed by prime minister Narendra Modi in several stages (Source : Health Ministry)
  • Low testing rate (Source: news18)

Exploring different types of hospital beds available in India during lockdown

cols_object = list(hospital_beds.columns[2:8])

for cols in cols_object:
    hospital_beds[cols] = hospital_beds[cols].astype(int,errors = 'ignore')

hospital_beds = hospital_beds.drop('Sno',axis=1)
hospital_beds.head(36)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
State/UT NumPrimaryHealthCenters_HMIS NumCommunityHealthCenters_HMIS NumSubDistrictHospitals_HMIS NumDistrictHospitals_HMIS TotalPublicHealthFacilities_HMIS NumPublicBeds_HMIS NumRuralHospitals_NHP18 NumRuralBeds_NHP18 NumUrbanHospitals_NHP18 NumUrbanBeds_NHP18
0 Andaman & Nicobar Islands 27 4 NaN 3 34 1246 27 575 3 500
1 Andhra Pradesh 1417 198 31.0 20 1666 60799 193 6480 65 16658
2 Arunachal Pradesh 122 62 NaN 15 199 2320 208 2136 10 268
3 Assam 1007 166 14.0 33 1220 19115 1176 10944 50 6198
4 Bihar 2007 63 33.0 43 2146 17796 930 6083 103 5936
5 Chandigarh 40 2 1.0 4 47 3756 0 0 4 778
6 Chhattisgarh 813 166 12.0 32 1023 14354 169 5070 45 4342
7 Dadra & Nagar Haveli 9 2 1.0 1 13 568 10 273 1 316
8 Daman & Diu 4 2 NaN 2 8 298 5 240 0 0
9 Delhi 534 25 9.0 47 615 20572 0 0 109 24383
10 Goa 31 4 2.0 3 40 2666 17 1405 25 1608
11 Gujarat 1770 385 44.0 37 2236 41129 364 11715 122 20565
12 Haryana 500 131 24.0 28 683 13841 609 6690 59 4550
13 Himachal Pradesh 516 79 61.0 15 671 8706 705 5665 96 6734
14 Jammu & Kashmir 702 87 NaN 29 818 11342 56 7234 76 4417
15 Jharkhand 343 179 13.0 23 558 7404 519 5842 36 4942
16 Karnataka 2547 207 147.0 42 2943 56333 2471 21072 374 49093
17 Kerala 933 229 82.0 53 1297 39511 981 16865 299 21139
18 Lakshadweep 4 3 2.0 1 10 250 9 300 0 0
19 Madhya Pradesh 1420 324 72.0 51 1867 38140 334 10020 117 18819
20 Maharashtra 2638 430 101.0 70 3239 68998 273 12398 438 39048
21 Manipur 87 17 1.0 9 114 2562 23 730 7 697
22 Meghalaya 138 29 NaN 13 180 4585 143 1970 14 2487
23 Mizoram 65 10 3.0 9 87 2312 56 604 34 1393
24 Nagaland 134 21 NaN 11 166 1944 21 630 15 1250
25 Odisha 1360 377 27.0 35 1799 16497 1655 6339 149 12180
26 Puducherry 40 4 5.0 4 53 4462 3 96 11 3473
27 Punjab 521 146 47.0 28 742 13527 510 5805 172 12128
28 Rajasthan 2463 579 64.0 33 3139 51844 602 21088 150 10760
29 Sikkim 25 2 1.0 4 32 1145 24 260 9 1300
30 Tamil Nadu 1854 385 310.0 32 2581 72616 692 40179 525 37353
31 Telangana 788 82 47.0 15 932 17358 802 7668 61 13315
32 Tripura 114 22 12.0 9 157 4895 99 1140 56 3277
33 Uttar Pradesh 3277 671 NaN 174 4122 58310 4442 39104 193 37156
34 Uttarakhand 275 69 19.0 20 383 6660 410 3284 50 5228
35 West Bengal 1374 406 70.0 55 1905 51163 1272 19684 294 58882

Exploring top 10 States in each health facilities

# top_10_primary = hospital_beds.nlargest(10,'NumPrimaryHealthCenters_HMIS')
top_10_community = hospital_beds.nlargest(10,'NumCommunityHealthCenters_HMIS')
top_10_district_hospitals = hospital_beds.nlargest(10,'NumDistrictHospitals_HMIS')
top_10_public_facility = hospital_beds.nlargest(10,'TotalPublicHealthFacilities_HMIS')
top_10_public_beds = hospital_beds.nlargest(10,'NumPublicBeds_HMIS')

plt.subplot(222)
plt.title('Community Health Centers')
plt.barh(top_10_community['State/UT'],top_10_community['NumCommunityHealthCenters_HMIS'],color = '#9370db');

plt.subplot(224)
plt.title('Total Public Health Facilities')
plt.barh(top_10_community['State/UT'],top_10_public_facility['TotalPublicHealthFacilities_HMIS'],color='#9370db');

plt.subplot(223)
plt.title('District Hospitals')
plt.barh(top_10_community['State/UT'],top_10_district_hospitals['NumDistrictHospitals_HMIS'],color = '#87479d');

png

Exploring Urban and Rural Healthcare Facility

top_rural_hos = hospital_beds.nlargest(10,'NumRuralHospitals_NHP18')
top_rural_beds = hospital_beds.nlargest(10,'NumRuralBeds_NHP18')
top_urban_hos = hospital_beds.nlargest(10,'NumUrbanHospitals_NHP18')
top_urban_beds = hospital_beds.nlargest(10,'NumUrbanBeds_NHP18')

plt.figure(figsize=(15,10))
plt.suptitle('Urban and Rural Health Facility',fontsize=20)
plt.subplot(221)
plt.title('Rural Hospitals')
plt.barh(top_rural_hos['State/UT'],top_rural_hos['NumRuralHospitals_NHP18'],color = '#87479d');

plt.subplot(222)
plt.title('Urban Hospitals')
plt.barh(top_urban_hos['State/UT'],top_urban_hos['NumUrbanHospitals_NHP18'],color = '#9370db');

plt.subplot(223)
plt.title('Rural Beds')
plt.barh(top_rural_beds['State/UT'],top_rural_beds['NumRuralBeds_NHP18'],color = '#87479d');

plt.subplot(224)
plt.title('Urban Beds')
plt.barh(top_urban_beds['State/UT'],top_urban_beds['NumUrbanBeds_NHP18'],color = '#9370db');

png

Exploring Statewise Testing Insights

state_test = pd.pivot_table(state_testing, values=['TotalSamples','Negative','Positive'], index='State', aggfunc='max')
state_names = list(state_test.index)
state_test['State'] = state_names

plt.figure(figsize=(25,20))
sns.set_color_codes("pastel")
sns.barplot(x="TotalSamples", y= state_names, data=state_test,label="Total Samples", color = '#7370db')
sns.barplot(x='Negative', y=state_names, data=state_test,label='Negative', color= '#af8887')
sns.barplot(x='Positive', y=state_names, data=state_test,label='Positive', color='#6ff79d')
plt.title('Testing statewise insight',fontsize = 20)
plt.legend(ncol=2, loc="lower right", frameon=True);

png

Number of ICMR Testing Centres in each state

values = list(ICMR_labs['state'].value_counts())
names = list(ICMR_labs['state'].value_counts().index)

plt.figure(figsize=(15,10))
sns.set_color_codes("pastel")
plt.title('ICMR Testing Centers in each State', fontsize = 20)
sns.barplot(x= values, y= names,color = '#ff2345');

png

Let's Start with the predictions

train = pd.read_csv('/content/train.csv')
test = pd.read_csv('/content/test.csv')
train['Date'] = pd.to_datetime(train['Date'])
test['Date'] = pd.to_datetime(test['Date'])

Prophet

Prophet is open source software released by Facebook’s Core Data Science team. It is available for download on CRAN and PyPI.

We use Prophet, a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Why Prophet?

  • Accurate and fast: Prophet is used in many applications across Facebook for producing reliable forecasts for planning and goal setting. Facebook finds it to perform better than any other approach in the majority of cases. It fit models in Stan so that you get forecasts in just a few seconds.

  • Fully automatic: Get a reasonable forecast on messy data with no manual effort. Prophet is robust to outliers, missing data, and dramatic changes in your time series.

  • Tunable forecasts: The Prophet procedure includes many possibilities for users to tweak and adjust forecasts. You can use human-interpretable parameters to improve your forecast by adding your domain knowledge

  • Available in R or Python: Facebook has implemented the Prophet procedure in R and Python. Both of them share the same underlying Stan code for fitting. You can use whatever language you’re comfortable with to get forecasts.

References

!pip install Prophet
Collecting Prophet
  Downloading prophet-0.1.1.post1.tar.gz (90 kB)
Requirement already satisfied: pytz>=2014.9 in c:\users\raghav\anaconda3\lib\site-packages (from Prophet) (2020.1)
Requirement already satisfied: pandas>=0.15.1 in c:\users\raghav\anaconda3\lib\site-packages (from Prophet) (1.0.4)
Requirement already satisfied: six>=1.8.0 in c:\users\raghav\anaconda3\lib\site-packages (from Prophet) (1.15.0)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\raghav\anaconda3\lib\site-packages (from pandas>=0.15.1->Prophet) (2.8.1)
Requirement already satisfied: numpy>=1.13.3 in c:\users\raghav\anaconda3\lib\site-packages (from pandas>=0.15.1->Prophet) (1.19.1)
Building wheels for collected packages: Prophet
  Building wheel for Prophet (setup.py): started
  Building wheel for Prophet (setup.py): finished with status 'done'
  Created wheel for Prophet: filename=prophet-0.1.1.post1-py3-none-any.whl size=13254 sha256=ba625745471e8c2acffc86c1c928c450f8274af8233d264d6f65a0cd31fff95f
  Stored in directory: c:\users\raghav\appdata\local\pip\cache\wheels\98\36\19\702df5440d2cf01c8221d08fb26bfe66e872100e7bfd75bb8f
Successfully built Prophet
Installing collected packages: Prophet
Successfully installed Prophet-0.1.1.post1
# !pip install pystan
# !pip install fbprophet

!conda install -c conda-forge fbprophet
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, add_changepoints_to_plot

k = df1[df1['Country/Region']=='India'].loc[:,'1/22/20':]
india_confirmed = k.values.tolist()[0] 
data = pd.DataFrame(columns = ['ds','y'])
data['ds'] = dates
data['y'] = india_confirmed

The input to Prophet is always a dataframe with two columns: ds and y. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast.

4.1 Forecasting Confirmed NCOVID-19 Cases Worldwide with Prophet (Base model)

Generating a week ahead forecast of confirmed cases of NCOVID-19 using Prophet, with 95% prediction interval by creating a base model with no tweaking of seasonality-related parameters and additional regressors.

prop = Prophet(interval_width=0.95)
prop.fit(data)
future = prop.make_future_dataframe(periods=15)
future.tail(15)
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ds
253 2020-10-01
254 2020-10-02
255 2020-10-03
256 2020-10-04
257 2020-10-05
258 2020-10-06
259 2020-10-07
260 2020-10-08
261 2020-10-09
262 2020-10-10
263 2020-10-11
264 2020-10-12
265 2020-10-13
266 2020-10-14
267 2020-10-15

The predict method will assign each row in future a predicted value which it names yhat. If you pass in historical dates, it will provide an in-sample fit. The forecast object here is a new dataframe that includes a column yhat with the forecast, as well as columns for components and uncertainty intervals.

#predicting the future with date, and upper and lower limit of y value
forecast = prop.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ds yhat yhat_lower yhat_upper
263 2020-10-11 7.100590e+06 7.006306e+06 7.195882e+06
264 2020-10-12 7.178642e+06 7.074952e+06 7.289030e+06
265 2020-10-13 7.257725e+06 7.147775e+06 7.366396e+06
266 2020-10-14 7.336894e+06 7.217428e+06 7.455318e+06
267 2020-10-15 7.417946e+06 7.301104e+06 7.551448e+06

You can plot the forecast by calling the Prophet.plot method and passing in your forecast dataframe.

confirmed_forecast_plot = prop.plot(forecast)

png

confirmed_forecast_plot =prop.plot_components(forecast)

png

ARIMA Model

from statsmodels.tsa.arima_model import ARIMA

from datetime import timedelta 

arima = ARIMA(data['y'], order=(5, 1, 0))
arima = arima.fit(trend='c', full_output=True, disp=True)
forecast = arima.forecast(steps= 30)
pred = list(forecast[0])

start_date = data['ds'].max()
prediction_dates = []
for i in range(30):
    date = start_date + timedelta(days=1)
    prediction_dates.append(date)
    start_date = date
plt.figure(figsize= (15,10))
plt.xlabel("Dates",fontsize = 20)
plt.ylabel('Total cases',fontsize = 20)
plt.title("Predicted Values for the next 15 Days" , fontsize = 20)

plt.plot_date(y= pred,x= prediction_dates,linestyle ='dashed',color = '#ff9999',label = 'Predicted');
plt.plot_date(y=data['y'],x=data['ds'],linestyle = '-',color = 'blue',label = 'Actual');
plt.legend();

png

1.4 Visualising the spread geographically

# df_India.head()
df = df_India.dropna(subset = ["Latitude","Longitude"], inplace=True)
# Learn how to use folium to create a zoomable map
map = folium.Map(location=[20, 70], zoom_start=4,tiles='Stamenterrain')

for lat, lon, value, name in zip(df_India['Latitude'], df_India['Longitude'], df_India['Confirmed'], df_India['State/UnionTerritory']):
    folium.CircleMarker([lat, lon], radius=value*0.002, popup = ('<strong>State</strong>: ' + str(name).capitalize() + '<br>''<strong>Total Cases</strong>: ' + str(value) + '<br>'),color='red',fill_color='red',fill_opacity=0.09 ).add_to(map)
map

#Part 3: Exploring World wide data

3.1 Visualizing: Worldwide NCOVID-19 cases

world_confirmed = confirmed_df[confirmed_df.columns[-1:]].sum()
world_recovered = recovered_df[recovered_df.columns[-1:]].sum()
world_deaths = deaths_df[deaths_df.columns[-1:]].sum()
world_active = world_confirmed - (world_recovered - world_deaths)

labels = ['Active','Recovered','Deceased']
sizes = [world_active,world_recovered,world_deaths]
color= ['blue','green','red']
explode = []

for i in labels:
    explode.append(0.05)
    
plt.figure(figsize= (15,10))
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=9, explode = explode,colors = color)
centre_circle = plt.Circle((0,0),0.70,fc='white')

fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.title('World COVID-19 Cases',fontsize = 20)
plt.axis('equal')  
plt.tight_layout()

png

dates
[Timestamp('2020-01-22 00:00:00'),
 Timestamp('2020-01-23 00:00:00'),
 Timestamp('2020-01-24 00:00:00'),
 Timestamp('2020-01-25 00:00:00'),
 Timestamp('2020-01-26 00:00:00'),
 Timestamp('2020-01-27 00:00:00'),
 Timestamp('2020-01-28 00:00:00'),
 Timestamp('2020-01-29 00:00:00'),
 Timestamp('2020-01-30 00:00:00'),
 Timestamp('2020-01-31 00:00:00'),
 Timestamp('2020-02-01 00:00:00'),
 Timestamp('2020-02-02 00:00:00'),
 Timestamp('2020-02-03 00:00:00'),
 Timestamp('2020-02-04 00:00:00'),
 Timestamp('2020-02-05 00:00:00'),
 Timestamp('2020-02-06 00:00:00'),
 Timestamp('2020-02-07 00:00:00'),
 Timestamp('2020-02-08 00:00:00'),
 Timestamp('2020-02-09 00:00:00'),
 Timestamp('2020-02-10 00:00:00'),
 Timestamp('2020-02-11 00:00:00'),
 Timestamp('2020-02-12 00:00:00'),
 Timestamp('2020-02-13 00:00:00'),
 Timestamp('2020-02-14 00:00:00'),
 Timestamp('2020-02-15 00:00:00'),
 Timestamp('2020-02-16 00:00:00'),
 Timestamp('2020-02-17 00:00:00'),
 Timestamp('2020-02-18 00:00:00'),
 Timestamp('2020-02-19 00:00:00'),
 Timestamp('2020-02-20 00:00:00'),
 Timestamp('2020-02-21 00:00:00'),
 Timestamp('2020-02-22 00:00:00'),
 Timestamp('2020-02-23 00:00:00'),
 Timestamp('2020-02-24 00:00:00'),
 Timestamp('2020-02-25 00:00:00'),
 Timestamp('2020-02-26 00:00:00'),
 Timestamp('2020-02-27 00:00:00'),
 Timestamp('2020-02-28 00:00:00'),
 Timestamp('2020-02-29 00:00:00'),
 Timestamp('2020-03-01 00:00:00'),
 Timestamp('2020-03-02 00:00:00'),
 Timestamp('2020-03-03 00:00:00'),
 Timestamp('2020-03-04 00:00:00'),
 Timestamp('2020-03-05 00:00:00'),
 Timestamp('2020-03-06 00:00:00'),
 Timestamp('2020-03-07 00:00:00'),
 Timestamp('2020-03-08 00:00:00'),
 Timestamp('2020-03-09 00:00:00'),
 Timestamp('2020-03-10 00:00:00'),
 Timestamp('2020-03-11 00:00:00'),
 Timestamp('2020-03-12 00:00:00'),
 Timestamp('2020-03-13 00:00:00'),
 Timestamp('2020-03-14 00:00:00'),
 Timestamp('2020-03-15 00:00:00'),
 Timestamp('2020-03-16 00:00:00'),
 Timestamp('2020-03-17 00:00:00'),
 Timestamp('2020-03-18 00:00:00'),
 Timestamp('2020-03-19 00:00:00'),
 Timestamp('2020-03-20 00:00:00'),
 Timestamp('2020-03-21 00:00:00'),
 Timestamp('2020-03-22 00:00:00'),
 Timestamp('2020-03-23 00:00:00'),
 Timestamp('2020-03-24 00:00:00'),
 Timestamp('2020-03-25 00:00:00'),
 Timestamp('2020-03-26 00:00:00'),
 Timestamp('2020-03-27 00:00:00'),
 Timestamp('2020-03-28 00:00:00'),
 Timestamp('2020-03-29 00:00:00'),
 Timestamp('2020-03-30 00:00:00'),
 Timestamp('2020-03-31 00:00:00'),
 Timestamp('2020-04-01 00:00:00'),
 Timestamp('2020-04-02 00:00:00'),
 Timestamp('2020-04-03 00:00:00'),
 Timestamp('2020-04-04 00:00:00'),
 Timestamp('2020-04-05 00:00:00'),
 Timestamp('2020-04-06 00:00:00'),
 Timestamp('2020-04-07 00:00:00'),
 Timestamp('2020-04-08 00:00:00'),
 Timestamp('2020-04-09 00:00:00'),
 Timestamp('2020-04-10 00:00:00'),
 Timestamp('2020-04-11 00:00:00'),
 Timestamp('2020-04-12 00:00:00'),
 Timestamp('2020-04-13 00:00:00'),
 Timestamp('2020-04-14 00:00:00'),
 Timestamp('2020-04-15 00:00:00'),
 Timestamp('2020-04-16 00:00:00'),
 Timestamp('2020-04-17 00:00:00'),
 Timestamp('2020-04-18 00:00:00'),
 Timestamp('2020-04-19 00:00:00'),
 Timestamp('2020-04-20 00:00:00'),
 Timestamp('2020-04-21 00:00:00'),
 Timestamp('2020-04-22 00:00:00'),
 Timestamp('2020-04-23 00:00:00'),
 Timestamp('2020-04-24 00:00:00'),
 Timestamp('2020-04-25 00:00:00'),
 Timestamp('2020-04-26 00:00:00'),
 Timestamp('2020-04-27 00:00:00'),
 Timestamp('2020-04-28 00:00:00'),
 Timestamp('2020-04-29 00:00:00'),
 Timestamp('2020-04-30 00:00:00'),
 Timestamp('2020-05-01 00:00:00'),
 Timestamp('2020-05-02 00:00:00'),
 Timestamp('2020-05-03 00:00:00'),
 Timestamp('2020-05-04 00:00:00'),
 Timestamp('2020-05-05 00:00:00'),
 Timestamp('2020-05-06 00:00:00'),
 Timestamp('2020-05-07 00:00:00'),
 Timestamp('2020-05-08 00:00:00'),
 Timestamp('2020-05-09 00:00:00'),
 Timestamp('2020-05-10 00:00:00'),
 Timestamp('2020-05-11 00:00:00'),
 Timestamp('2020-05-12 00:00:00'),
 Timestamp('2020-05-13 00:00:00'),
 Timestamp('2020-05-14 00:00:00'),
 Timestamp('2020-05-15 00:00:00'),
 Timestamp('2020-05-16 00:00:00'),
 Timestamp('2020-05-17 00:00:00'),
 Timestamp('2020-05-18 00:00:00'),
 Timestamp('2020-05-19 00:00:00'),
 Timestamp('2020-05-20 00:00:00'),
 Timestamp('2020-05-21 00:00:00'),
 Timestamp('2020-05-22 00:00:00'),
 Timestamp('2020-05-23 00:00:00'),
 Timestamp('2020-05-24 00:00:00'),
 Timestamp('2020-05-25 00:00:00'),
 Timestamp('2020-05-26 00:00:00'),
 Timestamp('2020-05-27 00:00:00'),
 Timestamp('2020-05-28 00:00:00'),
 Timestamp('2020-05-29 00:00:00'),
 Timestamp('2020-05-30 00:00:00'),
 Timestamp('2020-05-31 00:00:00'),
 Timestamp('2020-06-01 00:00:00'),
 Timestamp('2020-06-02 00:00:00'),
 Timestamp('2020-06-03 00:00:00'),
 Timestamp('2020-06-04 00:00:00'),
 Timestamp('2020-06-05 00:00:00'),
 Timestamp('2020-06-06 00:00:00'),
 Timestamp('2020-06-07 00:00:00'),
 Timestamp('2020-06-08 00:00:00'),
 Timestamp('2020-06-09 00:00:00'),
 Timestamp('2020-06-10 00:00:00'),
 Timestamp('2020-06-11 00:00:00'),
 Timestamp('2020-06-12 00:00:00'),
 Timestamp('2020-06-13 00:00:00'),
 Timestamp('2020-06-14 00:00:00'),
 Timestamp('2020-06-15 00:00:00'),
 Timestamp('2020-06-16 00:00:00'),
 Timestamp('2020-06-17 00:00:00'),
 Timestamp('2020-06-18 00:00:00'),
 Timestamp('2020-06-19 00:00:00'),
 Timestamp('2020-06-20 00:00:00'),
 Timestamp('2020-06-21 00:00:00'),
 Timestamp('2020-06-22 00:00:00'),
 Timestamp('2020-06-23 00:00:00'),
 Timestamp('2020-06-24 00:00:00'),
 Timestamp('2020-06-25 00:00:00'),
 Timestamp('2020-06-26 00:00:00'),
 Timestamp('2020-06-27 00:00:00'),
 Timestamp('2020-06-28 00:00:00'),
 Timestamp('2020-06-29 00:00:00'),
 Timestamp('2020-06-30 00:00:00'),
 Timestamp('2020-07-01 00:00:00'),
 Timestamp('2020-07-02 00:00:00'),
 Timestamp('2020-07-03 00:00:00'),
 Timestamp('2020-07-04 00:00:00'),
 Timestamp('2020-07-05 00:00:00'),
 Timestamp('2020-07-06 00:00:00'),
 Timestamp('2020-07-07 00:00:00'),
 Timestamp('2020-07-08 00:00:00'),
 Timestamp('2020-07-09 00:00:00'),
 Timestamp('2020-07-10 00:00:00'),
 Timestamp('2020-07-11 00:00:00'),
 Timestamp('2020-07-12 00:00:00'),
 Timestamp('2020-07-13 00:00:00'),
 Timestamp('2020-07-14 00:00:00'),
 Timestamp('2020-07-15 00:00:00'),
 Timestamp('2020-07-16 00:00:00'),
 Timestamp('2020-07-17 00:00:00'),
 Timestamp('2020-07-18 00:00:00'),
 Timestamp('2020-07-19 00:00:00'),
 Timestamp('2020-07-20 00:00:00'),
 Timestamp('2020-07-21 00:00:00'),
 Timestamp('2020-07-22 00:00:00'),
 Timestamp('2020-07-23 00:00:00'),
 Timestamp('2020-07-24 00:00:00'),
 Timestamp('2020-07-25 00:00:00'),
 Timestamp('2020-07-26 00:00:00'),
 Timestamp('2020-07-27 00:00:00'),
 Timestamp('2020-07-28 00:00:00'),
 Timestamp('2020-07-29 00:00:00'),
 Timestamp('2020-07-30 00:00:00'),
 Timestamp('2020-07-31 00:00:00'),
 Timestamp('2020-08-01 00:00:00'),
 Timestamp('2020-08-02 00:00:00'),
 Timestamp('2020-08-03 00:00:00'),
 Timestamp('2020-08-04 00:00:00'),
 Timestamp('2020-08-05 00:00:00'),
 Timestamp('2020-08-06 00:00:00'),
 Timestamp('2020-08-07 00:00:00'),
 Timestamp('2020-08-08 00:00:00'),
 Timestamp('2020-08-09 00:00:00'),
 Timestamp('2020-08-10 00:00:00'),
 Timestamp('2020-08-11 00:00:00'),
 Timestamp('2020-08-12 00:00:00'),
 Timestamp('2020-08-13 00:00:00'),
 Timestamp('2020-08-14 00:00:00'),
 Timestamp('2020-08-15 00:00:00'),
 Timestamp('2020-08-16 00:00:00'),
 Timestamp('2020-08-17 00:00:00'),
 Timestamp('2020-08-18 00:00:00'),
 Timestamp('2020-08-19 00:00:00'),
 Timestamp('2020-08-20 00:00:00'),
 Timestamp('2020-08-21 00:00:00'),
 Timestamp('2020-08-22 00:00:00'),
 Timestamp('2020-08-23 00:00:00'),
 Timestamp('2020-08-24 00:00:00'),
 Timestamp('2020-08-25 00:00:00'),
 Timestamp('2020-08-26 00:00:00'),
 Timestamp('2020-08-27 00:00:00'),
 Timestamp('2020-08-28 00:00:00'),
 Timestamp('2020-08-29 00:00:00'),
 Timestamp('2020-08-30 00:00:00'),
 Timestamp('2020-08-31 00:00:00'),
 Timestamp('2020-09-01 00:00:00'),
 Timestamp('2020-09-02 00:00:00'),
 Timestamp('2020-09-03 00:00:00'),
 Timestamp('2020-09-04 00:00:00'),
 Timestamp('2020-09-05 00:00:00'),
 Timestamp('2020-09-06 00:00:00'),
 Timestamp('2020-09-07 00:00:00'),
 Timestamp('2020-09-08 00:00:00'),
 Timestamp('2020-09-09 00:00:00'),
 Timestamp('2020-09-10 00:00:00'),
 Timestamp('2020-09-11 00:00:00'),
 Timestamp('2020-09-12 00:00:00'),
 Timestamp('2020-09-13 00:00:00'),
 Timestamp('2020-09-14 00:00:00'),
 Timestamp('2020-09-15 00:00:00'),
 Timestamp('2020-09-16 00:00:00'),
 Timestamp('2020-09-17 00:00:00'),
 Timestamp('2020-09-18 00:00:00'),
 Timestamp('2020-09-19 00:00:00'),
 Timestamp('2020-09-20 00:00:00'),
 Timestamp('2020-09-21 00:00:00'),
 Timestamp('2020-09-22 00:00:00'),
 Timestamp('2020-09-23 00:00:00'),
 Timestamp('2020-09-24 00:00:00'),
 Timestamp('2020-09-25 00:00:00'),
 Timestamp('2020-09-26 00:00:00'),
 Timestamp('2020-09-27 00:00:00'),
 Timestamp('2020-09-28 00:00:00'),
 Timestamp('2020-09-29 00:00:00'),
 Timestamp('2020-09-30 00:00:00')]
hotspots = ['China','Germany','Iran','Italy','Spain','US','Korea, South','France','Turkey','United Kingdom','India']
dates = list(confirmed_df.columns[4:])
dates = list(pd.to_datetime(dates))
dates_india = dates[8:]

df1 = confirmed_df.groupby('Country/Region').sum().reset_index()
df2 = deaths_df.groupby('Country/Region').sum().reset_index()
df3 = recovered_df.groupby('Country/Region').sum().reset_index()

global_confirmed = {}
global_deaths = {}
global_recovered = {}
global_active= {}

for country in hotspots:
    k =df1[df1['Country/Region'] == country].loc[:,'1/30/20':]
    global_confirmed[country] = k.values.tolist()[0]

    k =df2[df2['Country/Region'] == country].loc[:,'1/30/20':]
    global_deaths[country] = k.values.tolist()[0]

    k =df3[df3['Country/Region'] == country].loc[:,'1/30/20':]
    global_recovered[country] = k.values.tolist()[0]
    
# for country in hotspots:
#     k = list(map(int.__sub__, global_confirmed[country], global_deaths[country]))
#     global_active[country] = list(map(int.__sub__, k, global_recovered[country]))
    
fig = plt.figure(figsize= (15,25))
plt.suptitle('Active, Recovered, Deaths in Hotspot Countries and India as of '+ today,fontsize = 20,y=1.0)
#plt.legend()
k=0
for i in range(1,12):
    ax = fig.add_subplot(6,2,i)
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
    # ax.bar(dates_india,global_active[hotspots[k]],color = 'green',alpha = 0.6,label = 'Active');
    ax.bar(dates_india,global_confirmed[hotspots[k]],color='blue',label = 'Confirmed');
    ax.bar(dates_india,global_recovered[hotspots[k]],color='grey',label = 'Recovered');
    ax.bar(dates_india,global_deaths[hotspots[k]],color='red',label = 'Death');   
    plt.title(hotspots[k])
    handles, labels = ax.get_legend_handles_labels()
    fig.legend(handles, labels, loc='upper left')
    k=k+1

plt.tight_layout(pad=3.0)

png

countries = ['China','Germany','Iran','Italy','Spain','US','Korea, South','France','United Kingdom','India']

global_confirmed = []
global_recovered = []
global_deaths = []

for country in countries:
    k =df1[df1['Country/Region'] == country].loc[:,'1/30/20':]
    global_confirmed.append(k.values.tolist()[0]) 

    k =df2[df2['Country/Region'] == country].loc[:,'1/30/20':]
    global_deaths.append(k.values.tolist()[0]) 

    k =df3[df3['Country/Region'] == country].loc[:,'1/30/20':]
    global_deaths.append(k.values.tolist()[0])  
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 11)
plt.yticks(fontsize = 10)
plt.xlabel("Dates",fontsize = 20)
plt.ylabel('Total cases',fontsize = 20)
plt.title("Comparison with other Countries" , fontsize = 20)

for i in range(len(countries)):
    plt.plot_date(y= global_confirmed[i],x= dates_india,label = countries[i])
plt.legend();

png

COVID-19 Symptoms

alt text

SOURCE: www.cdc.gov/coronavirus

Data Source:

The latest data can also be extracted from the available APIs and reading the json. Below are the API list that have been provided by crowd sourced. Extract and use these data to find meaningful insights.

Extracting data from Hirokuapp

api = pd.read_json('https://corona-virus-stats.herokuapp.com/api/v1/cases/countries-search')
json_data = api['data']['rows']
from pandas.io.json import json_normalize

data = json_normalize(json_data)
data
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
country country_abbreviation total_cases new_cases total_deaths new_deaths total_recovered active_cases serious_critical cases_per_mill_pop flag
0 World 4,525,103 3,077 303,351 269 1,703,742 2,518,010 45,560 581.0 https://upload.wikimedia.org/wikipedia/commons...
1 USA US 1,457,593 0 86,912 0 318,027 1,052,654 16,240 4,404.0 https://www.worldometers.info/img/flags/us-fla...
2 Spain ES 272,646 0 27,321 0 186,480 58,845 1,376 5,831.0 https://www.worldometers.info/img/flags/sp-fla...
3 Russia RU 252,245 0 2,305 0 53,530 196,410 2,300 1,728.0 https://www.worldometers.info/img/flags/rs-fla...
4 UK GB 233,151 0 33,614 0 N/A 199,193 1,559 3,434.0 https://www.worldometers.info/img/flags/uk-fla...
5 Italy IT 223,096 0 31,368 0 115,288 76,440 855 3,690.0 https://www.worldometers.info/img/flags/it-fla...
6 Brazil BR 203,165 247 13,999 6 79,479 109,687 8,318 956.0 https://www.worldometers.info/img/flags/br-fla...
7 France FR 178,870 0 27,425 0 59,605 91,840 2,299 2,740.0 https://www.worldometers.info/img/flags/fr-fla...
8 Germany DE 174,975 0 7,928 0 150,300 16,747 1,329 2,088.0 https://www.worldometers.info/img/flags/gm-fla...
9 Turkey TR 144,749 0 4,007 0 104,030 36,712 963 1,716.0 https://www.worldometers.info/img/flags/tu-fla...

Collecting Data for Statewise Insights

# to parse json contents
import json
# to parse csv files
import csv
import requests
# get response from the web page for LIVE data
response = requests.get('https://api.covid19india.org/raw_data3.json')
# get contents from the response
content = response.content
# parse the json file
parsed = json.loads(content)
# keys
parsed.keys()
dict_keys(['raw_data'])
# save to df
df = pd.DataFrame(parsed['raw_data'])

# shape of the dataframe
print(df.shape)

# # list of columns
print(df.columns)

# # first few rows
df.head()
(10020, 20)
Index(['agebracket', 'contractedfromwhichpatientsuspected', 'currentstatus',
       'dateannounced', 'detectedcity', 'detecteddistrict', 'detectedstate',
       'entryid', 'gender', 'nationality', 'notes', 'numcases',
       'patientnumber', 'source1', 'source2', 'source3', 'statecode',
       'statepatientnumber', 'statuschangedate', 'typeoftransmission'],
      dtype='object')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
agebracket contractedfromwhichpatientsuspected currentstatus dateannounced detectedcity detecteddistrict detectedstate entryid gender nationality notes numcases patientnumber source1 source2 source3 statecode statepatientnumber statuschangedate typeoftransmission
0 Hospitalized 27/04/2020 West Bengal 1 Details awaited 38 27892 mohfw.gov.in WB
1 Hospitalized 27/04/2020 Bhilwara Rajasthan 2 Details awaited 2 27893 https://twitter.com/ANI/status/125461859651442... RJ
2 Hospitalized 27/04/2020 Jaipur Rajasthan 3 Details awaited 9 27894 https://twitter.com/ANI/status/125461859651442... RJ
3 28 Deceased 27/04/2020 Surajpol Jaipur Rajasthan 4 M Details awaited 1 27895 https://twitter.com/ANI/status/125461859651442... RJ
4 Hospitalized 27/04/2020 Jaisalmer Rajasthan 5 Details awaited 1 27896 https://twitter.com/ANI/status/125461859651442... RJ
# creating patient id column from patient number
# ===============================================

df['p_id'] = df['patientnumber'].apply(lambda x : 'P'+str(x))
df.columns
Index(['agebracket', 'contractedfromwhichpatientsuspected', 'currentstatus',
       'dateannounced', 'detectedcity', 'detecteddistrict', 'detectedstate',
       'entryid', 'gender', 'nationality', 'notes', 'numcases',
       'patientnumber', 'source1', 'source2', 'source3', 'statecode',
       'statepatientnumber', 'statuschangedate', 'typeoftransmission', 'p_id'],
      dtype='object')

Rearrange and rename columns

# order of columns
cols = ['patientnumber', 'p_id', 'statepatientnumber', 
        'dateannounced', 'agebracket', 'gender', 
        'detectedcity', 'detecteddistrict', 'detectedstate', 'statecode', 'nationality',
        'typeoftransmission', 'contractedfromwhichpatientsuspected',
        'statuschangedate', 'currentstatus', 'source1', 'source2', 'source3', 'notes']

# rearrange columns
df = df[cols]

# rename columns
df.columns = ['patient_number', 'p_id', 'state_patient_number', 
              'date_announced', 'age_bracket', 'gender', 
              'detected_city', 'detected_district', 'detected_state', 'state_code', 'nationality',
              'type_of_transmission', 'contracted_from_which_patient_suspected',
              'status_change_date', 'current_status', 'source1', 'source2', 'source3', 'notes']

# dataframe shape
df.shape
(10020, 19)
# first 3 rows of the dataframe
df.head(3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
patient_number p_id state_patient_number date_announced age_bracket gender detected_city detected_district detected_state state_code nationality type_of_transmission contracted_from_which_patient_suspected status_change_date current_status source1 source2 source3 notes
0 27892 P27892 27/04/2020 West Bengal WB Hospitalized mohfw.gov.in Details awaited
1 27893 P27893 27/04/2020 Bhilwara Rajasthan RJ Hospitalized https://twitter.com/ANI/status/125461859651442... Details awaited
2 27894 P27894 27/04/2020 Jaipur Rajasthan RJ Hospitalized https://twitter.com/ANI/status/125461859651442... Details awaited

Missing values

# no. of empty values in each column
# ==================================

print(df.shape, '\n')

for i in df.columns:
    print(i, '\t', df[df[i]==''].shape[0])
(10020, 19) 

patient_number 	 12
p_id 	 0
state_patient_number 	 4882
date_announced 	 0
age_bracket 	 4837
gender 	 5366
detected_city 	 9599
detected_district 	 86
detected_state 	 5
state_code 	 5
nationality 	 10020
type_of_transmission 	 10020
contracted_from_which_patient_suspected 	 9779
status_change_date 	 10020
current_status 	 0
source1 	 65
source2 	 9940
source3 	 9989
notes 	 8089
# no. of non-empty values in each column
# ===================================

print(df.shape, '\n')

for i in df.columns:
    print(i, '\t', df[df[i]!=''].shape[0])
(10020, 19) 

patient_number 	 10008
p_id 	 10020
state_patient_number 	 5138
date_announced 	 10020
age_bracket 	 5183
gender 	 4654
detected_city 	 421
detected_district 	 9934
detected_state 	 10015
state_code 	 10015
nationality 	 0
type_of_transmission 	 0
contracted_from_which_patient_suspected 	 241
status_change_date 	 0
current_status 	 10020
source1 	 9955
source2 	 80
source3 	 31
notes 	 1931
# replacing empty strings with np.nan
# ==================================-

print(df.shape)

df = df.replace(r'', np.nan, regex=True)
df.isna().sum()
(10020, 19)





patient_number                                12
p_id                                           0
state_patient_number                        4882
date_announced                                 0
age_bracket                                 4837
gender                                      5366
detected_city                               9599
detected_district                             86
detected_state                                 5
state_code                                     5
nationality                                10020
type_of_transmission                       10020
contracted_from_which_patient_suspected     9779
status_change_date                         10020
current_status                                 0
source1                                       65
source2                                     9940
source3                                     9989
notes                                       8089
dtype: int64
# droping empty rows (row with just row number but without patient entry
# ======================================================================

print(df.shape)

# df.dropna(subset=['detected_state'], inplace=True)

print(df.shape)
df.isna().sum()
(10020, 19)
(10020, 19)





patient_number                                12
p_id                                           0
state_patient_number                        4882
date_announced                                 0
age_bracket                                 4837
gender                                      5366
detected_city                               9599
detected_district                             86
detected_state                                 5
state_code                                     5
nationality                                10020
type_of_transmission                       10020
contracted_from_which_patient_suspected     9779
status_change_date                         10020
current_status                                 0
source1                                       65
source2                                     9940
source3                                     9989
notes                                       8089
dtype: int64

Save data

# save to csv`
df.to_csv('patients_data.csv', index=False)

Collecting Some more Statewise Data

# get response from the web page
response = requests.get('https://api.covid19india.org/state_test_data.json')

# get contents from the response
content = response.content

# parse the json file
parsed = json.loads(content)

# keys
parsed.keys()
dict_keys(['states_tested_data'])
# get response from the web page
response = requests.get('https://api.covid19india.org/state_test_data.json')

# get contents from the response
content = response.content

# parse the json file
parsed = json.loads(content)

# keys
parsed.keys()
dict_keys(['states_tested_data'])
# save data in a dataframe
th = pd.DataFrame(parsed['states_tested_data'])

# first few rows
th
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
antigentests coronaenquirycalls cumulativepeopleinquarantine negative numcallsstatehelpline numicubeds numisolationbeds numventilators othertests peopleinicu ... testsperpositivecase testsperthousand totaln95masks totalpeoplecurrentlyinquarantine totalpeoplereleasedfromquarantine totalppe totaltested unconfirmed updatedon _djhdx
0 1210 50 ... 117 3.53 1403 181 17/04/2020 NaN
1 280 50 ... 99 6.75 614 347 2679 246 24/04/2020 NaN
2 298 50 ... 86 7.17 724 420 2848 106 27/04/2020 NaN
3 340 50 ... 114 9.46 643 556 3754 199 01/05/2020 NaN
4 471 98 ... 202 16.82 16 1196 6677 136 16/05/2020 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5803 1057947 1243 12675 790 ... 2298040 2444 107697 2370262 3098657 27/09/2020 NaN
5804 1082706 1243 12675 790 ... 2308040 2447 107712 2375262 3139938 28/09/2020 NaN
5805 1107750 1243 12675 790 ... 2323040 2439 107706 2381262 3183697 29/09/2020 NaN
5806 1131535 1243 12715 790 ... 2335040 2442 107721 2387262 3227462 30/09/2020 NaN
5807 1155254 1243 12715 790 ... 2350040 2439 107726 2393262 3271316 01/10/2020 NaN

5808 rows × 32 columns

th.columns
Index(['antigentests', 'coronaenquirycalls', 'cumulativepeopleinquarantine',
       'negative', 'numcallsstatehelpline', 'numicubeds', 'numisolationbeds',
       'numventilators', 'othertests', 'peopleinicu', 'peopleonventilators',
       'populationncp2019projection', 'positive', 'rtpcrtests', 'source1',
       'source2', 'source3', 'state', 'tagpeopleinquarantine',
       'tagtotaltested', 'testpositivityrate', 'testspermillion',
       'testsperpositivecase', 'testsperthousand', 'totaln95masks',
       'totalpeoplecurrentlyinquarantine', 'totalpeoplereleasedfromquarantine',
       'totalppe', 'totaltested', 'unconfirmed', 'updatedon', '_djhdx'],
      dtype='object')
# save to csv`
th.to_csv('tests_latest_state_level.csv', index=False)
# to get web contents
import requests
# to parse json contents
import json
# to parse csv files
import csv

Zones

# get response from the web page
response = requests.get('https://api.covid19india.org/zones.json')

# get contents from the response
content = response.content

# parse the json file
parsed = json.loads(content)

# keys
parsed.keys()
dict_keys(['zones'])
zo = pd.DataFrame(parsed['zones'])
zo.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
district districtcode lastupdated source state statecode zone
0 Nicobars AN_Nicobars 01/05/2020 https://www.facebook.com/airnewsalerts/photos/... Andaman and Nicobar Islands AN Green
1 North and Middle Andaman AN_North and Middle Andaman 01/05/2020 https://www.facebook.com/airnewsalerts/photos/... Andaman and Nicobar Islands AN Green
2 South Andaman AN_South Andaman 01/05/2020 https://www.facebook.com/airnewsalerts/photos/... Andaman and Nicobar Islands AN Red
3 Anantapur AP_Anantapur 01/05/2020 https://www.facebook.com/airnewsalerts/photos/... Andhra Pradesh AP Orange
4 Chittoor AP_Chittoor 01/05/2020 https://www.facebook.com/airnewsalerts/photos/... Andhra Pradesh AP Red
# save to csv`
zo.to_csv('zones.csv', index=False)

National level daily

response = requests.get('https://api.covid19india.org/data.json')
content = response.content
parsed = json.loads(content)
parsed.keys()
dict_keys(['cases_time_series', 'statewise', 'tested'])
national = pd.DataFrame(parsed['cases_time_series'])
national.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
dailyconfirmed dailydeceased dailyrecovered date totalconfirmed totaldeceased totalrecovered
0 1 0 0 30 January 1 0 0
1 0 0 0 31 January 1 0 0
2 0 0 0 01 February 1 0 0
3 1 0 0 02 February 2 0 0
4 1 0 0 03 February 3 0 0
national.columns
Index(['dailyconfirmed', 'dailydeceased', 'dailyrecovered', 'date',
       'totalconfirmed', 'totaldeceased', 'totalrecovered'],
      dtype='object')
national = national[['date', 'totalconfirmed', 'totaldeceased', 'totalrecovered', 
                     'dailyconfirmed', 'dailydeceased', 'dailyrecovered']]
national.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date totalconfirmed totaldeceased totalrecovered dailyconfirmed dailydeceased dailyrecovered
0 30 January 1 0 0 1 0 0
1 31 January 1 0 0 0 0 0
2 01 February 1 0 0 0 0 0
3 02 February 2 0 0 1 0 0
4 03 February 3 0 0 1 0 0
# save to csv`
national.to_csv('nation_level_daily.csv', index=False)

National level latest

state_level = pd.DataFrame(parsed['statewise'])
state_level.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
active confirmed deaths deltaconfirmed deltadeaths deltarecovered lastupdatedtime migratedother recovered state statecode statenotes
0 945551 6397896 99833 5936 29 2941 02/10/2020 12:20:45 918 5351594 Total TT
1 259006 1400922 37056 0 0 0 01/10/2020 23:39:43 434 1104426 Maharashtra MH [Sep 9] :239 cases have been removed from the ...
2 57858 700235 5869 0 0 0 01/10/2020 22:07:48 0 636508 Andhra Pradesh AP
3 46369 603290 9586 0 0 0 01/10/2020 18:46:44 0 547335 Tamil Nadu TN [July 22]: 444 backdated deceased entries adde...
4 110412 611837 8994 0 0 0 01/10/2020 22:07:49 19 492412 Karnataka KA
state_level.columns
Index(['active', 'confirmed', 'deaths', 'deltaconfirmed', 'deltadeaths',
       'deltarecovered', 'lastupdatedtime', 'migratedother', 'recovered',
       'state', 'statecode', 'statenotes'],
      dtype='object')
state_level = state_level[['state', 'statecode', 'lastupdatedtime',  
                           'confirmed', 'active', 'deaths', 'recovered',
                           'deltaconfirmed', 'deltadeaths', 'deltarecovered', 'statenotes']]
state_level.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
state statecode lastupdatedtime confirmed active deaths recovered deltaconfirmed deltadeaths deltarecovered statenotes
0 Total TT 02/10/2020 12:20:45 6397896 945551 99833 5351594 5936 29 2941
1 Maharashtra MH 01/10/2020 23:39:43 1400922 259006 37056 1104426 0 0 0 [Sep 9] :239 cases have been removed from the ...
2 Andhra Pradesh AP 01/10/2020 22:07:48 700235 57858 5869 636508 0 0 0
3 Tamil Nadu TN 01/10/2020 18:46:44 603290 46369 9586 547335 0 0 0 [July 22]: 444 backdated deceased entries adde...
4 Karnataka KA 01/10/2020 22:07:49 611837 110412 8994 492412 0 0 0
# save to csv`
state_level.to_csv('state_level_latest.csv', index=False)

About

This notebook involves scraping government websites hosting live coronavirus case updates in India to utilize in forecasting the future cases and preparedness for the pandemic.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published