Bing COVID-19
Bing COVID-19 data includes confirmed, fatal, and recovered cases from all regions, updated daily. The Bing COVID-19 Tracker reflects this data.
Bing collects data from multiple trusted, reliable sources, including:
- BNO News
- Centers for Disease Control and Prevention (CDC)
- National/regional and state public health departments
- Wikipedia
- The World Health Organization (WHO)
- 24/7 Wall St.
Note
Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.
This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.
Datasets
Modified Bing COVID-19 datasets are available in CSV, JSON, JSON-Lines, and Parquet:
All modified datasets have ISO 3166 subdivision codes and load times added. They use lower case column names with underscore separators.
Earlier versions of modified and raw data are available at this resource.
Data volume
All datasets receive daily updates. As of March 5, 2023 they contained 4,766,737 rows. The dataset is available in these file formats:
- CSV (560.3 MB)
- JSON (1515.6 MB)
- JSONL (1506.2 MB)
- Parquet (55.4 MB)
License and use rights attribution
The data is available strictly for educational and academic purposes under these terms and conditions. Valid purposes include:
- academic institutions
- government agencies
- medical research
Data used or cited in publications should include an attribution to ‘Bing COVID-19 Tracker’, with a link to www.bing.com/covid.
Contact
For any questions or feedback about this or other datasets in the COVID-19 Data Lake contact askcovid19dl@microsoft.com.
Columns
Name | Data type | Unique | Values (sample) | Description |
---|---|---|---|---|
admin_region_1 | string | 864 | Texas Georgia | Region within country_region |
admin_region_2 | string | 3,143 | Washington County Jefferson County | Region within admin_region_1 |
confirmed | int | 120,692 | 1 2 | Confirmed case count for the region |
confirmed_change | int | 12,120 | 1 2 | Change of confirmed case count from the previous day |
country_region | string | 237 | United States India | Country/region |
deaths | int | 20,616 | 1 2 | Death case count for the region |
deaths_change | smallint | 1,981 | 1 2 | Change of death count from the previous day |
id | int | 1,783,534 | 742546 69019298 | Unique identifier |
iso_subdivision | string | 484 | US-TX US-GA | Two-part ISO subdivision code |
iso2 | string | 226 | US IN | 2 letter country code identifier |
iso3 | string | 226 | USA IND | 3 letter country code identifier |
latitude | double | 5,675 | 42.28708 19.59852 | Latitude of the centroid of the region |
load_time | timestamp | 1 | 2021-04-26 00:06:34.719000 | The date and time the file was loaded from the Bing source on GitHub |
longitude | double | 5,693 | -2.5396 -155.5186 | Longitude of the centroid of the region |
recovered | int | 73,287 | 1 2 | Recovered count for the region |
recovered_change | int | 10,441 | 1 2 | Change of recovered case count from the previous day |
updated | date | 457 | 2021-04-23 2021-04-22 | The as at date for the record |
Preview
id | updated | confirmed | deaths | iso2 | iso3 | country_region | admin_region_1 | iso_subdivision | admin_region_2 | load_time | confirmed_change | deaths_change |
---|---|---|---|---|---|---|---|---|---|---|---|---|
338995 | 2020-01-21 | 262 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | ||
338996 | 2020-01-22 | 313 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 51 | 0 |
338997 | 2020-01-23 | 578 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 265 | 0 |
338998 | 2020-01-24 | 841 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 263 | 0 |
338999 | 2020-01-25 | 1320 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 479 | 0 |
339000 | 2020-01-26 | 2014 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 694 | 0 |
339001 | 2020-01-27 | 2798 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 784 | 0 |
339002 | 2020-01-28 | 4593 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 1795 | 0 |
339003 | 2020-01-29 | 6065 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 1472 | 0 |
339004 | 2020-01-30 | 7818 | 0 | null | null | Worldwide | null | null | null | 4/26/2021 12:06:34 AM | 1753 | 0 |
Data access - Azure Notebooks
Note
This notebook documents the URLs and sample code to access the Bing COVID-19 Dataset.
Use these URLs to get specific file formats hosted on Azure Blob Storage:
Download the dataset file using the built-in capability of Pandas to download from an HTTP URL. Pandas has readers for various file formats:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
df = pd.read_parquet("https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet")
df.head(10)
To verify that the updated column has a datetime format, check the data types of the various fields:
df.dtypes
Review the Worldwide data. To visualize the data, build some charts:
df_Worldwide=df[df['country_region']=='Worldwide']
df_Worldwide_pivot=df_Worldwide.pivot_table(df_Worldwide, index=['country_region','updated'])
df_Worldwide_pivot
df_Worldwide.plot(kind='line',x='updated',y="confirmed",grid=True)
df_Worldwide.plot(kind='line',x='updated',y="deaths",grid=True)
df_Worldwide.plot(kind='line',x='updated',y="confirmed_change",grid=True)
df_Worldwide.plot(kind='line',x='updated',y="deaths_change",grid=True)
Data Access - Azure Databricks
A sample isn't available for this platform / package combination.
Data Access - Azure Synapse
A sample isn't available for this platform / package combination.
Next steps
View the rest of the datasets in the Open Datasets catalog.