COVID Tracking project
The COVID Tracking Project dataset provides the latest numbers on tests, confirmed cases, hospitalizations, and patient outcomes from every US state and territory.
For more information about this dataset, see the project GitHub repository.
Note
Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.
This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.
Datasets
Modified versions of the dataset are available in CSV, JSON, JSON-Lines, and Parquet.
- https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/covid_tracking/latest/covid_tracking.csv
- https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/covid_tracking/latest/covid_tracking.json
- https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/covid_tracking/latest/covid_tracking.jsonl
- https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/covid_tracking/latest/covid_tracking.parquet
All modified versions have ISO 3166 subdivision codes and load times added, and use lower case column names with underscore separators.
Raw data: 'https://pandemicdatalake.blob.core.windows.net/public/raw/covid-19/covid_tracking/latest/daily.json'
Previous versions of modified and raw data: https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/covid_tracking/
https://pandemicdatalake.blob.core.windows.net/public/raw/covid-19/covid_tracking/
Data volume
All datasets are updated daily. As of May 13, 2020 they contained 4,100 rows (CSV 574 KB, JSON 1.8 MB, JSONL 1.8 MB, Parquet 334 KB).
Data source
This data is originally published by the COVID Tracking Project at the Atlantic. Raw data is ingested from the COVID Tracking GitHub repo using the states_daily_4p_et.csv file. For more information on this dataset including its origins from the COVID Tracking Project API, see the project GitHub repository.
Data quality
COVID Tracking Project grades the data quality for each state and provides further information about their assessment of the quality of the data. For more information, see the COVID Tracking Project data page. Data in the GitHub repository may be an hour behind the API; use of the API is necessary to access the most recent data.
License and use rights attribution
This data is licensed under the terms and conditions of the Apache License 2.0.
Any use of the data must retain all copyright, patent, trademark, and attribution notices.
Contact
For any questions or feedback about this or other datasets in the COVID-19 Data Lake, contact askcovid19dl@microsoft.com.
Columns
Name | Data type | Unique | Values (sample) | Description |
---|---|---|---|---|
date | date | 420 | 2020-11-10 2021-01-30 | Date for which the daily totals were collected. |
date_checked | string | 9,487 | 2020-12-01T00:00:00Z 2020-09-01T00:00:00Z | Deprecated |
death | smallint | 7,327 | 2 5 | Total number of people who have died as a result of COVID-19 so far. |
death_increase | smallint | 429 | 1 2 | Deprecated |
fips | smallint | 56 | 26 55 | Census FIPS code for the state. |
fips_code | string | 60 | 53 25 | Census FIPS code for the state. |
hash | string | 20,780 | 63df8cccd23a5476bab2d8111b138e4c9becd35e c606cd6990f16086b5382e12d84f6206172d493d | A hash for this record |
hospitalized | int | 7,641 | 89995 4 | Deprecated |
hospitalized_cumulative | int | 7,641 | 89995 4 | Total number of people who have gone to the hospital for COVID-19 so far, including those who have since recovered or died. |
hospitalized_currently | smallint | 3,886 | 8 13 | Number of people in hospital for COVID-19 on this day. |
hospitalized_increase | smallint | 615 | 1 2 | Deprecated |
in_icu_cumulative | smallint | 2,295 | 990 220 | Total number of people who have gone to the ICU for COVID-19 so far, including those who have since recovered or died. |
in_icu_currently | smallint | 1,643 | 2 8 | Total number of people in the ICU for COVID-19 on this day. |
iso_country | string | 1 | US | ISO 3166 country or region code |
iso_subdivision | string | 57 | US-UM US-WA | ISO 3166 subdivision code |
last_update_et | timestamp | 9,487 | 2020-12-01 00:00:00 2020-09-01 00:00:00 | Last time the day’s data was updated |
load_time | timestamp | 1 | 2021-04-26 00:06:49.883000 | Date and time the data was loaded to Azure from the source |
negative | int | 10,864 | 305972 2140 | Total number of people who have tested negative for COVID-19 so far. |
negative_increase | int | 7,328 | 6 17 | Deprecated |
on_ventilator_cumulative | smallint | 677 | 411 412 | Total number of people who have used a ventilator for COVID-19 so far, including those who have since recovered or died. |
on_ventilator_currently | smallint | 837 | 4 10 | Number of people using a ventilator for COVID-19 on this day. |
pending | smallint | 944 | 2 17 | Number of tests whose results have yet to be determined. |
pos_neg | int | 18,282 | 2140 2 | Deprecated |
positive | int | 16,837 | 2 1 | Total number of people who have tested positive for COVID-19 so far. |
positive_increase | smallint | 4,754 | 1 2 | Deprecated |
recovered | int | 8,286 | 29 19 | Total number of people who have recovered from COVID-19 so far. |
state | string | 56 | MI PA | Two-letter code for the state. |
total | int | 18,283 | 2140 2 | Deprecated |
total_test_results | int | 18,648 | 2140 3 | Total test results provided by the State |
total_test_results_increase | int | 13,463 | 1 2 | Deprecated |
Preview
date | state | positive | hospitalized_currently | hospitalized_cumulative | on_ventilator_currently | data_quality_grade | last_update_et | hash | date_checked | death | hospitalized | total | total_test_results | pos_neg | fips | death_increase | hospitalized_increase | negative_increase | positive_increase | total_test_results_increase | fips_code | iso_subdivision | load_time | iso_country | negative | in_icu_cumulative | on_ventilator_cumulative | recovered | in_icu_currently |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-03-07 | AK | 56886 | 33 | 1293 | 2 | null | 3/5/2021 3:59:00 AM | dc4bccd4bb885349d7e94d6fed058e285d4be164 | 3/5/2021 3:59:00 AM | 305 | 1293 | 56886 | 1731628 | 56886 | 2 | 0 | 0 | 0 | 0 | 0 | 2 | US-AK | 4/26/2021 12:06:49 AM | US | |||||
2021-03-07 | AL | 499819 | 494 | 45976 | null | 3/7/2021 11:00:00 AM | 997207b430824ea40b8eb8506c19a93e07bc972e | 3/7/2021 11:00:00 AM | 10148 | 45976 | 2431530 | 2323788 | 2431530 | 1 | -1 | 0 | 2087 | 408 | 2347 | 1 | US-AL | 4/26/2021 12:06:49 AM | US | 1931711 | 2676 | 1515 | 295690 | ||
2021-03-07 | AR | 324818 | 335 | 14926 | 65 | null | 3/7/2021 12:00:00 AM | 50921aeefba3e30d31623aa495b47fb2ecc72fae | 3/7/2021 12:00:00 AM | 5319 | 14926 | 2805534 | 2736442 | 2805534 | 5 | 22 | 11 | 3267 | 165 | 3380 | 5 | US-AR | 4/26/2021 12:06:49 AM | US | 2480716 | 1533 | 315517 | 141 | |
2021-03-07 | AS | 0 | null | 12/1/2020 12:00:00 AM | 96d23f888c995b9a7f3b4b864de6414f45c728ff | 12/1/2020 12:00:00 AM | 0 | 2140 | 2140 | 2140 | 60 | 0 | 0 | 0 | 0 | 0 | 60 | US-AS | 4/26/2021 12:06:49 AM | US | 2140 | ||||||||
2021-03-07 | AZ | 826454 | 963 | 57907 | 143 | null | 3/7/2021 12:00:00 AM | 0437a7a96f4471666f775e63e86923eb5cbd8cdf | 3/7/2021 12:00:00 AM | 16328 | 57907 | 3899464 | 7908105 | 3899464 | 4 | 5 | 44 | 13678 | 1335 | 45110 | 4 | US-AZ | 4/26/2021 12:06:49 AM | US | 3073010 | 273 | |||
2021-03-07 | CA | 3501394 | 4291 | null | 3/7/2021 2:59:00 AM | 63c5c0fd2daef2fb65150e9db486de98ed3f7b72 | 3/7/2021 2:59:00 AM | 3501394 | 49646014 | 3501394 | 6 | 258 | 0 | 0 | 3816 | 133186 | 6 | US-CA | 4/26/2021 12:06:49 AM | US | 1159 | ||||||||
2021-03-07 | CO | 436602 | 326 | 23904 | null | 3/7/2021 1:59:00 AM | 444746cda3a596f183f3fa3269c8cab68704e819 | 3/7/2021 1:59:00 AM | 5989 | 23904 | 2636060 | 6415123 | 2636060 | 8 | 3 | 18 | 0 | 840 | 38163 | 8 | US-CO | 4/26/2021 12:06:49 AM | US | 2199458 | |||||
2021-03-07 | CT | 285330 | 428 | 12257 | null | 3/4/2021 11:59:00 PM | bcc0f7bc8c2bf77eec31b25f8b59d510f679d3e7 | 3/4/2021 11:59:00 PM | 7704 | 12257 | 285330 | 6520366 | 285330 | 9 | 0 | 0 | 0 | 0 | 0 | 9 | US-CT | 4/26/2021 12:06:49 AM | US | ||||||
2021-03-07 | DC | 41419 | 150 | 16 | null | 3/6/2021 12:00:00 AM | a3aa0d623d538807fb9577ad64354f48cf728cc8 | 3/6/2021 12:00:00 AM | 1030 | 41419 | 1261363 | 41419 | 11 | 0 | 0 | 0 | 146 | 5726 | 11 | US-DC | 4/26/2021 12:06:49 AM | US | 29570 | 38 | |||||
2021-03-07 | DE | 88354 | 104 | null | 3/6/2021 6:00:00 PM | 059d870e689d5cc19c35f5eb398214d7d9856373 | 3/6/2021 6:00:00 PM | 1473 | 633424 | 1431942 | 633424 | 10 | 9 | 0 | 917 | 215 | 5867 | 10 | US-DE | 4/26/2021 12:06:49 AM | US | 545070 | 13 |
Data access
Azure Notebooks
URLs of different dataset file formats hosted on Azure Blob Storage:
Download the dataset file using the built-in capability download from an http URL in Pandas. Pandas has readers for various file formats:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_parquet.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
df = pd.read_parquet("https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/covid_tracking/latest/covid_tracking.parquet ")
df.head(10)
df.dtypes
df.groupby('state').first().filter(['date','positive', 'death'])
df.groupby(df.state).agg({'state': 'count','positive_increase': 'sum','death_increase': 'sum'})
df_NY=df[df['state'] == 'NY']
df_NY.plot(kind='line',x='date',y="positive",grid=True)
df_NY.plot(kind='line',x='date',y="positive_increase",grid=True)
df_NY.plot(kind='line',x='date',y="death",grid=True)
df_NY.plot(kind='line',x='date',y="death_increase",grid=True)
df_US=df.groupby(df.date).agg({'positive': 'sum','positive_increase': 'sum','death':'sum','death_increase': 'sum'}).reset_index()
df_US.plot(kind='line',x='date',y="positive",grid=True)
df_US.plot(kind='line',x='date',y="positive_increase",grid=True)
df_US.plot(kind='line',x='date',y="death",grid=True)
df_US.plot(kind='line',x='date',y="death_increase",grid=True)
Azure Databricks
Sample not available for this platform/package combination.
Azure Synapse
Sample not available for this platform/package combination.
Next steps
View the rest of the datasets in the Open Datasets catalog.