Improve the accuracy of your machine learning models with publicly available datasets. Save time on data discovery and preparation by using curated datasets that are ready to use in machine learning projects.
COVID-19 Data Lake collection is a collection of COVID-19 related datasets from various sources, covering testing and patient outcome tracking data, social distancing policy, hospital capacity, mobility, etc.
The Genomics Data Lake provides various public datasets that you can access for free and integrate into your genomics analysis workflows and applications. The datasets include genome sequences, variant info and subject/sample metadata in BAM, FASTA, VCF, CSV file formats.
The US Local Area Unemployment Statistics datasets provides monthly and annual employment, unemployment, and labor force data for Census regions and divisions, States, counties, metropolitan areas, and many cities in the United States.
This dataset is derived from the Dominick’s OJ dataset and includes extra simulated data with the goal of providing a dataset that makes it easy to simultaneously train thousands of models on Azure Machine Learning.
Microsoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It serves as a benchmark dataset for news recommendation, and facilitates research in news recommendation and recommender systems.
Russian Open STT is a large-scale open speech to text dataset for the Russian language
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see: https://aka.ms/ContentUserFeedback.