Example Jupyter notebooks show how to enrich data with Open Datasets

Članak
08/28/2024

The example Jupyter notebooks for Azure Open Datasets explain how to load open datasets, and use them to enrich demo data. The techniques include use of Apache Spark and Pandas to process data.

Important

When working in a non-Spark environment, Open Datasets allows downloads of only one month of data at a time with certain classes, to avoid MemoryError problems with large datasets.

Load NOAA Integrated Surface Database (ISD) data

Notebook	Description
Load one recent month of weather data into a Pandas dataframe	Learn how to load historical weather data into your favorite Pandas dataframe.
Load one recent month of weather data into a Spark dataframe	Learn how to load historical weather data into your favorite Spark dataframe.

Join demo data with NOAA ISD data

Notebook	Description
Join demo data with weather data - Pandas	Join a one-month demo dataset of sensor locations with weather readings in a Pandas dataframe.
Join demo data with weather data – Spark	Join a demo dataset of sensor locations with weather readings in a Spark dataframe.

Join NYC taxi data with NOAA ISD data

Notebook	Description
Taxi trip data enriched with weather data - Pandas	Load NYC green taxi data (over one month) and enrich it with weather data in a Pandas dataframe. This example overrides the method `get_pandas_limit` and balances data load performance with the amount of data.
Taxi trip data enriched with weather data – Spark	Load NYC green taxi data, and enrich it with weather data, in Spark dataframe.

Dijeli putem

Example Jupyter notebooks show how to enrich data with Open Datasets

Load NOAA Integrated Surface Database (ISD) data

Join demo data with NOAA ISD data

Join NYC taxi data with NOAA ISD data

Next steps

Povratne informacije

Dodatni resursi