azureml-opendatasets Package
Packages
opendatasets |
Contains functionality for consuming Azure Open Datasets as dataframes and for enriching customer data. Azure Open Datasets are curated public datasets that you can use to add scenario-specific features to machine learning solutions for more accurate models. You can convert these public datasets into Spark and pandas dataframes with filters applied. For some datasets, you can use an enricher to join the public data with your data. For example, you can join your data with weather data by longitude and latitude or zip code and time. Included in Azure Open Datasets are public-domain data for weather, census, holidays, public safety, and location that help you train machine learning models and enrich predictive solutions. Open Datasets are in the cloud on Microsoft Azure and are integrated into Azure Machine Learning. For more information about working with Azure Open Datasets, see Create datasets with Azure Open Datasets. For general information about Azure Open Datasets, see Azure Open Datasets Documentation. |
Modules
country_or_region_time_customer_data |
Customer data with location and time columns should be wrapped using this class. |
country_or_region_time_public_data |
Public data with country_or_region and time columns can be wrapped with this class. |
country_region_data |
Contains fucntionality for working with location data, with supported column classes. |
customer_data |
Contains the base class of all customer data. |
location_data |
Contains functionality for working with location data, with supported column classes. |
location_time_customer_data |
Contains functionality for wrapping customer data with location and time columns. |
location_time_public_data |
Contains functionality for wrapping public data with location and time columns. |
open_dataset_base |
Base class for tabular open datasets. |
public_data |
Contains the public data base class. |
time_data |
Contains functionality for representing time data and related operations in opendatasets. |
aggregator |
Defines the base class for all aggregators. |
aggregator_all |
Contains the the aggregator for including all columns, that is, when no aggregation is performed. |
aggregator_avg |
Contains the aggregator average class. |
aggregator_max |
Contains the aggregator max class. |
aggregator_min |
Contains the aggregator min class. |
aggregator_top |
Contains the aggregator top class. |
base_blob_info |
Contains the blob info base class. |
blob_parquet_descriptor |
Contains the descriptor of blob parquet. |
dataset_partition_prep |
Contains functionality for specifying dataset partition preparation. Partition preparation occurs automatically, when you use a opendatasets classe that requires a partition of data, such as the NycTlcGreen class. |
pandas_data_load_limit |
Contains functionality to control how the limit pandas data loads when parquet files are large. With this module's functionality, you can specify how to limit how pandas data loads when parquet files are too large to load. |
common_weather_enricher |
Contains functionality for enriching custom data with weather public data. |
enricher |
Defines the generic enricher class for joining together data with different granularity and aggregators. This module contains static function overloads: |
holiday_enricher |
Contains functionality for enriching custom data with holiday public data. |
environ |
Defines runtime environment classes where Azure Open Datasets are used. The classes in this module ensure Azure Open Datasets functionality is optimized for different environments.
In general, you do not need to instantiate these environment classes or worry about their implementation.
Instead, use the |
granularity |
Contains granularity definitions for time and location. The granularities are organized as follows: You work with a granularity by specifying it in an enricher function. For example, when using the HolidayEnricher class methods to enrich data, specify the TimeGranularity as an input parameter to the method. |
country_region_selector |
Contains the country region selector class. |
enricher_selector |
Contains the base classes for location and time selectors. There are two subclasses of EnricherSelector:
The EnricherSelector is the root class of LocationClosestSelector and TimeNearestSelector. |
location_closest_selector |
Contains the location closest selector class. |
time_nearest_selector |
Contains the time nearest selector class. |
Feedback
Submit and view feedback for