DataReference Class

Reference

Represents a reference to data in a datastore.

A DataReference represents a path in a datastore and can be used to describe how and where data should be made available in a run. It is no longer the recommended approach for data access and delivery in Azure Machine Learning. Dataset supports accessing data from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL through unified interface with added data management capabilities. It is recommended to use dataset for reading data in your machine learning projects.

For more information on how to use Azure ML dataset in two common scenarios, see the articles:

Class DataReference constructor.

Inheritance: builtins.object

DataReference

Constructor

DataReference(datastore, data_reference_name=None, path_on_datastore=None, mode='mount', path_on_compute=None, overwrite=False)

Parameters

Name	Description
datastore Required	Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore] The datastore to reference.
data_reference_name	str The name of the data reference. Default value: None
path_on_datastore	str The relative path in the backing storage for the data reference. Default value: None
mode	str The operation on the data reference. Supported values are 'mount' (the default) and 'download'. Use the 'download' mode when your script expects a specific (e.g., hard-coded) path for the input data. In this case, specify the path with the `path_on_compute` parameter when you declare the DataReference. Azure Machine Learning will download the data specified by that path before executing your script. With the 'mount' mode, a temporary directory is created with the mounted data and an environment variable $AZUREML_DATAREFERENCE_<data_reference_name> is set with the path to the temporary directory. If you pass a DataReference into the arguments list for a pipeline step (e.g. PythonScriptStep), then the reference will be expanded to the local data path at runtime. Default value: mount
path_on_compute	str The path on the compute target for the data reference. Default value: None
overwrite	bool Indicates whether to overwrite existing data. Default value: False
datastore Required	Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore] The datastore to reference.
data_reference_name Required	str The name of the data reference.
path_on_datastore Required	str The relative path in the backing storage for the data reference.
mode Required	str The operation on the data reference. Supported values 'mount' (the default) and 'download'. Use the 'download' mode when your script expects a specific (e.g., hard-coded) path for the input data. In this case, specify the path with the `path_on_compute` parameter when you declare the DataReference. Azure Machine Learning will download the data specified by that path before executing your script. With the 'mount' mode, a temporary directory is created with the mounted data and an environment variable $AZUREML_DATAREFERENCE_<data_reference_name> is set with the path to the temporary directory. If you pass a DataReference into the arguments list for a pipeline step (e.g. PythonScriptStep), then the reference will be expanded to the local data path at runtime.
path_on_compute Required	str The path on the compute target for the data reference.
overwrite Required	bool Indicates whether to overwrite existing data.

Remarks

A DataReference defines both the data location and how the data is used on the target compute binding (mount or upload). The path to the data in the datastore can be the root /, a directory within the datastore, or a file in the datastore.

Methods

as_download	Switch data reference operation to download. DataReference download only supports Azure Blob and Azure File Share. To download data from Azure Blob, Azure File Share, Azure Data Lake Gen1, and Azure Data Lake Gen2 we recommend using Azure Machine Learning Dataset. For more information on how to create and use Dataset, please visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets.
as_mount	Switch data reference operation to mount. DataReference mount only supports Azure Blob. To mount data in Azure Blob, Azure File Share, Azure Data Lake Gen1, and Azure Data Lake Gen2 we recommend using Azure Machine Learning Dataset. For more information on how to create and use Dataset, please visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets.
as_upload	Switch data reference operation to upload. For more information on which computes and datastores support uploading of the data, see: https://aka.ms/datastore-matrix.
create	Create a DataReference using DataPath and DataPathComputeBinding.
path	Create a DataReference instance based on the given path.
to_config	Convert the DataReference object to DataReferenceConfiguration object.

as_download

Switch data reference operation to download.

DataReference download only supports Azure Blob and Azure File Share. To download data from Azure Blob, Azure File Share, Azure Data Lake Gen1, and Azure Data Lake Gen2 we recommend using Azure Machine Learning Dataset. For more information on how to create and use Dataset, please visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets.

as_download(path_on_compute=None, overwrite=False)

Parameters

Name	Description
path_on_compute	str The path on the compute for the data reference. Default value: None
overwrite	bool Indicates whether to overwrite existing data. Default value: False

Returns

Type	Description
DataReference	A new data reference object.

as_mount

Switch data reference operation to mount.

DataReference mount only supports Azure Blob. To mount data in Azure Blob, Azure File Share, Azure Data Lake Gen1, and Azure Data Lake Gen2 we recommend using Azure Machine Learning Dataset. For more information on how to create and use Dataset, please visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets.

as_mount()

Returns

Type	Description
DataReference	A new data reference object.

as_upload

Switch data reference operation to upload.

For more information on which computes and datastores support uploading of the data, see: https://aka.ms/datastore-matrix.

as_upload(path_on_compute=None, overwrite=False)

Parameters

Name	Description
path_on_compute	str The path on the compute for the data reference. Default value: None
overwrite	bool Indicates whether to overwrite existing data. Default value: False

Returns

Type	Description
DataReference	A new data reference object.

create

Create a DataReference using DataPath and DataPathComputeBinding.

static create(data_reference_name=None, datapath=None, datapath_compute_binding=None)

Parameters

Name	Description
data_reference_name	str The name for the data reference to create. Default value: None
datapath	DataPath [Required] The datapath to use. Default value: None
datapath_compute_binding	DataPathComputeBinding [Required] The datapath compute binding to use. Default value: None

Returns

Type	Description
DataReference	A DataReference object.

path

Create a DataReference instance based on the given path.

path(path=None, data_reference_name=None)

Parameters

Name	Description
path	str The path on the datastore. Default value: None
data_reference_name	str The name of the data reference. Default value: None

Returns

Type	Description
DataReference	The data reference object.

to_config

Convert the DataReference object to DataReferenceConfiguration object.

to_config()

Returns

Type	Description
DataReferenceConfiguration	A new DataReferenceConfiguration object.

Share via

DataReference Class

Constructor

Parameters

Remarks

Methods

as_download

Parameters

Returns

as_mount

Returns

as_upload

Parameters

Returns

create

Parameters

Returns

path

Parameters

Returns

to_config

Returns

Feedback

Additional resources