DataReference Class
Represents a reference to data in a datastore.
A DataReference represents a path in a datastore and can be used to describe how and where data should be made available in a run. It is no longer the recommended approach for data access and delivery in Azure Machine Learning. Dataset supports accessing data from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL through unified interface with added data management capabilities. It is recommended to use dataset for reading data in your machine learning projects.
For more information on how to use Azure ML dataset in two common scenarios, see the articles:
- Inheritance
-
builtins.objectDataReference
Constructor
DataReference(datastore, data_reference_name=None, path_on_datastore=None, mode='mount', path_on_compute=None, overwrite=False)
Parameters
- datastore
- Union[<xref:azureml.data.azure_storage_datastore.AbstractAzureStorageDatastore,azureml.data.azure_data_lake_datastore.AzureDataLakeDatastore>]
The datastore to reference.
- path_on_datastore
- str
The relative path in the backing storage for the data reference.
- mode
- str
The operation on the data reference. Supported values are 'mount' (the default) and 'download'.
Use the 'download' mode when your script expects a specific (e.g., hard-coded) path for the input data.
In this case, specify the path with the path_on_compute
parameter when you declare the DataReference.
Azure Machine Learning will download the data specified by that path before executing your script.
With the 'mount' mode, a temporary directory is created with the mounted data and an environment variable $AZUREML_DATAREFERENCE_<data_reference_name> is set with the path to the temporary directory. If you pass a DataReference into the arguments list for a pipeline step (e.g. PythonScriptStep), then the reference will be expanded to the local data path at runtime.
Remarks
A DataReference defines both the data location and how the data is used on the target compute binding (mount or upload). The path to the data in the datastore can be the root /, a directory within the datastore, or a file in the datastore.
Methods
as_download |
Switch data reference operation to download. DataReference download only supports Azure Blob and Azure File Share. To download data from Azure Blob, Azure File Share, Azure Data Lake Gen1, and Azure Data Lake Gen2 we recommend using Azure Machine Learning Dataset. For more information on how to create and use Dataset, please visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets. |
as_mount |
Switch data reference operation to mount. DataReference mount only supports Azure Blob. To mount data in Azure Blob, Azure File Share, Azure Data Lake Gen1, and Azure Data Lake Gen2 we recommend using Azure Machine Learning Dataset. For more information on how to create and use Dataset, please visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets. |
as_upload |
Switch data reference operation to upload. For more information on which computes and datastores support uploading of the data, see: https://aka.ms/datastore-matrix. |
create |
Create a DataReference using DataPath and DataPathComputeBinding. |
path |
Create a DataReference instance based on the given path. |
to_config |
Convert the DataReference object to DataReferenceConfiguration object. |
as_download
Switch data reference operation to download.
DataReference download only supports Azure Blob and Azure File Share. To download data from Azure Blob, Azure File Share, Azure Data Lake Gen1, and Azure Data Lake Gen2 we recommend using Azure Machine Learning Dataset. For more information on how to create and use Dataset, please visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets.
as_download(path_on_compute=None, overwrite=False)
Parameters
Returns
A new data reference object.
Return type
as_mount
Switch data reference operation to mount.
DataReference mount only supports Azure Blob. To mount data in Azure Blob, Azure File Share, Azure Data Lake Gen1, and Azure Data Lake Gen2 we recommend using Azure Machine Learning Dataset. For more information on how to create and use Dataset, please visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets.
as_mount()
Returns
A new data reference object.
Return type
as_upload
Switch data reference operation to upload.
For more information on which computes and datastores support uploading of the data, see: https://aka.ms/datastore-matrix.
as_upload(path_on_compute=None, overwrite=False)
Parameters
Returns
A new data reference object.
Return type
create
Create a DataReference using DataPath and DataPathComputeBinding.
static create(data_reference_name=None, datapath=None, datapath_compute_binding=None)
Parameters
- datapath_compute_binding
- DataPathComputeBinding
[Required] The datapath compute binding to use.
Returns
A DataReference object.
Return type
path
Create a DataReference instance based on the given path.
path(path=None, data_reference_name=None)
Parameters
Returns
The data reference object.
Return type
to_config
Convert the DataReference object to DataReferenceConfiguration object.
to_config()
Returns
A new DataReferenceConfiguration object.
Return type
Feedback
Submit and view feedback for