DataReference 类

表示对数据存储中的数据的引用。

DataReference 表示数据存储中的路径，可用于描述如何在运行中提供数据以及数据的位置。它不再是 Azure 机器学习中数据访问和传递的建议方法。数据集支持通过具有添加的数据管理功能的统一接口从 Azure Blob 存储、Azure 文件存储、Azure Data Lake Storage Gen1、Azure Data Lake Storage Gen2、Azure SQL 数据库和 Azure Database for PostgreSQL 访问数据。建议使用数据集来读取机器学习项目中的数据。

有关如何在两种常见方案中使用 Azure ML 数据集的详细信息，请参阅以下文章：

类 DataReference 构造函数。

构造函数

DataReference(datastore, data_reference_name=None, path_on_datastore=None, mode='mount', path_on_compute=None, overwrite=False)

参数

名称	说明
datastore 必需	Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore] 要引用的数据存储。
data_reference_name	str 数据引用的名称。默认值: None
path_on_datastore	str 数据引用在后备存储中的相对路径。默认值: None
mode	str 对数据引用的作。支持的值为“mount”（默认值）和“download”。当脚本需要输入数据的特定（例如硬编码）路径时，请使用“下载”模式。在这种情况下，请在声明 DataReference 时使用 `path_on_compute` 参数指定路径。在执行脚本之前，Azure 机器学习将下载该路径指定的数据。使用“装载”模式，使用已装载的数据创建临时目录，并将环境变量 $AZUREML_DATAREFERENCE_<data_reference_name> 设置为临时目录的路径。如果将 DataReference 传递到管道步骤（例如 PythonScriptStep）的参数列表中，则引用将在运行时扩展到本地数据路径。默认值: mount
path_on_compute	str 数据引用的计算目标上的路径。默认值: None
overwrite	bool 指示是否覆盖现有数据。默认值: False
datastore 必需	Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore] 要引用的数据存储。
data_reference_name 必需	str 数据引用的名称。
path_on_datastore 必需	str 数据引用在后备存储中的相对路径。
mode 必需	str 对数据引用的作。支持的值“mount”（默认值）和“download”。当脚本需要输入数据的特定（例如硬编码）路径时，请使用“下载”模式。在这种情况下，请在声明 DataReference 时使用 `path_on_compute` 参数指定路径。在执行脚本之前，Azure 机器学习将下载该路径指定的数据。使用“装载”模式，使用已装载的数据创建临时目录，并将环境变量 $AZUREML_DATAREFERENCE_<data_reference_name> 设置为临时目录的路径。如果将 DataReference 传递到管道步骤（例如 PythonScriptStep）的参数列表中，则引用将在运行时扩展到本地数据路径。
path_on_compute 必需	str 数据引用的计算目标上的路径。
overwrite 必需	bool 指示是否覆盖现有数据。

注解

DataReference 定义数据位置以及如何在目标计算绑定（装载或上传）上使用数据。数据存储中数据的路径可以是根 /、数据存储中的目录或数据存储中的文件。

方法

as_download	切换要下载的数据引用作。 DataReference 下载仅支持 Azure Blob 和 Azure 文件共享。若要从 Azure Blob、Azure 文件共享、Azure Data Lake Gen1 和 Azure Data Lake Gen2 下载数据，建议使用 Azure 机器学习数据集。有关如何创建和使用数据集的详细信息，请访问 https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets。
as_mount	切换要装载的数据引用作。 DataReference 装载仅支持 Azure Blob。若要在 Azure Blob、Azure 文件共享、Azure Data Lake Gen1 和 Azure Data Lake Gen2 中装载数据，建议使用 Azure 机器学习数据集。有关如何创建和使用数据集的详细信息，请访问 https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets。
as_upload	切换要上传的数据引用作。有关支持上传数据的计算和数据存储的详细信息，请参阅： https://aka.ms/datastore-matrix
create	使用 DataPath 和 DataPathComputeBinding 创建 DataReference。
path	基于给定路径创建 DataReference 实例。
to_config	将 DataReference 对象转换为 DataReferenceConfiguration 对象。

as_download

切换要下载的数据引用作。

DataReference 下载仅支持 Azure Blob 和 Azure 文件共享。若要从 Azure Blob、Azure 文件共享、Azure Data Lake Gen1 和 Azure Data Lake Gen2 下载数据，建议使用 Azure 机器学习数据集。有关如何创建和使用数据集的详细信息，请访问 https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets。

as_download(path_on_compute=None, overwrite=False)

参数

名称	说明
path_on_compute	str 数据引用的计算路径。默认值: None
overwrite	bool 指示是否覆盖现有数据。默认值: False

类型	说明
DataReference	新的数据引用对象。

as_mount

切换要装载的数据引用作。

DataReference 装载仅支持 Azure Blob。若要在 Azure Blob、Azure 文件共享、Azure Data Lake Gen1 和 Azure Data Lake Gen2 中装载数据，建议使用 Azure 机器学习数据集。有关如何创建和使用数据集的详细信息，请访问 https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets。

as_mount()

类型	说明
DataReference	新的数据引用对象。

as_upload

切换要上传的数据引用作。

有关支持上传数据的计算和数据存储的详细信息，请参阅： https://aka.ms/datastore-matrix

as_upload(path_on_compute=None, overwrite=False)

参数

名称	说明
path_on_compute	str 数据引用的计算路径。默认值: None
overwrite	bool 指示是否覆盖现有数据。默认值: False

类型	说明
DataReference	新的数据引用对象。

create

使用 DataPath 和 DataPathComputeBinding 创建 DataReference。

static create(data_reference_name=None, datapath=None, datapath_compute_binding=None)

参数

名称	说明
data_reference_name	str 要创建的数据引用的名称。默认值: None
datapath	DataPath [必需]要使用的数据路径。默认值: None
datapath_compute_binding	DataPathComputeBinding [必需]要使用的 datapath 计算绑定。默认值: None

类型	说明
DataReference	DataReference 对象。

path

基于给定路径创建 DataReference 实例。

path(path=None, data_reference_name=None)

参数

名称	说明
path	str 数据存储上的路径。默认值: None
data_reference_name	str 数据引用的名称。默认值: None

类型	说明
DataReference	数据引用对象。

to_config

将 DataReference 对象转换为 DataReferenceConfiguration 对象。

to_config()

类型	说明
DataReferenceConfiguration	新的 DataReferenceConfiguration 对象。

通过

构造函数

参数

注解

as_download

参数

返回

as_mount

返回

as_upload

参数

返回

create

参数

返回

path

参数

返回

to_config

返回

通过

DataReference 类

构造函数

参数

注解

方法

as_download

参数

返回

as_mount

返回

as_upload

参数

返回

create

参数

返回

path

参数

返回

to_config

返回

反馈