DatasetConsumptionConfig 类

参考

表示如何将数据集交付到计算目标。

表示如何将数据集传递到计算目标。

继承: builtins.object

DatasetConsumptionConfig

构造函数

DatasetConsumptionConfig(name, dataset, mode='direct', path_on_compute=None)

参数

name: str

必需

运行中的数据集的名称，该名称可与注册名称不同。该名称将注册为环境变量，并可用于数据平面。

dataset: AbstractDataset 或 PipelineParameter 或 OutputDatasetConfig

必需

将在运行中使用的数据集。

mode: str

默认值: direct

定义将数据集传送到计算目标的方式。有三种模式：

“直接”：将数据集用作数据集。
“下载”：下载数据集，并使用数据集作为下载路径。
“装载”：装载数据集，并使用数据集作为装载路径。
“hdfs”：从解析的 hdfs 路径使用数据集（当前仅在 SynapseSpark 计算上受支持）。

path_on_compute: str

默认值: None

计算上要提供数据的目标路径。将保留源数据的文件夹结构，但为了避免冲突，我们可能会将前缀添加到此文件夹结构。使用 tabular_dataset.to_path 可查看输出文件夹结构。

name: str

必需

运行中的数据集的名称，该名称可与注册名称不同。该名称将注册为环境变量，并可用于数据平面。

dataset: Dataset 或 PipelineParameter 或 tuple(Workspace, str) 或 tuple(Workspace, str, str) 或 OutputDatasetConfig

必需

要作为 Dataset 对象传送的数据集、引入数据集的管道参数、 (工作区元组、数据集名称) 或 (工作区元组、数据集名称、数据集版本) 。如果仅提供名称，DatasetConsumptionConfig 将使用最新版本的数据集。

mode: str

必需

定义将数据集传送到计算目标的方式。有三种模式：

“直接”：将数据集用作数据集。
“下载”：下载数据集，并使用数据集作为下载路径。
“装载”：装载数据集，并使用数据集作为装载路径。
“hdfs”：从解析的 hdfs 路径使用数据集（当前仅在 SynapseSpark 计算上受支持）。

path_on_compute: str

必需

计算上要提供数据的目标路径。将保留源数据的文件夹结构，但为了避免冲突，我们可能会将前缀添加到此文件夹结构。建议调用 tabular_dataset.to_path 以查看输出文件夹结构。

方法

as_download

将模式设置为“下载”。

在提交的运行中，数据集中的文件将下载到计算目标上的本地路径。可以从运行上下文的参数值和 input_datasets 字段检索下载位置。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_hdfs

将模式设置为 hdfs。

在提交的 synapse 运行中，数据集中的文件将转换为计算目标上的本地路径。可以从参数值和 os 环境变量检索 hdfs 路径。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_mount

将模式设置为“装载”。

在提交的运行中，数据集中的文件将装载到计算目标上的本地路径。可以从运行上下文的参数值和 input_datasets 字段检索装载点。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_download

将模式设置为“下载”。

在提交的运行中，数据集中的文件将下载到计算目标上的本地路径。可以从运行上下文的参数值和 input_datasets 字段检索下载位置。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_download(path_on_compute=None)

参数

path_on_compute: str

默认值: None

计算上要提供数据的目标路径。

注解

从单个文件的路径创建数据集时，下载位置将是单个已下载文件的路径。否则，下载位置将是所有已下载文件的封闭文件夹的路径。

如果 path_on_compute 以 / 开头，则它将被视为绝对路径。如果未以 / 开始，则它将被视为相对于工作目录的相对路径。如果指定了绝对路径，请确保作业有权写入该目录。

as_hdfs

将模式设置为 hdfs。

在提交的 synapse 运行中，数据集中的文件将转换为计算目标上的本地路径。可以从参数值和 os 环境变量检索 hdfs 路径。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_hdfs()

注解

从单个文件的路径创建数据集时，hdfs 路径将为单一文件的路径。否则，hdfs 路径将是所有已装载文件的封闭文件夹的路径。

as_mount

将模式设置为“装载”。

在提交的运行中，数据集中的文件将装载到计算目标上的本地路径。可以从运行上下文的参数值和 input_datasets 字段检索装载点。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_mount(path_on_compute=None)

参数

path_on_compute: str

默认值: None

计算上要提供数据的目标路径。

注解

从单个文件的路径创建数据集时，装载点将为单一已装载文件路径。否则，装载点将是所有已装载文件的封闭文件夹的路径。

属性

name

输入的名称。

通过

DatasetConsumptionConfig 类

构造函数

参数

方法

as_download

参数

注解

as_hdfs

注解

as_mount

参数

注解

属性

name

返回

反馈

反馈

其他资源