DatasetConsumptionConfig 類別

參考

表示如何將資料集傳遞至計算目標。

繼承: builtins.object

DatasetConsumptionConfig

建構函式

DatasetConsumptionConfig(name, dataset, mode='direct', path_on_compute=None)

參數

name: str

必要

執行中資料集的名稱，與已註冊的名稱不同。名稱將會註冊為環境變數，並可用於資料平面。

dataset: AbstractDataset 或 PipelineParameter 或 OutputDatasetConfig

必要

將在執行中取用的資料集。

mode: str

預設值: direct

定義如何將資料集傳遞至計算目標。有三種模式：

'direct'：使用資料集作為資料集。
'download'：下載資料集，並取用資料集作為下載的路徑。
'mount'：掛接資料集，並取用資料集作為掛接路徑。
'hdfs'：從已解析的 hdfs 路徑取用資料集 (目前僅支援 SynapseSpark 計算) 。

path_on_compute: str

預設值: None

計算上要提供資料的目標路徑。不過，我們會保留來源資料的資料夾結構，不過，我們可能會將前置詞新增至此資料夾結構，以避免發生衝突。用來 tabular_dataset.to_path 查看輸出檔案夾結構。

name: str

必要

執行中資料集的名稱，與已註冊的名稱不同。名稱將會註冊為環境變數，並可用於資料平面。

dataset: Dataset 或 PipelineParameter 或 tuple(Workspace, str) 或 tuple(Workspace, str, str) 或 OutputDatasetConfig

必要

要傳遞的資料集，作為資料集物件、擷取資料集的管線參數、 (工作區的 Tuple、資料集名稱) ，或 (工作區的 Tuple、資料集名稱、資料集版本) 。如果只提供名稱，DatasetConsumptionConfig 將會使用最新版本的資料集。

mode: str

必要

定義如何將資料集傳遞至計算目標。有三種模式：

'direct'：使用資料集作為資料集。
'download'：下載資料集，並取用資料集作為下載的路徑。
'mount'：掛接資料集，並取用資料集作為掛接路徑。
'hdfs'：從已解析的 hdfs 路徑取用資料集 (目前僅支援 SynapseSpark 計算) 。

path_on_compute: str

必要

計算上要提供資料的目標路徑。不過，我們會保留來源資料的資料夾結構，不過，我們可能會將前置詞新增至此資料夾結構，以避免發生衝突。建議您呼叫 tabular_dataset.to_path 以查看輸出檔案夾結構。

方法

as_download

設定要下載的模式。

在提交的執行中，資料集中的檔案將會下載到計算目標上的本機路徑。您可以從引數值和執行內容input_datasets欄位擷取下載位置。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_hdfs

將模式設定為 hdfs。

在提交的 Synapse 執行中，資料集中的檔案將會轉換成計算目標上的本機路徑。 hdfs 路徑可以從引數值和 os 環境變數中擷取。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_mount

設定要掛接的模式。

在提交的執行中，資料集中的檔案會掛接至計算目標上的本機路徑。您可以從引數值和執行內容input_datasets欄位擷取掛接點。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_download

設定要下載的模式。

在提交的執行中，資料集中的檔案將會下載到計算目標上的本機路徑。您可以從引數值和執行內容input_datasets欄位擷取下載位置。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_download(path_on_compute=None)

參數

path_on_compute: str

預設值: None

計算上要提供資料的目標路徑。

備註

從單一檔案的路徑建立資料集時，下載位置會是單一下載檔案的路徑。否則，下載位置將會是所有下載檔案之封入資料夾的路徑。

如果path_on_compute以 /開頭，則會將其視為絕對路徑。如果它不是以 /開頭，則會將其視為相對於工作目錄的相對路徑。如果您已指定絕對路徑，請確定作業具有寫入該目錄的許可權。

as_hdfs

將模式設定為 hdfs。

在提交的 Synapse 執行中，資料集中的檔案將會轉換成計算目標上的本機路徑。 hdfs 路徑可以從引數值和 os 環境變數中擷取。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_hdfs()

備註

從單一檔案的路徑建立資料集時，hdfs 路徑會是單一檔案的路徑。否則，hdfs 路徑會是所有掛接檔案的封入資料夾路徑。

as_mount

設定要掛接的模式。

在提交的執行中，資料集中的檔案會掛接至計算目標上的本機路徑。您可以從引數值和執行內容input_datasets欄位擷取掛接點。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_mount(path_on_compute=None)

參數

path_on_compute: str

預設值: None

計算上要提供資料的目標路徑。

備註

從單一檔案的路徑建立資料集時，掛接點會是單一掛接檔案的路徑。否則，掛接點會是所有掛接檔案的封入資料夾路徑。

DatasetConsumptionConfig 類別

建構函式

參數

方法

as_download

參數

備註

as_hdfs

備註

as_mount

參數

備註

屬性

name

傳回

意見反應

意見反應

其他資源