DatasetConsumptionConfig クラス

データセットをコンピューティング先に配信する方法を表します。

コンストラクター

DatasetConsumptionConfig(name, dataset, mode='direct', path_on_compute=None)

パラメーター

名前	説明
name 必須	str 実行時のデータセットの名前。登録された名前とは異なる場合があります。名前は環境変数として登録され、データプレーンで使用できます。
dataset 必須	AbstractDataset または PipelineParameter または OutputDatasetConfig 実行時に使用されるデータセット。
mode	str データセットをコンピューティング先に配信する方法を定義します。次の 3 つのモードがあります。 'direct': データセットをデータセットとして使用します。 'download': データセットをダウンロードし、ダウンロードしたパスとしてデータセットを使用します。 'mount': データセットをマウントし、マウントパスとしてデータセットを使用します。 'hdfs': 解決された hdfs パスからデータセットを使用します (現在、SynapseSpark コンピューティングでのみサポートされています)。規定値: direct
path_on_compute	str データを使用可能にするコンピューティングのターゲットパス。ソースデータのフォルダー構造は保持されますが、競合を回避するために、このフォルダー構造にプレフィックスを追加する場合があります。 `tabular_dataset.to_path`を使用して、出力フォルダーの構造を確認します。規定値: None
name 必須	str 実行時のデータセットの名前。登録された名前とは異なる場合があります。名前は環境変数として登録され、データプレーンで使用できます。
dataset 必須	Dataset または PipelineParameter または tuple(Workspace, str) または tuple(Workspace, str, str) または OutputDatasetConfig データセットオブジェクトとして配信されるデータセット、データセットを取り込むパイプラインパラメーター、(ワークスペース、データセット名)、またはタプル (ワークスペース、データセット名、データセットバージョン)。名前のみを指定した場合、DatasetConsumptionConfig は最新バージョンのデータセットを使用します。
mode 必須	str データセットをコンピューティング先に配信する方法を定義します。次の 3 つのモードがあります。 'direct': データセットをデータセットとして使用します。 'download': データセットをダウンロードし、ダウンロードしたパスとしてデータセットを使用します。 'mount': データセットをマウントし、マウントパスとしてデータセットを使用します。 'hdfs': 解決された hdfs パスからデータセットを使用します (現在、SynapseSpark コンピューティングでのみサポートされています)。
path_on_compute 必須	str データを使用可能にするコンピューティングのターゲットパス。ソースデータのフォルダー構造は保持されますが、競合を回避するために、このフォルダー構造にプレフィックスを追加する場合があります。 tabular_dataset.to_path を呼び出して、出力フォルダーの構造を確認することをお勧めします。

メソッド

as_download

ダウンロードするモードを設定します。

送信された実行では、データセット内のファイルがコンピューティング先のローカルパスにダウンロードされます。ダウンロード場所は、引数の値と実行コンテキストのinput_datasets フィールドから取得できます。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_hdfs

モードを hdfs に設定します。

送信された synapse 実行では、データセット内のファイルがコンピューティング先のローカルパスに変換されます。 hdfs パスは、引数の値と os 環境変数から取得できます。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_mount

モードをマウントに設定します。

送信された実行では、データセット内のファイルがコンピューティング先のローカルパスにマウントされます。マウントポイントは、実行コンテキストの引数値とinput_datasets フィールドから取得できます。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_download

ダウンロードするモードを設定します。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_download(path_on_compute=None)

パラメーター

名前	説明
path_on_compute	str データを使用可能にするコンピューティングのターゲットパス。規定値: None

注釈

データセットが 1 つのファイルのパスから作成される場合、ダウンロード場所はダウンロードされた 1 つのファイルのパスになります。それ以外の場合、ダウンロード場所は、ダウンロードしたすべてのファイルの外側のフォルダーのパスになります。

path_on_compute /で始まる場合は、絶対パスとして扱われます。 /で始まらない場合は、作業ディレクトリに対する相対パスとして扱われます。絶対パスを指定した場合は、そのディレクトリに書き込む権限がジョブにあることを確認してください。

as_hdfs

モードを hdfs に設定します。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_hdfs()

注釈

データセットが 1 つのファイルのパスから作成されると、hdfs パスは単一ファイルのパスになります。それ以外の場合、hdfs パスは、マウントされているすべてのファイルの外側のフォルダーのパスになります。

as_mount

モードをマウントに設定します。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_mount(path_on_compute=None)

パラメーター

名前	説明
path_on_compute	str データを使用可能にするコンピューティングのターゲットパス。規定値: None

注釈

データセットが 1 つのファイルのパスから作成されると、マウントポイントは単一のマウントされたファイルのパスになります。それ以外の場合、マウントポイントは、マウントされたすべてのファイルの外側のフォルダーのパスになります。

属性

name

入力の名前。

戻り値

型	説明
	入力の名前。

次の方法で共有

DatasetConsumptionConfig クラス

コンストラクター

パラメーター

メソッド

as_download

パラメーター

注釈

as_hdfs

注釈

as_mount

パラメーター

注釈

属性

name

戻り値

フィードバック