DatasetConsumptionConfig クラス

リファレンス

データセットをコンピューティング先に配信する方法を表します。

継承: builtins.object

DatasetConsumptionConfig

コンストラクター

DatasetConsumptionConfig(name, dataset, mode='direct', path_on_compute=None)

パラメーター

名前	説明
name 必須	str 実行中のデータセットの名前。登録された名前とは異なる場合があります。名前は環境変数として登録され、データプレーンで使用できます。
dataset 必須	AbstractDataset または PipelineParameter または OutputDatasetConfig 実行に使われるデータセット。
mode	str データセットをコンピューティング先に配信する方法を定義します。次の 3 つのモードがあります。 'direct': データセットをデータセットとして使用します。 'download': データセットをダウンロードし、データセットをダウンロードしたパスとして使用します。 'mount': データセットをマウントし、データセットをマウントパスとして使用します。 'hdfs': 解決された hdfs パスからデータセットを使用します (現時点で、SynapseSpark コンピューティング上でのみサポートされます)。規定値: direct
path_on_compute	str データが使用できるようになるコンピューティング先のパス。ソースデータのフォルダー構造は保持されますが、競合を回避するために、このフォルダー構造にプレフィックスを追加することもできます。 `tabular_dataset.to_path` を使って、出力フォルダー構造を確認します。規定値: None
name 必須	str 実行のデータセットの名前。登録された名前とは異なる場合があります。名前は環境変数として登録され、データプレーンで使用できます。
dataset 必須	Dataset または PipelineParameter または tuple(Workspace, str) または tuple(Workspace, str, str) または OutputDatasetConfig データセットオブジェクトとして配信されるデータセット、データセットを取り込むパイプラインパラメーター、(ワークスペース、データセット名)、または (ワークスペース、データセット名、データセットバージョン) のタプル。名前のみを指定した場合、DatasetConsumptionConfig は最新バージョンのデータセットを使用します。
mode 必須	str データセットをコンピューティング先に配信する方法を定義します。次の 3 つのモードがあります。 'direct': データセットをデータセットとして使用します。 'download': データセットをダウンロードし、データセットをダウンロードしたパスとして使用します。 'mount': データセットをマウントし、データセットをマウントパスとして使用します。 'hdfs': 解決された hdfs パスからデータセットを使用します (現時点で、SynapseSpark コンピューティング上でのみサポートされます)。
path_on_compute 必須	str データが使用できるようになるコンピューティング先のパス。ソースデータのフォルダー構造は保持されますが、競合を回避するために、このフォルダー構造にプレフィックスを追加することもできます。 tabular_dataset.to_path を呼び出して、出力フォルダーの構造を確認することをお勧めします。

メソッド

as_download

ダウンロードするモードを設定します。

送信された実行では、データセット内のファイルがコンピューティング先のローカルパスにダウンロードされます。ダウンロード場所は、引数の値と実行コンテキストの input_datasets フィールドから取得できます。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_hdfs

モードを hdfs に設定します。

送信された synapse 実行では、データセット内のファイルがコンピューティング先のローカルパスに変換されます。 hdfs パスは、引数の値および os 環境変数から取得できます。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_mount

マウントするモードを設定します。

送信された実行では、データセット内のファイルがコンピューティング先のローカルパスにマウントされます。マウントポイントは、引数の値と実行コンテキストの input_datasets フィールドから取得できます。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_download

ダウンロードするモードを設定します。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_download(path_on_compute=None)

パラメーター

名前	説明
path_on_compute	str データが使用できるようになるコンピューティング先のパス。規定値: None

注釈

データセットが単一ファイルのパスから作成された場合、ダウンロード先は、ダウンロードされたその単一ファイルのパスになります。それ以外の場合、ダウンロード先は、ダウンロードされたすべてのファイルを含むフォルダーのパスになります。

path_on_compute が / で始まる場合、絶対パスとして扱われます。 / で始まらない場合、作業ディレクトリを基準とした相対パスとして扱われます。絶対パスを指定した場合は、ジョブに、そのディレクトリに書き込むためのアクセス許可があることを確認してください。

as_hdfs

モードを hdfs に設定します。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_hdfs()

注釈

データセットが単一ファイルのパスから作成された場合、hdfs パスは、その単一ファイルのパスになります。それ以外の場合、hdfs パスは、マウントされたすべてのファイルを含むフォルダーのパスになります。

as_mount

マウントするモードを設定します。


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_mount(path_on_compute=None)

パラメーター

名前	説明
path_on_compute	str データが使用できるようになるコンピューティング先のパス。規定値: None

注釈

データセットが単一ファイルのパスから作成された場合、マウントポイントは、マウントされたその単一ファイルのパスになります。それ以外の場合、マウントポイントは、マウントされたすべてのファイルを含むフォルダーのパスになります。

属性

name

入力の名前。

戻り値

型	説明
	入力の名前。

次の方法で共有

DatasetConsumptionConfig クラス

コンストラクター

パラメーター

メソッド

as_download

パラメーター

注釈

as_hdfs

注釈

as_mount

パラメーター

注釈

属性

name

戻り値

フィードバック

その他のリソース