DatasetConsumptionConfig 클래스

참조

데이터 세트를 컴퓨팅 대상에 전달하는 방법을 나타냅니다.

컴퓨팅 대상에 데이터 세트를 전달하는 방법을 나타냅니다.

상속: builtins.object

DatasetConsumptionConfig

생성자

DatasetConsumptionConfig(name, dataset, mode='direct', path_on_compute=None)

매개 변수

Name	Description
name 필수	str 실행 중인 데이터 세트의 이름으로, 등록된 이름과 다를 수 있습니다. 이름은 환경 변수로 등록되며 데이터 평면에서 사용할 수 있습니다.
dataset 필수	AbstractDataset 또는 PipelineParameter 또는 OutputDatasetConfig 실행에서 사용할 데이터 세트입니다.
mode	str 데이터 세트가 컴퓨팅 대상에 전달되어야 하는 방법을 정의합니다. 세 가지 모드가 있습니다. 'direct': 데이터 세트를 데이터 세트로 사용합니다. 'download': 데이터 세트를 다운로드하고 다운로드한 경로로 데이터 세트를 사용합니다. 'mount': 데이터 세트를 탑재하고 데이터 세트를 탑재 경로로 사용합니다. 'hdfs': 확인된 hdfs 경로에서 데이터 세트를 사용합니다(현재 SynapseSpark 컴퓨팅에서만 지원됨). Default value: direct
path_on_compute	str 데이터를 사용할 수 있도록 하는 컴퓨팅의 대상 경로입니다. 원본 데이터의 폴더 구조는 유지되지만 충돌을 피하기 위해 이 폴더 구조에 접두어를 추가할 수 있습니다. `tabular_dataset.to_path`를 사용하여 출력 폴더 구조를 확인합니다. Default value: None
name 필수	str 실행 중인 데이터 세트의 이름으로, 등록된 이름과 다를 수 있습니다. 이름은 환경 변수로 등록되며 데이터 평면에서 사용할 수 있습니다.
dataset 필수	Dataset 또는 PipelineParameter 또는 tuple(Workspace, str) 또는 tuple(Workspace, str, str) 또는 OutputDatasetConfig 데이터 세트 개체로 전달할 데이터 세트, 데이터 세트를 수집하는 파이프라인 매개 변수, 튜플(작업 영역, 데이터 세트 이름) 또는 튜플(작업 영역, 데이터 세트 이름, 데이터 세트 버전)입니다. 이름만 제공된 경우 DatasetConsumptionConfig는 최신 버전의 데이터 세트를 사용합니다.
mode 필수	str 데이터 세트가 컴퓨팅 대상에 전달되어야 하는 방법을 정의합니다. 세 가지 모드가 있습니다. 'direct': 데이터 세트를 데이터 세트로 사용합니다. 'download': 데이터 세트를 다운로드하고 다운로드한 경로로 데이터 세트를 사용합니다. 'mount': 데이터 세트를 탑재하고 데이터 세트를 탑재 경로로 사용합니다. 'hdfs': 확인된 hdfs 경로에서 데이터 세트를 사용합니다(현재 SynapseSpark 컴퓨팅에서만 지원됨).
path_on_compute 필수	str 데이터를 사용할 수 있도록 하는 컴퓨팅의 대상 경로입니다. 원본 데이터의 폴더 구조는 유지되지만 충돌을 피하기 위해 이 폴더 구조에 접두어를 추가할 수 있습니다. 출력 폴더 구조를 보려면 tabular_dataset.to_path 를 호출하는 것이 좋습니다.

메서드

as_download

다운로드 모드를 설정합니다.

제출된 실행에서 데이터 세트의 파일은 컴퓨팅 대상의 로컬 경로로 다운로드됩니다. 다운로드 위치는 실행 컨텍스트의 인수 값 및 input_datasets 필드에서 검색할 수 있습니다.


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_hdfs

모드를 hdfs로 설정합니다.

제출된 Synapse 실행에서 데이터 집합의 파일은 컴퓨팅 대상의 로컬 경로로 변환됩니다. hdfs 경로는 인수 값 및 os 환경 변수에서 검색할 수 있습니다.


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_mount

탑재할 모드를 설정합니다.

제출된 실행에서 데이터 세트의 파일은 컴퓨팅 대상의 로컬 경로에 탑재됩니다. 탑재 지점은 인수 값과 실행 컨텍스트의 input_datasets 필드에서 검색할 수 있습니다.


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_download

다운로드 모드를 설정합니다.


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']

as_download(path_on_compute=None)

매개 변수

Name	Description
path_on_compute	str 데이터를 사용할 수 있도록 하는 컴퓨팅의 대상 경로입니다. Default value: None

설명

단일 파일의 경로에서 데이터 집합이 만들어질 때 다운로드 위치는 단일 다운로드 파일의 경로가 됩니다. 그렇지 않으면 다운로드 위치는 다운로드한 모든 파일을 포함하는 폴더의 경로가 됩니다.

path_on_compute가 /로 시작하면 절대 경로로 처리됩니다. /로 시작하지 않으면 작업 디렉터리에 대한 상대 경로로 처리됩니다. 절대 경로를 지정한 경우 작업에 해당 디렉터리에 대한 쓰기 권한이 있는지 확인합니다.

as_hdfs

모드를 hdfs로 설정합니다.

제출된 Synapse 실행에서 데이터 집합의 파일은 컴퓨팅 대상의 로컬 경로로 변환됩니다. hdfs 경로는 인수 값 및 os 환경 변수에서 검색할 수 있습니다.


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_hdfs()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The hdfs path can be retrieved from argument values
   import sys
   hdfs_path = sys.argv[1]

   # The hdfs path can also be retrieved from input_datasets of the run context.
   import os
   hdfs_path = os.environ['input_1']

as_hdfs()

설명

단일 파일의 경로에서 데이터 세트를 만들 때 hdfs 경로는 단일 파일의 경로가 됩니다. 그렇지 않으면 hdfs 경로는 탑재된 모든 파일을 포함하는 폴더의 경로가 됩니다.

as_mount

탑재할 모드를 설정합니다.


   file_dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
   file_pipeline_param = PipelineParameter(name="file_ds_param", default_value=file_dataset)
   dataset_input = DatasetConsumptionConfig("input_1", file_pipeline_param).as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_mount(path_on_compute=None)

매개 변수

Name	Description
path_on_compute	str 데이터를 사용할 수 있도록 하는 컴퓨팅의 대상 경로입니다. Default value: None

설명

단일 파일의 경로에서 데이터 집합이 만들어질 때 탑재 지점은 단일 탑재된 파일의 경로가 됩니다. 그렇지 않으면 탑재 지점은 탑재된 모든 파일을 포함하는 폴더의 경로가 됩니다.

특성

name

입력의 이름입니다.

반환

형식	Description
	입력의 이름입니다.

다음을 통해 공유

DatasetConsumptionConfig 클래스

생성자

매개 변수

메서드

as_download

매개 변수

설명

as_hdfs

설명

as_mount

매개 변수

설명

특성

name

반환

피드백

추가 리소스