PipelineOutputTabularDataset Class

Represent intermediate pipeline data promoted to an Azure Machine Learning Tabular Dataset.

Once an intermediate data is promoted to an Azure Machine Learning Dataset, it will also be consumed as a Dataset instead of a DataReference in subsequent steps.

Create an intermediate data that will be promoted to an Azure Machine Learning Dataset.

Inheritance
PipelineOutputTabularDataset

Constructor

PipelineOutputTabularDataset(pipeline_output_dataset, additional_transformations)

Parameters

pipeline_output_dataset
PipelineOutputFileDataset
Required

The file dataset that represents the intermediate output which will be transformed to a tabular Dataset.

additional_transformations
<xref:azureml.dataprep.Dataflow>
Required

Additional transformations that will be applied on top of the file dataset.

pipeline_output_dataset
PipelineOutputFileDataset
Required

The file dataset that represents the intermediate output which will be transformed to a tabular Dataset.

additional_transformations
<xref:azureml.dataprep.Dataflow>
Required

Additional transformations that will be applied on top of the file dataset.

Methods

create_input_binding

Create an input binding.

drop_columns

Drop the specified columns from the dataset.

keep_columns

Keep the specified columns and drops all others from the dataset.

random_split

Split records in the dataset into two parts randomly and approximately by the percentage specified.

create_input_binding

Create an input binding.

create_input_binding()

Returns

The InputPortBinding with this PipelineData as the source.

Return type

drop_columns

Drop the specified columns from the dataset.

drop_columns(columns)

Parameters

columns
str or list[str]
Required

The name or a list of names for the columns to drop.

Returns

Returns a new intermediate data with only the specified columns dropped.

Return type

keep_columns

Keep the specified columns and drops all others from the dataset.

keep_columns(columns)

Parameters

columns
str or list[str]
Required

The name or a list of names for the columns to keep.

Returns

Returns a new intermediate data with only the specified columns kept.

Return type

random_split

Split records in the dataset into two parts randomly and approximately by the percentage specified.

random_split(percentage, seed=None)

Parameters

percentage
float
Required

The approximate percentage to split the dataset by. This must be a number between 0.0 and 1.0.

seed
int
default value: None

Optional seed to use for the random generator.

Returns

Returns a tuple of new TabularDataset objects representing the two datasets after the split.

Return type