PipelineOutputTabularDataset Class
Represent intermediate pipeline data promoted to an Azure Machine Learning Tabular Dataset.
Once an intermediate data is promoted to an Azure Machine Learning Dataset, it will also be consumed as a Dataset instead of a DataReference in subsequent steps.
Create an intermediate data that will be promoted to an Azure Machine Learning Dataset.
- Inheritance
-
PipelineOutputTabularDataset
Constructor
PipelineOutputTabularDataset(pipeline_output_dataset, additional_transformations)
Parameters
Name | Description |
---|---|
pipeline_output_dataset
Required
|
The file dataset that represents the intermediate output which will be transformed to a tabular Dataset. |
additional_transformations
Required
|
<xref:azureml.dataprep.Dataflow>
Additional transformations that will be applied on top of the file dataset. |
pipeline_output_dataset
Required
|
The file dataset that represents the intermediate output which will be transformed to a tabular Dataset. |
additional_transformations
Required
|
<xref:azureml.dataprep.Dataflow>
Additional transformations that will be applied on top of the file dataset. |
Methods
create_input_binding |
Create an input binding. |
drop_columns |
Drop the specified columns from the dataset. |
keep_columns |
Keep the specified columns and drops all others from the dataset. |
random_split |
Split records in the dataset into two parts randomly and approximately by the percentage specified. |
create_input_binding
Create an input binding.
create_input_binding()
Returns
Type | Description |
---|---|
The InputPortBinding with this PipelineData as the source. |
drop_columns
Drop the specified columns from the dataset.
drop_columns(columns)
Parameters
Name | Description |
---|---|
columns
Required
|
The name or a list of names for the columns to drop. |
Returns
Type | Description |
---|---|
Returns a new intermediate data with only the specified columns dropped. |
keep_columns
Keep the specified columns and drops all others from the dataset.
keep_columns(columns)
Parameters
Name | Description |
---|---|
columns
Required
|
The name or a list of names for the columns to keep. |
Returns
Type | Description |
---|---|
Returns a new intermediate data with only the specified columns kept. |
random_split
Split records in the dataset into two parts randomly and approximately by the percentage specified.
random_split(percentage, seed=None)
Parameters
Name | Description |
---|---|
percentage
Required
|
The approximate percentage to split the dataset by. This must be a number between 0.0 and 1.0. |
seed
|
Optional seed to use for the random generator. Default value: None
|
Returns
Type | Description |
---|---|
Returns a tuple of new TabularDataset objects representing the two datasets after the split. |