OutputTabularDatasetConfig Class
Represent how to copy the output of a run and be promoted as a TabularDataset.
Initialize a OutputTabularDatasetConfig.
- Inheritance
-
OutputTabularDatasetConfig
Constructor
OutputTabularDatasetConfig(**kwargs)
Remarks
You should not call this constructor directly, but instead should create a OutputFileDatasetConfig and then call the corresponding read_* methods to convert it into a OutputTabularDatasetConfig.
The way the output will be copied to the destination for a OutputTabularDatasetConfig is the same as a OutputFileDatasetConfig. The difference between them is that the Dataset that is created will be a TabularDataset containing all the specified transformations.
Methods
as_input |
Specify how to consume the output as an input in subsequent pipeline steps. |
as_mount |
Set the mode of the output to mount. For mount mode, the output directory will be a FUSE mounted directory. Files written to the mounted directory will be uploaded when the file is closed. |
as_upload |
Set the mode of the output to upload. For upload mode, files written to the output directory will be uploaded at the end of the job. If the job fails or gets canceled, then the output directory will not be uploaded. |
drop_columns |
Drop the specified columns from the Dataset. |
keep_columns |
Keep the specified columns and drops all others from the Dataset. |
random_split |
Split records in the dataset into two parts randomly and approximately by the percentage specified. The resultant output configs will have their names changed, the first one will have _1 appended to the name and the second one will have _2 appended to the name. If it will cause a name collision or you would like to specify a custom name, please manually set their names. |
as_input
Specify how to consume the output as an input in subsequent pipeline steps.
as_input(name=None)
Parameters
Returns
A DatasetConsumptionConfig instance describing how to deliver the input data.
Return type
as_mount
Set the mode of the output to mount.
For mount mode, the output directory will be a FUSE mounted directory. Files written to the mounted directory will be uploaded when the file is closed.
as_mount()
Returns
A OutputTabularDatasetConfig instance with mode set to mount.
Return type
as_upload
Set the mode of the output to upload.
For upload mode, files written to the output directory will be uploaded at the end of the job. If the job fails or gets canceled, then the output directory will not be uploaded.
as_upload(overwrite=False, source_globs=None)
Parameters
Returns
A OutputTabularDatasetConfig instance with mode set to upload.
Return type
drop_columns
Drop the specified columns from the Dataset.
drop_columns(columns)
Parameters
Returns
A OutputTabularDatasetConfig instance with which columns to drop.
Return type
keep_columns
Keep the specified columns and drops all others from the Dataset.
keep_columns(columns)
Parameters
Returns
A OutputTabularDatasetConfig instance with which columns to keep.
Return type
random_split
Split records in the dataset into two parts randomly and approximately by the percentage specified.
The resultant output configs will have their names changed, the first one will have _1 appended to the name and the second one will have _2 appended to the name. If it will cause a name collision or you would like to specify a custom name, please manually set their names.
random_split(percentage, seed=None)
Parameters
- percentage
- float
The approximate percentage to split the dataset by. This must be a number between 0.0 and 1.0.
Returns
Returns a tuple of two OutputTabularDatasetConfig objects representing the two Datasets after the split.
Return type
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for