DataFlow Class

Definition

Namespace:: Microsoft.DataPrep.Common

Assembly:: Microsoft.DataPrep.dll

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

A Dataflow represents a series of lazily-evaluated, immutable operations on data. It is only an execution plan.No data is loaded from the source until you get data from the Dataflow using one of Head, GetProfile or the write methods.

public ref class DataFlow

public class DataFlow

type DataFlow = class

Public Class DataFlow

Inheritance: Object
DataFlow

Properties

Activity	A object contains under-constructing dataflow and assoiciated support methods, such as methods to Add, or remove a step.
Builders	builder classe could be used to create various Intelligent transformation steps.
SecretManager	Maintain a list of secrets which will be used during execution.

Methods

AppendColumns(List<ActivityReference>)	Appends the columns from the referenced dataflows to the current one. Duplicate columns will result in failure.
AppendColumns(List<DataFlow>)	Appends the columns from the referenced dataflows to the current one. Duplicate columns will result in failure.
AppendColumnsAsync(List<DataFlow>)	Asynchronously appends the columns from the referenced dataflows to the current one. Duplicate columns will result in failure.
AppendRows(List<ActivityReference>)	Appends the records in the specified dataflows to the current one. If the schemas of the dataflows are distinct, this will result in records with different schemas.
AppendRows(List<DataFlow>)	Appends the records in the specified dataflows to the current one. If the schemas of the dataflows are distinct, this will result in records with different schemas.
AppendRowsAsync(List<DataFlow>)	Asynchronously appends the records in the specified dataflows to the current one. If the schemas of the dataflows are distinct, this will result in records with different schemas.
Clip(ColumnsSelector, Nullable<Double>, Nullable<Double>, Boolean)	Clips values so that all values are between the lower and upper boundaries.
ConvertUnixTimestampToDateTime(ColumnsSelector, Boolean)	Converts the specified column to DateTime values by treating the existing value as a Unix timestamp.
ConvertUnixTimestampToDateTime(String, Boolean)	Converts the specified column to DateTime values by treating the existing value as a Unix timestamp.
ConvertUnixTimestampToDateTime(String[], Boolean)	Converts the specified column to DateTime values by treating the existing value as a Unix timestamp.
Distinct(ColumnsSelector)	Filters out records that contain duplicate values in the specified columns, leaving only a single instance.
DistinctRows()	Filters out records that contain duplicate values in all columns, leaving only a single instance.
DropColumns(ColumnsSelector)	Drops the specified columns.
DropColumns(String)	Drops the specified columns.
DropColumns(String[])	Drops the specified columns.
DropErrors(ColumnsSelector, ColumnRelationship)	Drops rows where all or any of the selected columns are an Error.
DropNulls(ColumnsSelector, ColumnRelationship)	Drops rows where all or any of the selected columns are null.
DuplicateColumn(Dictionary<String,String>)	Creates new columns that are duplicates of the specified source columns.
ExtractErrorDetails(String, String, Boolean, String)	Extracts the error details from error values into a new column.
FromDPrepFile(String, SecretManager)	Perform the execution on DPrep file to get a DataFlow object.
FromDPrepJSONString(String, SecretManager)	Perform the execution on DPrep JSON string to get a DataFlow object.
GetFiles(String)	Expands the path specified by reading globs and files in folders and outputs one record per file found.
GetProfile(Boolean, Int64)	Requests the data profile which collects summary statistics on the full data produced by the Dataflow. A data profile can be very useful to understand the input data, identify anomalies and missing values, and verify that data preparation operations produced the desired result.
Join(DataFlow, List<KeyValuePair<String,String>>, JoinType, String, String, List<String>, List<String>)	Creates a new Dataflow that is a result of joining this Dataflow with the provided right_dataflow.
JoinAsync(DataFlow, DataFlow, List<KeyValuePair<String,String>>, JoinType, String, String, List<String>, List<String>)	Creates a new Dataflow that is a result of joining two provided Dataflows.
JoinAsync(DataFlow, List<KeyValuePair<String,String>>, JoinType, String, String, List<String>, List<String>)	Creates a new Dataflow that is a result of joining this Dataflow with the provided right_dataflow.
KeepColumns(ColumnsSelector)	Keeps the specified columns and drops all others.
KeepColumns(String)	Keeps the specified columns and drops all others.
KeepColumns(String[])	Keeps the specified columns and drops all others.
MapColumn(String, String, List<ReplacementsValue>)	Creates a new column where matching values in the source column have been replaced with the specified values.
NullCoalesce(List<String>, String)	For each record, selects the first non-null value from the columns specified and uses it as the value of a new column.
ParseDelimited(Char, PromoteHeadersMode, FileEncoding, Boolean, Int64, SkipMode, Nullable<Char>)	Return a DataFlow object which can parse data with a specified delimiter.
ParseJsonColumn(String)	Parses the values in the specified column as JSON objects and expands them into multiple columns.
PromoteHeaders()	Sets the first record in the dataset as headers, replacing any existing ones.
ReadParquetDataset(DataSourcePropertyValue)	Creates a step to read parquet file.
Reference(ActivityReference)	Creates a reference to an existing activity object.
RenameColumns(Dictionary<String,String>)	Renames the specified columns.
ReplaceNa(ColumnsSelector, Boolean, Boolean, Boolean, String)	Replaces values in the specified columns with nulls. You can choose to use the default list, supply your own, or both.
ReplaceNa(String, Boolean, Boolean, Boolean, String)	Replaces values in the specified columns with nulls. You can choose to use the default list, supply your own, or both.
ReplaceNa(String[], Boolean, Boolean, Boolean, String)	Replaces values in the specified columns with nulls. You can choose to use the default list, supply your own, or both.
Round(String, Int64)	Rounds the values in the column specified to the desired number of decimal places.
SaveToDPrepFile(String)	Serializes the current Dataflow into a specified DPrep file in JSON format.
SetColumnTypes(ColumnDefinitionInfo[])	Converts values in specified columns to the corresponding data types.
Skip(Int64)	Skips the specified number of records.
Sort(List<Tuple<String,Boolean>>)	Sorts the dataset by the specified columns.
SplitSType(String, SType, List<String>, List<String>)	Creates new columns from an existing column, interpreting its values as a semantic type.
StrReplace(ColumnsSelector, String, String, Boolean, Boolean)	Replaces values in a string column that match a search string with the specified value.
Summarize(List<SummaryColumnsValue>, List<String>, Boolean, String)	Summarizes data by running aggregate functions over specific columns.
Take(Int64)	Takes the specified count of records.
ToBool(ColumnsSelector, List<String>, List<String>, MismatchAsOption)	Converts the values in the specified columns to booleans.
ToBool(String, List<String>, List<String>, MismatchAsOption)	Converts the values in the specified columns to booleans.
ToBool(String[], List<String>, List<String>, MismatchAsOption)	Converts the values in the specified columns to booleans.
ToDataView()	Perform the necessary execution on DataFlow object to get an IDataView result.
ToDPrepJson()	Serializes the current Dataflow into a JSON string.
ToLong(ColumnsSelector)	Converts the values in the specified columns to 64 bit integers.
ToLong(String)	Converts the values in the specified columns to 64 bit integers.
ToLong(String[])	Converts the values in the specified columns to 64 bit integers.
ToNumber(ColumnsSelector, DecimalMark)	Converts the values in the specified columns to floating point numbers.
ToNumber(String, DecimalMark)	Converts the values in the specified columns to floating point numbers.
ToNumber(String[], DecimalMark)	Converts the values in the specified columns to floating point numbers.
ToString(ColumnsSelector)	Converts the values in the specified columns to strings.
ToString(String)	Converts the values in the specified columns to strings.
ToString(String[])	Converts the values in the specified columns to strings.
TrimString(ColumnsSelector, Boolean, Boolean, TrimType, String)	Trims string values in specific columns.
WriteDelimitedFile(String, Char, String, String)	Write out the data in the Dataflow in a delimited text format. The output is specified as a directory which will contain multiple files, one per partition processed in the Dataflow.
WriteStreams(String, OutputFilePropertyValue, String)	Writes the streams in the specified column to the destination path. By default, the name of the files written will be the resource identifier of the streams. This behavior can be overriden by specifying a column which contains the names to use.

Applies to