DataOperationsCatalog Class

Definition

Class used to create components that operate on data, but are not part of the model training pipeline. Includes components to load, save, cache, filter, shuffle, and split data.

public sealed class DataOperationsCatalog
type DataOperationsCatalog = class
Public NotInheritable Class DataOperationsCatalog
Inheritance
DataOperationsCatalog

Methods

BootstrapSample(IDataView, Nullable<Int32>, Boolean)

Take an approximate bootstrap sample of input.

Cache(IDataView, String[])

Creates a lazy in-memory cache of input.

CreateEnumerable<TRow>(IDataView, Boolean, Boolean, SchemaDefinition)

Convert an IDataView into a strongly-typed IEnumerable<T>.

CrossValidationSplit(IDataView, Int32, String, Nullable<Int32>)

Split the dataset into cross-validation folds of train set and test set. Respects the samplingKeyColumnName if provided.

FilterRowsByColumn(IDataView, String, Double, Double)

Filter the dataset by the values of a numeric column.

FilterRowsByKeyColumnFraction(IDataView, String, Double, Double)

Filter the dataset by the values of a KeyDataViewType column.

FilterRowsByMissingValues(IDataView, String[])

Drop rows where any column in columns contains a missing value.

LoadFromEnumerable<TRow>(IEnumerable<TRow>, DataViewSchema)

Create a new IDataView over an enumerable of the items of user-defined type using the provided DataViewSchema, which might contain more information about the schema than the type can capture.

LoadFromEnumerable<TRow>(IEnumerable<TRow>, SchemaDefinition)

Create a new IDataView over an enumerable of the items of user-defined type. The user maintains ownership of the data and the resulting data view will never alter the contents of the data. Since IDataView is assumed to be immutable, the user is expected to support multiple enumerations of the data that would return the same results, unless the user knows that the data will only be cursored once.

One typical usage for streaming data view could be: create the data view that lazily loads data as needed, then apply pre-trained transformations to it and cursor through it for transformation results.

ShuffleRows(IDataView, Nullable<Int32>, Int32, Boolean)

Shuffle the rows of input.

SkipRows(IDataView, Int64)

Skip count rows in input.

TakeRows(IDataView, Int64)

Take count rows from input.

TrainTestSplit(IDataView, Double, String, Nullable<Int32>)

Split the dataset into the train set and test set according to the given fraction. Respects the samplingKeyColumnName if provided.

Extension Methods

LoadFromBinary(DataOperationsCatalog, IMultiStreamSource)

Load a IDataView from an IMultiStreamSource on a binary file. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

LoadFromBinary(DataOperationsCatalog, String)

Load a IDataView from a binary file. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

SaveAsBinary(DataOperationsCatalog, IDataView, Stream, Boolean)

Save the IDataView into a binary stream.

FilterByCustomPredicate<TSrc>(DataOperationsCatalog, IDataView, Func<TSrc,Boolean>)

Drop rows where a specified predicate returns true.

FilterByStatefulCustomPredicate<TSrc,TState>(DataOperationsCatalog, IDataView, Func<TSrc,TState,Boolean>, Action<TState>)

Drop rows where a specified predicate returns true. This filter allows to maintain a per-cursor state.

CreateSvmLightLoader(DataOperationsCatalog, Nullable<Int64>, Int32, Boolean, IMultiStreamSource)

Creates a loader that loads SVM-light format files. SvmLightLoader.

CreateSvmLightLoaderWithFeatureNames(DataOperationsCatalog, Nullable<Int64>, IMultiStreamSource)

Creates a loader that loads SVM-light like files, where features are specified by their names.

LoadFromSvmLightFile(DataOperationsCatalog, String, Nullable<Int64>, Int32, Boolean)

Load a IDataView from a text file using SvmLightLoader.

LoadFromSvmLightFileWithFeatureNames(DataOperationsCatalog, String, Nullable<Int64>)

Load a IDataView from a text file containing features specified by feature names, using SvmLightLoader.

SaveInSvmLightFormat(DataOperationsCatalog, IDataView, Stream, Boolean, Boolean, String, String, String, String)

Save the IDataView in SVM-light format. Four columns can be saved: a label and a features column, and optionally a group ID column and an example weight column.

CreateDatabaseLoader(DataOperationsCatalog, DatabaseLoader+Column[])

Create a database loader DatabaseLoader.

CreateDatabaseLoader(DataOperationsCatalog, DatabaseLoader+Options)

Create a database loader DatabaseLoader.

CreateDatabaseLoader<TInput>(DataOperationsCatalog)

Create a database loader DatabaseLoader.

CreateTextLoader(DataOperationsCatalog, TextLoader+Column[], Char, Boolean, IMultiStreamSource, Boolean, Boolean, Boolean)

Create a text loader TextLoader.

CreateTextLoader(DataOperationsCatalog, TextLoader+Options, IMultiStreamSource)

Create a text loader TextLoader.

CreateTextLoader<TInput>(DataOperationsCatalog, TextLoader+Options, IMultiStreamSource)

Create a text loader TextLoader by inferencing the dataset schema from a data model type.

CreateTextLoader<TInput>(DataOperationsCatalog, Char, Boolean, IMultiStreamSource, Boolean, Boolean, Boolean)

Create a text loader TextLoader by inferencing the dataset schema from a data model type.

LoadFromTextFile(DataOperationsCatalog, String, TextLoader+Column[], Char, Boolean, Boolean, Boolean, Boolean)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

LoadFromTextFile(DataOperationsCatalog, String, TextLoader+Options)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

LoadFromTextFile<TInput>(DataOperationsCatalog, String, TextLoader+Options)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

LoadFromTextFile<TInput>(DataOperationsCatalog, String, Char, Boolean, Boolean, Boolean, Boolean)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

SaveAsText(DataOperationsCatalog, IDataView, Stream, Char, Boolean, Boolean, Boolean, Boolean)

Save the IDataView as text.

Applies to