TextLoader.Options Class

Definition

The settings for TextLoader

public class TextLoader.Options
Inheritance
TextLoader.Options

Constructors

Fields

AllowQuoting

Whether the input may include double-quoted values. This parameter is used to distinguish separator characters in an input value from actual separators. When true, separators within double quotes are treated as part of the input value. When false, all separators, even those within quotes, are treated as delimiting a new column.

AllowSparse

Whether the input may include sparse representations. For example, a row containing "5 2:6 4:3" means that there are 5 columns, and the only non-zero are columns 2 and 4, which have values 6 and 3, respectively. Column indices are zero-based, so columns 2 and 4 represent the 3rd and 5th columns. A column may also have dense values followed by sparse values represented in this fashion. For example, a row containing "1 2 5 2:6 4:3" represents two dense columns with values 1 and 2, followed by 5 sparsely represented columns with values 0, 0, 6, 0, and 3. The indices of the sparse columns start from 0, even though 0 represents the third column.

In addition, InputSize should be used when the number of sparse elements (5 in this example) is not present in each line. It should specify the total size, not just the size of the sparse part. However, indices of the spars part are relative to where the sparse part begins. If InputSize is set to 7, the line "1 2 2:6 4:3" will be mapped to "1 2 0 0 6 0 4", but if set to 10, the same line will be mapped to "1 2 0 0 6 0 4 0 0 0".

Columns

Specifies the input columns that should be mapped to IDataView columns.

DecimalMarker

The character that should be used as the decimal marker. Default value is '.'. Only '.' and ',' are allowed to be decimal markers.

EscapeChar

Character to use to escape quotes inside quoted fields. It can't be a character used as separator.

HasHeader

Whether the file has a header with feature names. When true, the loader will skip the first line when Load(IMultiStreamSource) is called. The sample can be used to infer slot name annotations if present.

HeaderFile

File containing a header with feature names. If specified, the header defined in the data file is ignored regardless of HasHeader.

InputSize

Number of source columns in the text data. Default is that sparse rows contain their size information.

MaxRows

Maximum number of rows to produce.

MissingRealsAsNaNs

If true, missing real fields (i.e. double or single fields) will be loaded as NaN. If false, they'll be loaded as 0. Default is false. A field is considered "missing" if it's empty, if it only has whitespace, or if there are missing columns at the end of a given row.

ReadMultilines

If true, new line characters are acceptable inside a quoted field, and thus one field can have multiple lines of text inside it If AllowQuoting is false, this option is ignored.

Separators

The characters that should be used as separators column separator.

TrimWhitespace

Wheter to remove trailing whitespace from lines.

UseThreads

Whether to use separate parsing threads.

Applies to

Product Versions
ML.NET 1.0.0, 1.1.0, 1.2.0, 1.3.1, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 2.0.0, 3.0.0