Share via


TextLoaderSaverCatalog.LoadFromTextFile Method

Definition

Overloads

LoadFromTextFile(DataOperationsCatalog, String, TextLoader+Options)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

LoadFromTextFile(DataOperationsCatalog, String, TextLoader+Column[], Char, Boolean, Boolean, Boolean, Boolean)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

LoadFromTextFile<TInput>(DataOperationsCatalog, String, TextLoader+Options)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

LoadFromTextFile<TInput>(DataOperationsCatalog, String, Char, Boolean, Boolean, Boolean, Boolean)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

LoadFromTextFile(DataOperationsCatalog, String, TextLoader+Options)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

public static Microsoft.ML.IDataView LoadFromTextFile (this Microsoft.ML.DataOperationsCatalog catalog, string path, Microsoft.ML.Data.TextLoader.Options options = default);
static member LoadFromTextFile : Microsoft.ML.DataOperationsCatalog * string * Microsoft.ML.Data.TextLoader.Options -> Microsoft.ML.IDataView
<Extension()>
Public Function LoadFromTextFile (catalog As DataOperationsCatalog, path As String, Optional options As TextLoader.Options = Nothing) As IDataView

Parameters

path
String

Specifies a file or path of files from which to load.

options
TextLoader.Options

Defines the settings of the load operation.

Returns

Examples

using System;
using System.Collections.Generic;
using System.IO;
using Microsoft.ML;

namespace Samples.Dynamic
{
    public static class SaveAndLoadFromText
    {
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for
            // exception tracking and logging, as a catalog of available operations
            // and as the source of randomness. Setting the seed to a fixed number
            // in this example to make outputs deterministic.
            var mlContext = new MLContext(seed: 0);

            // Create a list of training data points.
            var dataPoints = new List<DataPoint>()
            {
                new DataPoint(){ Label = 0, Features = 4},
                new DataPoint(){ Label = 0, Features = 5},
                new DataPoint(){ Label = 0, Features = 6},
                new DataPoint(){ Label = 1, Features = 8},
                new DataPoint(){ Label = 1, Features = 9},
            };

            // Convert the list of data points to an IDataView object, which is
            // consumable by ML.NET API.
            IDataView data = mlContext.Data.LoadFromEnumerable(dataPoints);

            // Create a FileStream object and write the IDataView to it as a text
            // file.
            using (FileStream stream = new FileStream("data.tsv", FileMode.Create))
                mlContext.Data.SaveAsText(data, stream);

            // Create an IDataView object by loading the text file.
            IDataView loadedData = mlContext.Data.LoadFromTextFile("data.tsv");

            // Inspect the data that is loaded from the previously saved text file.
            var loadedDataEnumerable = mlContext.Data
                .CreateEnumerable<DataPoint>(loadedData, reuseRowObject: false);

            foreach (DataPoint row in loadedDataEnumerable)
                Console.WriteLine($"{row.Label}, {row.Features}");

            // Preview of the loaded data.
            // 0, 4
            // 0, 5
            // 0, 6
            // 1, 8
            // 1, 9
        }

        // Example with label and feature values. A data set is a collection of such
        // examples.
        private class DataPoint
        {
            public float Label { get; set; }

            public float Features { get; set; }
        }
    }
}

Applies to

LoadFromTextFile(DataOperationsCatalog, String, TextLoader+Column[], Char, Boolean, Boolean, Boolean, Boolean)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

public static Microsoft.ML.IDataView LoadFromTextFile (this Microsoft.ML.DataOperationsCatalog catalog, string path, Microsoft.ML.Data.TextLoader.Column[] columns, char separatorChar = '\t', bool hasHeader = false, bool allowQuoting = false, bool trimWhitespace = false, bool allowSparse = false);
static member LoadFromTextFile : Microsoft.ML.DataOperationsCatalog * string * Microsoft.ML.Data.TextLoader.Column[] * char * bool * bool * bool * bool -> Microsoft.ML.IDataView
<Extension()>
Public Function LoadFromTextFile (catalog As DataOperationsCatalog, path As String, columns As TextLoader.Column(), Optional separatorChar As Char = '\t', Optional hasHeader As Boolean = false, Optional allowQuoting As Boolean = false, Optional trimWhitespace As Boolean = false, Optional allowSparse As Boolean = false) As IDataView

Parameters

path
String

The path to the file(s).

columns
TextLoader.Column[]

The columns of the schema.

separatorChar
Char

The character used as separator between data points in a row. By default the tab character is used as separator.

hasHeader
Boolean

Whether the file has a header. When true, the loader will skip the first line when Load(IMultiStreamSource) is called.

allowQuoting
Boolean

Whether the input may include double-quoted values. This parameter is used to distinguish separator characters in an input value from actual separators. When true, separators within double quotes are treated as part of the input value. When false, all separators, even those whitin quotes, are treated as delimiting a new column. It is also used to distinguish empty values from missing values. When true, missing value are denoted by consecutive separators and empty values by "". When false, empty values are denoted by consecutive separators and missing values by the default missing value for each type documented in DataKind.

trimWhitespace
Boolean

Remove trailing whitespace from lines.

allowSparse
Boolean

Whether the input may include sparse representations. For example, a row containing "5 2:6 4:3" means that there are 5 columns, and the only non-zero are columns 2 and 4, which have values 6 and 3, respectively. Column indices are zero-based, so columns 2 and 4 represent the 3rd and 5th columns. A column may also have dense values followed by sparse values represented in this fashion. For example, a row containing "1 2 5 2:6 4:3" represents two dense columns with values 1 and 2, followed by 5 sparsely represented columns with values 0, 0, 6, 0, and 3. The indices of the sparse columns start from 0, even though 0 represents the third column.

Returns

The data view.

Applies to

LoadFromTextFile<TInput>(DataOperationsCatalog, String, TextLoader+Options)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

public static Microsoft.ML.IDataView LoadFromTextFile<TInput> (this Microsoft.ML.DataOperationsCatalog catalog, string path, Microsoft.ML.Data.TextLoader.Options options);
static member LoadFromTextFile : Microsoft.ML.DataOperationsCatalog * string * Microsoft.ML.Data.TextLoader.Options -> Microsoft.ML.IDataView
<Extension()>
Public Function LoadFromTextFile(Of TInput) (catalog As DataOperationsCatalog, path As String, options As TextLoader.Options) As IDataView

Type Parameters

TInput

Parameters

path
String

Specifies a file or path of files from which to load.

options
TextLoader.Options

Defines the settings of the load operation. No need to specify a Columns field, as columns will be infered by this method.

Returns

The data view.

Applies to

LoadFromTextFile<TInput>(DataOperationsCatalog, String, Char, Boolean, Boolean, Boolean, Boolean)

Load a IDataView from a text file using TextLoader. Note that IDataView's are lazy, so no actual loading happens here, just schema validation.

public static Microsoft.ML.IDataView LoadFromTextFile<TInput> (this Microsoft.ML.DataOperationsCatalog catalog, string path, char separatorChar = '\t', bool hasHeader = false, bool allowQuoting = false, bool trimWhitespace = false, bool allowSparse = false);
static member LoadFromTextFile : Microsoft.ML.DataOperationsCatalog * string * char * bool * bool * bool * bool -> Microsoft.ML.IDataView
<Extension()>
Public Function LoadFromTextFile(Of TInput) (catalog As DataOperationsCatalog, path As String, Optional separatorChar As Char = '\t', Optional hasHeader As Boolean = false, Optional allowQuoting As Boolean = false, Optional trimWhitespace As Boolean = false, Optional allowSparse As Boolean = false) As IDataView

Type Parameters

TInput

Parameters

path
String

The path to the file(s).

separatorChar
Char

Column separator character. Default is '\t'

hasHeader
Boolean

Whether the file has a header. When true, the loader will skip the first line when Load(IMultiStreamSource) is called.

allowQuoting
Boolean

Whether the input may include double-quoted values. This parameter is used to distinguish separator characters in an input value from actual separators. When true, separators within double quotes are treated as part of the input value. When false, all separators, even those whitin quotes, are treated as delimiting a new column. It is also used to distinguish empty values from missing values. When true, missing value are denoted by consecutive separators and empty values by "". When false, empty values are denoted by consecutive separators and missing values by the default missing value for each type documented in DataKind.

trimWhitespace
Boolean

Remove trailing whitespace from lines.

allowSparse
Boolean

Whether the input may include sparse representations. For example, a row containing "5 2:6 4:3" means that there are 5 columns, and the only non-zero are columns 2 and 4, which have values 6 and 3, respectively. Column indices are zero-based, so columns 2 and 4 represent the 3rd and 5th columns. A column may also have dense values followed by sparse values represented in this fashion. For example, a row containing "1 2 5 2:6 4:3" represents two dense columns with values 1 and 2, followed by 5 sparsely represented columns with values 0, 0, 6, 0, and 3. The indices of the sparse columns start from 0, even though 0 represents the third column.

Returns

The data view.

Applies to