Learn how to load data into ML.NET for processing and training, using the API. The data is originally stored in files or other data sources such as databases, JSON, XML, or in-memory collections.
Attributes give ML.NET more information about the data model and the data source.
The LoadColumn attribute specifies your properties' column indices.
Svarīgi
LoadColumn is only required when loading data from a file.
Load columns as:
Individual columns, like Size and CurrentPrices in the HousingData class.
Multiple columns at a time in the form of a vector, like HistoricalPrices in the HousingData class.
If you have a vector property, apply the VectorType attribute to the property in your data model. All of the elements in the vector must be the same type. Keeping the columns separated allows for ease and flexibility of feature engineering, but for a large number of columns, operating on the individual columns causes an impact on training speed.
ML.NET operates through column names. If you want to change the name of a column to something other than the property name, use the ColumnName attribute. When creating in-memory objects, you still create objects using the property name. However, for data processing and building machine learning models, ML.NET overrides and references the property with the value provided in the ColumnName attribute.
Load data from a single file
To load data from a file, use the LoadFromTextFile method with the data model for the data to be loaded. Since separatorChar parameter is tab-delimited by default, change it for your data file as needed. If your file has a header, set the hasHeader parameter to true to ignore the first line in the file and begin to load data from the second line.
C#
//Create MLContext
MLContext mlContext = new MLContext();
//Load Data
IDataView data = mlContext.Data.LoadFromTextFile<HousingData>("my-data-file.csv", separatorChar: ',', hasHeader: true);
Load data from multiple files
In the event that your data is stored in multiple files, as long as the data schema is the same, ML.NET allows you to load data from multiple files that are either in the same directory or multiple directories.
Load from files in a single directory
When all of your data files are in the same directory, use wildcards in the LoadFromTextFile method.
C#
//Create MLContext
MLContext mlContext = new MLContext();
//Load Data File
IDataView data = mlContext.Data.LoadFromTextFile<HousingData>("Data/*", separatorChar: ',', hasHeader: true);
Load from files in multiple directories
To load data from multiple directories, use the CreateTextLoader method to create a TextLoader. Then, use the TextLoader.Load method and specify the individual file paths (wildcards can't be used).
C#
//Create MLContext
MLContext mlContext = new MLContext();
// Create TextLoader
TextLoader textLoader = mlContext.Data.CreateTextLoader<HousingData>(separatorChar: ',', hasHeader: true);
// Load Data
IDataView data = textLoader.Load("DataFolder/SubFolder1/1.txt", "DataFolder/SubFolder2/1.txt");
Load data from a relational database
ML.NET supports loading data from a variety of relational databases supported by System.Data, which include SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, Progress, and IBM DB2.
Then, inside of your application, create a DatabaseLoader.
C#
MLContext mlContext = new MLContext();
DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader<HouseData>();
Define your connection string as well as the SQL command to be executed on the database and create a DatabaseSource instance. This sample uses a LocalDB SQL Server database with a file path. However, DatabaseLoader supports any other valid connection string for databases on-premises and in the cloud.
Svarīgi
Microsoft recommends that you use the most secure authentication flow available. If you're connecting to Azure SQL, Managed Identities for Azure resources is the recommended authentication method.
C#
string connectionString = @"Data Source=(LocalDB)\MSSQLLocalDB;AttachDbFilename=<YOUR-DB-FILEPATH>;Database=<YOUR-DB-NAME>;Integrated Security=True;Connect Timeout=30";
string sqlCommand = "SELECT CAST(Size as REAL) as Size, CAST(NumBed as REAL) as NumBed, Price FROM House";
DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, sqlCommand);
Numerical data that's not of type Real has to be converted to Real. The Real type is represented as a single-precision floating-point value or Single, the input type expected by ML.NET algorithms. In this sample, the Size and NumBed columns are integers in the database. Using the CAST built-in function, it's converted to Real. Because the Price property is already of type Real, it's loaded as-is.
Use the Load method to load the data into an IDataView.
C#
IDataView data = loader.Load(dbSource);
Load images
To load image data from a directory, first create a model that includes the image path and a label. ImagePath is the absolute path of the image in the data source directory. Label is the class or category of the actual image file.
In addition to loading data stored in files, ML.NET supports loading data from sources that include:
In-memory collections
JSON/XML
When working with streaming sources, ML.NET expects input to be in the form of an in-memory collection. Therefore, when working with sources like JSON/XML, make sure to format the data into an in-memory collection.
// Create MLContext
MLContext mlContext = new MLContext();
//Load Data
IDataView data = mlContext.Data.LoadFromEnumerable<HousingData>(inMemoryCollection);
Šī satura avotu var atrast vietnē GitHub, kur varat arī izveidot un pārskatīt problēmas un atgādāšanas pieprasījumus. Lai iegūtu papildinformāciju, skatiet mūsu līdzstrādnieku rokasgrāmatu.
.NET atsauksmes
.NET ir atklātā pirmkoda projekts. Atlasiet saiti, lai sniegtu atsauksmes:
Pievienojieties meetup sērijai, lai kopā ar citiem izstrādātājiem un ekspertiem izveidotu mērogojamus AI risinājumus, kuru pamatā ir reālas lietošanas gadījumi.
Learn how to build machine learning models, collect metrics, and measure performance with ML.NET. A machine learning model identifies patterns within training data to make predictions using new data.
The ML.NET Automated ML (AutoML) API automates the model building process and generates a model ready for deployment. Learn the options that you can use to configure automated machine learning tasks.