Learn how to load your training datasets from a file or a SQL Server database for use in one of the Model Builder scenarios for ML.NET. Model Builder scenarios can use SQL Server databases, image files, and CSV or TSV file formats as training data.
Model Builder only accepts TSV, CSV, and TXT files with comma, tab, and semi-colon delimiters and PNG and JPG images.
Model Builder scenarios
Model Builder helps you create models for the following machine learning scenarios:
Data classification (binary & multiclass classification): Classify text data into two or more categories.
Value prediction (regression): Predict a numeric value.
Image classification (deep learning): Classify images into two or more categories.
Recommendation (recommendation): Produce a list of suggested items for a particular user.
Object detection (deep learning): Detect and identify object in images. This can find one or more objects and label them accordingly.
This article covers classification and regression with textual or numerical data, image classification, and object detection scenarios.
Load text or numeric data from a file
You can load text or numeric data from a file into Model Builder. It accepts comma-delimited (CSV) or tab-delimited (TSV) file formats.
In the data step of Model Builder, select File as the data source type.
Select the Browse button next to the text box, and use File Explorer to browse and select the data file.
Choose a category in the Column to predict (Label) dropdown.
Note
(Optional) data classification scenarios: If the data type of your label column (the value in the "Column to predict (Label)" dropdown) is set to Boolean (True/False), a binary classification algorithm is used in your model training pipeline. Otherwise, a multiclass classification trainer is used. Use Advanced data options to modify the data type for your label column and inform Model Builder which type of trainer it should use for your data.
Update the data in the Advanced data options link to set column settings or to update the data formatting.
You're done setting up your data source file for Model Builder. Click the Next step button to move to the next step in Model Builder.
Load data from a SQL Server database
Model Builder supports loading data from local and remote SQL Server databases.
Local database file
To load data from a SQL Server database file into Model Builder:
In the data step of Model Builder, select SQL Server as the data source type.
Select the Choose data source button.
In the Choose Data Source dialog, select Microsoft SQL Server Database File.
Uncheck the Always use this selection checkbox and select Continue
In the Connection Properties dialog, select Browse and select the downloaded .MDF file.
Select OK
Choose the dataset name from the Table Name dropdown.
From the Column to predict (Label) dropdown, choose the data category on which you want to make a prediction.
Note
(Optional) data classification scenarios: If the data type of your label column (the value in the "Column to predict (Label)" dropdown) is set to Boolean (True/False), a binary classification algorithm is used in your model training pipeline. Otherwise, a multiclass classification trainer is used. Use Advanced data options to modify the data type for your label column and inform Model Builder which type of trainer it should use for your data.
Update the data in the Advanced data options link to set column settings or to update the data formatting.
Remote database
To load data from a SQL Server database connection into Model Builder:
In the data step of Model Builder, select SQL Server as the data source type.
Select the Choose data source button.
In the Choose Data Source dialog, select Microsoft SQL Server.
In the Connection Properties dialog, input the properties of your Microsoft SQL database.
Provide the server name that has the table that you want to connect to.
Set up the authentication to the server. If SQL Server Authentication is selected, input the server's username and password.
Select what database to connect to in the Select or enter a database name dropdown. This should auto-populate if the server name and log in information are correct.
Select OK
Choose the dataset name from the Table Name dropdown.
From the Column to predict (Label) dropdown, choose the data category on which you want to make a prediction.
Note
(Optional) data classification scenarios: If the data type of your label column (the value in the "Column to predict (Label)" dropdown) is set to Boolean (True/False), a binary classification algorithm is used in your model training pipeline. Otherwise, a multiclass classification trainer is used. Use Advanced data options to modify the data type for your label column and inform Model Builder which type of trainer it should use for your data.
Update the data in the Advanced data options link to set column settings or to update the data formatting.
You're done setting up your data source file for Model Builder. Click the Next step button link to move to the next step in Model Builder.
Set up image classification data files
Model Builder expects image classification data to be JPG or PNG files organized in folders that correspond to the categories of the classification.
To load images into Model Builder, provide the path to a single top-level directory:
This top-level directory contains one subfolder for each of the categories to predict.
Each subfolder contains the image files belonging to its category.
In the folder structure illustrated below, the top-level directory is flower_photos. There are five subdirectories corresponding to the categories you want to predict: daisy, dandelion, roses, sunflowers, and tulips. Each of these subdirectories contains images belonging to its respective category.
Model Builder expects object detection image data to be in JSON format generated from VoTT. The JSON file is located in the vott-json-export folder in the Target Location that is specified in the project settings.
The JSON file consists of the following information generated from VoTT:
The source for this content can be found on GitHub, where you can also create and review issues and pull requests. For more information, see our contributor guide.
.NET feedback
.NET is an open source project. Select a link to provide feedback:
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.