Enter Data Manually
Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.
- See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
- Learn more about Azure Machine Learning.
ML Studio (classic) documentation is being retired and may not be updated in the future.
Enables entering and editing small datasets by typing values
Category: Data Transformation / Manipulation
Note
Applies to: Machine Learning Studio (classic) only
Similar drag-and-drop modules are available in Azure Machine Learning designer.
Module overview
This article describes how to use the Enter Data Manually module in Machine Learning Studio (classic), to create a small dataset by typing values. The dataset can have multiple columns.
This module can be helpful in scenarios such as these:
Generating a small set of values for testing
Creating a short list of labels
Entering values for use in Apply Math Operation
Specifying replacement values for use in Replace Discrete Values
Typing a list of column names to insert in a dataset
How to use Enter Data Manually
Add the Enter Data Manually module to your experiment. You can find this module in the Data Input and Output category in Machine Learning Studio (classic).
For DataFormat, select one of the following options. These options determine how the data that you provide should be parsed. The requirements for each format differ greatly, so be sure to read the related topics.
ARFF. The attribute-relation file format, used by Weka. For more information, see Convert to ARFF.
CSV. Comma-separated values format. For more information, see Convert to CSV.
SVMLight. A format used by Vowpal Wabbit and other machine learning frameworks. For more information, see Convert to SVMLight.
TSV. Tab-separated values format. For more information, see Convert to TSV.
If you choose a format and do not provide data that meets the format specifications, a run-time error occurs.
Click inside the Data text box to start entering data. The following formats require special attention:
CSV: To create multiple columns, paste in comma-separated text, or type multiple columns using commas between fields.
If you select the HasHeader option, you can use the first row of values as the column heading.
If you deselect this option, the columns names, Col1, Col2 and so forth are used. You can add or change columns names later using Edit Metadata.
TSV: To create multiple columns, paste in tab-separated text, or type multiple columns using tabs between fields.
If you select the HasHeader option, you can use the first row of values as the column heading.
If you deselect this option, the columns names, Col1, Col2 and so forth are used. You can add or change columns names later using Edit Metadata.
ARFF: Paste in an existing ARFF format file. If you are typing values directly, be sure to add the optional header and required attribute fields at the beginning of the data.
For example, the following header and attribute rows could be added to a simple list. The column heading would be
SampleText
.% Title: SampleText.ARFF % Source: Enter Data module @ATTRIBUTE SampleText STRING @DATA \<type first data row here>
SVMLight: Type or paste in values using the SVMLight format.
For example, the following sample represents the first couple lines of the Blood Donation dataset, in SVMight format:
# features are [Recency], [Frequency], [Monetary], [Time] 1 1:2 2:50 3:12500 4:98 1 1:0 2:13 3:3250 4:28
When you run the Enter Data Manually module, these lines are converted to a dataset of columns and index values as follows:
Col1 Col2 Col3 Col4 Labels 0.00016 0.004 0.999961 0.00784 1 0 0.004 0.999955 0.008615 1
Press ENTER after each row, to start a new line.
Be sure to press ENTER after the final row.
If you press ENTER multiple times to add multiple empty trailing rows, the final empty row is removed trimmed, but other empty rows are treated as missing values.
If you create rows with missing values, you can always filter them out later.
Right-click the module and select Run selected to parse the data and load it into your workspace as a dataset.
To view the dataset, click the output port and select Visualize.
Examples
For examples of how this module is used in machine learning, see the Azure AI Gallery:
- Download Data sample: Gets data from the UCI Machine Learning repository and then uses Enter Data Manually to create column names. Sample R code is also provided, which you can use to merge the entered rows with the dataset.
Technical notes
This section contains implementation details, tips, and answers to frequently asked questions.
Regardless of the saved format, data that you enter is implicitly converted to the dataset (Data Table) format for use in experiments. However, data is not persisted as a saved dataset unless you explicitly choose the Save as Dataset option.
If you do not save the data in Enter Data Manually as a dataset, it is removed from the workspace cache when you end the session. However, you can run the experiment again to make the data available.
If you combine the data from Enter Data Manually with another dataset, the combined dataset cannot have two columns with the same name. If there are duplicate column names, a numeric suffix is appended to the column from the right dataset to make the column names unique.
For example, assume that you have two instances of Enter Data Manually that contain the column TestData, and use the Add Columns module to merge them. The column from the left instance of Enter Data Manually would remain as TestData, and the column from the right instance of Enter Data Manually would be renamed TestData (2).