Hello,
We are sorry that have not heard from you. Azure ML Designer officially supports Group data into Bins with custom edge. I have highlighted the essential part below.
- Add the Group Data Into Bins module to your pipeline in the designer. You can find this module in the category Data Transformation.
- Connect the dataset that has numerical data to bin. Quantization can be applied only to columns that contain numeric data.
If the dataset contains non-numeric columns, use the Select Columns in Dataset module to select a subset of columns to work with.
- Specify the binning mode. The binning mode determines other parameters, so be sure to select the Binning mode option first. The following types of binning are supported:
Quantiles: The quantile method assigns values to bins based on percentile ranks. This method is also known as equal height binning.
Equal Width: With this option, you must specify the total number of bins. The values from the data column are placed in the bins such that each bin has the same interval between starting and ending values. As a result, some bins might have more values if data is clumped around a certain point.
Custom Edges: You can specify the values that begin each bin. The edge value is always the lower boundary of the bin.
For example, assume you want to group values into two bins. One will have values greater than 0, and one will have values less than or equal to 0. In this case, for bin edges, you enter 0 in Comma-separated list of bin edges. The output of the module will be 1 and 2, indicating the bin index for each row value. Note that the comma-separated value list must be in an ascending order, such as 1, 3, 5, 7.
Note
Entropy MDL mode is defined in Studio (classic) and there's no corresponding open source package which can be leveraged to support in Designer yet.
- If you're using the Quantiles and Equal Width binning modes, use the Number of bins option to specify how many bins, or quantiles, you want to create.
- For Columns to bin, use the column selector to choose the columns that have the values you want to bin. Columns must be a numeric data type.
The same binning rule is applied to all applicable columns that you choose. If you need to bin some columns by using a different method, use a separate instance of the Group Data into Bins module for each set of columns.
Warning
If you choose a column that's not an allowed type, a runtime error is generated. The module returns an error as soon as it finds any column of a disallowed type. If you get an error, review all selected columns. The error does not list all invalid columns.
- For Output mode, indicate how you want to output the quantized values:
Append: Creates a new column with the binned values, and appends that to the input table.
Inplace: Replaces the original values with the new values in the dataset.
ResultOnly: Returns just the result columns.
- If you select the Quantiles binning mode, use the Quantile normalization option to determine how values are normalized before sorting into quantiles. Note that normalizing values transforms the values but doesn't affect the final number of bins.
The following normalization types are supported:
Percent: Values are normalized within the range [0,100].
PQuantile: Values are normalized within the range [0,1].
QuantileIndex: Values are normalized within the range [1,number of bins].
- If you choose the Custom Edges option, enter a comma-separated list of numbers to use as bin edges in the Comma-separated list of bin edges text box.
The values mark the point that divides bins. For example, if you enter one bin edge value, two bins will be generated. If you enter two bin edge values, three bins will be generated.
The values must be sorted in the order that the bins are created, from lowest to highest.
- Select the Tag columns as categorical option to indicate that the quantized columns should be handled as categorical variables.
- Submit the pipeline.
Regards,
Yutong