Apply Filter

Article
05/06/2019

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
Learn more about Azure Machine Learning.

ML Studio (classic) documentation is being retired and may not be updated in the future.

Applies a filter to specified columns of a dataset

Category: Data Transformation / Filter

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Module overview

This article descries how to use the Apply Filter module in Machine Learning Studio (classic), to transform a column of values by applying a previously defined filter. Filters are used in digital signal processing to reduce noise or highlight a pattern. Thus, the values that you transform are always numeric, and typically represent some kind of audio or visual signal.

Tip

Are you looking for a different type of filter? Studio (classic) provides these modules for sampling data, getting a subset of data, removing bad values, or creating test and training sets: Split Data, Clean Missing Data, Partition and Sample, Apply SQL Transformation, Clip Values. If you need to filter data as you read it from a source, see Import Data. The options depend on the source type.

After determining which type of filter is best for your data source, you specify the parameters, and use Apply Filter to transform the dataset. Because the design of filters is separate from the process of applying a filter, filters are reusable. For example, if you frequently work with data used for forecasting, you might design several types of moving average filters to train and compare multiple models. You can also save the filter to apply to other experiments or to different datasets.

How to configure Apply Filter

Add the Apply Filter module to your experiment. You can find the IIR filter module under Data Transformation, in the Filters category.
To the right-hand input, connect a dataset that contains numeric values to one input.
To the left-hand input, connect an existing filter. You can re-use a saved filter, or you can configure a filter by using one of the following filter modules: Threshold Filter, Moving Average Filter, Median Filter, IIR Filter, FIR Filter, User-Defined Filter.
In the Properties pane of Apply Filter, click Launch column selector and choose the columns to which the filter should be applied.
Run the experiment, or right-click Apply Filter and click Run selected.

Results

The output contains only the data in the selected columns, transformed by applying the specified predefined mathematical transformation.

If you want see other columns in the dataset, you can use the Add Columns module to merge the original and filtered datasets.

Note

The values in the original column have not been deleted or overwritten, and are still available in the experiment for reference. However, the output of the filter usually more useful for modeling.

Examples

For examples of how filters are used in machine learning, see the Azure AI Gallery:

Filters: Demonstrates all filter types, using an engineered waveform dataset.

Technical notes

This section contains implementation details, tips, and answers to frequently asked questions.

The Apply Filter module binds the specified type of filter to the selected columns. If you need to apply different types of filters to different columns, you should use Select Columns in Dataset to isolate the columns and apply different filter types in separate workflows. For more information, see Select Columns in Dataset.
The filters do not pass through data columns that are not affected by the filter. That is, the output of Apply Filter contains only the transformed numeric values. However, you can use the Add Columns module to join transformed values with the source dataset.

Filter periods

The filter period is determined in part by the filter type, as follows:

For finite impulse response (FIR), simple moving average, and triangular moving average filters, the filter period is finite.
For infinite impulse response (IIR), exponential moving average, and cumulative moving average filters, the filter period is infinite.
For threshold filters, the filter period is always 1.
For median filters, regardless of the filter period, NaNs and missing values in the input signal do not produce new NaNs in output.

Missing values

This section describes the behavior when missing values are encountered, by filter type. In general, when a filter encounters a NaN or a missing value in the input dataset, the output dataset becomes spoiled with NaNs for some next number of samples, depending on the filter period. This has the following consequences:

FIR, simple moving average, or triangular moving average filters have a finite period. As a result, any missing value will be followed by a number of NaNs equal to the filter order minus one.
IIR, exponential moving average, or cumulative moving average filters have an infinite period. As a result, after the first missing value is encountered, NaNs will continue to propagate indefinitely.
In a threshold filter, the period of a threshold filter is 1. As a result, missing values and NaNs do not propagate.
For median filters, NaNs and missing values encountered in the input dataset do not produce new NaNs in output, regardless of the filter period.

Expected inputs

Name	Type	Description
Filter	IFilter interface	Filter implementation
Dataset	Data Table	Input dataset

For a list of errors specific to Studio (classic) modules, see Machine Learning Error codes.

For a list of API exceptions, see Machine Learning REST API Error Codes.

Module parameters

Name	Range	Type	Default	Description
Column set	Any	ColumnSelection	NumericAll	Select the columns to filter

Output

Name	Type	Description
Results dataset	Data Table	Output dataset