Feature engineering

Feature engineering is a machine learning approach that creates new variables by analyzing the available data.

Embeddings

ML models can learn how to represent data. For example, a deep learning model can learn embeddings during training, a lower dimensional representation of the data passing through the model.

For more information on embeddings, refer to Understanding Embeddings in Azure OpenAI Service

Feature Extraction

A deep learning model can automatically learn an embedding or automatically engineer features from input data. This process can be applied to any input data (in the correct format) to get the associated embedding or data representation from the trained deep learning model. This process is known as feature extraction and can be useful to enrich or augment existing data, or to cluster the data to group together similar records.

Here are some tools that offer feature extraction capabilities.

Tool Description
Keras feature extraction notebook Shows how an existing model can be used to extract images features for clustering. For more information, see the Data Discovery playbook
InceptionV3 model pre-trained on imagenet data This model is used to extract data representations or features for images the model has never seen before, which may then be used to cluster the data. The output features are taken from the second last layer of the model where the last layer would typically represent the ImageNet class prediction.