How is Data Drift Magnitude and Data Drift Contribution of each feature calculated in Azure Machine Learning (Azure ML)?

David Z 6 Reputation points
2022-03-08T16:53:33.733+00:00

On https://learn.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets?tabs=python,

it says :

Data drift magnitude:
A percentage of drift between the baseline and target dataset over time. Ranging from 0 to 100, 0 indicates identical datasets and 100 indicates the Azure Machine Learning data drift model can completely tell the two datasets apart. Noise in the precise percentage measured is expected due to machine learning techniques being used to generate this magnitude.

Top drifting features:
Shows the features from the dataset that have drifted the most and are therefore contributing the most to the Drift Magnitude metric. Due to covariate shift, the underlying distribution of a feature does not necessarily need to change to have relatively high feature importance.

My questions are:

  1. How is data drift magnitude calculated?
  2. How is the data drift contribution of each feature calculated?
  3. In the documentation, there are cases where the Wasserstein distance is low, yet the contribution of the feature is significant. Could you please clarify why that is the case?

Thank you in advance!

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,560 questions
{count} vote

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,611 Reputation points
    2022-03-31T08:51:43.177+00:00

    @David Z Thanks, For internal product details please share details of your experiment and issue from the ml.azure.com portal for a service engineer to lookup the issue from the back-end? This option is available from the top right hand corner of the portal by clicking the smiley face, Please select the option Microsoft can email you about the feedback along with a screen shot so our service team can lookup and advise through email.

    0 comments No comments