In this article, you learn how to get explanations for automated machine learning (automated ML) models in Azure Machine Learning using the Python SDK. Automated ML helps you understand feature importance of the models that are generated.
All SDK versions after 1.0.85 set model_explainability=True by default. In SDK version 1.0.85 and earlier versions users need to set model_explainability=True in the AutoMLConfig object in order to use model interpretability.
In this article, you learn how to:
Perform interpretability during training for best model or any model.
Enable visualizations to help you see patterns in data and explanations.
Implement interpretability during inference or scoring.
Prerequisites
Interpretability features. Run pip install azureml-interpret to get the necessary package.
This feature is currently in public preview. This preview version is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities.
When you compute model explanations and visualize them, you're not limited to an existing model explanation for an AutoML model. You can also get an explanation for your model with different test data. The steps in this section show you how to compute and visualize engineered feature importance based on your test data.
Initialize the Mimic Explainer for feature importance
To generate an explanation for automated ML models, use the MimicWrapper class. You can initialize the MimicWrapper with these parameters:
The explainer setup object
Your workspace
A surrogate model to explain the fitted_model automated ML model
The MimicWrapper also takes the automl_run object where the engineered explanations will be uploaded.
from azureml.interpret import MimicWrapper
# Initialize the Mimic Explainer
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator,
explainable_model=automl_explainer_setup_obj.surrogate_model,
init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,
features=automl_explainer_setup_obj.engineered_feature_names,
feature_maps=[automl_explainer_setup_obj.feature_map],
classes=automl_explainer_setup_obj.classes,
explainer_kwargs=automl_explainer_setup_obj.surrogate_model_params)
Use Mimic Explainer for computing and visualizing engineered feature importance
You can call the explain() method in MimicWrapper with the transformed test samples to get the feature importance for the generated engineered features. You can also sign in to Azure Machine Learning studio to view the explanations dashboard visualization of the feature importance values of the generated engineered features by automated ML featurizers.
For models trained with automated ML, you can get the best model using the get_output() method and compute explanations locally. You can visualize the explanation results with ExplanationDashboard from the raiwidgets package.
Use Mimic Explainer for computing and visualizing raw feature importance
You can call the explain() method in MimicWrapper with the transformed test samples to get the feature importance for the raw features. In the Machine Learning studio, you can view the dashboard visualization of the feature importance values of the raw features.
In this section, you learn how to operationalize an automated ML model with the explainer that was used to compute the explanations in the previous section.
Register the model and the scoring explainer
Use the TreeScoringExplainer to create the scoring explainer that'll compute the engineered feature importance values at inference time. You initialize the scoring explainer with the feature_map that was computed previously.
Save the scoring explainer, and then register the model and the scoring explainer with the Model Management Service. Run the following code:
from azureml.interpret.scoring.scoring_explainer import TreeScoringExplainer, save
# Initialize the ScoringExplainer
scoring_explainer = TreeScoringExplainer(explainer.explainer, feature_maps=[automl_explainer_setup_obj.feature_map])
# Pickle scoring explainer locally
save(scoring_explainer, exist_ok=True)
# Register trained automl model present in the 'outputs' folder in the artifacts
original_model = automl_run.register_model(model_name='automl_model',
model_path='outputs/model.pkl')
# Register scoring explainer
automl_run.upload_file('scoring_explainer.pkl', 'scoring_explainer.pkl')
scoring_explainer_model = automl_run.register_model(model_name='scoring_explainer', model_path='scoring_explainer.pkl')
Create the conda dependencies for setting up the service
Next, create the necessary environment dependencies in the container for the deployed model. Please note that azureml-defaults with version >= 1.0.45 must be listed as a pip dependency, because it contains the functionality needed to host the model as a web service.
from azureml.core.conda_dependencies import CondaDependencies
azureml_pip_packages = [
'azureml-interpret', 'azureml-train-automl', 'azureml-defaults'
]
myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas', 'numpy', 'py-xgboost<=0.80'],
pip_packages=azureml_pip_packages,
pin_sdk_version=True)
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
with open("myenv.yml","r") as f:
print(f.read())
Create the scoring script
Write a script that loads your model and produces predictions and explanations based on a new batch of data.
%%writefile score.py
import joblib
import pandas as pd
from azureml.core.model import Model
from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations
def init():
global automl_model
global scoring_explainer
# Retrieve the path to the model file using the model name
# Assume original model is named automl_model
automl_model_path = Model.get_model_path('automl_model')
scoring_explainer_path = Model.get_model_path('scoring_explainer')
automl_model = joblib.load(automl_model_path)
scoring_explainer = joblib.load(scoring_explainer_path)
def run(raw_data):
data = pd.read_json(raw_data, orient='records')
# Make prediction
predictions = automl_model.predict(data)
# Setup for inferencing explanations
automl_explainer_setup_obj = automl_setup_model_explanations(automl_model,
X_test=data, task='classification')
# Retrieve model explanations for engineered explanations
engineered_local_importance_values = scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform)
# Retrieve model explanations for raw explanations
raw_local_importance_values = scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform, get_raw=True)
# You can return any data type as long as it is JSON-serializable
return {'predictions': predictions.tolist(),
'engineered_local_importance_values': engineered_local_importance_values,
'raw_local_importance_values': raw_local_importance_values}
Deploy the service
Deploy the service using the conda file and the scoring file from the previous steps.
from azureml.core.webservice import Webservice
from azureml.core.webservice import AciWebservice
from azureml.core.model import Model, InferenceConfig
from azureml.core.environment import Environment
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1,
tags={"data": "Bank Marketing",
"method" : "local_explanation"},
description='Get local explanations for Bank marketing test data')
myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="score_local_explain.py", environment=myenv)
# Use configs and models generated above
service = Model.deploy(ws,
'model-scoring',
[scoring_explainer_model, original_model],
inference_config,
aciconfig)
service.wait_for_deployment(show_output=True)
Inference with test data
Inference with some test data to see the predicted value from AutoML model, currently supported only in Azure Machine Learning SDK. View the feature importances contributing towards a predicted value.
if service.state == 'Healthy':
# Serialize the first row of the test data into json
X_test_json = X_test[:1].to_json(orient='records')
print(X_test_json)
# Call the service to get the predictions and the engineered explanations
output = service.run(X_test_json)
# Print the predicted value
print(output['predictions'])
# Print the engineered feature importances for the predicted value
print(output['engineered_local_importance_values'])
# Print the raw feature importances for the predicted value
print('raw_local_importance_values:\n{}\n'.format(output['raw_local_importance_values']))
Visualize to discover patterns in data and explanations at training time
You can visualize the feature importance chart in your workspace in Azure Machine Learning studio. After your AutoML run is complete, select View model details to view a specific run. Select the Explanations tab to see the visualizations in the explanation dashboard.
For more information on the explanation dashboard visualizations and specific plots, please refer to the how-to doc on interpretability.
Next steps
For more information about how you can enable model explanations and feature importance in areas other than automated ML, see more techniques for model interpretability.
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.
Learn how to get explanations for how your machine learning model determines feature importance and makes predictions when using the Azure Machine Learning SDK.