Enabling Model Data Collector for ACI through az ml deploy

Christian Hansen 21 Reputation points
2022-03-31T14:09:39.63+00:00

Hello,

I want to create a release pipeline in DevOps that deploys a model to ACI, I have a PowerShell script that deploys using the az ml package. However even though I set the model data collector flag in the deploy command, the endpoint does not collect the inputs. I have tried deploying the same model, using the exact same score.py, but deployed programmatically through azureml.core.webservice and it worked just fine with collection data, so the problem is elsewhere.

I have deployed my model using the following command in a PowerShell script (paraphrasing some of the arguments):

az ml model deploy -n $name --model $model -g $resource-group -w $workspace --es $entry_script_path --cf $conda_file_path --dc $deployment_config_path --md True --overwrite -v

Notice that I have set the --md argument to True, which is the model data collector flag.

And in my score.py, I have imported the ModelDataCollector, initialized it and I call collect. I have something like this:

from azureml.monitoring import ModelDataCollector

def init():
global model, scaler, input_name, label_name, inputs_dc, prediction_dc

# variables to monitor model input and output data
inputs_dc = ModelDataCollector("model", designation="inputs")

input_sample = pd.DataFrame(sample})
@input_schema('data',PandasParameterType(input_sample))
@output_schema(NumpyParameterType(np.array([0])))
def run(data):
try:
inputs_dc.collect(data)

model inference

result = model.predict(data)
return {"result": result.tolist()}
except Exception as e:
result = e
return {"result": result}

However I deploy my model using the PowerShell script, and everything goes fine. The endpoint is callable, healthy and outputs correct results. But the data does not get saved in the storage account or in application insights (which I have enabled, and can see is enabled on the endpoint as well.

In my deployment config I see the following:

Data collection is not enabled. Set environment variable ML_MODEL_DC_STORAGE_ENABLED to 'true' to enable.

How do I go about that? I have enabled Data collection in my deployment command, why does it not work, and how can I set the environmental variable?
I tried adding it to my conda_env.yml which is part of the deployment command "--cf $conda_file_path" like so:

name: my_env
dependencies:

  • python=3.6.2
  • pip:
  • numpy
  • onnxruntime
  • joblib
  • azureml-core~=1.10.0
  • azureml-defaults~=1.10.0
  • scikit-learn==0.22.2.post1
  • inference-schema
  • inference-schema[numpy-support]
  • azureml-monitoring
    channels:
  • anaconda
  • conda-forge
    variables:
    ML_MODEL_DC_STORAGE_ENABLED = true

But that just produces another error. How do I solve this problem?

Azure Container Instances
Azure Container Instances
An Azure service that provides customers with a serverless container experience.
580 questions
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,249 questions
{count} votes