Why do I always get this error when I submit a batch scoring job in Azure ML? ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>

Question

Why do I always get this error when I submit a batch scoring job in Azure ML? ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>

Adam Goldammer 25

I keep getting this error when I try running a batch scoring job in Azure ML:

ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>

It's strange because I had successfully run several jobs previously and then all of a sudden I started to encounter this. Please note that I am using the auto-generated scoring script and environment provided via the Azure ML wizard.

I am self-taught on a lot of this so am leaning heavily on the Azure ML UI. My belief is that I could work around it by using a custom scoring script and environment but so far those attempts have failed.

Adam Goldammer 25 Reputation points

2025-06-06T02:37:53.96+00:00

I guess I don't understand when you say "The auto-generated scoring script provided by the Azure ML wizard is designed to work with dictionary inputs. However, when using the ParallelRunStep or ParallelRunConfig, Azure ML passes a MiniBatch object to the run() method." Why would the wizard be designed to work with dictionary inputs if the UI is clearly intended to work with mini-batches?

Is there a way to avoid the ParallelRunStep or ParallelRunConfig, if that is the source of the problem?
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-06-06T12:11:17.4066667+00:00

Hi Adam Goldammer

It would be possible to adopt single step and avoid parallel run step and run config from SDK.

Please check the pipeline documentation.

https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/batch/deploy-pipelines/batch-scoring-with-preprocessing/sdk-deploy-and-test.ipynb

https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-model-deployments?view=azureml-api-2&tabs=python

But I would strongly recommend you to just create a new Batch endpoint and invoke the dataset from there before going SDK way to create and consume batch endpoints.

Thank you.
Adam Goldammer 25 Reputation points

2025-06-06T16:51:30.0566667+00:00

Re: "I would strongly recommend you to just create a new Batch endpoint and invoke the dataset from there"...

I have tried creating a new batch endpoint multiple times but every time the inference job fails. In my latest deploy attempt, I tried using a custom scoring script and environment and it's like Azure ML just ignores it and continues to auto-generate.
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-06-10T11:00:42.7233333+00:00

Hi Adam Goldammer

I have requested some to connect offline and take context of the issue.

Could you share the model, autogenerate scoring script and env details in private message.

I suspect there might have been recent regression on environment packages. You might need to change certain packages to previous stable version.

Appreciate your inputs so far.

Thank you
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-06-11T16:59:01.8033333+00:00

Hi Adam Goldammer

We have not heard from you. Hope the pointers shared were helpful to you.

Thank you.

2 answers

Your answer

Adam Goldammer 25 Reputation points

2025-06-06T02:37:53.96+00:00

I guess I don't understand when you say "The auto-generated scoring script provided by the Azure ML wizard is designed to work with dictionary inputs. However, when using the ParallelRunStep or ParallelRunConfig, Azure ML passes a MiniBatch object to the run() method." Why would the wizard be designed to work with dictionary inputs if the UI is clearly intended to work with mini-batches?

Is there a way to avoid the ParallelRunStep or ParallelRunConfig, if that is the source of the problem?
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-06-06T12:11:17.4066667+00:00

Hi Adam Goldammer

It would be possible to adopt single step and avoid parallel run step and run config from SDK.

Please check the pipeline documentation.

https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/batch/deploy-pipelines/batch-scoring-with-preprocessing/sdk-deploy-and-test.ipynb

https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-model-deployments?view=azureml-api-2&tabs=python

But I would strongly recommend you to just create a new Batch endpoint and invoke the dataset from there before going SDK way to create and consume batch endpoints.

Thank you.
Adam Goldammer 25 Reputation points

2025-06-06T16:51:30.0566667+00:00

Re: "I would strongly recommend you to just create a new Batch endpoint and invoke the dataset from there"...

I have tried creating a new batch endpoint multiple times but every time the inference job fails. In my latest deploy attempt, I tried using a custom scoring script and environment and it's like Azure ML just ignores it and continues to auto-generate.
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-06-10T11:00:42.7233333+00:00

Hi Adam Goldammer

I have requested some to connect offline and take context of the issue.

Could you share the model, autogenerate scoring script and env details in private message.

I suspect there might have been recent regression on environment packages. You might need to change certain packages to previous stable version.

Appreciate your inputs so far.

Thank you
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-06-11T16:59:01.8033333+00:00

Hi Adam Goldammer

We have not heard from you. Hope the pointers shared were helpful to you.

Thank you.

Answer 1

Hi Adam Goldammer

You're encountering the error:

ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>

This issue typically arises in Azure Machine Learning batch scoring jobs when the scoring script expects a dictionary (dict) but instead receives a MiniBatch object.

Possible reasons

The auto-generated scoring script provided by the Azure ML wizard is designed to work with dictionary inputs. However, when using the ParallelRunStep or ParallelRunConfig, Azure ML passes a MiniBatch object to the run() method. If your script doesn't handle this object correctly, it throws the error you're seeing.

This mismatch can occur if:

The scoring script was modified or regenerated incorrectly.
Azure ML SDK or environment versions changed (e.g., MLflow 2.0 introduced stricter schema validation ).
The job configuration was altered to use parallel processing without updating the script accordingly.

Suggestion

Could you recreate the batch endpoints to automatically generate the correct scoring script and environment.

Modify the scoring script manually to convert minibatches to dictionary value (Tedious and need idea of input features)

   def run(mini_batch):
       results = []
       for file_path in mini_batch:
           # Load and process the file
           with open(file_path, 'r') as f:
               data = f.read()
           # Convert to dict if needed
           result = process_data(data)  # Your custom logic
           results.append(result)
       return results
   
   import json
   
   def process_data(data):
       try:
           # Parse the input string as JSON
           input_dict = json.loads(data)
   
           # Extract features expected by your model
           features = [
               input_dict.get("feature1", 0),
               input_dict.get("feature2", 0),
               input_dict.get("feature3", 0)
           ]
   
           # Predict using the global model loaded in init()
           prediction = model.predict([features])
           return prediction[0]
       except Exception as e:
           return f"Error: {str(e)}"

Reference - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-automl-endpoint?view=azureml-api-2&tabs=Studio#deploy-from-azure-machine-learning-studio-and-no-code

Hope it helps.

Thank you

Answer 2

Hi there Adam Goldammer

Thanks for using QandA platform

this often happens when the input handling in the script isn't aligned with the batch job config.

Try this open the run() function in your scoring script and check how inputs are being processed. If you're using the ParallelRunStep, the input is a MiniBatch object, so your script should handle it accordingly If the script expects JSON lines, use ParallelRunConfig with appropriate input_data_type.

Since this issue started suddenly, it might be due to changes in how the job is submitted or a mismatch in the input data type configuration/

If this helps kindly accpet the answer thanks much.

Share via

Why do I always get this error when I submit a batch scoring job in Azure ML? ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>

2 answers

Your answer