Why do I always get this error when I submit a batch scoring job in Azure ML? ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>

Adam Goldammer 25 Reputation points
2025-06-05T14:26:42.6133333+00:00

I keep getting this error when I try running a batch scoring job in Azure ML:

ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>

It's strange because I had successfully run several jobs previously and then all of a sudden I started to encounter this. Please note that I am using the auto-generated scoring script and environment provided via the Azure ML wizard.

I am self-taught on a lot of this so am leaning heavily on the Azure ML UI. My belief is that I could work around it by using a custom scoring script and environment but so far those attempts have failed.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,340 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator
    2025-06-05T16:40:12.6066667+00:00

    Hi Adam Goldammer

    You're encountering the error:

    ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>
    

    This issue typically arises in Azure Machine Learning batch scoring jobs when the scoring script expects a dictionary (dict) but instead receives a MiniBatch object.

    Possible reasons

    The auto-generated scoring script provided by the Azure ML wizard is designed to work with dictionary inputs. However, when using the ParallelRunStep or ParallelRunConfig, Azure ML passes a MiniBatch object to the run() method. If your script doesn't handle this object correctly, it throws the error you're seeing.

    This mismatch can occur if:

    • The scoring script was modified or regenerated incorrectly.
    • Azure ML SDK or environment versions changed (e.g., MLflow 2.0 introduced stricter schema validation ).
    • The job configuration was altered to use parallel processing without updating the script accordingly.

    Suggestion

    1. Could you recreate the batch endpoints to automatically generate the correct scoring script and environment.
    2. Modify the scoring script manually to convert minibatches to dictionary value (Tedious and need idea of input features)
         def run(mini_batch):
             results = []
             for file_path in mini_batch:
                 # Load and process the file
                 with open(file_path, 'r') as f:
                     data = f.read()
                 # Convert to dict if needed
                 result = process_data(data)  # Your custom logic
                 results.append(result)
             return results
         
         import json
         
         def process_data(data):
             try:
                 # Parse the input string as JSON
                 input_dict = json.loads(data)
         
                 # Extract features expected by your model
                 features = [
                     input_dict.get("feature1", 0),
                     input_dict.get("feature2", 0),
                     input_dict.get("feature3", 0)
                 ]
         
                 # Predict using the global model loaded in init()
                 prediction = model.predict([features])
                 return prediction[0]
             except Exception as e:
                 return f"Error: {str(e)}"
         
      

    Reference - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-automl-endpoint?view=azureml-api-2&tabs=Studio#deploy-from-azure-machine-learning-studio-and-no-code

    Hope it helps.

    Thank you

    0 comments No comments

  2. Azar 29,520 Reputation points MVP Volunteer Moderator
    2025-06-05T18:40:10.4533333+00:00

    Hi there Adam Goldammer

    Thanks for using QandA platform

    this often happens when the input handling in the script isn't aligned with the batch job config.

    Try this open the run() function in your scoring script and check how inputs are being processed. If you're using the ParallelRunStep, the input is a MiniBatch object, so your script should handle it accordingly If the script expects JSON lines, use ParallelRunConfig with appropriate input_data_type.

    Since this issue started suddenly, it might be due to changes in how the job is submitted or a mismatch in the input data type configuration/

    If this helps kindly accpet the answer thanks much.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.