You're encountering the error:
ValueError: Invalid input data type to parse. Expected: <class 'dict'> but got <class 'azureml_common.parallel_run.mini_batch.MiniBatch'>
This issue typically arises in Azure Machine Learning batch scoring jobs when the scoring script expects a dictionary (dict
) but instead receives a MiniBatch
object.
Possible reasons
The auto-generated scoring script provided by the Azure ML wizard is designed to work with dictionary inputs. However, when using the ParallelRunStep
or ParallelRunConfig
, Azure ML passes a MiniBatch
object to the run()
method. If your script doesn't handle this object correctly, it throws the error you're seeing.
This mismatch can occur if:
- The scoring script was modified or regenerated incorrectly.
- Azure ML SDK or environment versions changed (e.g., MLflow 2.0 introduced stricter schema validation ).
- The job configuration was altered to use parallel processing without updating the script accordingly.
Suggestion
- Could you recreate the batch endpoints to automatically generate the correct scoring script and environment.
- Modify the scoring script manually to convert minibatches to dictionary value (Tedious and need idea of input features)
def run(mini_batch): results = [] for file_path in mini_batch: # Load and process the file with open(file_path, 'r') as f: data = f.read() # Convert to dict if needed result = process_data(data) # Your custom logic results.append(result) return results import json def process_data(data): try: # Parse the input string as JSON input_dict = json.loads(data) # Extract features expected by your model features = [ input_dict.get("feature1", 0), input_dict.get("feature2", 0), input_dict.get("feature3", 0) ] # Predict using the global model loaded in init() prediction = model.predict([features]) return prediction[0] except Exception as e: return f"Error: {str(e)}"
Hope it helps.
Thank you