Azure AutoML Batch Inference: Save Predictions in Original Input File Format

Kushagra Gupta 0 Reputation points
2025-06-02T11:41:16.4666667+00:00

I have trained and registered a price prediction model using Azure AutoML (via the drag-and-drop Designer interface). My test file (stored in Azure Blob Storage) has the exact same schema and column order as the training data, except that it is missing the target 'price' column.

I used the no-code Batch Endpoint functionality to perform batch inference. It automatically generated a score.py script and a custom environment. Even though I explicitly selected the "Append column" option, the output is still saved as a new predictions.csv file with the following structure:

Column1 Column2

['Testing_File_Without_Price.csv' '1234.5678']

...

(and so on)

What I want instead is:

The predicted price values should be appended as a new column to the original test file (i.e., restoring the price column).

The resulting updated file should be saved back to Azure Blob Storage in the same format and structure as the original input file.

Here’s what I’ve already tried without success:

Designer Pipeline: Fails due to custom environment requirements (AutoML model uses MLflow which requires a custom environment not currently supported in Designer).

Manual Batch Endpoint Setup: Attempted using a custom score.py and environment manually, but it crashes during execution.

How can I correctly perform batch inference where the scored 'price' column is appended to the original input file and the output is saved back to Blob Storage in the same format?

I have trained and registered a price prediction model using Azure AutoML (via the drag-and-drop Designer interface). My test file (stored in Azure Blob Storage) has the exact same schema and column order as the training data, except that it is missing the target 'price' column.

I used the no-code Batch Endpoint functionality to perform batch inference. It automatically generated a score.py script and a custom environment. Even though I explicitly selected the "Append column" option, the output is still saved as a new predictions.csv file with the following structure:

Column1 Column2

['Testing_File_Without_Price.csv' '1234.5678']

...

(and so on)

What I want instead is:

The predicted price values should be appended as a new column to the original test file (i.e., restoring the price column).

The resulting updated file should be saved back to Azure Blob Storage in the same format and structure as the original input file.

Here’s what I’ve already tried without success:

Designer Pipeline: Fails due to custom environment requirements (AutoML model uses MLflow which requires a custom environment not currently supported in Designer).

Manual Batch Endpoint Setup: Attempted using a custom score.py and environment manually, but it crashes during execution.

How can I correctly perform batch inference where the scored 'price' column is appended to the original input file and the output is saved back to Blob Storage in the same format?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,333 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2025-06-02T18:08:18.5366667+00:00

    Hello Kushagra !

    Thank you for posting on Microsoft Learn.

    To append predicted values from Azure AutoML batch inference to your original input CSV file, and save the updated file back to Azure Blob Storage, you will need to slightly bypass the default behavior of no-code batch inference. Here's a working approach using custom scoring script with batch inference pipeline, fully aligned with MLflow and your custom environment.

    Azure AutoML no-code batch endpoints generate outputs in a prediction metadata format (filename + prediction array), not your original structure.

    Even with "Append column" selected, it just appends the value to a file reference, not to the actual dataframe you uploaded.

    You need to write a custom score.py that:

    • Loads the input CSV (no target column)
    • Loads your AutoML model via MLflow
    • Generates predictions
    • Appends the 'price' column to the original dataframe
    • Saves the result to the expected output path (for example ./outputs/)

    Then package this in a batch pipeline job with custom environment.


  2. JAYA SHANKAR G S 4,035 Reputation points Microsoft External Staff Moderator
    2025-06-05T09:28:28.8333333+00:00

    Hello @Kushagra Gupta ,

    When you deploy batch endpoint with default scoring script it just gives you prediction along with input file name.

    So, for custom output you need to customize the batch scoring script.

    For your case use below sample code.

    
    def init():
        global model
        global output_path
    
        output_path = os.environ["AZUREML_BI_OUTPUT_PATH"]
        model_path = os.environ["AZUREML_MODEL_DIR"]
        model_file = glob.glob(f"{model_path}/*/*.pkl")[-1]
    
        with open(model_file, "rb") as file:
            model = pickle.load(file)
    
    
    def run(mini_batch: List[str]):
        for file_path in mini_batch:
            data = pd.read_csv(file_path)
            pred = model.predict(data)
    
            data["prediction"] = pred
    
            output_file_name = Path(file_path).stem
            output_file_path = os.path.join(output_path, output_file_name + ".csv")
            data.to_csv(output_file_path)
    
        return mini_batch
    

    Here, it read the csv file from input batch then adds prediction column with values and writing it as csv file.

    You alter above code according to the requirement and also you can refer this github notebook for custom batch output.

    Please try above and let us know in comments if you face any error or having any query.

    Thank you


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.