Batch Endpoint Deployment Fails with Status Code 42 for AutoML Model

Kushagra Gupta 0 Reputation points
2025-07-16T12:22:15.7766667+00:00

We are consistently encountering a critical error while deploying a trained Azure AutoML regression model using a batch endpoint created through the Azure Machine Learning Studio (No-Code UI). The deployment fails with the following error:

User's image

Approach Used:

Model Training: Performed via Azure ML Studio (No-Code UI) using AutoML for a regression task.

Batch Deployment: Deployed model for batch inference using Batch Endpoint wizard within Azure ML Studio (No-Code UI).

Data Files:

Training File: Clean, properly formatted, and successfully used during AutoML training.

  **Testing File:** CSV format, accessible in Azure Blob Storage, and successfully used for evaluation and real-time inference.
  
     **Trained Model:** AutoML-generated model with strong regression metrics (e.g., R², RMSE) and works correctly with real-time endpoints.
     
     **Compute Target:** Azure ML Compute Cluster (dedicated, functional, and validated).
     

**
Steps Taken:**

Trained the model using clean training data via AutoML (No-Code UI).

Created batch endpoint and configured the inference job using the same model and test data.

Batch inference job fails consistently with exit code 42.

Re-verified the data, retrained the model, and repeated deployment — the issue persists.

  1. The same model and test file return correct results when run through a real-time endpoint.Approach Used:
    • Model Training: Performed via Azure ML Studio (No-Code UI) using AutoML for a regression task.
    • Batch Deployment: Deployed model for batch inference using Batch Endpoint wizard within Azure ML Studio (No-Code UI).
    • Data Files:
    • Training File: Clean, properly formatted, and successfully used during AutoML training.
    • Testing File: CSV format, accessible in Azure Blob Storage, and successfully used for evaluation and real-time inference.
    • Trained Model: AutoML-generated model with strong regression metrics (e.g., R², RMSE) and works correctly with real-time endpoints.
    • Compute Target: Azure ML Compute Cluster (dedicated, functional, and validated).
    Impact: This issue is blocking critical batch inference workflows for production use. Given that the model and data work flawlessly in real-time mode, the issue appears isolated to the batch endpoint mechanism. A fix or workaround is urgently needed. .Requested Action: Investigate the root cause of the exit code 42.
    • Provide resolution or corrective steps, especially considering the use of the No-Code UI pipeline.
    Let us know if any further logs or configuration inputs are required.
Azure Machine Learning
{count} votes

3 answers

Sort by: Most helpful
  1. Chakaravarthi Rangarajan Bhargavi 1,200 Reputation points MVP
    2025-07-18T04:40:38.9466667+00:00

    Hi Kushagra Gupta,

    Welcome to Microsoft Q&A Community!

    Thank you for reaching out. Let's address your issue step-by-step.

    From your error log:

    Execution failed. User process exited with status code 42.
    File "driver/amIbi_main.py", line 226, in main
    sys.exit(exitcode_candidate)
    SystemExit: 42
    

    This suggests a user-level failure during model training or deployment in Azure Machine Learning, often due to:

    Malformed or incomplete data labels

    Missing expected fields in your training documents

    Runtime errors in the custom script or configuration

    Can you please confirm if the below steps were already performed to Improve Your Custom Extraction Model so that we can check on the different methods.

    1. Review and Refine Labeling Strategy

    Ensure all labeled fields exist across samples. If a field is missing on some pages, it may confuse the model.

    Use consistent bounding boxes – irregular labeling introduces noise.

    Label multiple document instances if the same field appears more than once (e.g., header/footer).

    Reference: Labeling best practices

    1. OCR Layer Not Available in Downloaded JSON?

    If you're exporting from Labeling Tool, it may not include OCR content by default.

    To get the OCR + labeled content, use the Analyze API on your labeled/test document through your trained model and set includeTextDetails=true in your request.

    More info: Build custom model

    1. Auto-labeling Fallback

    If auto-labeling is off or failing, review logs or try resetting the label layout and re-assigning entities manually.

    1. Limited Data? Use Prebuilt + Compose Model

    Combine your Custom Extraction model with a Prebuilt Layout or Read model using Compose Model to extract structure reliably with few samples.

    Reference: Choosing model types

    1. AzureML Error: SystemExit 42

    From the error:

    AzureMLCompute job failed

    Please review the full log in:

    user_logs/std_log_0.txt
    

    Look for:

    Import errors

    File path mismatches

    Environment dependency issues

    Make sure:

    • Your compute instance has necessary permissions and access to training files.
    • The training dataset is correctly mounted or uploaded.
    • When labeling or troubleshooting in the Document Intelligence Studio:
      • Prefer the latest stable API version (v4.0 or v3.1).
      • If the UI seems buggy, try re-uploading clean files and creating a new project.

    Let us know if you need further help!

    Best regards,

    Chakravarthi Rangarajan Bhargavi

    If this answer helped, please click "Accept Answer" and upvote to help others in the community!

    0 comments No comments

  2. Kushagra Gupta 0 Reputation points
    2025-07-18T11:05:23.9866667+00:00

    We are still encountering consistent failures when deploying a regression AutoML model using Batch Endpoints via Azure ML Studio (No-Code UI). The deployment fails with Exit Code 42, even though the same model performs accurately through real-time endpoints using the same test dataset

    Troubleshooting Steps Already Performed:

    1.Confirmed test file format is identical to what was used during training and real-time inference.

    2.Verified compute cluster is active, correctly attached, and accessible.

    3.Retried the batch job with fresh deployments and re-uploaded data – issue persists.

    4.Confirmed the model runs without issue via real-time endpoints on the same test data.

    5.Reviewed logs but no clear pointer apart from the exit code and system exit. Have attached the logs for your kind perusal.


    user_logs/std_log_0.txt:

    Azure Machine Learning Batch Inference Start
    [2025-07-16 11:25:04.298903] No started flag set. Skip creating started flag.
    Azure Machine Learning Batch Inference End
    Cleaning up all outstanding Run operations, waiting 300.0 seconds
    2 items cleaning up...
    Cleanup took 0.1315138339996338 seconds
    Traceback (most recent call last):
      File "driver/amlbi_main.py", line 275, in <module>
        main()
      File "driver/amlbi_main.py", line 226, in main
    

    This issue is blocking our batch inference pipeline. Given that the real-time endpoint executes perfectly with the same model and data, this issue is isolated to the batch endpoint infrastructure.

    0 comments No comments

  3. Vinu Bhambore 0 Reputation points
    2025-08-20T15:27:21.35+00:00

    @MicrosoftSupport: Hello, posting this here because I cannot raise a support ticket anymore. It keeps pointing me to documentations and the docs dont have solution to this issue.

    This is still an active issue. I'm facing the same error on our batch endpoints during inference. Please see screenshot below. Infact I tested it on some old batch endpoint that worked fine until a few weeks ago, and are now failing.

    Also I cannot edit the scoring script to fix the issue, because it is an auto-generated scoring script as I deployed the endpoint using no-code UI based endpoint.

    Please let me know of there is a support email I can send further details to.Screenshot 2025-08-20 at 11.22.05 AM

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.