AzureMLCompute job failed 500: [REDACTED]: Some(true) Error while creating custom environment in azure ml

Sena Aslan 0 Reputation points
2024-07-08T14:00:04.7766667+00:00

Hello everyone,

I am trying to create a custom environment to train and deploy a catboost regression model in azure ml SDK. However when I submit the job, it's running for a while and then throws "AzureMLCompute job failed 500: [REDACTED]: Some(true)" error. When I check the logs for the job, I couldn't find anything to solve the problem. Actually there was nothing in the logs. Can you please help me identify and solve the problem ?

Here is my environment definition, and the job to create the env.

channels:
  - conda-forge
dependencies:
  - python=3.10
  - numpy=1.21.2
  - pip=21.2.4
  - scikit-learn=1.0.2
  - scipy=1.7.1
  - pandas~=1.5.3
  - catboost
  - pip:
      - inference-schema[numpy-support]~=1.5.0
      - packaging==23.2
      - cloudpickle==2.2.1
      - mlflow==2.8.0
      - mlflow-skinny==2.8.0
      - azureml-mlflow==1.51.0
      - psutil==5.8.0
      - pyyaml==6.0.1
      - tqdm>=4.59,<4.60
      - ipykernel~=6.0
      - azureml-inference-server-http
      - azureml-core
      - azureml-dataset-runtime[fuse]
      - azureml-fsspec
name: model-env
import os
#create a source folder for the script
train_src_dir = "./pipeline_src"
os.makedirs(train_src_dir, exist_ok=True)


from azure.ai.ml.entities import Environment
#create and register this custom environment in your workspace:
custom_env_name = "model-env"
custom_job_env = Environment(
    name=custom_env_name,
    description="Custom environment for catboost reg",
    tags={"scikit-learn": "1.0.2"},
    conda_file=os.path.join(train_src_dir, "conda.yaml"),
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)
custom_job_env = ml_client.environments.create_or_update(custom_job_env)

print(
    f"Environment with name {custom_job_env.name} is registered to workspace, the environment version is {custom_job_env.version}")
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,692 questions
{count} votes