How to save or log pytorch model using MLflow?

Rishavraj Mandal 30 Reputation points
2023-05-24T17:10:11.7533333+00:00

I am in main.py at the root directory and at main.py calling the model script to train the model. The directory looks like this

User's image

But I am getting an error while saving the code paths saying the directory is not found.


# Registering the model to the workspace
    mlflow.pytorch.log_model(
        pytorch_model= model,
        registered_model_name="use-case1-model",
        artifact_path="use-case1-model",
        input_example=df[['Title', 'Attributes']],
        conda_env=os.path.join("./dependencies", "conda.yaml"),
        code_paths="./models"
        ]
        
    )

    # Saving the model to a file
    mlflow.pytorch.save_model(
        pytorch_model= model,
        conda_env=os.path.join("./dependencies", "conda.yaml"),
        input_example=df[['Title', 'Attributes']],
        path=os.path.join(args.model, "use-case1-model"),
        code_paths="./models"
    )

Qu`estion 1: is there a need to save the code paths and extra files parameter in my case?

Question 2: What's the right way to save the code paths directory for code_paths and extra_files parameters?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,333 questions
Azure
Azure
A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.
1,408 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,966 Reputation points Moderator
    2023-05-24T22:11:01.49+00:00

    Hello @Rishavraj Mandal

    Thanks for reaching out to us, I will suggest you try below workflow:

    To save or log a PyTorch model using MLflow, you can use the mlflow.pytorch.log_model or mlflow.pytorch.save_model functions.

    Regarding your first question, the code_paths and extra_files parameters are optional and are used to specify additional files or directories that should be included when logging or saving the model. If you don't have any additional files or directories that need to be included, you can omit these parameters.

    Regarding your second question, the code_paths parameter should be set to the path of the directory that contains the code used to train the model. This can be a local directory or a remote directory accessible via a URI. If you are running the training script locally, you can set the code_paths parameter to the path of the directory containing the training script and any other necessary files. For example, if your training script is located in the models directory, you can set code_paths to "./models". If you have multiple directories that contain code used to train the model, you can specify them as a list of strings.

    Here is an example of how to log and save a PyTorch model using MLflow for your reference, please do need changes to fit your scenario -

    import mlflow.pytorch
    import torch
    # Define your PyTorch model
    model = torch.nn.Sequential(
        torch.nn.Linear(2, 1),
        torch.nn.Sigmoid()
    )
    # Train your model and obtain the trained model object
    # Log the model to MLflow
    mlflow.pytorch.log_model(
        pytorch_model=model,
        artifact_path="my-model",
        conda_env="path/to/conda.yaml",
        code_paths=["path/to/training/script.py", "path/to/other/code"],
        registered_model_name="my-registered-model"
    )
    # Save the model to a file
    mlflow.pytorch.save_model(
        pytorch_model=model,
        path="my-model",
        conda_env="path/to/conda.yaml",
        code_paths=["path/to/training/script.py", "path/to/other/code"]
    )
    
    
    
    

    Regards,

    Yutong

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.