Is it possible to deploy a custom image with ray serve for model parallelization for online inference ?

Dragos Toma 0 Reputation points
2024-08-27T11:45:26.3+00:00

I want to use an instance with 4 cpu's for inference and use the ray serve lib for model parallelization.
This option should be superior then using 4 workers on the same image because the memory management is better using ray. I can't find an example where we can customize the rest api

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,891 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 51,696 Reputation points
    2024-08-29T02:48:08.17+00:00

    Hello Dragos,

    Thanks for reaching out to us, there is no official documents shared for this topic in Azure but I think you can check with ray serve to see if there is any example from their end.

    Use case - https://docs.ray.io/en/latest/ray-overview/use-cases.html

    Examples - https://docs.ray.io/en/latest/ray-overview/examples.html

    Discussion forum - https://discuss.ray.io/?_gl=111dr9eq_gcl_au*MTE4MTA1ODc1MC4xNzI0ODA4OTI2

    General to answer this question, yes, it is possible to deploy a custom image with Ray Serve for model parallelization for online inference. Ray Serve is designed to handle complex model serving scenarios, including model parallelization, and can be deployed using a custom Docker image.

    1. Create a Custom Docker Image

    First, you need to create a Docker image that includes your model, Ray Serve, and any other dependencies required for inference.

    1. Define Your Ray Serve Application

    Ray Serve allows you to deploy models with parallelization and scaling capabilities. Create a Python script to define your Ray Serve application.

    You can deploy your Ray Serve application using the custom Docker image you created. Follow these steps to deploy it:

    Using Ray’s Docker Deployment:

    1. Push Your Docker Image: Push your Docker image to a container registry (e.g., Docker Hub). When launching a Ray cluster, you can specify the Docker image to use.
    2. Access and Customize the API

    Ray Serve exposes HTTP endpoints for your model deployments. You can interact with these endpoints using standard HTTP requests. Customize your API by modifying the __call__ method in your ModelDeployment class to handle different types of requests or add more functionality.

    Monitoring: Use Ray’s monitoring tools to keep track of your deployment’s performance. Ray provides a dashboard where you can monitor your cluster and deployments.

    Scaling: Adjust the number of replicas and resources allocated to your deployment as needed. Ray Serve can dynamically scale based on traffic and load.

    I hope this helps and your issue can be solved soon.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.