Is it possible to deploy a custom image with ray serve for model parallelization for online inference ?

Question

I want to use an instance with 4 cpu's for inference and use the ray serve lib for model parallelization.
This option should be superior then using 4 workers on the same image because the memory management is better using ray. I can't find an example where we can customize the rest api

Answer

Hello Dragos,

Thanks for reaching out to us, there is no official documents shared for this topic in Azure but I think you can check with ray serve to see if there is any example from their end.

Use case - https://docs.ray.io/en/latest/ray-overview/use-cases.html

Examples - https://docs.ray.io/en/latest/ray-overview/examples.html

Discussion forum - https://discuss.ray.io/?_gl=111dr9eq_gcl_au*MTE4MTA1ODc1MC4xNzI0ODA4OTI2

General to answer this question, yes, it is possible to deploy a custom image with Ray Serve for model parallelization for online inference. Ray Serve is designed to handle complex model serving scenarios, including model parallelization, and can be deployed using a custom Docker image.

Create a Custom Docker Image

First, you need to create a Docker image that includes your model, Ray Serve, and any other dependencies required for inference.

Define Your Ray Serve Application

Ray Serve allows you to deploy models with parallelization and scaling capabilities. Create a Python script to define your Ray Serve application.

You can deploy your Ray Serve application using the custom Docker image you created. Follow these steps to deploy it:

Using Ray’s Docker Deployment:

Push Your Docker Image: Push your Docker image to a container registry (e.g., Docker Hub). When launching a Ray cluster, you can specify the Docker image to use.
Access and Customize the API

Ray Serve exposes HTTP endpoints for your model deployments. You can interact with these endpoints using standard HTTP requests. Customize your API by modifying the __call__ method in your ModelDeployment class to handle different types of requests or add more functionality.

Monitoring: Use Ray’s monitoring tools to keep track of your deployment’s performance. Ray provides a dashboard where you can monitor your cluster and deployments.

Scaling: Adjust the number of replicas and resources allocated to your deployment as needed. Ray Serve can dynamically scale based on traffic and load.

I hope this helps and your issue can be solved soon.

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot

Share via

Is it possible to deploy a custom image with ray serve for model parallelization for online inference ?

1 answer

Your answer