How to increase Azure ML Real-time endpoint Input limit of 100 MB?

Rahul Kurian Jacob 80 Reputation points
2024-11-03T12:17:19.09+00:00

I am deploying a custom model with a custom inference pipeline (with score.py) on an Azure ML Real-time endpoint. I have found that the maximum length of POST request content is 104,857,600 bytes (exactly 100 MB).

In the limits given in the docs, there is no mention of any upper limit for requests made. The endpoint uses Flask library which, by default, has no upper limit either.

So, my question is:

  1. Is there a way to increase this limit?
  2. If not, is there some work around (not using Batch Endpoints)?
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,332 questions
0 comments No comments
{count} votes

Accepted answer
  1. RevelinoB 3,675 Reputation points
    2024-11-03T14:05:24.4333333+00:00

    Hi Rahul,

    At a recent customer site, we came across a similar challenge with Azure Machine Learning's managed online endpoints, where there’s a 100 MB cap on POST request payloads. Although this limit isn’t widely documented, it’s a restriction built into the system’s infrastructure, and currently, there’s no supported way to expand it.

    In our case, one option we explored was deploying the model on a Kubernetes cluster. This approach gives more control over configurations, letting us set up a web server (such as Nginx or Flask) that can handle larger payloads. While it involves managing the Kubernetes environment, it does provide the flexibility we needed.

    We also looked into splitting the data into smaller chunks, each staying within the 100 MB limit. These chunks can be sent one at a time or in parallel, then reassembled on the endpoint for processing. This approach does need some added logic to manage chunking and reassembly.

    Finally, another solution we tried was storing larger data in Azure Blob Storage and passing a reference (like a URL) to the endpoint, which can then pull the data directly from storage. This keeps the payload size manageable and shifts large data transfers to Blob Storage.

    Each of these solutions involves some trade-offs and may require small changes in architecture, but they’re effective for handling larger payloads. Let me know if you’d like more details on any of these approaches!

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.