Hi Rahul,
At a recent customer site, we came across a similar challenge with Azure Machine Learning's managed online endpoints, where there’s a 100 MB cap on POST request payloads. Although this limit isn’t widely documented, it’s a restriction built into the system’s infrastructure, and currently, there’s no supported way to expand it.
In our case, one option we explored was deploying the model on a Kubernetes cluster. This approach gives more control over configurations, letting us set up a web server (such as Nginx or Flask) that can handle larger payloads. While it involves managing the Kubernetes environment, it does provide the flexibility we needed.
We also looked into splitting the data into smaller chunks, each staying within the 100 MB limit. These chunks can be sent one at a time or in parallel, then reassembled on the endpoint for processing. This approach does need some added logic to manage chunking and reassembly.
Finally, another solution we tried was storing larger data in Azure Blob Storage and passing a reference (like a URL) to the endpoint, which can then pull the data directly from storage. This keeps the payload size manageable and shifts large data transfers to Blob Storage.
Each of these solutions involves some trade-offs and may require small changes in architecture, but they’re effective for handling larger payloads. Let me know if you’d like more details on any of these approaches!