An Azure machine learning service for building and deploying models.
hi Jas,
to answer your direct question: Yes, using six separate real-time endpoints for different PromptFlow workflows is likely overkill and the most expensive way to architect this.
The high cost comes from each endpoint needing its own dedicated compute instance (like a Kubernetes pod) that you're paying for 24/7, even during low traffic periods. The $650 endpoint is probably on a more powerful VM SKU.
more efficient architectural patterns that other enterprises use like
Consolidate into a Single Endpoint: Instead of six separate endpoints, build one robust real-time endpoint that can handle multiple types of requests. You can route different workflows through a single PromptFlow, using the input to determine which logic path to execute. This alone could cut your compute costs by 50-80%.
Use Asynchronous Patterns for Long Tasks: For workflows that exceed the 300-second timeout, don't force them into a real-time mold. The standard pattern is to
- Have the client kick off the job via a quick real-time call.
- Return a job ID immediately.
- Process the long-running task in the background (using batch endpoints, Azure Functions, or Container Apps).
- Let the client poll for results or use webhooks to notify upon completion. This separates your need for low-latency initiation from the long-running processing.
Consider Custom Containers on AKS/Container Apps.... While more work to set up, hosting your PromptFlow logic in a custom container on Azure Kubernetes Service or Container Apps gives you much finer control over scaling and cost. You can scale to zero during quiet periods, which is not possible with AML real-time endpoints.
Many teams use a hybrid approach a consolidated real-time endpoint for immediate, simple queries, and an asynchronous system for complex, long-running tasks.
You're not on the wrong track, but the architecture can be optimized significantly. Start by consolidating endpoints and implementing async patterns for long tasks.
regards,
Alex
and "yes" if you would follow me at Q&A - personaly thx.
P.S. If my answer help to you, please Accept my answer