Prompt Flow compute and costs

Question

Prompt Flow compute and costs

Ilkka Huotelin 40

Hello,

We want to build several AI chat bots using the same data for RAG but variations on system prompts and the flow logic. They each have modest traffic and we want to optimise the total cost. We have been experimenting with AI Foundry's Prompt Flow, and we have a couple of questions on the required compute and related costs:

Does each chat bot deployment authored in Prompt Flow require its own virtual machines or is there a way for them share compute resources?
Are these prompt flow VMs used also for LLM inference (for instance Azure OpenAI) or those resources are consumed and charged separately?
When authoring a prompt flow, a compute session is required. Once the flow has been deployed, can the session be shut down without affecting the running of the deployment?

Thank you!

Accepted answer

0 additional answers

Your answer

Answer 1

Shared compute resources for chat bot deployments:
AI Foundry’s Prompt Flow allows multiple deployments to share compute resources if they are running on the same compute cluster. This means you can deploy multiple bots with different system prompts and flow logic while optimizing costs by pooling resources. Ensure that the cluster is sized appropriately for the combined load across all bots.
Compute for LLM inference vs. Prompt Flow VMs:
The VMs or compute used by Prompt Flow are primarily for orchestrating your workflows, integrating RAG (retrieval-augmented generation), and handling custom logic. LLM inference (e.g., Azure OpenAI Service) is handled separately, and the cost of inference is charged independently based on your usage (tokens processed). These inference calls do not consume the compute resources used for Prompt Flow itself.
Shutting down the compute session after deployment:
Yes, once a prompt flow has been authored and deployed, the active authoring session compute can be shut down without affecting the deployed flow. The deployed flow will continue to operate as it uses the designated deployment infrastructure. Authoring sessions are temporary and only required during design, testing, or debugging phases.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Ilkka Huotelin 40 Reputation points

2025-01-23T15:28:16.45+00:00

Thank you Marcin for the reply. A follow on question:

When I deploy a Prompt Flow it creates a new compute resource. How do I select compute resource sharing?
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Marcin Policht 50,735 Reputation points MVP Volunteer Moderator

2025-01-23T21:56:43+00:00

AFAIK, to enable resource sharing, you'll need to manually create a shared compute cluster via Azure Machine Learning. Then, you may reference this cluster when deploying flows through the SDK or CLI, not directly in the current UI interface.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Share via

Prompt Flow compute and costs

0 additional answers

Your answer