Prompt Flow compute and costs

Ilkka Huotelin 40 Reputation points
2025-01-23T12:12:38.95+00:00

Hello,

We want to build several AI chat bots using the same data for RAG but variations on system prompts and the flow logic. They each have modest traffic and we want to optimise the total cost. We have been experimenting with AI Foundry's Prompt Flow, and we have a couple of questions on the required compute and related costs:

  1. Does each chat bot deployment authored in Prompt Flow require its own virtual machines or is there a way for them share compute resources?
  2. Are these prompt flow VMs used also for LLM inference (for instance Azure OpenAI) or those resources are consumed and charged separately?
  3. When authoring a prompt flow, a compute session is required. Once the flow has been deployed, can the session be shut down without affecting the running of the deployment?

Thank you!

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,092 questions
0 comments No comments
{count} votes

Accepted answer
  1. Marcin Policht 50,495 Reputation points MVP Volunteer Moderator
    2025-01-23T13:17:18.2733333+00:00
    • Shared compute resources for chat bot deployments:
      AI Foundry’s Prompt Flow allows multiple deployments to share compute resources if they are running on the same compute cluster. This means you can deploy multiple bots with different system prompts and flow logic while optimizing costs by pooling resources. Ensure that the cluster is sized appropriately for the combined load across all bots.
    • Compute for LLM inference vs. Prompt Flow VMs:
      The VMs or compute used by Prompt Flow are primarily for orchestrating your workflows, integrating RAG (retrieval-augmented generation), and handling custom logic. LLM inference (e.g., Azure OpenAI Service) is handled separately, and the cost of inference is charged independently based on your usage (tokens processed). These inference calls do not consume the compute resources used for Prompt Flow itself.
    • Shutting down the compute session after deployment:
      Yes, once a prompt flow has been authored and deployed, the active authoring session compute can be shut down without affecting the deployed flow. The deployed flow will continue to operate as it uses the designated deployment infrastructure. Authoring sessions are temporary and only required during design, testing, or debugging phases.

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.