Share via


Use agent tools to extend, automate, and enhance your agents

Agents become more powerful when you equip them with specialized tools that extend their core capabilities. Copilot Studio provides three primary categories of agent tools:

This article explores how each tool type works, when to use them, and how they can help you build more capable and efficient agents. You also learn about the differences between hosted and bring-your-own machines for computer use scenarios, plus guidance on choosing between traditional Robotic Process Automation (RPA) and Computer Using Agents (CUA) approaches.

Generate a response by using AI prompts

AI prompts use a set of instructions to generate a response from an AI model. You can include variables to insert more text or documents into these instructions. The output is typically provided in either plain text or JSON format. You can select any AI model built into Copilot Studio or deployed through Microsoft Foundry to generate the response.

You can invoke prompts as an agent tool or from within a topic. All prompts are saved to a prompt library and support application lifecycle management, role-based access control, and sharing.

Learn more about using prompts to make your agent perform specific tasks.

Determine when to use AI prompts vs. the orchestrator

Every agent built in Copilot Studio uses the orchestrator to determine how to respond by selecting tools, topics, and knowledge based on system instructions, user input, and contextual information. The orchestrator is the engine behind generative orchestration, which plans actions and composes responses using the agent's tools and descriptions.

While orchestrator‑driven responses might seem similar to AI prompts, the two capabilities serve different purposes. AI prompts are standalone prompt‑based actions that give makers deeper control over model configuration.

AI prompts support a broader range of models, including the ones available through Microsoft Foundry. They also support features such as Dataverse grounding, file inputs, and code interpreter.

The orchestrator uses a fixed system prompt and tool descriptions to choose the right building blocks for a given request. Makers can't edit the orchestrator's system prompt, but they can influence how it behaves through agent instructions.

AI prompts give full control over the formatting, constraints, and logic, making them the right choice for scenarios that require fine‑tuned or highly structured output. For example, if you need stylistic control beyond simple formatting ("write a rhyming poem in ABAB structure using these exact words"), a prompt is the better fit.

The orchestrator works well for simple tasks like extracting a single name from text. For complex extraction, use AI prompts. For example, pulling multiple entities from a long report and linking them to domain-specific relationships (such as extracting multiple names from an insurance report and identifying the car-repair service owner associated with only one party in the incident).

The decision between orchestrator and AI prompts depends on the level of customization required. If you need precise control over the model's behavior or output, choose AI prompts. For scenarios where general reasoning, tool selection, and lightweight formatting are sufficient, the orchestrator is the appropriate choice.

Integrate agent tools by using MCP

The Model Context Protocol (MCP) is a universal interface that AI models use to interact with external tools, data sources, and user environments in a consistent and scalable way.

By comparison, Power Platform connectors require you to describe each action and its inputs, and to update these descriptions as new definitions become available. Custom coding an integration for each tool is more complex and less scalable.

Use the MCP servers provided with Copilot Studio for Microsoft services like Outlook, Dataverse, and GitHub, or third-party services like Salesforce and JIRA. Build custom MCP servers for services where none exists.

Benefits of MCP include:

  • Standardized context for AI models
  • Seamless integration with Copilot Studio
  • Improved developer efficiency and user experience
  • Governance, monitoring, and extensibility

Consider the following limitations before implementing MCP servers:

  • You can't enrich tool descriptions with more context about when to invoke.
  • Topics can't call MCP servers directly.

Understand when to use MCP

You can achieve the same outcomes in Copilot Studio through several integration approaches. It's important to understand when to use Model Context Protocol (MCP) servers versus simpler options like Power Platform connectors or direct REST API calls.

Use MCP when you need a standardized, centrally managed way to expose tools and resources to multiple agents without per-client configuration. MCP servers publish tools and resources that agents can automatically discover, version, and use consistently because the MCP server defines the tool descriptions and their inputs. In contrast, adding an API directly requires you to manually describe its purpose and define its inputs per agent.

MCP is especially valuable when upstream APIs change frequently. Instead of updating every agent that consumes the API, you modify the definition once on the MCP server, and all agents automatically use the updated version without republishing. If no MCP server exists, or you're rapidly prototyping, calling APIs directly is faster and avoids the setup overhead required to introduce the full MCP lifecycle.

Generative Orchestration must be enabled to use MCP. Learn more in How does MCP work?

Automate desktop processes by using the computer use tool

By using the computer use tool, an agent can operate a computer without the need for automation scripts or APIs. Instead of using scripts or APIs, you configure the agent by using a prompt. The agent determines how best to achieve its goals. During the process, the agent takes a screenshot at each step, analyzes it to decide the next action, executes that action, and repeats this cycle until the task is complete. Screenshots taken by the agent and reasoning steps are available as part of the run history.

Common scenarios where an agent can benefit from the computer use tool include:

  • Data entry: For each row in the incoming CSV file, create the sales order in SAP and write the generated order ID back to the file.
  • Data extraction: Go to each supplier portal, search the listed SKU, extract the price, stock, and lead-time, and insert the results with a timestamp into the database.
  • Across apps: Export the day's transactions from the desktop finance client, navigate QuickBooks, and post each entry to the correct account.

Understand hosted machines vs. bring your own machine

Agents can call the computer use tool on a Microsoft-hosted machine or a bring-your-own (BYO) machine. Hosted machines are available for immediate use without IT configuration or billing. They belong to a shared pool of pre-provisioned Windows 365 Cloud PCs that aren't Entra-joined to the customer tenant. BYO machines must be provisioned in advance within the customer's own virtual network. You must register and manage BYO machines in Power Automate.

Use BYO machines for production scenarios. They have Microsoft Entra ID support, are Intune-enrolled, and support both web and desktop automation use cases. Use hosted machines only for prototyping due to their limited capabilities. Only one Cloud PC is available per user at a time, and usage can be throttled based on demand.

Learn more in Configure where computer use runs.

Robotic Process Automation (RPA) vs. Computer Using Agents (CUA)

Robotic Process Automation (RPA) is the automation of a computer using a script. You can apply it to many of the same scenarios as CUA. However, it's important to understand the differences between RPA and CUA.

Aspect RPA CUA
Automation type Rule based LLM driven
Interaction method UI tree Vision
Authoring Script, complex Natural language instructions
Decision making Predefined rules Autonomous visual-based decisions
Flexibility Limited flexibility High flexibility
Error handling Static error handling Self-correcting based on visual feedback

Use RPA when:

  • Only Generally Available (GA) features are allowed.
  • The user interface is stable. The screens, fields, and selectors rarely change.
  • The rules are clear. You can capture decisions in rules.
  • Speed matters. High-volume. Every second counts.
  • An RPA team owns it. The team has existing RPA development and management knowledge.

Use CUA when:

  • User interfaces shift or vary widely. You work with multiple apps and frequent redesigns.
  • You need it fast. The RPA team's backlog is full.
  • User interface matters. The task depends on what's visible on screen, such as charts, colors, and dynamic layouts.
  • Decisions are fuzzy. The agent must reason, pick the next step, or self-correct.