Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Windows 365 for Agents exposes capabilities through complementary surfaces that map to the agent session lifecycle:
- Microsoft Graph APIs for administration. IT admins and agent makers use these APIs to provision and govern pool capacity.
- Windows 365 for Agents session API for runtime session management. Partner applications call this API to check out a Cloud PC, then release it when work completes.
- Model Context Protocol (MCP) tools for in-session operation. AI agents invoke these tools through the per-session MCP endpoint. For screen sharing, a partner application invokes screenshare actions on a human's behalf.
Together, these surfaces cover provisioning the pool, acquiring a Cloud PC, performing work, and observing or assisting as needed.
Computer-Create: administration
On the Microsoft Graph API side, the Computer-Create plane uses the W365A Graph API and the W365 admin portal. Through these surfaces, administrators and independent software vendors (ISVs) can:
- Provision Cloud PC agent pools.
- Configure policies and images.
- Register trusted partner callers.
- Scale pool counts.
- Attach metering through MAC billing.
Computer-Get: session checkout and checkin
The Computer-Get plane is a small runtime control surface for partner applications, served by the Windows 365 for Agents session API (not Microsoft Graph).
Checkout reserves a Cloud PC and returns the session identity and connection URLs:
POST /api/pools/{poolId}/sessions?api-version=2.0
A successful checkout returns:
sessionId: the session identifiercomputerUrl: base URL for MCP tool calls (append/mcp)screenshareUrl: base URL for screen-share actions
Checkout may take up to 30 seconds while a device is assigned. Use the x-ms-sessionId header (a UUID v4) as an idempotency key so retries don't allocate duplicate sessions.
Session kinds are determined at checkout time by the headers you pass:
| Kind | Headers | Purpose |
|---|---|---|
| HumanUser (default) | user-object-id |
Standard interactive session bound to an AAD identity. |
| Agentic | x-ms-authorization-auxiliary (agent identity token) + user-object-id (agent user ID) |
Agent-driven session. The auxiliary token identifies the specific agent requesting access. Contact wcxcipai@microsoft.com for tenant setup. |
| Local | Neither header | System-account session with no AAD user binding. |
Checkin releases the session:
DELETE /api/sessions/{sessionId}?api-version=2.0
Checkin is fire-and-forget, a 204 No Content response means the release was accepted and cleanup completes asynchronously. Idle sessions are evicted automatically after 30 minutes of inactivity (any MCP or screen share request counts as activity), but partner applications should always check sessions in explicitly when work completes.
Computer-Do: in-session operation
After the partner application acquires a Cloud PC, agents use MCP tools to operate it. These tools follow the open Model Context Protocol, so any agent that supports the protocol can discover and invoke tools without custom integration.
All MCP traffic flows through the session's MCP endpoint, formed by appending /mcp to the computerUrl returned at checkout:
POST {computerUrl}/mcp?api-version=1.0
Every request must include the x-ms-computerId header matching the computer ID in the URL. Each POST sends one JSON-RPC message and returns one response.
MCP session lifecycle. Before calling any tool, the client must complete the MCP initialization handshake:
- Send an
initializerequest to receive server capabilities. - Send an
initializednotification (no response expected). - Issue tool calls
tools/listto discover available tools, ortools/callto invoke one.
Initialization is required once per session. The MCP plane covers desktop interaction (mouse, keyboard, screenshot capture), window management, command execution, browser automation and UI accessibility capabilities.
For the full catalog of tools and their parameter schemas, see Windows 365 for Agents MCP Server.
Computer-See/Take-Control: human supervision
The Screenshare SDK lets a partner application embed real-time human observation of agent activity directly in its own UI. It streams the agent's Cloud PC over WebRTC and, when needed, relays mouse and keyboard input back to the session. The SDK creates an iframe inside your page that handles all video streaming, input relay, and screen share API calls, so your application never talks to the streaming stack directly.
The viewer connects to the screenshareUrl returned at checkout. No separate screen share endpoint construction is required, the SDK derives its calls from the base URL and computer ID you supply.
Integration flow
The partner application checks out a session, loads the SDK from the CDN, and hands the returned screenshareUrl and bearer token to a ScreenShareViewer. The iframe takes over from there, calling the ARI screen share API and joining the video call on your behalf:
Partner application ARI service
│ │
│ POST /api/pools/{poolId}/sessions │
│ ──────────────────────────────────────→ │
│ │
│ 200 OK { screenshareUrl: "…" } │
│ ←────────────────────────────────────── │
│ │
│ Load screenshare-embed.js from CDN │
│ new ScreenShareViewer(container, │
│ baseUrl, computerId) │
│ viewer.connect(bearerToken) │
│ ─── postMessage to iframe ────────────→ │
│ │
│ iframe calls ARI screenshare API │
│ iframe joins ACS video call │
│ live video streams back │
│ ←────────────────────────────────────── │
SDK distribution
The SDK is published per environment. Load the screenshare-embed.js build that matches the ring your application runs against:
SDK distribution
The SDK is published per environment. Load the screenshare-embed.js build that matches the ring your application runs against:
| Environment | CDN URL |
|---|---|
| PROD | https://packages.global.cloudinferenceplatform.azure.com/screenshare-sdk/latest/screenshare-embed.js |
Viewer methods
A ScreenShareViewer instance exposes the full session lifecycle, connect, optional control handoff, token refresh, and teardown:
| Method | Description |
|---|---|
connect(bearerToken) |
Starts a screen share session. Returns a Promise. See section 2.2 for obtaining a bearer token. |
takeControl() |
Requests mouse and keyboard control (interactive mode only). The most recent caller always wins, there's no rejection. |
releaseControl() |
Releases control and returns the viewer to view-only. |
updateToken(bearerToken) |
Replaces the bearer token without restarting the session. Use when you receive a TOKEN_EXPIRED error. |
stop() |
Ends the session and removes the iframe from the DOM. The instance cannot be reused, create a new ScreenShareViewer to reconnect. |
Error responses
Errors surface through the error event with a code and message. Each code maps to a specific recovery action:
| Code | Meaning | Action |
|---|---|---|
TOKEN_EXPIRED |
Bearer token expired (401). |
Call viewer.updateToken(newToken). |
START_FAILED |
ARI Start API failed. | Check computerId and pool registration. |
JOIN_FAILED |
ACS call join failed. | Retry with a fresh token. |
RECONNECT_FAILED |
Auto-reconnect exhausted (3 attempts). | Call viewer.stop(), create a new viewer, and reconnect with a fresh token. |
IFRAME_LOAD_FAILED |
Iframe didn't respond within 10 seconds. | Check that baseUrl is reachable from the browser. |
MODE_RESTRICTED |
Control command issued in viewOnly mode. |
Create the viewer with mode: 'interactive'. |
Quick start
A minimal page that mounts a viewer into a container and connects it to an already-checked-out session. It assumes you have the checkout response (see section 6.1) and a bearer token (see section 2.2):
<!DOCTYPE html>
<html>
<head><title>Screen Share</title></head>
<body>
<div id="viewer" style="width: 100%; height: 600px;"></div>
<script src="https://packages.global.cloudinferenceplatform-int.azure.com/screenshare-sdk/latest/screenshare-embed.js"></script>
<script>
// Assumes you already have the checkout response (Section 6.1) and bearer token (Section 2.2)
var computerUrl = checkoutResponse.computerUrl;
var computerId = checkoutResponse.computerId;
var viewer = new ScreenShareViewer({
container: document.getElementById('viewer'),
baseUrl: computerUrl,
computerId: computerId
});
viewer.on('error', function (code, msg) {
console.error(code, msg);
});
viewer.connect(bearerToken);
</script>
</body>
</html>
Surface summary
| Surface | Plane | Endpoint | Called by | Purpose |
|---|---|---|---|---|
| Graph API | Computer-Create | W365A Graph API and W365 admin portal | IT admin or ISV | Shape and maintain the pool. |
| Session API | Computer-Get | POST /api/pools/{poolId}/sessions (Checkout) |
Partner application | Reserve a Cloud PC. |
| Session API | Computer-Get | DELETE /api/sessions/{sessionId} (Checkin) |
Partner application | Release the Cloud PC. |
| MCP | Computer-Do | POST {computerUrl}/mcp |
AI agent | Operate the Cloud PC. |
| Screenshare SDK | Computer-See, Computer-TakeControl | ScreenShareViewer (from CDN screenshare-embed.js) |
Partner app, on behalf of a human | Observe and co-drive. |
How they fit together
The surfaces work in sequence, with a clear handoff between callers:
- Admins and agent makers use Computer-Create to provision the pool.
- The partner application calls Checkout on Computer-Get to reserve a Cloud PC for a specific piece of agent work, specifying the session kind through request headers.
- The AI agent initializes the MCP session against
{computerUrl}/mcpand drives the Cloud PC through the Computer-Do tools. Most calls flow through this plane. - When needed, the partner application invokes Computer-See actions against
{screenshareUrl}on behalf of a human to observe or take over. - The partner application calls Checkin on Computer-Get to release the Cloud PC when the work is done. Sessions left idle for 30 minutes are evicted automatically.
Next steps
- Learn more about Windows 365 for Agents MCP Server.
- Learn about the Windows 365 for Agents architecture.
- Learn about the agent session lifecycle.