W365 for Agents API overview

Windows 365 for Agents exposes capabilities through complementary surfaces that map to the agent session lifecycle:

  • Microsoft Graph APIs for administration. IT admins and agent makers use these APIs to provision and govern pool capacity.
  • Windows 365 for Agents session API for runtime session management. Partner applications call this API to check out a Cloud PC, then release it when work completes.
  • Model Context Protocol (MCP) tools for in-session operation. AI agents invoke these tools through the per-session MCP endpoint. For screen sharing, a partner application invokes screenshare actions on a human's behalf.

Together, these surfaces cover provisioning the pool, acquiring a Cloud PC, performing work, and observing or assisting as needed.

Computer-Create: administration

On the Microsoft Graph API side, the Computer-Create plane uses the W365A Graph API and the W365 admin portal. Through these surfaces, administrators and independent software vendors (ISVs) can:

  • Provision Cloud PC agent pools.
  • Configure policies and images.
  • Register trusted partner callers.
  • Scale pool counts.
  • Attach metering through MAC billing.

Computer-Get: session checkout and checkin

The Computer-Get plane is a small runtime control surface for partner applications, served by the Windows 365 for Agents session API (not Microsoft Graph).

Checkout reserves a Cloud PC and returns the session identity and connection URLs:

POST /api/pools/{poolId}/sessions?api-version=2.0

A successful checkout returns:

  • sessionId : the session identifier
  • computerUrl : base URL for MCP tool calls (append /mcp)
  • screenshareUrl : base URL for screen-share actions

Checkout may take up to 30 seconds while a device is assigned. Use the x-ms-sessionId header (a UUID v4) as an idempotency key so retries don't allocate duplicate sessions.

Session kinds are determined at checkout time by the headers you pass:

Kind Headers Purpose
HumanUser (default) user-object-id Standard interactive session bound to an AAD identity.
Agentic x-ms-authorization-auxiliary (agent identity token) + user-object-id (agent user ID) Agent-driven session. The auxiliary token identifies the specific agent requesting access. Contact wcxcipai@microsoft.com for tenant setup.
Local Neither header System-account session with no AAD user binding.

Checkin releases the session:

DELETE /api/sessions/{sessionId}?api-version=2.0

Checkin is fire-and-forget, a 204 No Content response means the release was accepted and cleanup completes asynchronously. Idle sessions are evicted automatically after 30 minutes of inactivity (any MCP or screen share request counts as activity), but partner applications should always check sessions in explicitly when work completes.

Computer-Do: in-session operation

After the partner application acquires a Cloud PC, agents use MCP tools to operate it. These tools follow the open Model Context Protocol, so any agent that supports the protocol can discover and invoke tools without custom integration.

All MCP traffic flows through the session's MCP endpoint, formed by appending /mcp to the computerUrl returned at checkout:

POST {computerUrl}/mcp?api-version=1.0

Every request must include the x-ms-computerId header matching the computer ID in the URL. Each POST sends one JSON-RPC message and returns one response.

MCP session lifecycle. Before calling any tool, the client must complete the MCP initialization handshake:

  1. Send an initialize request to receive server capabilities.
  2. Send an initialized notification (no response expected).
  3. Issue tool calls tools/list to discover available tools, or tools/call to invoke one.

Initialization is required once per session. The MCP plane covers desktop interaction (mouse, keyboard, screenshot capture), window management, command execution, browser automation and UI accessibility capabilities.

For the full catalog of tools and their parameter schemas, see Windows 365 for Agents MCP Server.

Computer-See/Take-Control: human supervision

The Screenshare SDK lets a partner application embed real-time human observation of agent activity directly in its own UI. It streams the agent's Cloud PC over WebRTC and, when needed, relays mouse and keyboard input back to the session. The SDK creates an iframe inside your page that handles all video streaming, input relay, and screen share API calls, so your application never talks to the streaming stack directly.

The viewer connects to the screenshareUrl returned at checkout. No separate screen share endpoint construction is required, the SDK derives its calls from the base URL and computer ID you supply.

Integration flow

The partner application checks out a session, loads the SDK from the CDN, and hands the returned screenshareUrl and bearer token to a ScreenShareViewer. The iframe takes over from there, calling the ARI screen share API and joining the video call on your behalf:

Partner application                       ARI service
  │                                          │
  │  POST /api/pools/{poolId}/sessions       │
  │  ──────────────────────────────────────→ │
  │                                          │
  │  200 OK { screenshareUrl: "…" }          │
  │  ←────────────────────────────────────── │
  │                                          │
  │  Load screenshare-embed.js from CDN      │
  │  new ScreenShareViewer(container,        │
  │      baseUrl, computerId)                │
  │  viewer.connect(bearerToken)             │
  │  ─── postMessage to iframe ────────────→ │
  │                                          │
  │      iframe calls ARI screenshare API    │
  │      iframe joins ACS video call         │
  │      live video streams back             │
  │  ←────────────────────────────────────── │

SDK distribution

The SDK is published per environment. Load the screenshare-embed.js build that matches the ring your application runs against:

SDK distribution

The SDK is published per environment. Load the screenshare-embed.js build that matches the ring your application runs against:

Environment CDN URL
PROD https://packages.global.cloudinferenceplatform.azure.com/screenshare-sdk/latest/screenshare-embed.js

Viewer methods

A ScreenShareViewer instance exposes the full session lifecycle, connect, optional control handoff, token refresh, and teardown:

Method Description
connect(bearerToken) Starts a screen share session. Returns a Promise. See section 2.2 for obtaining a bearer token.
takeControl() Requests mouse and keyboard control (interactive mode only). The most recent caller always wins, there's no rejection.
releaseControl() Releases control and returns the viewer to view-only.
updateToken(bearerToken) Replaces the bearer token without restarting the session. Use when you receive a TOKEN_EXPIRED error.
stop() Ends the session and removes the iframe from the DOM. The instance cannot be reused, create a new ScreenShareViewer to reconnect.

Error responses

Errors surface through the error event with a code and message. Each code maps to a specific recovery action:

Code Meaning Action
TOKEN_EXPIRED Bearer token expired (401). Call viewer.updateToken(newToken).
START_FAILED ARI Start API failed. Check computerId and pool registration.
JOIN_FAILED ACS call join failed. Retry with a fresh token.
RECONNECT_FAILED Auto-reconnect exhausted (3 attempts). Call viewer.stop(), create a new viewer, and reconnect with a fresh token.
IFRAME_LOAD_FAILED Iframe didn't respond within 10 seconds. Check that baseUrl is reachable from the browser.
MODE_RESTRICTED Control command issued in viewOnly mode. Create the viewer with mode: 'interactive'.

Quick start

A minimal page that mounts a viewer into a container and connects it to an already-checked-out session. It assumes you have the checkout response (see section 6.1) and a bearer token (see section 2.2):

<!DOCTYPE html>
<html>
<head><title>Screen Share</title></head>
<body>
    <div id="viewer" style="width: 100%; height: 600px;"></div>

    <script src="https://packages.global.cloudinferenceplatform-int.azure.com/screenshare-sdk/latest/screenshare-embed.js"></script>
    <script>
        // Assumes you already have the checkout response (Section 6.1) and bearer token (Section 2.2)
        var computerUrl = checkoutResponse.computerUrl;
        var computerId = checkoutResponse.computerId;

        var viewer = new ScreenShareViewer({
            container: document.getElementById('viewer'),
            baseUrl: computerUrl,
            computerId: computerId
        });

        viewer.on('error', function (code, msg) {
            console.error(code, msg);
        });

        viewer.connect(bearerToken);
    </script>
</body>
</html>

Surface summary

Surface Plane Endpoint Called by Purpose
Graph API Computer-Create W365A Graph API and W365 admin portal IT admin or ISV Shape and maintain the pool.
Session API Computer-Get POST /api/pools/{poolId}/sessions (Checkout) Partner application Reserve a Cloud PC.
Session API Computer-Get DELETE /api/sessions/{sessionId} (Checkin) Partner application Release the Cloud PC.
MCP Computer-Do POST {computerUrl}/mcp AI agent Operate the Cloud PC.
Screenshare SDK Computer-See, Computer-TakeControl ScreenShareViewer (from CDN screenshare-embed.js) Partner app, on behalf of a human Observe and co-drive.

How they fit together

The surfaces work in sequence, with a clear handoff between callers:

  1. Admins and agent makers use Computer-Create to provision the pool.
  2. The partner application calls Checkout on Computer-Get to reserve a Cloud PC for a specific piece of agent work, specifying the session kind through request headers.
  3. The AI agent initializes the MCP session against {computerUrl}/mcp and drives the Cloud PC through the Computer-Do tools. Most calls flow through this plane.
  4. When needed, the partner application invokes Computer-See actions against {screenshareUrl} on behalf of a human to observe or take over.
  5. The partner application calls Checkin on Computer-Get to release the Cloud PC when the work is done. Sessions left idle for 30 minutes are evicted automatically.

Next steps