Monitor Unity AI Gateway cost

Important

This feature is in Beta.

Observe and analyze cost for all Unity AI Gateway traffic by model service, target model, requesting principal, and tags.

Note

Cost observability is based on Azure Databricks billing records. For request-level usage analytics such as token counts, latency, requester details, and request tags, see Model usage for Unity AI Gateway services.

Requirements

Attribution

Unity AI Gateway provides cost attribution through the billable usage system table (system.billing.usage).

Unity AI Gateway enriches MODEL_SERVING billing records in system.billing.usage with service-specific metadata, so you can attribute Azure Databricks cost to the associated services, target models, principals, and service tags. For the complete schema and field definitions, see the Billing usage system table reference.

The billable usage system table includes cost attribution for Azure Databricks-hosted models. For external model cost analysis in the dashboard, see External model cost.

For requests served through a Unity AI Gateway model service, Azure Databricks populates the following fields on MODEL_SERVING records in system.billing.usage:

Field Description
usage_metadata.ai_gateway_endpoint_name The name of the Unity AI Gateway model service that received the request. This is the Unity Catalog fully qualified name, in the form <catalog>.<schema>.<modelservice>.
usage_metadata.ai_gateway_endpoint_id The ID of the Unity AI Gateway model service.
usage_metadata.ai_gateway_destination_model The destination model that handled the request, for example GPT-5.2.
usage_metadata.ai_gateway_destination_id The ID of the target that handled the request.
identity_metadata.run_by The user or service principal that issued the request.
custom_tags Service tags configured on the Unity AI Gateway model service, such as team or cost_center. See Configure Unity AI Gateway endpoints (legacy).

Unity AI Gateway populates these fields for both real-time and batch inference requests routed through it.

Observability

The built-in usage dashboard includes a Cost Analysis page for monitoring cost and analyzing cost breakdowns over time. You can analyze cost across multiple dimensions, including:

  • Model service
  • Target model
  • Requesting user or service principal
  • Service tags
  • Request tags

To open the dashboard, click View Dashboard from the AI Gateway page. For details on importing and updating the dashboard, see Built-in usage dashboard.

ai-gateway cost analysis dashboard

ai-gateway cost analysis drilldown

Note

Cost observability is available in dashboard version 0.4 and above. Account admins must update the dashboard to receive the latest template changes. See Built-in usage dashboard.

Tag-based analysis

The Cost Analysis page includes tag-based views and filters so you can analyze cost using service tags and request tags.

Service tags are configured on the Unity AI Gateway model service and apply to all requests sent to that model service. Request tags are attached to individual requests and enable more granular attribution within the same model service, such as by project, feature, environment, or end user.

Tag filters accept a semicolon-separated list in the format <entry1>;<entry2>;<entry3>, where each entry is specified as either:

  • <key> to match all values for a tag key. For example, team matches all requests with the team tag.
  • <key>=<value> to match a specific tag key-value pair. For example, team=ml-platform;env=prod matches requests tagged with team=ml-platform and env=prod.

For information about configuring and querying request tags, see Tag requests and model services for usage tracking.

External model cost

The usage dashboard can be configured to include cost estimates for external models by specifying a model pricing table in the Pricing Table Override setting. The pricing table is user-managed and must be provided as input to the dashboard.

ai-gateway external model pricing table override

The pricing table must include the following fields:

Field Type Description
model STRING The model name used for cost attribution in the dashboard.
input_token_price DOUBLE The price for input tokens.
output_token_price DOUBLE The price for output tokens.
cache_read_input_token_price DOUBLE The price for cache-read input tokens, when supported.
cache_write_input_token_price DOUBLE The price for cache-write input tokens, when supported.

Note

Cost estimates for external models are for informational purposes only. These figures are calculated based on list or override prices and might not reflect your final provider invoice. Databricks is not liable for discrepancies in third-party billing.

Analyzing cost

Tip

Genie Code (Agent mode) can do this for you. Try this example prompt:

Query system.billing.usage to show AI Gateway DBU cost for the past 30 days, broken down by usage_metadata.ai_gateway.endpoint_name, destination model, and requesting user. Filter to MODEL_SERVING records. Show top 10 in each.

The following queries analyze cost for Azure Databricks-hosted models in system.billing.usage. Cost can be broken down by model service, target model, principal, and service tag.

By model service

SELECT
  usage_metadata.ai_gateway_endpoint_name AS endpoint_name,
  SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
  AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
  AND usage_unit = 'DBU'
  AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY endpoint_name
ORDER BY dbus DESC;

By destination model

SELECT
  usage_metadata.ai_gateway_destination_model AS destination_model,
  SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
  AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
  AND usage_unit = 'DBU'
  AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY destination_model
ORDER BY dbus DESC;

By user or service principal

SELECT
  identity_metadata.run_by AS run_by,
  SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
  AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
  AND identity_metadata.run_by IS NOT NULL
  AND usage_unit = 'DBU'
  AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY run_by
ORDER BY dbus DESC;

By service tag

Service tags propagate to the billing records in custom_tags, so you can allocate cost by dimensions such as team, environment, project, or cost center.

SELECT
  custom_tags['team'] AS team,
  SUM(usage_quantity) AS dbus
FROM system.billing.usage
WHERE billing_origin_product = 'MODEL_SERVING'
  AND usage_metadata.ai_gateway_endpoint_name IS NOT NULL
  AND custom_tags['team'] IS NOT NULL
  AND usage_unit = 'DBU'
  AND usage_date >= current_date() - INTERVAL 30 DAYS
GROUP BY team
ORDER BY dbus DESC;

To add tags such as team, project, or cost_center to a model service, see Configure Unity AI Gateway endpoints (legacy).

Limitations

  • Spend attribution applies to MODEL_SERVING records in system.billing.usage. Requests routed to external models that are billed directly by the external provider do not appear in system.billing.usage.
  • For model services with multiple destinations, such as traffic splitting or fallbacks, ai_gateway_destination_model and ai_gateway_destination_id identify the destination that ultimately served the request.