Monitor usage using cluster, pool, and workspace tags

To monitor cost and accurately attribute Azure Databricks usage to your organization’s business units and teams (for chargebacks, for example), you can tag workspaces (resource groups), clusters, and pools. These tags propagate to detailed cost analysis reports that you can access in the Azure portal.

For example, here is a cost analysis invoice details report in the Azure portal that details cost by clusterid tag over a one-month period:

Cost analysis by cluster ID

Tagged objects and resources

You can add custom tags for the following objects managed by Azure Databricks:

Object Tagging interface (UI) Tagging interface (API)
Workspace Azure Portal Azure Resources API
Pool Pool UI in the Azure Databricks workspace Instance Pool API
Cluster Cluster UI in the Azure Databricks workspace Clusters API

Azure Databricks adds the following default tags to all pools and clusters:

Pool tag key name Value
Vendor Constant “Databricks”
DatabricksInstancePoolCreatorId Azure Databricks internal identifier of the user who created the pool
DatabricksInstancePoolId Azure Databricks internal identifier of the pool
Cluster tag key name Value
Vendor Constant “Databricks”
ClusterId Azure Databricks internal identifier of the cluster
ClusterName Name of the cluster
Creator Username (email address) of the user who created the cluster

On job clusters, Azure Databricks also applies the following default tags:

Cluster tag key name Value
RunName Job name
JobId Job ID

On resources used by Databricks SQL, Azure Databricks also applies the following default tag:

Cluster tag key name Value
SqlWarehouseId Azure Databricks internal identifier of the SQL warehouse

Tag propagation

Workspace, pool, and cluster tags are aggregated by Azure Databricks and propagated to Azure VMs for cost analysis reporting. But pool and cluster tags are propagated differently from each other.

Databricks object tagging hierarchy

Workspace and pool tags are aggregated and assigned as resource tags of the Azure VMs that host the pools.

Workspace and cluster tags are aggregated and assigned as resource tags of the Azure VMs that host the clusters.

When clusters are created from pools, only workspace tags and pool tags are propagated to the VMs. Cluster tags are not propagated, in order to preserve pool cluster startup performance.

Tag conflict resolution

If a custom cluster tag, pool tag, or workspace tag has the same name as an Azure Databricks default cluster or pool tag, the custom tag is prefixed with an x_ when it is propagated.

For example, if a workspace is tagged with vendor = Azure Databricks, that tag will conflict with the default cluster tag vendor = Databricks. The tags will therefore be propagated as x_vendor = Azure Databricks and vendor = Databricks.

Limitations

  • It can take up to one hour for custom workspace tags to propagate to Azure Databricks after any change.
  • No more than 50 tags can be assigned to an Azure resource. If the overall count of aggregated tags exceeds this limit, x_-prefixed tags are evaluated in alphabetical order and those that exceed the limit are ignored. If all x_-prefixed tags are ignored and the count is till over the limit, the remaining tags are evaluated in alphabetical order and those that exceed the limit are ignored.
  • Tag keys and values can contain only characters from the ISO 8859-1 (latin1) set. Tags containing other characters are ignored.
  • If you change tag key names or values, these changes apply only after cluster restart or pool expansion.
  • If the cluster’s custom tags conflict with a pool’s custom tags, the cluster can’t be created.