Use managed identity to access mlflow models and artifacts

Tobias Quadfasel 75 Reputation points
2025-05-16T06:09:47.9233333+00:00

Hello! I am new to Azure Databricks and have a question: In my current setup, I am running some containerized python code within an azure functions app. In this code, I need to download some models and artifacts stored via mlflow in our Azure Databricks workspace.

Previously, I have done this by setting DATABRICKS_HOST and DATABRICKS_TOKEN environment variables and then within my code I just set mlflow.set_tracking_uri("databricks") and all worked fine. However, the token is a PAT, which I do not like from a security perspective. Ideally, I would like to use the managed Identity of the functions app to authenticate with databricks. According to the following article, this should be possible: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/azure-mi-auth

So I essentially repeated the steps in the article. Note that I omitted all account-level authorization steps, since workspace-level authorization is enough for my use case.

  • I created a user-assigned managed Identity in Azure
  • I assigned the managed identity to the functions app
  • I added a new entra ID managed service principal in my Azure Databricks workspace, using the client ID of the managed identity as application Id
  • I created the respective config file ~/.databrickscfg, adding a single profile with the name [AZURE_MI_WORKSPACE], containing the parameters host (my azure databricks workspace URL), azure_workspace_resource_id (resource ID of my azure databricks workspace), azure_client_id (the client ID of the managed Identity), azure_tenant_id (my azure tenant ID) and I set azure_use_msi to true, just as in the config in the referenced article above

Then, I changed my code to mlflow.set_tracking_uri("databricks://AZURE_MI_WORKSPACE"). The code proceeds to read the information from the .databrickscfg file, since I get the output

loading AZURE_MI_WORKSPACE profile from ~/.databrickscfg: host, azure_workspace_resource_id, azure_client_id, azure_use_msi, azure_tenant_id

But when setting the tracking uri, I get the following error:

Reading Databricks credential configuration failed with MLflow tracking URI 'databricks://AZURE_MI_WORKSPACE'. Please ensure that the 'databricks-sdk' PyPI library is installed, the tracking URI is set correctly, and Databricks authentication is properly configured. The tracking URI can be either 'databricks' (using 'DEFAULT' authentication profile) or 'databricks://{profile}'. You can configure Databricks authentication in several ways, for example by specifying environment variables (e.g. DATABRICKS_HOST + DATABRICKS_TOKEN) or logging in using 'databricks auth login'.

Do you have any leads what could be wrong here? I triple checked the parameters in the config files and they are definitely correct. I was asking myself if I made some kind of conceptual error and the mlflow tracking can't be done via managed identity auth for some reason.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

Answer accepted by question author
  1. Sina Salam 26,666 Reputation points Volunteer Moderator
    2025-05-16T12:04:55.6866667+00:00

    Hello Tobias Quadfasel,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are trying to use a managed identity in a way that is not fully supported by MLflow's current authentication flow.

    MLflow does not natively support managed identity authentication via .databrickscfg alone. The below steps align with best practice guidance and are secure, scalable, and recommended for production environments: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/azure-mi

    • Install Required Libraries using bash command: pip install azure-identity mlflow databricks-sdk
    • Manually acquire Azure AD token using Managed Identity using Python:
           from azure.identity import ManagedIdentityCredential
           import requests
           # Get token for Databricks
           credential = ManagedIdentityCredential(client_id="<your-client-id>")
           token = credential.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default")
           # Use token in headers for Databricks REST API
           headers = {
               "Authorization": f"Bearer {token.token}"
           }
      
      • The resource ID 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d is the Azure Databricks App ID
    • Then, use REST API to download Artifacts or Models, since MLflow may not support this token directly, use the REST API:
           response = requests.get(
               "https://<your-databricks-instance>.azuredatabricks.net/api/2.0/mlflow/artifacts/download",
               headers=headers,
               params={"run_id": "<your-run-id>", "path": "<artifact-path>"}
           )
      

    Alternatively, this step is a practical workaround, not officially documented as a supported method for managed identity with MLflow. If you want to try MLflow with token injection:

       import mlflow
       import os
       os.environ["DATABRICKS_HOST"] = "https://<your-databricks-instance>.azuredatabricks.net"
       os.environ["DATABRICKS_TOKEN"] = token.token  # Inject token manually
       mlflow.set_tracking_uri("databricks")
    
    

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.