Share via


Use Azure Databricks data on Microsoft Power Platform

This page explains how to use your Azure Databricks data from the following platforms after creating a connection:

  • Power Apps: Build applications that can read from and write to Azure Databricks, while preserving your Azure Databricks governance controls.
  • Power Automate: Build flows and add actions that enable executing custom SQL or an existing Job and get back the results.
  • Copilot Studio: Build custom agents using your Azure Databricks data as a knowledge source or connect Genie spaces as tools.

Prerequisites

Before you connect to Azure Databricks from Power Platform, you must create a connection to Azure Databricks on Microsoft Power Platform.

Use your Azure Databricks data to build Power canvas apps

To add your Azure Databricks data to your application, do the following:

  1. From the leftmost navigation bar in Power Apps, click Create.
  2. Click Start with a blank canvas and choose your desired canvas size to create a new canvas app.
  3. From your application, click Add data > Connectors > Azure Databricks. Click the Azure Databricks connection you created.
  4. Select a catalog from the Choose a dataset sidebar.
  5. From the Choose a dataset sidebar, select all the tables you want to connect your canvas app to.
  6. Click Connect.

Data operations in Power Apps:

The connector supports create, update, and delete operations, but only for tables that have a primary key defined. When performing create operations you must always specify the primary key.

Azure Databricks supports generated identity columns. In this case, primary key values are automatically generated on the server during row creation and cannot be manually specified.

Use your Azure Databricks data to build Power Automate flows

The Azure Databricks Statement Execution API and Jobs API are exposed within Power Automate, allowing you to write SQL statements and run existing Jobs. To create a Power Automate flow using Azure Databricks as an action, do the following:

  1. From the leftmost navigation bar in Power Automate, click Create.

  2. Create a flow and add any trigger type.

  3. From your new flow, click + and search for "Databricks" to see the available actions.

  4. To write a SQL query, select one of the following actions:

    • Execute a SQL Statement: Write and run a SQL statement. Enter the following:

      • For Body/warehouse_id, enter the ID of the warehouse upon which to execute the SQL statement.
      • For Body/statement_id, enter the ID of the SQL statement to execute.

      For more about the advanced parameters, see here.

    • Check status and get results: Check the status of a SQL statement and gather results. Enter the following:

      • For Statement ID, enter the ID returned when the SQL statement was executed.

      For more about the parameter, see here.

    • Cancel the execution of a statement: Terminate execution of a SQL statement. Enter the following:

      • For Statement ID, enter the ID of the SQL statement to terminate.

      For more about the parameter, see here.

    • Get result by chunk index: Get results by chunk index, which is suitable for large result sets. Enter the following:

      • For Statement ID, enter the ID of the SQL statement whose results you want to retrieve.
      • For Chunk index, enter the target chunk index.

      For more about the parameters, see here.

  5. To interact with an existing Azure Databricks Job, select one of the following actions:

    • List jobs: Retrieves a list of jobs. For more information, see here.
    • Trigger a new job run: Runs a job and returns the run_id of the triggered run. For more information, see here.
    • Get a single job run: Returns metadata about a run, including run status (e.g., RUNNING, SUCCESS, FAILED), start and end time, run durations, cluster information, and so on. For more information, see here.
    • Cancel a run: Cancels a job run or a task run. For more information, see here.
    • Get the output for a single run: Retrieves the output and metadata of a single job or task run. For more information, see here.

Connect to Genie spaces in Copilot Studio

Important

This feature is in Public Preview.

Important

Before using this feature, do the following:

AI/BI Genie is a Azure Databricks feature that allows business teams to interact with their data using natural language. For more information about setting up Genie spaces on Azure Databricks, see What is an AI/BI Genie space. Make sure to provide your Azure Databricks Genie space a clear name and description.

To add a Genie space as a tool in a Copilot Studio agent, do the following:

  1. From the Copilot Studio sidebar, click Agent.

  2. Select an existing agent or create a new agent by clicking + Create blank agent.

    • Describe the agent by inputting a message and then click Create.
    • Or, click Skip to manually specify the agent's information.
  3. In the Tools tab, click + Add a Tool.

  4. Search for "Azure Databricks" or select Model Context Protocol.

  5. Select Azure Databricks Genie and choose your connection to Azure Databricks.

  6. Configure the connection:

    1. Name the tool. A more descriptive tool name helps the Copilot Studio agent orchestrate requests.
    2. Enter the Genie Space ID or select the Genie space from the dropdown menu.
    3. To use on-behalf-of (OBO) flow, select End user credentials as the Credentials to use. To use a singular identity, select Maker-provided credentials.
    4. Save the tool.
    5. (Optional) Refresh the tools section in the configuration screen to confirm that the Genie space is connected.
  7. In the upper-right, click Settings. Under the Orchestration section, click Yes to use generative AI orchestration for your agent's responses.

  8. (Optional) Publish the Genie-enabled Copilot Studio agent to Microsoft Teams or Microsoft 365 to distribute Genie's insights to other users. To learn how to publish your agent, see Microsoft's documentation.

Note

If the integration repeatedly returns “still processing,” even though the agent indicates it is polling the Genie space, add this instruction to your agent: “Continue polling the Genie space until you receive a response.”

You can add your agent to a Power App. To set up a Power App, see Use your Azure Databricks data to build Power canvas apps. To add your Copilot Studio agent to a Power App, see Microsoft's documentation.

To use the agent in your published app, open the published application, click the Copilot icon in the top right corner, and ask a question.

Use Azure Databricks as a knowledge source in Copilot Studio

To add your Azure Databricks data as a knowledge source to a Copilot Studio agent, do the following:

  1. From the Copilot Studio sidebar, click Agent.

  2. Select an existing agent or create a new agent by clicking + New agent.

    • Describe the agent by inputting a message and then click Create.
    • Or, click Skip to manually specify the agent’s information.
  3. In the Knowledge tab, click + Knowledge.

  4. Click Advanced.

  5. Click Azure Databricks as the knowledge source.

  6. Input the catalog name your data is in.

  7. Click Connect.

  8. Select the tables you want your agent to use as a knowledge source and click Add.

Create Dataverse virtual tables with your Azure Databricks data

You can also create Dataverse virtual tables with the Azure Databricks connector. Virtual tables, also known as virtual entities, integrate data from external systems with Microsoft Dataverse. A virtual table defines a table in Dataverse without storing the physical table in the Dataverse database. To learn more about virtual tables, see Get started with virtual tables (entities).

Note

Although virtual tables do not consume Dataverse storage capacity, Databricks recommends using direct connections for better performance.

You must have the System Customizer or System Admin role. For more information, see security roles for Power Platform.

Follow these steps to create a Dataverse virtual table:

  1. In Power Apps, from the sidebar, click Tables.
  2. Click + New Table from the menu bar and select Create a virtual table.
  3. Select an existing Azure Databricks connection or create a new connection to Azure Databricks. To add a new connection, see Create an Azure Databricks connection on Microsoft Power Platform.
    • Databricks recommends using a service principal connection to create a virtual table.
  4. Click Next.
  5. Select the tables to represent as a Dataverse virtual table.

    Note

    Dataverse virtual tables require a primary key. The key must either be a GUID (string), bigInt, or an integer. Views cannot be virtual tables, but materialized views can be.

  6. Click Next.
  7. Configure the virtual table by updating the details of the table, if necessary.
  8. Click Next.
  9. Confirm the details of the data source and click Finish.
  10. Use the Dataverse virtual table in Power Apps, Power Automate, and Copilot Studio.

For a list of known limitations of Dataverse virtual tables, see Known limitations and troubleshooting.

Should you use virtual tables or a direct connection?

Dataverse virtual tables and a direct connection offer different advantages. The method you choose depends on your use case. The table below summarizes the main capabilities of each method.

Data access method End-user credential passthrough via OAuth Zero-copy Read Write to Azure Databricks Canvas Apps Model driven apps
Direct connection
Dataverse virtual tables

Conduct batch updates

If you need to perform bulk create, update, or delete operations in response to Power Apps inputs, Databricks recommends to implement a Power Automate flow. To accomplish this, do the following:

  1. Create a canvas app using your Azure Databricks connection in Power Apps.

  2. Create a Power Automate flow using the Azure Databricks connection and use Power Apps as the trigger.

  3. In the Power Automate trigger, add the input fields that you want to pass from Power Apps to Power Automate.

  4. Create a collection object within Power Apps to collect all of your changes.

  5. Add the Power Automate flow to your canvas app.

  6. Call the Power Automate flow from your canvas app and iterate over the collection using a ForAll command.

    ForAll(collectionName, FlowName.Run(input field 1, input field 2, input field 3, …)
    

Concurrent writes

Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving conflicts that occur when concurrent writes update or delete different rows in the same data file.

Row-level concurrency is included in Databricks Runtime 14.2 or above. Row-level concurrency is supported by default for the following types of tables:

  • Tables with deletion vectors enabled and without partitioning
  • Tables with liquid clustering, unless deletion vectors are disabled

To enable deletion vectors, run the following SQL command.

ALTER TABLE table_name SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);

For more information about concurrent write conflicts in Azure Databricks, see Isolation levels and write conflicts on Azure Databricks.

Limitations

Copilot Studio limitations

  • Genie spaces in Copilot Studio support up to five questions per minute due to Genie API rate limits.

Power Apps limitations

The following PowerFx formulas calculate values using only the data retrieved locally:

Category Formula
Table function
  • GroupBy
  • Distinct
Aggregation
  • CountRows
  • StdevP
  • StdevS