Use Azure Databricks data on Microsoft Power Platform

This page explains how to use your Azure Databricks data from the following platforms after creating a connection:

  • Power Apps: Build applications that can read from and write to Azure Databricks, while preserving your Azure Databricks governance controls.
  • Power Automate: Build flows and add actions that enable executing custom SQL or an existing Job and get back the results.
  • Copilot Studio: Build custom agents using your Azure Databricks data as a knowledge source or connect Genie Spaces as tools. See Use Azure Databricks data in Microsoft Copilot Studio.

Prerequisites

Before you connect to Azure Databricks from Power Platform, you must create a connection to Azure Databricks on Microsoft Power Platform.

Use your Azure Databricks data to build Power canvas apps

To add your Azure Databricks data to your application, do the following:

  1. From the leftmost navigation bar in Power Apps, click Create.
  2. Click Start with a blank canvas and choose your desired canvas size to create a new canvas app.
  3. From your application, click Add data > Connectors > Azure Databricks. Click the Azure Databricks connection you created.
  4. Select a catalog from the Choose a dataset sidebar.
  5. From the Choose a dataset sidebar, select all the tables you want to connect your canvas app to.
  6. Click Connect.

Data operations in Power Apps:

The connector supports create, update, and delete operations, but only for tables that have a primary key defined. When performing create operations you must always specify the primary key.

Azure Databricks supports generated identity columns. In this case, primary key values are automatically generated on the server during row creation and cannot be manually specified.

Use your Azure Databricks data to build Power Automate flows

The Azure Databricks Statement Execution API and Jobs API are exposed within Power Automate, allowing you to write SQL statements and run existing Jobs. To create a Power Automate flow using Azure Databricks as an action, do the following:

  1. From the leftmost navigation bar in Power Automate, click Create.

  2. Create a flow and add any trigger type.

  3. From your new flow, click + and search for "Databricks" to see the available actions.

  4. To write a SQL query, select one of the following actions:

    • Execute a SQL Statement: Write and run a SQL statement. Enter the following:

      • For Body/warehouse_id, enter the ID of the warehouse upon which to execute the SQL statement.
      • For Body/statement_id, enter the ID of the SQL statement to execute.

      For more about advanced parameters, see the API documentation.

    • Check status and get results: Check the status of a SQL statement and gather results. Enter the following:

      • For Statement ID, enter the ID returned when the SQL statement was executed.

      For more about the parameter, see the API documentation.

    • Cancel the execution of a statement: Terminate execution of a SQL statement. Enter the following:

      • For Statement ID, enter the ID of the SQL statement to terminate.

      For more about the parameter, see the API documentation.

    • Get result by chunk index: Get results by chunk index, which is suitable for large result sets. Enter the following:

      • For Statement ID, enter the ID of the SQL statement whose results you want to retrieve.
      • For Chunk index, enter the target chunk index.

      For more about the parameters, see the API documentation.

  5. To interact with an existing Azure Databricks Job, select one of the following actions:

    • List jobs: Retrieves a list of jobs. For more information, see the API documentation.
    • Trigger a new job run: Runs a job and returns the run_id of the triggered run. For more information, see the API documentation.
    • Get a single job run: Returns metadata about a run, including run status (e.g., RUNNING, SUCCESS, FAILED), start and end time, run durations, cluster information, and so on. For more information, see the API documentation.
    • Cancel a run: Cancels a job run or a task run. For more information, see the API documentation.
    • Get the output for a single run: Retrieves the output and metadata of a single job or task run. For more information, see the API documentation.

Create Dataverse virtual tables with your Azure Databricks data

You can also create Dataverse virtual tables with the Azure Databricks connector. Virtual tables, also known as virtual entities, integrate data from external systems with Microsoft Dataverse. A virtual table defines a table in Dataverse without storing the physical table in the Dataverse database. To learn more about virtual tables, see Get started with virtual tables (entities).

Note

Although virtual tables do not consume Dataverse storage capacity, Databricks recommends using direct connections for better performance.

You must have the System Customizer or System Admin role. For more information, see security roles for Power Platform.

Follow these steps to create a Dataverse virtual table:

  1. In Power Apps, from the sidebar, click Tables.
  2. Click + New Table from the menu bar and select Create a virtual table.
  3. Select an existing Azure Databricks connection or create a new connection to Azure Databricks. To add a new connection, see Create an Azure Databricks connection on Microsoft Power Platform.
    • Databricks recommends using a service principal connection to create a virtual table.
  4. Click Next.
  5. Select the tables to represent as a Dataverse virtual table.

    Note

    Dataverse virtual tables require a primary key. The key must either be a GUID (string), bigInt, or an integer. Views cannot be virtual tables, but materialized views can be.

  6. Click Next.
  7. Configure the virtual table by updating the details of the table, if necessary.
  8. Click Next.
  9. Confirm the details of the data source and click Finish.
  10. Use the Dataverse virtual table in Power Apps, Power Automate, and Copilot Studio.

For a list of known limitations of Dataverse virtual tables, see Known limitations and troubleshooting.

Should you use virtual tables or a direct connection?

Dataverse virtual tables and a direct connection offer different advantages. The method you choose depends on your use case. The table below summarizes the main capabilities of each method.

Data access method End-user credential passthrough via OAuth Zero-copy Read Write to Azure Databricks Canvas Apps Model driven apps
Direct connection
Dataverse virtual tables

Conduct batch updates

If you need to perform bulk create, update, or delete operations in response to Power Apps inputs, Databricks recommends to implement a Power Automate flow. To accomplish this, do the following:

  1. Create a canvas app using your Azure Databricks connection in Power Apps.

  2. Create a Power Automate flow using the Azure Databricks connection and use Power Apps as the trigger.

  3. In the Power Automate trigger, add the input fields that you want to pass from Power Apps to Power Automate.

  4. Create a collection object within Power Apps to collect all of your changes.

  5. Add the Power Automate flow to your canvas app.

  6. Call the Power Automate flow from your canvas app and iterate over the collection using a ForAll command.

    ForAll(collectionName, FlowName.Run(input field 1, input field 2, input field 3, …)
    

Concurrent writes

Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving conflicts that occur when concurrent writes update or delete different rows in the same data file.

Row-level concurrency is included in Databricks Runtime 14.2 or above. Row-level concurrency is supported by default for the following types of tables:

  • Tables with deletion vectors enabled and without partitioning
  • Tables with liquid clustering, unless deletion vectors are disabled

To enable deletion vectors, run the following SQL command.

ALTER TABLE table_name SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);

For more information about concurrent write conflicts in Azure Databricks, see Isolation levels and write conflicts.

Limitations

Power Apps limitations

The following PowerFx formulas calculate values using only the data retrieved locally:

Category Formula
Table function
  • GroupBy
  • Distinct
Aggregation
  • CountRows
  • StdevP
  • StdevS