Share via


Azure Databricks (Preview)

Azure Databricks offers a unified platform for scalable data management, governance, and analytics, combining streamlined workflows with the ability to handle diverse data types efficiently

This connector is available in the following products and regions:

Service Class Regions
Copilot Studio Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Apps Premium All Power Apps regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Automate Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Contact
Name Databricks Support
URL https://help.databricks.com
Email eng-partner-eco-help@databricks.com
Connector Metadata
Publisher Databricks Inc.
Website https://www.databricks.com/
Privacy policy https://www.databricks.com/legal/privacynotice
Categories Data

Before you begin

Before you connect to Azure Databricks from Power Platform, you must meet the following requirements:

  • You need a Microsoft Entra ID (formerly Azure Active Directory) account to sign in.
  • You need a premium Power Apps license.
  • You need a Azure Databricks account
  • You need access to a SQL warehouse in Azure Databricks.

Optional: Virtual Network support

If your Azure Databricks workspace uses Virtual Networks, there are two ways to connect. If this does not apply to your environment, you may skip this section.

  1. Integrate Power Platform with resources inside your virtual network without exposing them over the public internet. To connect to the private endpoint of your Azure Databricks workspace, do the following after you configure private connectivity to Azure Databricks:

    1. Set up Virtual Network support for Power Platform.

      Note: If your Power Platform virtual network (whether Primary or Secondary) is different than your Azure Databricks virtual network, then you need to use VPC peering to connect the virtual network with Azure Databricks.

    For more information about virtual networks, see Virtual Network support overview.

  2. Enable access with hybrid deployment, where a front-end private link with a public endpoint is protected by a Workspace IP Access List. To enable access, do the following:

    1. Enable public access at workspace level. For more details, see Configure IP access lists for workspaces.
    2. Add the AzureConnectors IP range, or specific Power Platform IP range based on your environment’s region, to your Workspace IP access list.

Optional: Create a Microsoft Entra Service Principal​

Before connecting, complete the following steps to create, set up, and assign a Microsoft Entra Service Principal to your Azure Databricks account or workspace:

Step 1: Create a connection to Azure Databricks

Note: If you're only using Copilot Studio, the Azure Databricks connection must be created in Power Apps or Power Automate. Then it can be used in Copilot Studio.

To add an Azure Databricks connection, do the following:

  1. In Power Apps, from the sidebar, click Connections.
  2. Click + New connection in the upper-right corner.
  3. Search for "Azure Databricks" and select the Azure Databricks tile.
  4. Select your Authentication type from the drop down menu.
  5. Select your authentication method and enter your authentication information.
    • If your Power Platform deployment and Azure Databricks account are in the same Microsoft Entra tenant, you can use OAuth connection. Enter the following information:
      • For Server Hostname, enter the Azure Databricks SQL warehouse hostname. Example: adb-3980263885549757139.2.azuredatabricks.net
      • For HTTP Path, enter the SQL warehouse HTTP path. Example: /sql/1.0/warehouses/a9c4e781bd29f315
      • Click Create.
      • Sign in with your Microsoft Entra ID.
    • Service principal connection can be used in any scenario. Before connecting, create a Microsoft Entra service principal. Enter the following information:
      • For Client ID, enter the service principal ID.
      • For Client Secret, enter the service principal secret.
      • For Tenant, enter the service principal tenant.
      • For Server Hostname, enter the Azure Databricks SQL warehouse hostname.
      • For HTTP Path, enter the SQL warehouse HTTP path
      • Optional: After you create the connection, you can rename or share the service principal connection with your team members.
    • To find your Azure Databricks SQL warehouse connection details, see Get connection details for an Azure Databricks compute resource.

Step 2: Use Azure Databricks connection

Use Azure Databricks to build Power Apps

To add data in Azure Databricks to your canvas app, do the following:

  1. From the leftmost navigation bar, click Create.
  2. Click Start with a blank canvas and select your desired canvas size to create a new canvas app.
  3. From your application, click Add data > Connectors > Azure Databricks. Select the Azure Databricks connection you created.
  4. Select a catalog from the Choose a dataset sidebar.
  5. From the Choose a dataset sidebar, select all the tables you want to connect your app to.
  6. Click Connect.

Use Azure Databricks to Build Power Automate flows

We have exposed the Azure Databricks Statement Execution API within Power Automate, allowing customers to write SQL statements. To create a Power Automate flow using Databricks as an action, do the following:

  1. From the leftmost navigation bar, click Create.
  2. Create a flow and add any trigger type.
  3. From your new flow, click + and search for Azure Databricks to see the available actions.
  4. Select your desired action from the four available options:
    • Execute a SQL Statement: Write and run a SQL statement. For more information, see here.

      Note: The warehouse_id is the segment that appears after the final slash in your warehouse's HTTP path. For example, in the path /sql/1.0/warehouses/[warehouse_id], the value [warehouse_id] represents your warehouse ID.

    • Check status and get results: Check the status of a SQL statement and gather results. For more information see here.

    • Cancel the execution of a statement: Terminate execution of a SQL statement. For more information see here.

    • Get result by chunk index: Get results by chunk index, which is suitable for large result sets. For more information see here.

Use Azure Databricks as a knowledge source in Copilot Studio

To add data in Azure Databricks as a knowledge source to a Copilot Studio agent, do the following:

  1. From the sidebar, click Agent.
  2. Select an existing agent or create a new agent by clicking + New agent.
    1. When creating a new agent, you can optionally describe the agent by inputting a message. Click Create.
    2. Or, click Skip to manually specify the agent’s information.
  3. In the Knowledge tab, click + Knowledge.
  4. Click Advanced.
  5. Select Azure Databricks as the knowledge source.
  6. Input the catalog name your data is in.
  7. Click Connect.
  8. Select the tables you want your agent to use as a knowledge source and click Add.

Create/Update/Delete in Power Apps

The connector supports create, update, and delete operations, but only for tables that have a primary key defined. When performing create operations you must always specify the primary key.

Note: Azure Databricks also supports generated identity columns. If this approach is used, when creating new rows, the value will be generated automatically on the server and you do not and cannot specify a value for the primary key when creating a new record.

Conduct batch updates

Azure Databricks does not support conducting concurrent updates on the same row.

When you need to perform bulk create, update, or delete operations in response to Power App inputs, the best practice is to implement a Power Automate flow. To accomplish this, do the following:

  1. Create your canvas Power App using your Azure Databricks connection.

  2. Create a Power Automate flow using the Azure Databricks connection and use Power Apps as the trigger.

    • In the Power Automate trigger, add the input fields that you want to pass from Power Apps to Power Automate.
  3. Create a collection object within Power Apps to collect all of your changes.

  4. Add the Power Automate flow to your Power Apps.

  5. Call the Power Automate flow from your Power App and iterate over the collection using a ForAll command.

    ForAll(collectionName, FlowName.Run(input field 1, input field 2, input field 3, …)
    

Concurrent write conflicts

Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving conflicts that occur when concurrent writes update or delete different rows in the same data file.

Row-level concurrency is generally available on Databricks Runtime 14.2 and above. Row-level concurrency is supported by default for the following conditions:

  • Tables with deletion vectors enabled and without partitioning.
  • Tables with liquid clustering, unless you have disabled deletion vectors

To enable deletion vectors, execute SQL below

-- Enable a feature using a table property and update the table protocol.
ALTER TABLE table_name SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);

For more information about concurrent write conflicts in Azure Databricks, see here

Limitations

  • The Power Platform connector does not support government clouds.

Power App Limitations

The following PowerFx formulas​ only perform calculations on the local result set:

Category Formula
Table function - GroupBy
- Distinct
Aggregation - CountRows
- StdevP
- StdevS

Power Automate Limitations

  • The data returned from a Power Automate query must not be null, if any value is null power automate will raise an error.

Creating a connection

The connector supports the following authentication types:

OAuth Connection OAuth Connection All regions Not shareable
Service Principal Connection Service Principal Connection All regions Shareable
Default [DEPRECATED] This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility. All regions Not shareable

OAuth Connection

Auth ID: oauth2-auth

Applicable: All regions

OAuth Connection

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
Server Hostname (Example: adb-3980263885549757139.2.azuredatabricks.net) string Server name of Databricks workspace True
HTTP Path (Example: /sql/1.0/warehouses/a9c4e781bd29f315) string HTTP Path of Databricks SQL Warehouse True

Service Principal Connection

Auth ID: oAuthClientCredentials

Applicable: All regions

Service Principal Connection

This is shareable connection. If the power app is shared with another user, connection is shared as well. For more information, please see the Connectors overview for canvas apps - Power Apps | Microsoft Docs

Name Type Description Required
Client ID string True
Client Secret securestring True
Tenant string True
Server Hostname (Example: adb-3980263885549757139.2.azuredatabricks.net) string Server name of Databricks workspace True
HTTP Path (Example: /sql/1.0/warehouses/a9c4e781bd29f315) string HTTP Path of Databricks SQL Warehouse True

Default [DEPRECATED]

Applicable: All regions

This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Cancel statement execution

Requests that an executing statement be canceled. Callers must poll for status to see the terminal state.

Check status and get results

Get the status, manifest and results of the statement

Execute a SQL statement

Execute a SQL statement and optionally await its results for a specified time.

Get result by chunk index

After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index.

Cancel statement execution

Requests that an executing statement be canceled. Callers must poll for status to see the terminal state.

Parameters

Name Key Required Type Description
Statement ID
statement_id True string

Statement ID

Check status and get results

Get the status, manifest and results of the statement

Parameters

Name Key Required Type Description
Statement ID
statement_id True string

Statement ID

Returns

Statement execution response

Execute a SQL statement

Execute a SQL statement and optionally await its results for a specified time.

Parameters

Name Key Required Type Description
warehouse_id
warehouse_id True string

Target warehouse ID

statement
statement True string

The SQL statement to execute. The statement can optionally be parameterized, see parameters

name
name True string

Parameter marker name

type
type string

Parameter data type

value
value string

Parameter value

catalog
catalog string

Default catalog for execution

schema
schema string

Default schema for execution

disposition
disposition string

Result fetching mode

format
format string

Result set format

on_wait_timeout
on_wait_timeout string

Action on timeout

wait_timeout
wait_timeout string

Result wait timeout

byte_limit
byte_limit integer

Result byte limit

row_limit
row_limit integer

Result row limit

Returns

Statement execution response

Get result by chunk index

After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index.

Parameters

Name Key Required Type Description
Statement ID
statement_id True string

Statement ID

Chunk index
chunk_index True string

Chunk index

Returns

Definitions

SqlBaseChunkInfo

Metadata for a result set chunk

Name Path Type Description
byte_count
byte_count integer

Number of bytes in the result chunk

chunk_index
chunk_index integer

Position in the sequence of result set chunks

row_count
row_count integer

Number of rows in the result chunk

row_offset
row_offset integer

Starting row offset in the result set

SqlColumnInfo

Name Path Type Description
name
name string

Column name

position
position integer

Column position (0-based)

type_interval_type
type_interval_type string

Interval type format

type_name
type_name SqlColumnInfoTypeName

The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.

type_precision
type_precision integer

Number of digits for DECIMAL type

type_scale
type_scale integer

Number of decimal places for DECIMAL type

type_text
type_text string

Full SQL type specification

SqlColumnInfoTypeName

The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.

The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.

SqlStatementResponse

Statement execution response

Name Path Type Description
manifest
manifest SqlResultManifest

Result set schema and metadata

result
result SqlResultData
statement_id
statement_id string

Statement ID

status
status SqlStatementStatus

Statement execution status

SqlResultManifest

Result set schema and metadata

Name Path Type Description
chunks
chunks array of SqlBaseChunkInfo

Result chunk metadata

format
format string
schema
schema SqlResultSchema

Result set column definitions

total_byte_count
total_byte_count integer

Total bytes in result set

total_chunk_count
total_chunk_count integer

Total number of chunks

total_row_count
total_row_count integer

Total number of rows

truncated
truncated boolean

Result truncation status

SqlStatementStatus

Statement execution status

Name Path Type Description
error
error SqlServiceError
state
state SqlStatementState

Statement execution state

SqlStatementState

Statement execution state

Statement execution state

SqlServiceError

Name Path Type Description
error_code
error_code string
message
message string

Error message

SqlResultSchema

Result set column definitions

Name Path Type Description
column_count
column_count integer
columns
columns array of SqlColumnInfo

SqlResultData

Name Path Type Description
byte_count
byte_count integer

Bytes in result chunk

chunk_index
chunk_index integer

Chunk position

data_array
data_array SqlJsonArray

Array of arrays with string values

external_links
external_links array of SqlExternalLink
next_chunk_index
next_chunk_index integer

Next chunk index

next_chunk_internal_link
next_chunk_internal_link string

Next chunk link

row_count
row_count integer

Rows in chunk

row_offset
row_offset integer

Starting row offset

SqlJsonArray

Array of arrays with string values

Name Path Type Description
Items
array of
Name Path Type Description
byte_count
byte_count integer

Bytes in chunk

chunk_index
chunk_index integer

Chunk position

expiration
expiration date-time

Link expiration time

external_link
external_link string
http_headers
http_headers object

Required HTTP headers

next_chunk_index
next_chunk_index integer

Next chunk index

next_chunk_internal_link
next_chunk_internal_link string

Next chunk link

row_count
row_count integer

Rows in chunk

row_offset
row_offset integer

Starting row offset