ADF Azure Databricks Linked Service job cluster using policyId that specifies pool id

James Lee 0 Reputation points
2024-11-22T22:32:26.85+00:00

Hi, how can I set up an Azure Databricks linked service using a job cluster, using a job policy that specifies the driver and worker pool ids?

In the linked service definition, I have selected "New job cluster", which requires me to supply a Cluster Node Type and Cluster Driver Node Type.

I am then using the dynamic json to specify a "policyId" key under typeProperties. As documented here:

When I attempt to use the linked service, I get this error:

Operation on target my_activity failed: The field 'node_type_id' cannot be supplied when an instance pool ID is provided .

I then removed the "newClusterNodeType" and "newClusterDriverNodeType" keys from typeProperties, and received this error:

Operation on target my_activity failed: Databricks LinkedService should specify an existing interactive cluster ID, or an existing instance pool ID, or new cluster information for creation .

I cannot use the "Existing Instance pool" option because I do not have permissions to view the pools for security reasons. I must use the policy ID provided to me.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,259 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,966 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 25,046 Reputation points MVP
    2024-11-23T10:07:57.2233333+00:00

    The issue you're encountering arises because the ADF databricks linked service requires proper alignment between the policyId and the cluster configuration parameters. Here's how you can configure the linked service correctly to use a job policy that specifies the driver and worker pool IDs without encountering the errors:

    Steps to Configure the Linked Service

    Set the policyId Only (No newClusterNodeType or newClusterDriverNodeType) A job policy in Databricks defines the allowed configurations for a job cluster. When you specify a policyId, the policy enforces configurations such as driver and worker node types, instance pool IDs, and other cluster settings. As a result, specifying additional properties like node_type_id alongside policyId causes conflicts.

    In your case:

    • Remove newClusterNodeType and newClusterDriverNodeType entirely from the typeProperties JSON.
      • Ensure that the policy associated with the policyId includes instance pool settings for both the driver and worker nodes.
    1. Dynamic JSON Example Here's how your typeProperties section in the linked service should look when using a policyId:
         {
             "type": "AzureDatabricks",
             "typeProperties": {
                 "domain": "https://<databricks-instance>.azuredatabricks.net",
                 "accessToken": "<your-databricks-access-token>",
                 "policyId": "<your-policy-id>",
                 "newClusterVersion": "<runtime-version>", // Specify the Databricks runtime version
                 "newClusterNumOfWorker": "<number-of-workers>"
             }
         }
         
         
      
    2. Ensure Policy Compliance Verify with your Databricks administrator that:
      • The specified policyId includes instance pool settings for the driver and worker nodes.
      • The policy allows for flexibility in setting the number of workers and runtime version, as these may need to be specified dynamically in your ADF pipeline.
    3. Permissions Check If you do not have visibility into instance pools for security reasons, you can still use the policyId to enforce instance pool usage. However, ensure the policy includes these settings:
      • "instancePoolId" for both driver and worker nodes.
      • Any constraints that restrict or specify acceptable configurations.
    4. Test Configuration After updating the linked service, test the configuration by triggering a minimal pipeline that uses the linked service. If the error persists, ask your Databricks admin to verify the compatibility of the job policy with the ADF pipeline requirement

  2. phemanth 11,975 Reputation points Microsoft Vendor
    2024-11-27T12:27:40.89+00:00

    @james lee

    Thanks for using Microsoft Q&A forum and posting your query.

    it seems like the issue might be related to how the policy is being enforced and the specific requirements of the linked service configuration. Here are a few additional steps you can try:

    Steps to Resolve the Issue

    Verify Policy Configuration:

    • Ensure that the policy associated with the policyId includes both the driver and worker instance pool IDs. The policy should not only specify the pool IDs but also ensure that no conflicting settings are present. Specify Cluster Information:
    • If the policy does not fully cover all required settings, you might need to provide minimal cluster information. This can include specifying the newClusterVersion and newClusterNumOfWorker directly in the typeProperties.

    Example Configuration

    Here’s an updated example of how your typeProperties might look:

    {
        "type": "AzureDatabricks",
        "typeProperties": {
            "domain": "https://<databricks-instance>.azuredatabricks.net",
            "accessToken": "<your-databricks-access-token>",
            "policyId": "<your-policy-id>",
            "newClusterVersion": "<runtime-version>", // Specify the Databricks runtime version
            "newClusterNumOfWorker": "<number-of-workers>"
        }
    }
    
    
    
    
    • Policy Enforcement: Ensure that the policy does not forbid specifying instance_pool_id. If it does, you might need to adjust the policy settings or work with your Databricks admin to create a policy that aligns with your requirements.
    • Permissions: Since you mentioned not having permissions to view the pools, ensure that the policy is correctly set up to enforce the instance pool usage without needing additional visibility.

    Testing

    After making these adjustments, test the configuration again by triggering a minimal pipeline. If the issue persists, it might be helpful to work closely with your Databricks admin to ensure the policy and linked service configurations are fully compatible.

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.