Single Node Databricks Job cluster from Azure Data Factory

viji.e 96 Reputation points
2021-04-23T16:31:47.89+00:00

I need to create Single Node Databricks Job Cluster from Azure Data Factory. Currently in Azure Data Factory, there is no option to choose the Cluster mode like Standard or Single Node. We cant mention Worker as 0 since the Standard cluster needs atleast one worker node to execute the Spark commands whereas it is not the case with Single Node. There is an alternate way of using Single Node cluster via Interactive Cluster option, but we want to have it as job cluster so that it gets deleted automatically after the process completes.90835-adf-job-cluster-adb.png

Will this feature get added to ADF in near future?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,174 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,677 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 89,376 Reputation points Microsoft Employee
    2021-04-26T04:26:09.363+00:00

    Hello @viji.e ,

    Thanks for the ask and using Microsoft Q&A platform.

    Unfortunately, you cannot use Single Node option using New Job cluster from Azure Data Factory.

    I would suggest you to vote up an idea submitted by another Azure customer.

    https://feedback.azure.com/forums/270578-data-factory/suggestions/42777137-add-to-databricks-linked-service-the-cluster-mode

    All of the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.

    If you want to use single node clusters, you can create a single node cluster from Azure Databricks portal and select it by choosing Existing interactive cluster while creating a new linked service.

    91142-image.png

    Hope this helps. Do let us know if you any further queries.

    ------------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.


2 additional answers

Sort by: Most helpful
  1. Miles Cole 6 Reputation points
    2021-10-01T18:24:49.887+00:00

    This is possible. The site below details out how you get ADF to spin up a SingleNode cluster, not too hard, just not straightforward from the UI.

    Credit to the below site where i found this:
    utiliser-un-automated-cluster-single-node

    137032-image.png

    Linked Service JSON:

    {  
        "name": "Databricks",  
        "properties": {  
            "annotations": [],  
            "type": "AzureDatabricks",  
            "typeProperties": {  
                "domain": "<Domain>",  
                "authentication": "MSI",  
                "workspaceResourceId": "<resouceID>",  
                "instancePoolId": "<poolD>",  
                "newClusterNodeType": "Standard_DS12_v2",  
                **"newClusterNumOfWorker": "0",**  
                "newClusterSparkConf": {  
                    **"spark.master": "local[*, 4]",  
                    "spark.databricks.cluster.profile": "singleNode",**  
                    "spark.databricks.delta.preview.enabled": "true"  
                },  
                "newClusterSparkEnvVars": {  
                    "PYSPARK_PYTHON": "/databricks/python3/bin/python3"  
                },  
                "newClusterVersion": "9.1.x-scala2.12",  
                "newClusterInitScripts": []  
            },  
            "connectVia": {  
                "referenceName": "AutoResolveIntegrationRuntime",  
                "type": "IntegrationRuntimeReference"  
            }  
        }  
    }  
    
    1 person found this answer helpful.

  2. Karthik Elavan 1 Reputation point
    2022-12-22T14:36:46.917+00:00

    Hi Team,
    I am facing similar issues which i used new job cluster choose in my ADF linkedservice, the third party libraries can't able to use it. it is throwing the error as below like.

    I am trying to execute the notebook via azure datafactory to Azure Databricks notebook but unable to success my ADF pipeline, if I run the azure databricks notebook separately on my pyspark scripts, there is no error but if run via the ADF pipeline, i am getting below like.

    ModuleNotFoundError: No module named 'prophet'

    ModuleNotFoundError Traceback (most recent call last) in 6 import pandas as pd 7 import pyspark.pandas as ps ----> 8 from prophet import Prophet 9 from pyspark.sql.types import StructType, StructField, StringType, FloatType, TimestampType, DateType, IntegerType 10

    also if i choose exiting job cluster which is already install my libraries so that the notebook executed properly in ADF.

    how do we handle to use new job cluster option to use execute my databricks notebook in my ADF. there is option append libraries to provided my DBSF filestore location of .wheel files but it is not able install and getting as below error.

    %pip install /dbfs/FileStore/jars/prophet/prophet-1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    Python interpreter will be restarted.
    Processing /dbfs/FileStore/jars/prophet/prophet-1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    Requirement already satisfied: matplotlib>=2.0.0 in /databricks/python3/lib/python3.8/site-packages (from prophet==1.1) (3.4.2)
    WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/setuptools-git/
    WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/setuptools-git/
    WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))':

    Kindly help me some one to resolve the issues.273346-clusterdetails.png

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.