Connection between Delta Lake of DataFactory and HDInsight Spark

Vaibhav 65 Reputation points
2024-03-26T06:56:39.2833333+00:00

Hi Everyone,

Below is the my design:

  1. I am using ADF to ingest the data from Oracle DB to Raw Layer.
  2. Then I am using ADF DataFlow to create and load data from RawLayer to DeltaLake. PFA.
    User's image
  3. Now I want to use HDinsight Spark cluster for further transformation.

Below are my queries:

  1. How can I link my Deltalake created through ADF and do further transformation from HDinsight Spark cluster.
  2. Is there any way to do select * and view the data of the deltalake.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,546 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247-1375 11,206 Reputation points
    2024-03-26T12:41:52.2633333+00:00

    Hi Vaibhav,

    Thanks for reaching out to Microsoft Q&A.

    How can I link my Deltalake created through ADF and do further transformation from HDinsight Spark cluster.

    To link your Delta Lake created through ADF and perform further transformations from an HDInsight Spark cluster, follow these steps...

    • Mount your Delta Lake storage to your HDInsight Spark cluster
    • Try the 'azure-datalake-store' library in python to mount ADLS to your hdnisight cluster. This allows you to access the delta Lake files directly using spark jobs running on the cluster.
    • Once the delta Lake storage is mounted, use spark to read the delta Lake files as dataframes and perform further transformations as required. The best way is to write spark jobs using PySpark to read the delta Lake data, apply your transformations, and then write the results back to delta Lake

    Is there any way to do select * and view the data of the deltalake.

    On the mounted delta lake files, you can query it using spark sql and view the results. For ex., reading the data from delta table...

    User's image

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.