How to load data from Lake database into Synapse Notebook through Pipeline?

Devender 61 Reputation points
2022-10-21T07:23:22.157+00:00

Hi Community,
I have a Spark Datalake in Synapse in which i have 6 tables. The data in all tables i have loaded from 6 different csv files. These csv files have been loaded and updated manually by third party if some new data comes. In future also the data in these files will be loaded manually. File name will always be same.

Currently in my synapse Notebook i am using those 6 tables data for transforming a new file that came for processing and i have transformed one file by using Pyspark in my synapse notebook. But In my case i am manually giving the file name in my code which is connected to Synapse ADLS as our Source files are coming there but in future the this process will be automated. The Code Should work for every new Source File that came for Processing .

My Question here is about the 6 Tables which are in my Spark Datalake when we create a ETL process for it in Synapse and load my code in Notebook Activity will at that time the 6 tables i am using in my Code will able to read data from those tables and Suppose if some new data been updatedto those 6 tables will i able to see changes in my tables and same in my transformed file also.

This is the Code which i am using for loading data from one of the table from my lake database into my notebook currently
%%pyspark
df_IndustryData = spark.sql("SELECT * FROM fplus.Industry_data")
Display(df_Industry_data)

Thanks in advance for your responses

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,395 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,939 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,599 questions
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 31,061 Reputation points Microsoft Employee
    2022-10-25T09:33:53.643+00:00

    Hi @Devender ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As per my understanding, you want to know if the updated data would reflect in your lake database if the source file gets updated. Please let me know if my understanding is incorrect.

    Lake databases use a data lake on the Azure Storage account to store the data of the database. The data can be stored in Parquet, Delta or CSV format and different settings can be used to optimize the storage. Every lake database uses a linked service to define the location of the root data folder

    Since tables in lake databases are the reference point to the ADLS files, it would show the latest data from the file.

    Coming to the next query about how to make the notebook dynamic for new tables . You can parameterize the above query and get the tablenames dynamically to the synapse notebook.

    First of all, in your synapse pipeline, using GetMetadata activity, you can get all the filenames present in the ADLS folder . Then iterate through all the files using ForEach activity and pass on the filenames to the Synapse notebook as a toggle parameter inside Foreach dynamically.

    253862-image.png

    253830-image.png

    253785-image.png

    To check how to parameterize your notebook, kindly refer to the following video: Parameterize Synapse notebook in Azure Synapse Analytics

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful