How Do I ingest Data in a Lake Database in Synapse workspace

ankit kumar 101 Reputation points
2022-01-29T11:47:25.61+00:00

So I followed the below steps and Now I am interested in knowing how we can ingest data.

I creates a Synapse Workspace and then in the Data Tab-->(+) --> (Lake Database) from the option and then created a Database and then selected the database and created a table in there with columns ID, Name and Class. The table is empty and it only has three column name.

169556-image.png

Now I selected the notebook and created a dataframe in pyspark where I have three columns (AssetID, AssetClass and Name) and have populated the data in this dataframe.

I would like to insert this data in my Lake Database which I created above where my AssetID goes in ID column, AssetClass goes in Class column and Name goes in Name column.

I am unable to find a way to do it in pyspark in Notebook. I am also unable to add the created Lake Database table in the Synapse pipeline for any activity. How do we access this created database and its table and populate it with data??

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator
    2022-02-01T00:08:15.073+00:00

    Hello @ankit kumar ,
    Thanks for the ask and using Microsoft Q&A platform .
    As I understand you have the below ask , let me know if thats not accurate .

    1. How to insert data in the table using pyspark .
    2. How to add the this table to the pipeline .

    Just to lewt you know thats Lake house is still unders "preview" and we are actively working on this at this time .
    I was able to use the below prove of code snippet to insert data into the table .

    %%sql
    INSERT INTO db2.table_1 VALUES (1,'Product','Prodycclass');
    INSERT INTO db2.table_1 VALUES (2,'Product2','Prodycclass2');

    %%pyspark
    df = spark.sql("SELECT * FROM db2.table_1")
    df.show(10)

    +---+--------+------------+
    | ID| Name| Class|
    +---+--------+------------+
    | 2|Product2|Prodycclass2|
    | 1| Product| Prodycclass|
    +---+--------+------------+

    Note : Once you crreate the lake database and table , make sure that you commits all the changes and publish the pass , once you do this then only its going to be discovered by the sparl clusters .

    To view this db/table from the pipeline , please create a linked servive for Synapse and select the manual option and pass the url of the serverless instance and other details .

    169980-image.png

    Please do let me if you have any queries .
    Thanks
    Himanshu

    -------------------------------------------------------------------------------------------------------------------------

    1. Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    2. Want a reminder to come back and check responses? Here is how to subscribe to a notification
    3. If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.