API fetching data from GIT loading to Storage Account in Parquet

ThakarPrateekS-5118 0 Reputation points
2024-10-21T18:03:34.96+00:00

I have a api that i am calling via ADF using copy activity

the data that i am bringing is 28 days and i want to build historical data

in the incoming data there is a column called "day" which holds date.

i want to reference that and make the adf pipeline so it writes incrementally

what would be the approach?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator
    2024-10-21T23:39:51.1733333+00:00

    Hi @ThakarPrateekS-5118

    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    As I understand that you have an API providing 28 days of data and want to build a historical data pipeline in Azure Data Factory. You want to use the 'day' column to incrementally load new or updated data.

    The three main ADF activities needed to accomplish this use case are lookup activity, copy data activity, and stored procedure activity.

    Lookup activity: Lookup activity reads and returns the content of a configuration file or table. It also returns the result of executing a query.

    Copy activity: Copy activity copies data among data stores located on-premises and in the cloud.

    Stored procedures: A stored procedure is a prepared SQL code that can be saved as a stored procedure in the database so the code can be reused over and over again. You can also pass parameters to a stored procedure so that the stored procedure can act based on the parameter value(s) that is passed.

    Watermark column: A watermark is a column in each table that indicates when the corresponding row was last created or modified. The watermark column is used to find out or slice the new or updated records for every run.

    Mostly timestamp column will be chosen as a watermark column.

    For repro purpose I took source as SQL database and sink as storage account.

    Here is the step-by-step guide to building an incremental data pipeline in Azure Data Factory.

    Sample data and Stored procedure in SQL database:

    User's image

    User's image

    Lookup activity configuration:

    User's image User's image Copy data activity:User's image Stored procedure activity:User's image Pipeline status:

    User's image Output:User's image Now added 2 more extra rows:

    User's image

    After adding the extra rows, once we done with the pipeline execution the output is:

    User's image

    By following these steps and considering the additional factors, you can effectively create a historical data pipeline in Azure Data Factory that incrementally loads data based on the "day" column in your incoming data.

    For more details, please refer below links:

    https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-portal

    https://www.youtube.com/watch?v=AOClU3s9jXw&t=12s

    I hope this information helps. Please do let us know if you have any further queries.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.