Call API with dynamic URL and store JSON results in DataLake Storage

Lucas Kluge 20 Reputation points
2024-05-30T11:55:42.17+00:00

I am trying to make implement the following scenario using Data Factory:

I am making multiple API calls, relying on a dyanmic URL (e.g. "url.com/api/{ID}" for a list of IDs). The resulting JSON from each of the API calls should be stored as a single file in a Data Lake Storage.

The problem I am encountering is that I can't figure out how to use dynamic URLs when using Copy Acitivies.

What would be a recommended way to implement this?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,395 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,904 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 17,791 Reputation points
    2024-05-31T08:47:11.1566667+00:00

    You can create a new linked service for the REST API where you provide the base URL and any necessary authentication details.

    Then, create a dataset for the Data Lake Storage where the JSON files will be stored.

    Configure it to use the linked service you created earlier.

    Create a dataset for the REST API, set the request method and then configure it to use the REST linked service you created earlier.

    In ADF, drag the ForEach activity into the pipeline canvas.

    Set the items property of the ForEach activity to the list of IDs you want to use in the API calls. This could be a JSON array or a dataset containing the IDs.

    Inside the ForEach Activity, add a Copy Data activity inside the ForEach activity.

    Source Configuration:

    • Set the source dataset to the REST API dataset you created.
    • In the dataset settings, use dynamic content to construct the URL with the current item from the ForEach loop. For example, if your list of IDs is called IDList, you can use @concat('https://url.com/api/', item().ID) to dynamically set the URL for each API call.

    Sink Configuration:

    • Set the sink dataset to the Azure Data Lake Storage dataset.
    • Configure the file path dynamically to store each JSON result as a separate file. For example, you could use @concat('output/', item().ID, '.json') to create a unique file for each API response.

    Your dynamic content expressions :

    • Source Dataset URL: @concat('https://url.com/api/', item().ID)
    • Sink Dataset File Path: @concat('output/', item().ID, '.json')

    If your list of IDs is static and you want to define it directly within the pipeline, you can use:

    
    {
    
      "IDList": [
    
        {"ID": "123"},
    
        {"ID": "456"},
    
        {"ID": "789"}
    
      ]
    
    }
    
    

    And set the ForEach activity items property to @activity('GetIDList').output.IDList.

    https://stackoverflow.com/questions/75931254/azure-data-factory-loop-through-parametrized-linked-service

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful