Azure Data Factory - Copy activity using REST connector - Unable to access response headers

Balachandran, Karthikeyan 25 Reputation points
2023-07-31T03:06:55.41+00:00

Hi team,

I use ADF pipeline comprising copy activity to pull data from a remote endpoint using REST connector and then place the data on the blob. I do not have a need to use the pagination feature available with REST connector as the response is not paginated.

Problem 1 - I run job 1 on 5th July. I run job 2 on the same pipeline on 6th July - during the 2nd (and subsequent runs) I want to use the value of a response header (next_token_header) that comes out of job 1 as a request query param (next_token_query_param) for job 2. How do I access the response header value at the end of 1st run to use it for the next day? (The question is NOT about pagination which I understand is only for successive hits within job 1 and not across jobs.)

Problem 2 - How do I setup sufficient logging for this pipeline to monitor at each step the progress - like request headers/parameters etc, response headers/body, whether the EndCondition was met etc?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
{count} votes

1 answer

Sort by: Most helpful
  1. RevelinoB 3,675 Reputation points
    2023-07-31T04:00:26.4066667+00:00

    Hi Balachandran, Karthikeyan.

    Here are some proposed solutions for your problems:

    Problem 1:

    Azure Data Factory (ADF) does not directly provide a way to read headers from a response and store it for subsequent use. It primarily deals with the data within the response body. However, you can implement an intermediary solution to address this requirement.

    You could use an Azure Function or Azure Logic Apps to call your REST API and manage response headers. This function would call the API, get the response headers you are interested in, and store this value (e.g., in Azure Key Vault or Azure Table Storage). This stored value can then be used as an input for your next pipeline run.

    Here are the high-level steps for this:

    Setup an Azure Function that calls your REST API endpoint. This function should be able to extract the necessary header (next_token_header) from the response.

    Save this header value in a persisting storage such as Azure Key Vault or Azure Table Storage.

    In your ADF pipeline, before your copy activity, add an activity to fetch the stored header value from the previous run (e.g., using Lookup activity if you're using Azure Table Storage).

    Use the fetched value as a query parameter for your copy activity in the current pipeline run.

    Problem 2:

    Azure Data Factory has built-in features for monitoring pipeline performance. You can monitor activity and pipeline runs, review activity details, and set alerts for specific events. However, it might not provide the granular logging level you're looking for.

    For more detailed logging, you can consider integrating Azure Monitor and Azure Log Analytics. Here's a high-level outline:

    In Azure Data Factory's Monitoring tab, you can view activity runs details, which includes inputs and outputs of each activity. It also shows whether the pipeline has met its End condition.

    Enable diagnostic settings on your ADF instance to send logs and metrics to Log Analytics.

    In Log Analytics, you can create custom queries to monitor your pipeline's detailed progress, such as request headers/parameters, etc.

    Remember that sensitive data like credentials or personal data should be carefully handled and not logged unless necessary for audit or diagnostic purposes, in which case they should be appropriately protected and masked.

    Also, ensure that logging does not significantly impact the performance of your pipeline runs.

    I hope this answers your query?

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.