How to validate an API call in Data Factory OR handle a failed record

Paul Potter 30 Reputation points
2023-03-05T23:27:14.09+00:00

I have a data factory pipeline which calls Microsoft Graph API to get an extract of ALL entries in Active Directory, id's and email addresses etc. There is a second pipeline that runs a foreach loop utilising the id of each entry and getting their Managers details. However, there are many entries that do not have a Manager attached, for example a departmental account instead of an actual person. So, my pipeline fails when the API call fails to retrieve any data because that account isnt setup with a Manager.

User's image

Is there a way in which I can validate the API call is a valid call BEFORE the "copydata" task works? I dont want to set a variable to the fail part of the task, incase it fails for a genuine reason, rather than the api call being invalid.

Looking for suggestions please or how you have handled this previously. I see validation tasks for datasets, but not for a REST API.

OR how best to handle the failure, as my pipeline hasnt necessarily "Failed" in a traditional sense. Currently I see "

  "errorCode": "2200",
    "message": "Failure happened on 'Source' side. ErrorCode=RestCallFailedWithClientError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Rest call failed with client error, status code 404 NotFound, please check your activity settings.\nRequest URL: https://graph.microsoft.com/v1.0/users/%7XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX%7D/?$expand=manager($levels=max;$select=id,displayName,userPrincipalName,createdDate).\nResponse: {\"error\":{\"code\":\"Request_ResourceNotFound\",\"message\":\"Resource '{60784341-1c36-XXXX-XXXX-XXXXXXXXXX}' does not exist or one of its queried reference-property objects are not present.\",\"innerError\":{\"date\":\"2023-03-06T03:30:30\",\"request-id\":\"XXXXXXXX-XXX-XXX-XXX-XXXXXXXXXXX\",\"client-request-id\":\"XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX\"}}},Source=Microsoft.DataTransfer.ClientLibrary,'",
    "failureType": "UserError",
    "target": "CopySourceToSink",

", so maybe just a better way of the pipeline not failing if I cant prevalidate the API before the ForEach loop runs.

Thanks.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,236 Reputation points
    2023-03-08T18:40:26.7666667+00:00

    @Paul Potter Hello and welcome to Microsoft Q&A.

    As I understand you seek a better way to handle and differentiate 'expected errors' and 'unexpected errors' in pipeline execution.

    A note before I get into solutions: I see you are using Lookup activity, and probably have a great many results. Are you aware of the limits of Lookup activity? It returns first 5000 rows, or 4MB. I'm just trying to prevent future headaches if you seem to be missing some.

    An ideal solution would be to somehow only retrieve the IDs of people with managers, and not accounts. However I'm not expert in Graph API, so I'll work with next best thing.

    You have stated for all deparmental accounts you get the same expected error message. If we can assume this error message is unique to this situation, we can craft some logic to handle it and not fail the forEach. Also, deliberately fail the forEach if the error message is different.

    To do this, I am thinking an If condition following your Copy Activity, connected by an on-fail dependency.

    User's image

    Since there is no success dependency coming out of the copy activity, the failure of copy activity will not fail the pipeline as long as the if condition succeeds.

    For the condition , we can test for part of the error message with something like:

    @less(1,indexof(activity('Copy data1').error.message,'does not exist or one of its queried reference-property objects are not present.'))
    
    

    So we want the condition to be successful when the message contains part of your expected message.

    When it doesn't we put a failure activity & message in the false branch.

    User's image

    1 person found this answer helpful.

  2. Paul Potter 1 Reputation point
    2023-06-07T21:44:52.41+00:00

    The ULTIMATE solution MUST include the dummy setvariable ONSKIP for the success lineage. This is the one thing that will determine that the pipeline is marked as a success EVEN if the fail IF condition is enacted.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.