How to extract all users using Azure Data Factory

Adrian Heald 20 Reputation points
2024-09-03T04:44:31.68+00:00

Using Azure Data Factory, I am trying to export all the User records from our AAD, there are around 700,000 records expected. But I can't get it to return more than a few thousand at best.

I have a linked service set up to MSGraph API that I know works as I get other data from AAD via the same service.

I've created a pipeline in ADF with a Copy Data activity, I've added ConsistencyLevel=eventual to the header.

the Relative URL is: users?$top=999&$count=true

When I debug this today, I got only 800 records instead of the expected 800K, strangely when I ran it yesterday, I consistently got 13666 records.

The users are students, so I thought I'd try to extract users from just one school, onPremisesExtensionAttributes/extensionAttribute5 holds the school number so I added a $filter, I also added a $select to the query string.

users?$top=999&$count=true&$filter=onPremisesExtensionAttributes/extensionAttribute5 eq '8897'&$select=accountEnabled,createdDateTime,department,displayName,givenName,jobTitle,mail,mailNickname,onPremisesExtensionAttributes,surname,userPrincipalName

I expected 2100 records but only got 3.

I'm sure I'm doing something wrong or stupid here but can't figure it out.

Any help would be very much appreciated.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,595 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 23,486 Reputation points
    2024-09-03T06:51:12.19+00:00

    The Graph API limits the number of records returned in a single request. Even if you specify $top=999, you might not get all the records in a single response. You need to handle pagination to retrieve all the records.

    The API will return a @odata.nextLink property if there are more records to retrieve. You should use this link to request the next page of records.

    Using ConsistencyLevel=eventual and $count=true helps in scenarios where you want to get the total number of records. However, this setting might still limit the number of records returned per page.

    Here is what you need to do :

    In your pipeline, use a Copy Data activity to connect to the Microsoft Graph API using the linked service you've already set up.

    In the Copy Data activity, ensure that pagination is configured correctly.

    Add a dynamic dataset that handles the @odata.nextLink for fetching subsequent pages. You can achieve this using ADF's built-in pagination support by setting the pagination rule in the dataset or activity.

    If the built-in pagination does not work as expected, consider using a loop with a ForEach activity to loop through each page of data until no more @odata.nextLink is returned.

    Adjust the $top value to something smaller, such as 100, to better handle API limits. Use $skip if you need to skip a certain number of records.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.