Understanding SCIM Client Request Behavior and Expected Load for Large-Scale Operations

Amrit Khanna 0 Reputation points
2025-06-17T06:32:07.4666667+00:00

We are developing a SCIM 2.0 compliant service and are preparing to integrate with Microsoft Entra ID as an identity provider for our customers. To ensure we can build a robust and scalable solution, we need to understand the specific request patterns of the Entra ID SCIM client, particularly for scenarios involving a large number of users (e.g., thousands).

Our primary questions are about the expected request load for different SCIM operations:

1. Initial Bulk Synchronization / Onboarding:

  • When a customer onboards and assigns our SCIM application to a group in Entra ID containing several thousand users, at what rate will the Entra ID SCIM client send requests to our server?
  • How are these requests typically distributed across our SCIM endpoints? For example, what is the expected rate of POST /Users requests?
  • Does the client send requests sequentially or in parallel? If in parallel, what is the typical level of concurrency we should expect?

2. Bulk Deactivation / Deletion:

  • Similarly, if an administrator unassigns a large number of users from our application (or if users are deleted), what is the expected request rate for PATCH requests (to set active to false) or DELETE /Users/{id} requests?
  • How does Entra ID handle the deletion of a group with a large membership? Will it trigger individual requests for each user, and if so, at what rate?

3. Client-Side Throttling and Retry Logic:

  • If our server responds with a 429 Too Many Requests status code and includes a Retry-After header, what is the precise behavior of the Entra ID SCIM client?
  • Will it pause all provisioning activities for the duration specified in the Retry-After header for that specific customer?
  • In the case of throttling during a large group membership update, will the client retry only the failed requests, or will it re-synchronize the entire group membership?
Microsoft Security Microsoft Entra Microsoft Entra ID
{count} votes

1 answer

Sort by: Most helpful
  1. Megan Truong 635 Reputation points Independent Advisor
    2025-06-18T08:46:30.0866667+00:00

    Hello @Amrit Khanna

    Thank you for contacting Q&A Forum. As for your question point:

    1. Initial Bulk Synchronization / Onboarding: Rate limitation is currently only available for apps in the gallery that Microsoft has built and onboarded. There is no rate limiter for custom non-gallery apps. Each provisioning job acts independently, with no knowledge of the others; the period between cycles is 40 minutes, though for excessively large sets of users/groups, the cycle may take considerably longer. Reference: https://learn.microsoft.com/en-us/entra/identity/app-provisioning/application-provisioning-when-will-provisioning-finish-specific-user#how-long-will-it-take-to-provision-users
    2. Bulk Deactivation / Deletion: The rate and time length are also determined by the number of users that the administrator want to deactivate or delete. One configured instance of provisioning on an AAD Enterprise App/custom non-gallery app corresponds to one provisioning job. If you have ten clients, each with one provisioning job configured, there will be ten provisioning jobs.
    3. Client-Side Throttling and Retry Logic: If you enable automated provisioning for your SCIM, it will retry after failing with the error "429 Too Many Requests".  It will suspend all provisioning activities for the time period indicated in the Retry-After header for that particular client.  Rather than resynchronizing the full group membership, the client retries only the failed requests.

    Kindly let me know if this work for you and please let me know if you have any further questions.

    If I have answered your question, please accept this answer as a token of appreciation and don't forget to give a thumbs up for "Was it helpful"!

    Best regards,

    Megan.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.