Rate limiting for for file scanning

vikram 21 Reputation points
2023-05-11T16:57:47.85+00:00

As per this doc, https://learn.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-or-blocked-in-sharepoint-online ...an app running in an org with 4K people gets a quota of 2.4M REST requests a day.

Apply this math to app that needs to scan files. If the app makes 1 REST call per file and the average file size is 0.1M , that means the fastest the app can scan is 240Gigs / day. This rate is laughably low. An organization with 4K people will typically have 100 TB of data in their O365. It will take them forever to complete their initial scan of data

What am I missing?

Microsoft Graph
Microsoft Graph
A Microsoft programmability model that exposes REST APIs and client libraries to access data on Microsoft 365 services.
11,447 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Antonio 250 Reputation points Microsoft Vendor
    2023-05-11T19:42:32.0333333+00:00

    Hi vikram,
    Thanks for posting on the Q&A forum. the chart you are referring to is below:

    License count 0 – 1k 1k – 5k 5k - 15k 15k - 50k 50k+
    App 1 minute 1,200 2,400 3,600 4,800 6,000
    App daily 1,200,000 2,400,000 3,600,000 4,800,000 6,000,000

    The assumption of 1 resource per cost unit can in fact be higher depending on the operation in question.

    Resource units per request Operations
    1 Single item query, such as get item
    Delta with a token
    2 Multi item query, such as list children, except delta with a token
    Create, update, delete and upload
    5 All permission resource operations, including $expand=permissions

    As mentioned below:

    Since application limits are in resource units, the actual request rate, such as requests per minute, depends on application’s API choice and the corresponding API resource unit cost. In general, you can estimate the request rate using an average of 2 resource units per request and divide resource unit limits by 2 to get the estimated request rate.

    Note that best practices for discovering files and detecting changes at scale are also noted.

    The being said, these rates are so that if an overwhelming number of requests occurs, throttling helps maintain optimal performance and reliability of the Microsoft Graph service.

    For consideration, is that solutions that need to extract a large volume of data from Microsoft Graph should use Microsoft Graph Data Connect instead of the Microsoft Graph REST APIs. Microsoft Graph Data Connect allows organizations to extract Microsoft 365 data in bulk without being subject to throttling limits.

    --please don't forget to upvote and Accept as answer if the reply is helpful--

    0 comments No comments

  2. vikram 21 Reputation points
    2023-05-11T19:58:03.96+00:00

    Yes I am aware that some API calls are more expensive than others. I was making the point that even in near-impossible case, where a single file costs only 1 unit, it still works out to a very slow performance.

    Graph data connect seems interesting but looking at the supported datasets , it doesn't look like the file content from onedrive/sharepoint is available in there