Hello,
I have written a GraphAPI app that visits each directory in given users' Onedrive and reports statistics about the folders and files. It makes calls to the API asynchronously, but I have been careful to write it with throttling and usage limits in mind. The three limits I am using are 1) only process a set number of users at once, 2) throttle requests given a maximum number of requests per minute, and 3) set a total maximum number of requests allowed to process at any one time. I have real time metrics to verify this is happening as expected. As required, I adhere to all the throttling requirements returned from the API. 1) I read the header for each request and if a retryAfter exists, I sleep for that length of time. 2) If I receive a 429 or 503 error, I sleep for the given amount of time.or more. I am retrying with exponential backoff sometimes up to a five minute or more wait between retries.
Even given the care I have used to stay within the required limits, I am seeing errors that I can only guess can be attributed to usage limits. Examples:
- I will receive random 400 errors on request to 'drives/{drive_id}/items/{item_id}/children'
- There might be other endpoints, but this is the one I request most often, so that's where I see the error
- Once a 400 error like this happens, I can retry as long as I want (even resetting the session), but it won't clear up
- I can later make the same exact request and the 400 error will be gone
- All requests are being formatted the same, so I don't see a reason for these to be any different.
- These don't give a detailed reason other than "Invalid request"
message='{"error":{"code":"invalidRequest","message":"Invalid request"}}', url=URL('https://graph.microsoft.com/v1.0/drives/XXXX/items/XXXX/children?$select=id,name,parentReference,folder,file,filesysteminfo,size')
- Other times I will get 503 errors that tell me to wait for 120 seconds. I Use my back off strategy starting with 120 seconds and increase it over time. These errors never clear up either and end up dying after a maximum retry is hit, sometimes waiting 10 minutes or more between requests.
- Sometimes the
retryAfterSeconds
entry is in the response json of the 503 and not in the header as its documented.
- I have now received an error
[WinError 10054] An existing connection was forcibly closed by the remote host
as well. I haven't looked into this one yet.
- Have also seen
[WinError 64] The specified network name is no longer available
I am using python with aiohttp to make the requests. I'm not using the SDK as it was less reliable than I have been able to achieve. Thinking that it might be a problem with the connection, I set it up to create a new aiohttp session on every request but the error rate has not changed.
Does the Graph API have any undocumented request limits I might be hitting? Given the care I'm using to follow the rules, what might I still be doing wrong?
If I can't resolve the issue here, is opening a ticket the best next route to get help with this?
Thanks in advance for any insight you might have.
~Sean