MSGraph Instability - Random 503, 400 errors and getting forcibly dropped

Question

Hello,

I have written a GraphAPI app that visits each directory in given users' Onedrive and reports statistics about the folders and files. It makes calls to the API asynchronously, but I have been careful to write it with throttling and usage limits in mind. The three limits I am using are 1) only process a set number of users at once, 2) throttle requests given a maximum number of requests per minute, and 3) set a total maximum number of requests allowed to process at any one time. I have real time metrics to verify this is happening as expected. As required, I adhere to all the throttling requirements returned from the API. 1) I read the header for each request and if a retryAfter exists, I sleep for that length of time. 2) If I receive a 429 or 503 error, I sleep for the given amount of time.or more. I am retrying with exponential backoff sometimes up to a five minute or more wait between retries.

Even given the care I have used to stay within the required limits, I am seeing errors that I can only guess can be attributed to usage limits. Examples:

I will receive random 400 errors on request to 'drives/{drive_id}/items/{item_id}/children'
1. There might be other endpoints, but this is the one I request most often, so that's where I see the error
2. Once a 400 error like this happens, I can retry as long as I want (even resetting the session), but it won't clear up
3. I can later make the same exact request and the 400 error will be gone
4. All requests are being formatted the same, so I don't see a reason for these to be any different.
5. These don't give a detailed reason other than "Invalid request"
```
      message='{"error":{"code":"invalidRequest","message":"Invalid request"}}', url=URL('https://graph.microsoft.com/v1.0/drives/XXXX/items/XXXX/children?$select=id,name,parentReference,folder,file,filesysteminfo,size')
```
Other times I will get 503 errors that tell me to wait for 120 seconds. I Use my back off strategy starting with 120 seconds and increase it over time. These errors never clear up either and end up dying after a maximum retry is hit, sometimes waiting 10 minutes or more between requests.
1. Sometimes the retryAfterSeconds entry is in the response json of the 503 and not in the header as its documented.
I have now received an error [WinError 10054] An existing connection was forcibly closed by the remote host as well. I haven't looked into this one yet.
Have also seen [WinError 64] The specified network name is no longer available

I am using python with aiohttp to make the requests. I'm not using the SDK as it was less reliable than I have been able to achieve. Thinking that it might be a problem with the connection, I set it up to create a new aiohttp session on every request but the error rate has not changed.

Does the Graph API have any undocumented request limits I might be hitting? Given the care I'm using to follow the rules, what might I still be doing wrong?

If I can't resolve the issue here, is opening a ticket the best next route to get help with this?

Thanks in advance for any insight you might have.

~Sean

Answer

Hi @Sean Dizazzo,

Thank you for posting in this community.

Based on your detailed case description and the research we have done. It was found that the issue is so complicated that We didn't find any relevant troubleshooting methods. We recommend you open a ticket to ask your questions.

If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Share via

MSGraph Instability - Random 503, 400 errors and getting forcibly dropped

1 answer