Gateway TImeout on auditLogs/signIns

Question

Gateway TImeout on auditLogs/signIns

Anonymous

While calling the auditLogs/signIns endpoint using paging via $top and $skiptoken, after retrieving dozens of pages sucessfully, some applications will return a 504 gateway timeout error.

Once we get this error, attempting to retry with the same url, we get a 400 with "Skip Token is null" while when we get a 429 error, we are able to retry with the same url after the throttling period has passed.

We have tried using smaller page sizes (as low as 500 records) but still run into the same problem for the same applications.

any idea what could be causing this issue?

Perhaps a certain signin record has more data than others and leads to a timeout?

Also is there a way to limit the data returned per sign in, all we really need is the date of the signin and a user id, but the $select option is not supported.

Here are the headers and response from the Gateway Time Error:
URL: https://graph.microsoft.com/beta/auditLogs/signIns?$filter=createdDateTime+ge+2022-03-16+and+appId+eq+%<<appid>>%27&$top=500&$skiptoken=47d96df265450ef838de40dac783215e22736d0aede079fb5f631fe09364a515

"request-id" "de8908f3-f79e-4f1f-b672-6977089b9329"
client-request-id "de8908f3-f79e-4f1f-b672-6977089b9329"
"x-ms-ags-diagnostic" {"ServerInfo":{"DataCenter":"West US 2","Slice":"E","Ring":"1","ScaleUnit":"002","RoleInstance":"MWH0EPF0005A6AE"}}
"Date" Fri, 15 Apr 2022 03:29:02 GMT

Response Body:
{"error":{"code":"UnknownError","message":"","innerError":{"date":"2022-04-15T03:29:02","request-id":"de8908f3-f79e-4f1f-b672-6977089b9329","client-request-id":"de8908f3-f79e-4f1f-b672-6977089b9329"}}}

And Then if we retry with the same URL:
{"error":{"code":"UnknownError","message":"Invalid Skip Token, skip token is null","innerError":{"date":"2022-04-15T03:36:00","request-id":"c48d64c7-8cd7-487a-901e-b06df5090161","client-request-id":"c48d64c7-8cd7-487a-901e-b06df5090161"}}}

This was on the beta API but experienced the same behavior on the v1.0 version as well

We do use the retry-after header and back off calls for the interval given. It doesn't seem like it's an throttling issue so much as a problem with a specific page load.

Vicky Kumar (Mindtree Consulting PVT LTD) 1,161 Reputation points Microsoft Employee

2022-04-14T08:12:28.297+00:00

Could you please provide request id and timestamp of the error?
Anonymous

2022-04-17T00:35:22.21+00:00

Added the additional information in the question, thanks!

2 answers

Your answer

Vicky Kumar (Mindtree Consulting PVT LTD) 1,161 Reputation points Microsoft Employee

2022-04-14T08:12:28.297+00:00

Could you please provide request id and timestamp of the error?
Anonymous

2022-04-17T00:35:22.21+00:00

Added the additional information in the question, thanks!

Answer 1

Vicky Kumar (Mindtree Consulting PVT LTD) 1,161 Microsoft Employee

Throttling behavior can depend on the type and number of requests. For example, if you have a high volume of requests, all request types are throttled. there might be case if you have consumed more than 0.8 of its limits, its mentioned in docs.

The following are best practices for handling throttling:

Reduce the number of operations per request.
Reduce the frequency of calls.
Avoid immediate retries, because all requests accrue against your usage limits.

Note : there are also a Service-specific limits

for info about throttling please take a look on doc -https://learn.microsoft.com/en-us/graph/throttling

Vicky Kumar (Mindtree Consulting PVT LTD) 1,161 Reputation points Microsoft Employee

2022-04-19T10:22:21.543+00:00

Hi AndrewBatchelor-3038 ,

Have got time to check the given docs? Please let me know if you have any questions and I can help you further!

If this answer helped you, please mark it as "Verified" so other users can reference it.
Anonymous

2022-04-19T14:26:54.303+00:00

We've implemented retry-after (and also add progressive backoff on top of that) I did not receive the x-ms-resource-unit or x-ms-throttle-limit-percentage headers in any of the responses. We are able to pull 50+ pages of sizes ranging from 100 - 1000 but seem to get the 504 error on what could be a specific record. We use the same logic successfully on hundreds of different applications within the same tenant, and across dozens of other tenants.

I've opened a ticket via our Azure subscription support so we can hopefully get more feedback on what the specific error is.

Answer 2

I know this thread is old but, since this post is the top search result, I wanted to share some information here for anyone encountering this problem

If you're suddenly facing a 504 error, check your $filter and $top parameters. It's likely that your $filters are so specific that there aren't enough records to meet the $top value, causing the compute that's backing the Graph API to search through EVERY row of data

This extensive search may not complete within the ~2min timeout, resulting in a 504 Gateway Timeout. Graph API seems to store only the last 30 days of sign-ins, so it's possible that your API calls work perfectly on some days, and relentlessly error on other days

In my case, I had set $filter=appDisplayName eq 'oneOfMyOrganizationsAppName' and $top=100, but it started timing out one day. While debugging, I noticed the 504 error persisted even as I reduced $top to 20 and then to 10. It finally worked when I reduced $top to 2, but this made filtering by application id/name completely impractical. Since there are periods when users rarely sign into the applications we were tracking, I decided to refactor my code to remove the application-specific filters from the Graph API $filter parameter, and handled the filtering on our backend compute. Since then, I haven't had any issues

Your API calls would probably still be okay when filtering by application id/name in the API parameters if you also filter using a very recent createdDateTime value (eg, $filter = createdDateTime ge <ISO-8601 datetime for 6 hours ago> and appDisplayName eq '<app name>')

PC 0 Reputation points

2024-07-10T17:13:42.99+00:00

Woops, reposted above!

Share via

Gateway TImeout on auditLogs/signIns

2 answers

Your answer