Managed identity not working on 1 instance of a windows VMSS

Ian Ferguson 1 Reputation point
2019-12-12T12:35:46.66+00:00

We have a windows VMSS with system assigned managed identity enabled (the VMSS is used for a service fabric cluster).
We have applications running on the instances, that connect to external dependencies using managed identity.
We observe that apps running on 1 particular instance are unable to authorize with these external dependencies.
Restarting the app on the same instance leads to the same failure. However, on moving the app to another instance, we see that it connects successfully.

I have used azure cloudshell to inspect the managed identity of the vmss, and of the individual instances

az vmss show --name {redacted} --resource-group {redacted}  
az vmss show --name {redacted} --resource-group {redacted} --instance-id 0 #the faulty instance  
az vmss show --name {redacted} --resource-group {redacted} --instance-id 1  
etc  

I see that all instances report the same principal id / managed identity as the vmss. (as it should be)

However running the following powershell command on the VMSS instances while logged in with RDP, to check managed identity against the azure resources REST endpoint
(as described in this article - https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token#get-a-token-using-azure-powershell )

Invoke-WebRequest -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F' -Headers @{Metadata="true"}  

The above command returns an error on the faulty instance ('Unable to connect to the remote server'). On the other instances, the same command authenticates and obtains a token.
I can't understand why one scaleset instance should exhibit different networking behaviors to the other instances, all are subject to the same networking policies.

Restarting or reimaging may fix it, but I would like to understand what has happened, and what may have caused it, so that I can prevent it from happening again.

Thanks in advance

Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.
347 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vahid Ghafarpour 17,875 Reputation points
    2023-08-21T06:07:32.82+00:00

    The error message "Unable to connect to the remote server" suggests that the instance is having trouble reaching the Azure Metadata service (http://169.254.169.254). This could be due to network issues, firewalls, or proxy configurations. Ensure that the instance has proper network connectivity to Azure services and that no network restrictions block communication to the Azure Metadata service.

    0 comments No comments