Hi Dominic Archual,
Thanks for reaching out to Microsoft Q&A.
This token error is typically due to a mismatch in the identity context between the pipeline's managed environment and the serverless compute used by train.py.
I wont call the below as solution but are a few steps you can try to resolve this:
- Specify Client ID in Managed Identity: As per your current configuration, ensure the client_id is specified when using ManagedIdentityCredential in train.py, and verify that the assigned managed identity on the serverless compute has the necessary permissions for the Azure ML workspace and associated resources.
##python from azure.identity import ManagedIdentityCredential credential = ManagedIdentityCredential(client_id="CLIENT_ID_OF_MANAGED_IDENTITY_ASSIGNED_TO_WORKSPACE")
- Use ChainedTokenCredential: Azure’s ChainedTokenCredential can help fall back to different credential methods. This might help if there are transient issues with ManagedIdentityCredential.
##python from azure.identity import ChainedTokenCredential, ManagedIdentityCredential, DefaultAzureCredential credential = ChainedTokenCredential( ManagedIdentityCredential(client_id="CLIENT_ID_OF_MANAGED_IDENTITY_ASSIGNED_TO_WORKSPACE"), DefaultAzureCredential() )
- Environment Variables: Double-check that environment variables (AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_CLIENT_SECRET) are set on the serverless compute environment, which might resolve some token acquisition issues when using DefaultAzureCredential.
- Authentication Timeout: If the training step takes a long time to start, authentication might time out. You could try running a pre-check authentication step before initiating train.py to help establish a stable credential context.
If these steps don’t resolve the issue, the GitHub link you found might suggest using a different compute type or upgrading the SDK versions to see if that resolves any underlying issues in Managed Identity support on serverless compute.
Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.