An Azure machine learning service for building and deploying models.
Hello Elias Asikainen,
Welcome to Microsoft Q&A and Thank you for reaching out.
If Machine Learning workspace run status changed events are no longer being delivered to Event Grid, and this was previously working, the issue is typically related to event type expectations, subscription configuration, endpoint health, or identity/network changes.
Steps to troubleshoot the issue,
1.Understand What RunStatusChanged Actually Emits
Microsoft.MachineLearningServices.RunStatusChanged currently fires only when a run transitions into a Failed state.
If your runs are completing successfully or being cancelled normally, you will not receive this event.
If you need notification for successful completion as well, make sure you also subscribe to:
Microsoft.MachineLearningServices.RunCompleted
This is a very common misunderstanding and often the root cause.
2.Verify the Event Subscription Configuration
Go to Azure Portal → Machine Learning Workspace → Events → Event Subscriptions
Check The subscription status is Succeeded
The correct event types are selected:
-
RunStatusChanged - (Optional but recommended)
RunCompleted
No advanced filters are unintentionally filtering events
The subscription hasn’t expired (webhook-based subscriptions can expire)
Also review Delivery Status
Metrics blade
- Published Events
- Matched Events
- Failed Deliveries
If Published Events = 0, either Events are not being emitted Or filters are excluding them
3.Validate Endpoint Health & Networking
If your subscription delivers to:
Webhook
Azure Function
Logic App
Event Hub
Service Bus
Verify Endpoint is reachable
No recent networking changes (VNET, private endpoint, firewall rules)
No authentication changes (key rotation, managed identity change)
Endpoint isn’t returning 401/403/500 errors
In Event Grid Check Delivery Logs
Confirm validation handshake is successful
If validation status shows red, recreate or revalidate the subscription
Event Grid retries for a limited time before dropping events.
4.Test with a Failing Run
To confirm the pipeline:
- Trigger a test run that you intentionally force to fail.
Monitor Event Grid metrics in real time.
If you receive RunStatusChanged, then:
The event system is working correctly.
The issue was expectation-related (only fires on failure).
If you still receive nothing the issue is likely subscription-level or workspace-level.
5.Confirm Events Are Being Published
Go to Azure Portal → Event Grid → System Topics
Check Is there a system topic associated with your ML workspace?
Are Published Events increasing?
If Published Events = 0 even for failing runs:
The workspace may have been recreated (new resource ID)
The event subscription may be pointing to an old resource
There may be a service-side issue (check Azure Service Health)
If the workspace was deleted and recreated (even with same name), the old event subscription will stop working.
6.Identity / RBAC Changes
If using Managed Identity, SAS token, AAD authentication
Verify Required RBAC roles still exist, Managed identity wasn’t recreated
Keys weren’t rotated
No permission changes occurred
Silent RBAC changes are a very common cause of event delivery failures.
7.Enable Diagnostics for Deeper Insight
Enable diagnostic logs on the Event Grid topic:
Send logs to Log Analytics or Storage Account
Review delivery failures and retry attempts
You can also use:
az eventgrid event-subscription show --name <subscription-name> --source-resource-id <workspace-resource-id>
To inspect configuration and delivery settings.
Isolation Test
Create a new temporary event subscription:
Point it to a simple Azure Function or Event Hub
Subscribe to both:
-
RunStatusChanged -
RunCompleted
Trigger a failing run
If new subscription works → issue is endpoint-related If not → issue is workspace/system topic-related
Common Root Causes
In real-world cases, the issue is usually:
- Expecting
RunStatusChangedfor successful runs - Endpoint authentication change
- Workspace recreated (new resource ID)
- Event subscription expired
- Networking changes (private endpoints/firewalls)
- RBAC changes
Please refer this
https://learn.microsoft.com/azure/event-grid/event-schema-machine-learning
Troubleshooting common Event Grid issues - https://learn.microsoft.com/azure/event-grid/troubleshoot-errors
I Hope this helps. Do let me know if you have any further queries.
Thank you!