Share via

Eventgrid of machine learning work space not receiving events

Elias Asikainen 50 Reputation points
2026-03-02T09:02:53.7366667+00:00

Machine learning workspace run status changed events are no longer being sent to event grid

Azure Machine Learning
{count} votes

1 answer

Sort by: Most helpful
  1. SRILAKSHMI C 14,820 Reputation points Microsoft External Staff Moderator
    2026-03-02T11:18:15.9866667+00:00

    Hello Elias Asikainen,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    If Machine Learning workspace run status changed events are no longer being delivered to Event Grid, and this was previously working, the issue is typically related to event type expectations, subscription configuration, endpoint health, or identity/network changes.

    Steps to troubleshoot the issue,

    1.Understand What RunStatusChanged Actually Emits

    Microsoft.MachineLearningServices.RunStatusChanged currently fires only when a run transitions into a Failed state.

    If your runs are completing successfully or being cancelled normally, you will not receive this event.

    If you need notification for successful completion as well, make sure you also subscribe to:

    Microsoft.MachineLearningServices.RunCompleted

    This is a very common misunderstanding and often the root cause.

    2.Verify the Event Subscription Configuration

    Go to Azure Portal → Machine Learning Workspace → EventsEvent Subscriptions

    Check The subscription status is Succeeded

    The correct event types are selected:

    • RunStatusChanged
    • (Optional but recommended) RunCompleted

    No advanced filters are unintentionally filtering events

    The subscription hasn’t expired (webhook-based subscriptions can expire)

    Also review Delivery Status

    Metrics blade

    • Published Events
    • Matched Events
    • Failed Deliveries

    If Published Events = 0, either Events are not being emitted Or filters are excluding them

    3.Validate Endpoint Health & Networking

    If your subscription delivers to:

    Webhook

    Azure Function

    Logic App

    Event Hub

    Service Bus

    Verify Endpoint is reachable

    No recent networking changes (VNET, private endpoint, firewall rules)

    No authentication changes (key rotation, managed identity change)

    Endpoint isn’t returning 401/403/500 errors

    In Event Grid Check Delivery Logs

    Confirm validation handshake is successful

    If validation status shows red, recreate or revalidate the subscription

    Event Grid retries for a limited time before dropping events.

    4.Test with a Failing Run

    To confirm the pipeline:

    1. Trigger a test run that you intentionally force to fail.

    Monitor Event Grid metrics in real time.

    If you receive RunStatusChanged, then:

    The event system is working correctly.

    The issue was expectation-related (only fires on failure).

    If you still receive nothing the issue is likely subscription-level or workspace-level.

    5.Confirm Events Are Being Published

    Go to Azure Portal → Event Grid → System Topics

    Check Is there a system topic associated with your ML workspace?

    Are Published Events increasing?

    If Published Events = 0 even for failing runs:

    The workspace may have been recreated (new resource ID)

    The event subscription may be pointing to an old resource

    There may be a service-side issue (check Azure Service Health)

    If the workspace was deleted and recreated (even with same name), the old event subscription will stop working.

    6.Identity / RBAC Changes

    If using Managed Identity, SAS token, AAD authentication

    Verify Required RBAC roles still exist, Managed identity wasn’t recreated

    Keys weren’t rotated

    No permission changes occurred

    Silent RBAC changes are a very common cause of event delivery failures.

    7.Enable Diagnostics for Deeper Insight

    Enable diagnostic logs on the Event Grid topic:

    Send logs to Log Analytics or Storage Account

    Review delivery failures and retry attempts

    You can also use:

    az eventgrid event-subscription show --name <subscription-name> --source-resource-id <workspace-resource-id>
    

    To inspect configuration and delivery settings.

    Isolation Test

    Create a new temporary event subscription:

    Point it to a simple Azure Function or Event Hub

    Subscribe to both:

    • RunStatusChanged
    • RunCompleted

    Trigger a failing run

    If new subscription works → issue is endpoint-related If not → issue is workspace/system topic-related

    Common Root Causes

    In real-world cases, the issue is usually:

    1. Expecting RunStatusChanged for successful runs
    2. Endpoint authentication change
    3. Workspace recreated (new resource ID)
    4. Event subscription expired
    5. Networking changes (private endpoints/firewalls)
    6. RBAC changes

    Please refer this

    https://learn.microsoft.com/azure/event-grid/event-schema-machine-learning

    https://learn.microsoft.com/azure/machine-learning/how-to-use-event-grid#consume-machine-learning-events

    Troubleshooting common Event Grid issues - https://learn.microsoft.com/azure/event-grid/troubleshoot-errors

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.