what the professional way to create an alert system to monitor machine learning cost

ja 26 Reputation points

I am trying to come up with a system which can monitor the costs of machine learning computes and stop if they are exceeding a pre-defined limit.

now, I am thinking of below plan ->

  1. create function app which can accept resource groups + subscription as query param and perform the shutting down of computes which are available under a given workspace which are created under a given subscription.
  2. create custom budgets + action group for each resource group created by user using terraform scripts. Use the function app in the creation of action group.

but I want to know if that's how you create this system in production where you don't use the azure portal to configure all these stuff.

Initially, I thought of using logic apps instead of function app, but the complications in configuring it made me think about using the function app.

Please suggest.

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
1,806 questions
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
2,581 questions
Azure Logic Apps
Azure Logic Apps
An Azure service that automates the access and use of data across clouds without writing code.
1,804 questions
No comments
{count} votes

2 answers

Sort by: Most helpful
  1. MughundhanRaveendran-MSFT 11,616 Reputation points Microsoft Employee

    Hi @ja ,

    It appears that you have asked a similar question in the below post, I have answered both the questions in the below post. Please comment on the below post if you have any queries or concerns.


    No comments

  2. Andrew Blumhardt 6,636 Reputation points Microsoft Employee

    It sounds like you want to create a solution than can be deployed as an ARM template. I think you are on the right track.

    I would start by creating a working solution in the portal before creating the ARM template(s).

    Budget alerts with an action group that runs the shutdown logic app or function. I recommend basing the budget alert on a tag that can be assigned to ML compute resources. Logic Apps will be easier if you are unfamiliar with function development.


    No comments