Using MLOps to manage the AI lifecycle
MLOps helps teams manage the lifecycle of their AI projects.
AI projects do not follow a linear flow where each step happens only once. The AI project lifecycle is iterative in nature, and MLOps automation assures critical steps are performed consistently as the team iterates.
In addition, not all aspects of the AI lifecycle fit into a single part of the MLOps flow. For example, the AI solution likely depends on both external data sources, as well as data generated by the solution itself (for example, feedback loop, data drift, etc.).
This article examines different aspects of how MLOps is used to support the AI project lifecycle.
Monitoring
During training and deployment of ML models, it is imperative that our customers are able to monitor the status of their various ML tasks. Recommended options for monitoring include:
- The OpenCensus library is one useful monitoring tool.
- Azure Machine Learning (Azure ML) has built-in tools for deploying a model, and it provides multiple metrics by default that can be queried from Application Insights. These metrics are primarily performance related such as response time, resource utilization, and failure rates.
- Online Endpoints (real-time)
- Web Service Endpoints on AKS or ACI (using the v1 SDK deployment tools)
Drift
Drift is the change in data over time that causes a deviation between the current distribution, and the distribution of the data used to create the underlying model. This drift can result in models no longer accurately predicting on real-world data.
- Refer to the following section for detailed overview of drift and adaption
Model Adaptation
Models must be adapted (retrained or updated), over time to remain representative and continue to bring business value. For more information on how models can be adapted appropriately for the circumstances, see the "How can we adapt to drift?" section of the drift and adaptation overview.
Data Collection
Once a model has been deployed to production, it may need periodic updates to ensure it continues to appropriately handle new data. This includes monitoring for things such as drift, along with retraining a model on new data. In both cases, it is important to collect input data (barring any privacy/security concerns) to evaluate over time.
Further reading
For more information, please visit the following resources: