Anomaly detector process

Databricks
Service Bus
Storage Accounts

This article presents an architecture for a near real-time implementation of an anomaly detection process.

Architecture

Diagram of the anomaly detector process architecture.

Download a Visio file of this architecture.

Dataflow

  1. Time-series data can come from multiple sources, such as Azure Database for MySQL, Blob storage, Event Hubs, Azure Cosmos DB, SQL Database, and Azure Database for PostgreSQL.
  2. Data is ingested into compute from various storage sources to be monitored by Anomaly Detector.
  3. Databricks helps aggregate, sample, and compute the raw data to generate the time with the detected results. Databricks is capable of processing stream and static data. Stream analytics and Azure Synapse can be alternatives based on the requirements.
  4. The anomaly detector API detects anomalies and returns the results to compute.
  5. The anomaly-related metadata is queued.
  6. Application Insights picks the message from the message queue based on the anomaly-related metadata and sends an alert about the anomaly.
  7. The results are stored in Azure Data Lake Service Gen2.
  8. Web applications and Power BI can visualize the results of the anomaly detection.

Components

Key technologies used to implement this architecture:

  • Service Bus: Reliable cloud messaging as a service (MaaS) and simple hybrid integration.
  • Azure Databricks: Fast, easy, and collaborative Apache Spark–based analytics service.
  • Power BI: Interactive data visualization BI tools.
  • Storage Accounts: Durable, highly available, and massively scalable cloud storage.
  • Cognitive Services: Cloud-based services with REST APIs and client library SDKs available to help you build cognitive intelligence into your applications.
  • Logic Apps: Serverless platform for building enterprise workflows that integrate applications, data, and services. In this architecture, the logic apps are triggered by HTTP requests.
  • Azure Data Lake Storage Gen2: Azure Data Lake Storage Gen2 provides file system semantics, file-level security, and scale.
  • Application Insights: Application Insights is a feature of Azure Monitor that provides extensible application performance management (APM) and monitoring for live web apps.

Alternatives

  • Event Hubs with Kafka: An alternative to running your own Kafka cluster. This Event Hubs feature provides an endpoint that is compatible with Kafka APIs.
  • Azure Synapse Analytics: An analytics service that brings together enterprise data warehousing and big data analytics.
  • Azure Machine Learning: Build, train, deploy, and manage custom machine learning / anomaly detection models in a cloud-based environment.

Scenario details

The Azure Cognitive Services Anomaly Detector API enables you to monitor and detect abnormalities in your time series data without having to know machine learning. The algorithms of the API adapt by automatically identifying and applying the best-fitting models to your time series data, regardless of industry, scenario, or data volume. They determine boundaries for anomaly detection, expected values, and anomalous data points.

Potential use cases

Some areas that anomaly detection helps monitor:

  • Bank fraud (finance industry)
  • Structural defects (manufacturing industry)
  • Medical problems (healthcare industry)

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

Scalability

Most of the components used in this example scenario are managed services that will automatically scale.

For general guidance on designing scalable solutions, see the performance efficiency checklist in the Azure Architecture Center.

Security

Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.

Managed identities for Azure resources are used to provide access to other resources internal to your account and then assigned to your Azure Functions. Allow those identities to access only requisite resources to ensure that nothing extra is exposed to your functions (and potentially to your customers).

For general guidance on designing secure solutions, see the Azure Security Documentation.

Resiliency

All of the components in this scenario are managed, so at a regional level they're all resilient automatically.

For general guidance on designing resilient solutions, see Designing resilient applications for Azure.

Cost optimization

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.

To explore the cost of running this scenario, see the pre-filled calculator with all of the services. To see how the pricing would change for your particular use case, change the appropriate variables to match your expected traffic / data volumes.

We've provided three sample cost profiles based on the amount of traffic (we assume all images are 100 kb in size):

  • Example calculator: this pricing example is a calculator with all services in this architecture, except Power BI and custom alerting solution.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps