Apache open-source scenarios on Azure
Microsoft is proud to support open-source projects, initiatives, and foundations and contribute to thousands of open-source communities. By using open-source technologies on Azure, you can run applications your way while optimizing your investments.
This article provides a summary of architectures and solutions that use Azure together with Apache open-source solutions.
ApacheĀ®, Apache Ignite, Ignite, and the flame logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Apache Cassandra
Architecture | Summary | Technology focus |
---|---|---|
Data partitioning guidance | View guidance for how to separate data partitions to be managed and accessed separately. Understand horizontal, vertical, and functional partitioning strategies. Cassandra is ideally suited to vertical partitioning. | Databases |
High availability in Azure public MEC | Learn how to deploy workloads in active-standby mode to achieve high availability and disaster recovery in Azure public multi-access edge compute. Cassandra can be used to support geo-replication. | Hybrid |
N-tier application with Apache Cassandra | Deploy Linux virtual machines and a virtual network configured for an N-tier architecture with Apache Cassandra. | Databases |
Non-relational data and NoSQL | Learn about non-relational databases that store data as key-value pairs, graphs, time series, objects, and other storage models, based on data requirements. Azure Cosmos DB for Apache Cassandra is a recommended Azure service. | Databases |
Run Apache Cassandra on Azure VMs | Examine performance considerations for running Apache Cassandra on Azure virtual machines. Use these recommendations as a baseline to test against your workload. | Databases |
Stream processing with fully managed open-source data engines | Stream events by using fully managed Azure data services. Use open-source technologies like Kafka, Kubernetes, Cassandra, PostgreSQL, and Redis components. | Analytics |
Apache CouchDB
Architecture | Summary | Technology focus |
---|---|---|
Baseline web application with zone redundancy | Use the proven practices in this reference architecture to improve redundancy, scalability and performance in an Azure App Service web application. CouchDB is a recommended document database. | Web |
Apache Hadoop
Architecture | Summary | Technology focus |
---|---|---|
Big data architectures | Learn about big data architectures that handle the ingestion, processing, and analysis of data that's too large or complex for traditional database systems. Azure HDInsight Hadoop clusters can be used for batch processing. | Databases |
Choose a data transfer technology | Learn about Azure data transfer options like Azure Import/Export service, Azure Data Box, Azure Data Factory, and command-line and graphical interface tools. The Hadoop ecosystem provides tools for data transfer. | Databases |
Citizen AI with Power Platform | Learn how to use Azure Machine Learning and Power Platform to quickly create a machine learning proof of concept and production version. Azure Data Lake, a Hadoop-compatible file system, stores data. | AI |
Data considerations for microservices | Learn about managing data in a microservices architecture. View an example that uses Azure Data Lake Store, a Hadoop file system. | Microservices |
Extract, transform, and load | Learn about extract-transform-load (ETL) and extract-load-transform (ELT) data transformation pipelines and how to use control flows and data flows. Hadoop can be used as destination data store in ELT processes. | Analytics |
IoT analyze-and-optimize loops | Learn about analyze-and-optimize loops, an IoT pattern for generating and applying optimization insights based on an entire business context. Hadoop map-reduce processing can be used to process big data. | IoT |
Materialized View pattern | Generate prepopulated views over the data in one or more data stores when the data isn't ideally formatted for your required query operations. Use Hadoop for a big data storage mechanism that supports indexing. | Databases |
Predict loan charge-offs with HDInsight Spark | Use HDInsight and machine learning to predict the likelihood of loans getting charged off. HDInsight supports Hadoop. | Databases |
Apache HBase
Architecture | Summary | Technology focus |
---|---|---|
Big data architectures | Learn about big data architectures that handle the ingestion, processing, and analysis of data that's too large or complex for traditional database systems. You can use HBase for data presentation in these scenarios. | Databases |
Choose a big data storage technology | Compare big data storage technology options in Azure. Includes a discussion of HBase on HDInsight. | Databases |
Choose an analytical data store | Learn about using HBase for random access and strong consistency for large amounts of unstructured and semi-structured data. | Analytics |
Data partitioning guidance | View guidance for separating data partitions so they can be managed and accessed separately. Understand horizontal, vertical, and functional partitioning strategies. HBase is ideally suited to vertical partitioning. | Databases |
Non-relational data and NoSQL | Learn about non-relational databases that store data as key-value pairs, graphs, time series, objects, and other storage models, based on data requirements. HBase can be used for columnar and time series data. | Databases |
Apache Hive
Architecture | Summary | Technology focus |
---|---|---|
Big data architectures | Learn about big data architectures that handle the ingestion, processing, and analysis of data that's too large or complex for traditional database systems. You can use Hive for batch processing and data presentation in these scenarios. | Databases |
Choose a batch processing technology | Compare technology choices for big data batch processing in Azure. Learn about the capabilities of Hive. | Analytics |
Choose an analytical data store | Evaluate analytical data store options for big data in Azure. Learn about the capabilities of Hive. | Analytics |
Extract, transform, and load | Learn about ETL and ELT data transformation pipelines and how to use control flows and data flows. In ELT, you can use Hive to query source data. You can also use it together with Hadoop as a data store. | Databases |
Loan charge-off prediction with HDInsight Spark clusters | Use HDInsight and machine learning to predict the likelihood of loans getting charged off. Analytics results are stored in Hive tables. | Analytics |
Predictive aircraft engine monitoring | Learn how to combine real-time aircraft data with analytics to create a solution for predictive aircraft engine monitoring and health. Hive scripts provide aggregations on raw events that are archived by Azure Stream Analytics. | Analytics |
Predictive insights with vehicle telematics | Learn how car dealerships, manufacturers, and insurance companies can use Azure to get predictive insights on vehicle health and driving habits. In this solution, Azure Data Factory uses HDInsight to run Hive queries to process and load data. | Analytics |
Scale AI and machine learning initiatives in regulated industries | Learn about scaling Azure AI and machine learning environments that must comply with extensive security policies. Hive is used to store metadata. | AI |
Apache JMeter
Architecture | Summary | Technology focus |
---|---|---|
Banking system cloud transformation on Azure | Use simulated and actual applications and existing workloads to monitor the reaction of a solution infrastructure for scalability and performance. A custom JMeter solution is used for load testing. | Migration |
Patterns and implementations for a banking cloud transformation | Learn about the patterns and implementations used to transform a banking system for the cloud. JMeter is used for load testing. | Migration |
Scalable cloud applications and SRE | Build scalable cloud applications by using performance modeling and other principles and practices of site reliability engineering (SRE). JMeter is used for load testing. | Web |
Apache Kafka
Architecture | Summary | Technology focus |
---|---|---|
Application data protection for AKS workloads on Azure NetApp Files | Deploy Astra Control Service with Azure NetApp Files for data protection, disaster recovery, and mobility for Azure Kubernetes Service (AKS) applications, including Kafka applications. | Containers |
Asynchronous messaging options | Learn about asynchronous messaging options in Azure, including support for Kafka clients. | Integration |
Automated guided vehicles fleet control | Learn about an end-to-end approach for an automotive original equipment manufacturer (OEM). Includes several open-source libraries that you can reuse. Back-end services in this architecture can connect to Kafka. | Web |
Azure Data Explorer monitoring | Use Azure Data Explorer in a hybrid monitoring solution that ingests streamed and batched logs from Kafka and other sources. | Analytics |
Banking system cloud transformation on Azure | Use simulated and actual applications and existing workloads to monitor the reaction of a solution infrastructure for scalability and performance. Events from Event Hubs for Kafka feed into the system. | Containers |
Choose a stream processing technology | Compare options for real-time message stream processing in Azure, including the Kafka streams API. | Analytics |
Claim-Check pattern | Examine the Claim-Check pattern, which splits a large message into a claim check and a payload to avoid overwhelming a message bus. Learn about an example that uses Kafka for claim-check generation. | Integration |
Data streaming with AKS | Use AKS to easily ingest and process a real-time data stream with millions of data points collected via sensors. Kafka stores data for analysis. | Containers |
Ingestion, ETL, and stream processing pipelines with Azure Databricks | Create ETL pipelines for batch and streaming data with Azure Databricks to simplify data lake ingestion at any scale. Kafka is one option for ingesting data. | Analytics |
Integrate Event Hubs with Azure Functions | Learn how to architect, develop, and deploy efficient and scalable code that runs on Azure Functions and responds to Azure Event Hubs events. Learn how events can be persisted in Kafka topics. | Serverless |
IoT analytics with Azure Data Explorer | Use Azure Data Explorer for near real-time IoT telemetry analytics on fast-flowing, high-volume streaming data from a variety of data sources, including Kafka. | Analytics |
Mainframe and midrange data replication to Azure using Qlik | Use Qlik Replicate to migrate mainframe and midrange systems to the cloud, or to extend such systems with cloud applications. In this solution, Kafka stores change log information that's used to replicate the data stores. | Mainframe |
Patterns and implementations for a banking cloud transformation | Learn about the patterns and implementations used to transform a banking system for the cloud. A Kafka scaler is used to detect whether the solution needs to activate or deactivate application deployment. | Serverless |
Publisher-Subscriber pattern | Learn about the Publisher-Subscriber pattern, which enables an application to announce events to many interested consumers asynchronously. Kafka is recommended for messaging. | Integration |
Rate Limiting pattern | Use a rate limiting pattern to avoid or minimize throttling errors. This pattern can implement Kafka for messaging. | Integration |
Refactor mainframe applications with Advanced | Learn how to use the automated COBOL refactoring solution from Advanced to modernize your mainframe COBOL applications, run them on Azure, and reduce costs. Kafka can be used as a data source. | Mainframe |
Stream processing with fully managed open-source data engines | Stream events by using fully managed Azure data services. Use open-source technologies like Kafka, Kubernetes, Cassandra, PostgreSQL, and Redis components. | Analytics |
Apache MapReduce
Architecture | Summary | Technology focus |
---|---|---|
Asynchronous messaging options | Learn about asynchronous messaging options in Azure. You can use MapReduce to generate reports on events captured by Event Hubs. | Integration |
Big data architectures | Learn about big data architectures that handle the ingestion, processing, and analysis of data that's too large or complex for traditional database systems. You can use MapReduce for batch processing and to provide functionality for parallel operations in these scenarios. | Databases |
Choose a batch processing technology | Learn about technologies for big data batch processing in Azure, including HDInsight with MapReduce. | Analytics |
Geode pattern | Deploy back-end services into a set of geographical nodes, each of which can service any client request in any region. This pattern occurs in big data architectures that use MapReduce to consolidate results across machines. | Databases |
Minimize coordination | Follow these recommendations to improve scalability by minimizing coordination between application services. Use MapReduce to split work into independent tasks. | Databases |
Apache NiFi
Architecture | Summary | Technology focus |
---|---|---|
Apache NiFi on Azure | Automate data flows with Apache NiFi on Azure. Use a scalable, highly available solution to move data into the cloud or storage and between cloud systems. | Analytics |
Helm-based deployments for Apache NiFi | Use Helm charts when you deploy NiFi on AKS. Helm streamlines the process of installing and managing Kubernetes applications. | Analytics |
Azure Data Explorer monitoring | Use Azure Data Explorer and NiFi in a hybrid monitoring solution that ingests streamed and batched logs from diverse sources. | Analytics |
Apache Oozie
Architecture | Summary | Technology focus |
---|---|---|
Big data architectures | Learn about big data architectures that handle the ingestion, processing, and analysis of data that's too large or complex for traditional database systems. You can use Oozie for orchestration in these scenarios. | Databases |
Choose a data pipeline orchestration technology | Learn about the key orchestration capabilities of Oozie. | Databases |
Apache Solr
Architecture | Summary | Technology focus |
---|---|---|
Choose a search data store | Learn about the capabilities of search data stores in Azure and the key criteria for choosing one that best matches your needs. Learn about the key capabilities of HDInsight with Solr. | Databases |
Apache Spark
Architecture | Summary | Technology focus |
---|---|---|
Analytics end-to-end with Azure Synapse | Learn how to use Azure Data Services to build a modern analytics platform capable of handling the most common data challenges. The Spark Pools analytics engine is available from Azure Synapse workspaces. | Analytics |
Batch scoring of Spark on Azure Databricks | Build a scalable solution for batch scoring an Apache Spark classification model. | AI |
Big data architectures | Learn about big data architectures that handle the ingestion, processing, and analysis of data that's too large or complex for traditional database systems. You can use Spark for batch or stream processing and as an analytical data store. | Databases |
Choose a batch processing technology | Compare technology choices for big data batch processing in Azure, including options for implementing Spark. | Analytics |
Choose a stream processing technology | Compare options for real-time message stream processing in Azure, including options for implementing Spark. | Analytics |
Choose an analytical data store | Evaluate analytical data store options for big data in Azure. Learn about the capabilities of Azure Synapse Spark pools. | Analytics |
Data science and machine learning with Azure Databricks | Improve operations by using Azure Databricks, Delta Lake, and MLflow for data science and machine learning. Develop, train, and deploy machine learning models. Azure Databricks provides managed Spark clusters. | AI |
Extract, transform, and load | Learn about extract-transform-load (ETL) and extract-load-transform (ELT) data transformation pipelines and how to use control flows and data flows. In ELT, you can use Spark to query source data. You can also use it together with Hadoop as a data store. | Databases |
IoT using Azure Cosmos DB | Learn how to use Azure Cosmos DB to accommodate diverse and unpredictable IoT workloads without sacrificing ingestion or query performance. Azure Databricks, running Spark Streaming, processes event data from devices. | IoT |
Loan charge-off predictions with HDInsight Spark | Use HDInsight and machine learning to predict the likelihood of loans getting charged off. | Databases |
Many models machine learning with Spark | Learn about many models machine learning in Azure. | AI |
Microsoft machine learning products | Compare options for building, deploying, and managing your machine learning models, including the Azure Databricks Spark-based analytics platform and SynapseML. | AI |
Modern data warehouse for small and medium businesses | Use Azure Synapse, Azure SQL Database, and Azure Data Lake Storage to modernize SMB legacy and on-premises data. Tools in the Azure Synapse workspace can use Spark compute capabilities to process data. | Analytics |
Natural language processing technology | Choose a natural language processing service for sentiment analysis, topic and language detection, key phrase extraction, and document categorization. Learn about the key capabilities of Azure HDInsight with Spark. | AI |
Observability patterns and metrics | Learn how to use observability patterns and metrics to improve the processing performance of a big data system by using Azure Databricks. The Azure Databricks monitoring library streams Spark events and Spark Structured Streaming metrics from jobs. | Databases |
Stream processing with fully managed open-source data engines | Stream events by using fully managed Azure data services. Use open-source technologies like Spark, Kafka, Kubernetes, Cassandra, PostgreSQL, and Redis components. | Analytics |
Apache Sqoop
Architecture | Summary | Technology focus |
---|---|---|
Big data architectures | Learn about big data architectures that handle the ingestion, processing, and analysis of data that's too large or complex for traditional database systems. In these scenarios, you can use Sqoop to automate orchestration workflows. | Databases |
Choose a data transfer technology | Learn about data transfer options like Azure Import/Export, Data Box, and Sqoop. | Databases |
Apache ZooKeeper
Architecture | Summary | Technology focus |
---|---|---|
Apache NiFi on Azure | Automate data flows with NiFi on Azure. Use a scalable, highly available solution to move data into the cloud or storage and between cloud systems. In this solution, NiFi uses ZooKeeper to coordinate the flow of data. | Analytics |
Helm-based deployments for Apache NiFi | Use Helm charts when you deploy NiFi on AKS. Helm streamlines the process of installing and managing Kubernetes applications. In this architecture, ZooKeeper provides cluster coordination. | Analytics |
Rate Limiting pattern | Use a rate limiting pattern to avoid or minimize throttling errors. In this scenario, you can use ZooKeeper to create a system that grants temporary leases to capacity. | Integration |
Related resources
- Microsoft partner and non-open-source third-party scenarios on Azure
- Scenarios featuring Microsoft on-premises technologies
- Architecture for startups
- Azure and Power Platform scenarios
- Azure and Microsoft 365 scenarios
- Azure and Dynamics 365 scenarios
- Azure for AWS professionals
- Azure for Google Cloud professionals