Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In a big data architecture, there's often a need for an analytical data store that serves processed data in a structured format that can be queried by using analytical tools. Analytical data stores that support querying of both hot-path and cold-path data are collectively referred to as the serving layer, or data serving storage.
The serving layer handles processed data from both the hot path and the cold path. In the Lambda architecture, the serving layer is subdivided into two layers. The speed serving layer contains the incrementally processed data. The batch serving layer contains the batch-processed output. The serving layer requires strong support for random reads that have low latency. Data storage for the speed layer should also support random writes because batch loading data into this store introduces undesired delays. Alternatively, data storage for the batch layer needs to support batch writes, not random writes.
There's no single best data management choice for all data storage tasks. Different data management solutions are optimized for different tasks. Most real-world cloud apps and big data processes have various data storage requirements and often use a combination of data storage solutions.
Modern analytical solutions, such as Microsoft Fabric, provide a comprehensive platform that integrates various data services and tools to meet diverse analytical needs. Fabric includes OneLake, which is a single, unified, logical data lake for your entire organization. OneLake is designed to store, manage, and secure all organizational data in one location. This flexibility allows your organization to address a wide range of data storage and processing requirements.
Choose an analytical data store
There are several options for data serving storage in Azure, depending on your needs:
- Fabric, specifically:
- Azure Databricks
- Azure SQL Database
- SQL Server in Azure VM
- Azure Analysis Services
- Azure Cosmos DB
The following database models are optimized for different types of tasks:
Key-value databases store a single serialized object for each key value. They're well-suited for managing large volumes of data when retrieval is based on a specific key, without the need to query other item properties.
Document databases are key-value databases in which the values are documents. In this context, a document is a collection of named fields and values. The database typically stores the data in a format such as XML, YAML, JSON, or binary JSON, but might use plain text. Document databases can query on non-key fields and define secondary indexes to improve querying efficiency. This capability makes a document database more suitable for applications that need to retrieve data based on criteria that's more complex than the value of the document key. For example, you could query on fields such as product ID, customer ID, or customer name.
Column store databases are key-value data stores that store each column separately on disk. A wide column store database is a type of column store database that stores column families, not only single columns. For example, a census database might have a separate column family for each of the following items:
A person's first, middle, and last name
That person's address
That person's profile information, like their date of birth or gender
The database can store each column family in a separate partition, while keeping all the data for one person related to the same key. An application can read a single column family without scanning all the data for an entity.
Graph databases store information as a collection of objects and relationships. A graph database can efficiently perform queries that traverse the network of objects and the relationships between them. For example, the objects might be employees in a human resources database, and you might want to facilitate queries such as "find all employees who directly or indirectly work for Scott."
Telemetry and time-series databases are an append-only collection of objects. Telemetry databases efficiently index data in various column stores and in-memory structures. This capability makes them the optimal choice for storing and analyzing vast quantities of telemetry and time-series data.
Fabric supports various database models, including key-value, document, column store, graph, and telemetry databases. This flexibility ensures scalability for a wide range of analytical tasks. To choose the right Fabric datastore for your analytical workloads, see Fabric decision guide: choose a datastore.
Key selection criteria
To refine the selection process, consider the following criteria:
Do you need serving storage that can serve as a hot path for your data? If yes, narrow your options to those that are optimized for a speed serving layer.
Do you need massively parallel processing support, where queries are automatically distributed across several processes or nodes? If yes, select an option that supports query scale-out.
Do you prefer to use a relational data store? If you do, narrow your options to those that have a relational database model. However, some nonrelational stores support SQL syntax for querying, and tools such as SQL Endpoint can be used to query nonrelational data stores such as OneLake.
Do you collect time-series data? Do you use append-only data? Fabric OneLake supports multiple analytical engines, including Analysis Services, T-SQL, and Apache Spark. Fabric Eventhouse makes it suitable for various data processing and querying needs of time-series data.
Capability matrix
The following tables summarize the key differences in capabilities in these managed services.
General capabilities
Capability | Fabric Lakehouse | Fabric Warehouse | Fabric Eventhouse | Fabric SQL Database | Azure SQL Database | Azure Cosmos DB | Analysis Services |
---|---|---|---|---|---|---|---|
Primary database model | Unified data lake, relational, user managed delta lake format using apache parquet | Unified data lake, relational, system managed delta lake format using apache parquet | Time series append oriented data store, graph, vector | Relational (column store format when you use columnstore indexes) | Relational (column store format when you use columnstore indexes) | Document store, graph, key-value store, wide column store | Tabular semantic models |
SQL language support | Yes1 | Yes | Yes2 | Yes | Yes | Yes | No |
Optimized for speed serving layer | Yes | Yes | Yes3 | Yes4 | Yes5 | Yes | No |
[1] T-SQL via SQL Analytics Endpoint.
[2] KQL has partial T-SQL language support.
[3] Supports queued ingestion and streaming ingestion.
[4] Supports transactional precision with low-latency access and real-time updates.
[5] Using memory-optimized tables and hash or nonclustered indexes.
Scalability capabilities
Capability | Fabric Lakehouse | Fabric Warehouse | Fabric Eventhouse | Fabric SQL Database | Azure SQL Database | Azure Cosmos DB | Analysis Services |
---|---|---|---|---|---|---|---|
Redundant regional servers for high availability | Yes1,2 | Yes1,2 | Yes | Yes | Yes | Yes | Yes |
Supports query scale-out | Yes3 | Yes4 | Yes5 | Yes | No | Yes | Yes |
Dynamic scalability (scale up) | Yes3 | Yes4 | Yes5 | Yes | Yes | Yes | Yes |
Supports in-memory caching of data | Yes6 | Yes6 | Yes7 | Yes | Yes | Yes | No |
[1] SQL Endpoints are routed via global traffic managers, but data is always processed in the assigned Fabric capacity region.
[2] Lakehouse and Warehouse store data in OneLake using Delta Parquet format, which supports querying and replication across engines.
[3] Lakehouse supports Spark-based scale-out for unstructured and structured data.
[4] Warehouse uses T-SQL and supports multi-table transactions, autonomous workload management, and distributed query processing (DQP). DQP acts like a cluster manager, dynamically allocating compute resources based on query complexity.
[5] Eventhouse supports KQL and SQL federation, enabling real-time analytics across multiple sources as well as scale up compute resources if hot cache usage exceeds ~95%.
[6] Intelligent cache for Spark jobs, in-memory caching, result set caching for SQL analytics endpoints.
[7] Frequently accessed data is stored in a hot cache which includes in-memory and SSD storage.
Security capabilities
Capability | Fabric Lakehouse | Fabric Warehouse | Fabric Eventhouse | Fabric SQL Database | Azure SQL Database | Azure Cosmos DB | Analysis Services |
---|---|---|---|---|---|---|---|
Authentication | Microsoft Entra ID | Microsoft Entra ID | Microsoft Entra ID | Microsoft Entra ID | SQL or Microsoft Entra ID | Database users or Microsoft Entra ID via access control (identity and access management) | Microsoft Entra ID |
Data encryption at rest | Yes | Yes | Yes | Yes | Yes1 | Yes | Yes |
Row-level security | Yes | Yes | Yes | Yes | Yes | No | Yes |
Supports firewalls | Yes2 | Yes2 | Yes3 | Yes | Yes | Yes | Yes |
Dynamic data masking | Yes4 | Yes4 | No | Yes | Yes | No | No |
[1] Requires you to use transparent data encryption to encrypt and decrypt your data at rest.
[2] Private Links and Entra Conditional Access can be used to restrict access to Fabric resources.
[3] Fabric Eventhouse and Real-Time Intelligence workloads can ingest data from secure sources like Kafka, Azure Event Hubs, and AMQP, with routing through secure endpoints.
[4] It can be applied at the Fabric SQL Endpoint Level
Contributors
Microsoft maintains this article. The following contributors wrote this article.
Principal authors:
- Mohit Agarwal | Principal Cloud Solution Architect
To see nonpublic LinkedIn profiles, sign in to LinkedIn.
Next steps
- Bring your Data to OneLake with Lakehouse
- Create a Fabric Warehouse
- Create an Eventhouse
- Analyze data in a relational data warehouse
- Create a single database in SQL Database
- Create an Azure Databricks workspace
- Explore Azure database and analytics services
- Query Azure Cosmos DB by using the API for NoSQL