What does it mean to build a single source of truth?
The Databricks lakehouse eliminates the need for creating and syncing copies of data across multiple systems by unifying data access and storage in a single system, establishing the lakehouse as the single source of truth (SSOT). Duplicating data often results in data silos, meaning that different teams within an organization may be working with versions of the same data that differ in quality and freshness.
How does the lakehouse control transactions and data access?
Delta Lake transactions use log files stored alongside data files to provide ACID guarantees at a table level. Because the data and log files backing Delta Lake tables live together in cloud object storage, reading and writing data can occur simultaneously without risk of many queries resulting in performance degradation or deadlock for business-critical workloads. This means that users and applications throughout the enterprise environment can connect to the same single copy of the data to drive diverse workloads, with all viewers guaranteed to receive the most current version of the data at the time their query executes.
Manage access to production data
Unity Catalog provides a centralized data governance solution that allows data stewards to provide fine-grained access control to users, groups, and service principals. Unity Catalog governs permissions using access control lists (ACLs) that provide both flexibility and specificity in configuring resources. Some configurable permissions include:
- Read-only access to a handful of tables.
- Table creation and modification permissions for a database.
- Ability to read or modify data in a specific cloud storage location.
- Access to many cloud resources through Unity Catalog managed storage credentials.
For more information, see What is Unity Catalog?.
Leverage views in the lakehouse
Views on Azure Databricks represent saved queries against data stored in tables somewhere in the lakehouse. Whereas the queries that result in tables are executed at write time, views execute defining logic each time a query against a view runs. This means that views can provide up-to-date access to data from a variety of sources, and that compute is only spent to update results as they are needed.
You can use Unity Catalog to secure and share views alongside other data objects, allowing individuals and teams to share the logic that drives key business decisions across the organization.
For more information, see What is a view?.
Share data with collaborators
While the ACLs in Unity Catalog cover a wide range of use cases for sharing data within an enterprise organization, Delta Sharing further expands this by managing read-only access to datasets that can be shared with collaborators anywhere. Use cases supported by Unity Catalog include:
- Providing real-time access to regional analytics for isolated regions of multinational corporations.
- Sharing datasets across isolated businesses that exist under the same corporate umbrella.
- Providing secure access to customer-curated datasets for third-party consumers.
On Azure Databricks, Delta Sharing comes built-in with Unity Catalog, but it is also part of open source Delta Lake. For more information, see What is Delta Sharing?.