The Databricks Data Intelligence Platform enables data practitioners throughout your organization to collaborate and productionize data solutions using shared, securely governed data assets and tools.
This article seeks to help you identify the correct starting point for your use case.
Many tasks on Azure Databricks require elevated permissions. Many organizations restrict these elevated permissions to a small number of users or teams. This article disambiguates actions that can be completed by most workspace users from actions that are restricted to privileged users.
Workspace administrators can help you determine if you should be requesting access to assets or requesting elevated permissions.
Find and access data
This section provides a brief overview of tasks to help you discover data assets available to you. Most of these tasks assume that an admin has configured permissions on data assets. See Configure data access.
Feature area
Resources
Data discovery
For a more detailed overview of data discovery tasks, see Discover data.
Catalogs
Catalogs are the top level object in the Unity Catalog data governance model. Use the Catalog Explorer to find table, views, and other data assets. See Explore database objects.
- Standard catalogs contain Unity Catalog schemas, tables, volumes, models, and other database objects. See Create catalogs. - Foreign catalogs contain federated tables from external systems. See Manage and work with foreign catalogs. - The hive_metastore catalog object contains tables that use the built-in legacy Hive metastore instead of Unity Catalog for data governance. See Work with Unity Catalog and the legacy Hive metastore.
Connected storage
If you have access to compute resources, you can use built-in commands to explore files in connected storage. See Explore storage and find data files.
In addition to tables and views, Azure Databricks uses other securable database objects such as volumes to securely govern data. See Database objects in Azure Databricks.
Data permissions
Unity Catalog governs all read and write operations in enabled workspaces. You must have adequate permissions to complete these operations. See Securable objects in Unity Catalog.
ETL
Extract, transform, and load (ETL) workloads are among the most common uses for Apache Spark and Azure Databricks, and most of the platform has features built and optimized for ETL. See Run your first ETL workload on Azure Databricks.
Queries
- All transformations, reports, analyses, or model training runs begin with a query against a table, view, or data files. You can query data using either batch or stream processing. See Query data.
- AI/BI dashboards allow you to extract and visualize insights easily in the UI. See Dashboards.
- Genie spaces use text prompts to answer questions and provide insights informed by your data. See What is an AI/BI Genie space.
Ingest
- LakeFlow Connect ingests data from popular external systems. See LakeFlow Connect.
- Auto Loader can be used with Delta Live Tables or Structured Streaming jobs to incrementally ingest data from cloud object storage. See What is Auto Loader?. - You can use Delta Live Tables or Structured Streaming to ingest data from message queues including Kafka. See Query streaming data.
Transformations
Azure Databricks uses common syntax and tooling for transformations that range in complexity from SQL CTAS statements to near real-time streaming applications. For an overview of data transformations, see Transform data.
The Databricks Data Intelligence Platform provides a suite of tools for data science, machine learning, and AI applications. See AI and machine learning on Databricks.
Configure data access
Most Azure Databricks workspaces rely on a workspace admin or other power users to configure connections to external data sources and enforce privileges to data assets based on team membership, region, or roles. This section provides an overview of common tasks for configuring and controlling data acess that require elevated permissions.
Note
Before requesting elevated permissions to configure a new connection to a data source, confirm whether you are just missing privileges on an existing connection, catalog, or table. If a data source is not available, consult with your organization for the policy for adding new data to your workspace.
Feature area
Resources
Unity Catalog
- Unity Catalog powers the data governance features built into the Databricks Data Intelligence Platform. See What is Unity Catalog?.
- Databricks account admins, workspace admins, and metastore admins have default privileges to manage Unity Catalog data privileges for users. See Manage privileges in Unity Catalog.
Connections and access
- Configuring secure connections to cloud object storage is a keystone activity, and a pre-requisite for nearly all admin and end user related tasks. See Manage access to cloud storage using Unity Catalog.
- Admins can create new catalogs. Catalogs provide a high-level abstraction for data isolation and can either be tied to individual workspaces or shared across all workspaces in an account. See Create catalogs. - AI/BI dashboards encourage owners to embed their credentials when publishing, ensuring that viewers can gain insights from shared results. For details, see Share a dashboard.
Configure workspaces and infrastructure
This section provides an overview of common tasks associated with adminstering workspace assets and infrastructure. Broadly defined, workspace assets include the following:
Compute resources: Compute resources include all-purpose interactive clusters, SQL warehouses, job clusters, and pipeline compute. A user or workload must have permissions to connect to running compute resources in order to process specified logic.
Note
Users who do not have access to connect to any compute resources have very limited functionality on Azure Databricks.
Platform tools: The Databricks Data Intelligence Platform provides a suite of tools tailored to different use cases and personas, such as notebooks, Databricks SQL, and Mosaic AI. Admins can customize settings that include default behaviors, optional features, and user access for many of these tools.
Artifacts: Artifacts include notebooks, queries, dashboards, files, libraries, pipelines, and jobs. Artifacts contain code and configurations that users author in order to perform desired actions on their data.
Important
The user who creates a workspace asset is assigned the owner role by default. For most assets, owners can grant permissions to any other user or group in the workspace.
To ensure that data and code are secure, Databricks recommends configuring the owner role for all artifacts and compute resources deployed to a production workspace.
Feature area
Resources
Workspace entitlements
Workspace entitlements include basic workspace access, access to Databricks SQL, and unrestricted cluster creation. See Manage entitlements.
Compute resource access & policies
- Most costs on Azure Databricks are for compute resources. Controlling which users have the ability to configure, deploy, start, and use various resources is vital to controlling costs. See Connect to all-purpose and jobs compute.
- Compute policies work in tandem with workspace compute entitlements to ensure that entitled users only deploy compute resources following specified configuration rules. See Create and manage compute policies. - Admins can configure default behaviors, data access policies, and user access to SQL warehouses. See SQL warehouse admin settings.
Platform tools
Use the admin console to configure behaviors ranging from customizing workspace appearance to enabling or disabling products and features. See Manage your workspace.
Workspace ACLs
Workspace access control lists (ACLs) govern how users and groups can interact with workspace assets including compute resources, code artifacts, and jobs. See Access control lists.
Productionize workloads
All Azure Databricks products are built to accelerate the path from development to production, and for scale and stability. This section provides a brief introduction to the suite of tools recommended for getting workloads into production.
Feature area
Resources
ETL pipelines
Delta Live Tables pipelines provides a declarative syntax for building and productionizing ETL pipelines. See What is Delta Live Tables?.
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.