Introduction to workspace assets
This article provides a high-level introduction to Azure Databricks workspace assets. You can view and organize all workspace assets in the workspace browser across personas. To create a workspace asset, you must use the appropriate persona’s workspace browser.
New Databricks SQL queries, dashboards, and alerts are visible in the workspace browser. To view and organize existing queries, dashboards, and articles in the workspace browser, users (or admins) must migrate them into the workspace browser. For information about migration, see Migrating existing queries, dashboards, and alerts.
Azure Databricks Data Science & Engineering and Databricks Machine Learning clusters provide a unified platform for various use cases such as running production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning. A cluster is a type of Azure Databricks compute resource. Other compute resource types include Azure Databricks SQL warehouses.
For detailed information on managing and using clusters, see Clusters.
A notebook is a web-based interface to documents containing a series of runnable cells (commands) that operate on files and tables, visualizations, and narrative text. Commands can be run in sequence, referring to the output of one or more previously run commands.
Notebooks are one mechanism for running code in Azure Databricks. The other mechanism is jobs.
For detailed information on managing and using notebooks, see Introduction to Databricks notebooks.
Jobs are one mechanism for running code in Azure Databricks. The other mechanism is notebooks.
For detailed information on managing and using jobs, see Create, run, and manage Azure Databricks Jobs.
A library makes third-party or locally-built code available to notebooks and jobs running on your clusters.
For detailed information on managing and using libraries, see Libraries.
You can import data into a distributed file system mounted into an Azure Databricks workspace and work with it in Azure Databricks notebooks and clusters. You can also use a wide variety of Apache Spark data sources to access data.
For detailed information on loading data, see Load data into the Azure Databricks Lakehouse.
This feature is in Public Preview.
In Databricks Runtime 11.2 and above, you can create and use arbitrary files in the Databricks workspace. Files can be any file type. Common examples include:
.pyfiles used in custom modules.
.mdfiles, such as
.csvor other small data files.
- Log files.
For detailed information on using files, see How to work with files on Azure Databricks. For information about how to use files to modularize your code as you develop with Databricks notebooks, see Share code between Databricks notebooks
Repos are Azure Databricks folders whose contents are co-versioned together by syncing them to a remote Git repository. Using an Azure Databricks repo, you can develop notebooks in Azure Databricks and use a remote Git repository for collaboration and version control.
For detailed information on using repos, see Git integration with Databricks Repos.
Model refers to a model registered in MLflow Model Registry. Model Registry is a centralized model store that enables you to manage the full lifecycle of MLflow models. It provides chronological model lineage, model versioning, stage transitions, and model and model version annotations and descriptions.
For detailed information on managing and using models, see MLflow Model Registry on Azure Databricks.
An MLflow experiment is the primary unit of organization and access control for MLflow machine learning model training runs; all MLflow runs belong to an experiment. Each experiment lets you visualize, search, and compare runs, as well as download run artifacts or metadata for analysis in other tools.
For detailed information on managing and using experiments, see Organize training runs with MLflow experiments.
Queries are SQL statements that allow you to interact with your data. For more information, see Queries.
Dashboards are presentations of query visualizations and commentary. For more information, see Databricks SQL dashboards.
Alerts are notifications that a field returned by a query has reached a threshold. For more information, see Alerts.
Submit and view feedback for