Training
Module
Get started with Jupyter notebooks for Python - Training
Get started using Jupyter notebooks for Python programming.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
This page describes how to develop code in Databricks notebooks, including autocomplete, automatic formatting for Python and SQL, combining Python and SQL in a notebook, and tracking the notebook version history.
For more details about advanced functionality available with the editor, such as autocomplete, variable selection, multi-cursor support, and side-by-side diffs, see Use the Databricks notebook and file editor.
When you use the notebook or the file editor, Databricks Assistant is available to help you generate, explain, and debug code. See Use Databricks Assistant for more information.
Databricks notebooks also include a built-in interactive debugger for Python notebooks. See Debug notebooks.
Databricks Assistant is a context-aware AI assistant that you can interact with using a conversational interface, making you more productive inside Databricks. You can describe your task in English and let the assistant generate Python code or SQL queries, explain complex code, and automatically fix errors. The assistant uses Unity Catalog metadata to understand your tables, columns, descriptions, and popular data assets across your company to provide personalized responses.
Databricks Assistant can help you with the following tasks:
For information about using Databricks Assistant to help you code more efficiently, see Use Databricks Assistant. For general information about Databricks Assistant, see DatabricksIQ-powered features.
To open a notebook, use the workspace Search function or use the workspace browser to navigate to the notebook and click on the notebook’s name or icon.
Use the schema browser to explore Unity Catalog objects available for the notebook. Click at the left side of the notebook to open the schema browser.
The For you button displays only those objects that you’ve used in the current session or previously marked as a Favorite.
As you type text into the Filter box, the display changes to show only those objects that contain the text you type. Only objects that are currently open or have been opened in the current session appear. The Filter box does not do a complete search of the catalogs, schemas, tables, and volumes available for the notebook.
To open the kebab menu, hover the cursor over the object’s name as shown:
If the object is a table, you can do the following:
If the object is a catalog, schema, or volume, you can copy the object’s path or open it in Catalog Explorer.
To insert a table or column name directly into a cell:
To display keyboard shortcuts, select Help > Keyboard shortcuts. The keyboard shortcuts available depend on whether the cursor is in a code cell (edit mode) or not (command mode).
You can quickly perform actions in the notebook using the command palette. To open a panel of notebook actions, click at the lower-right corner of the workspace or use the shortcut Cmd + Shift + P on MacOS or Ctrl + Shift + P on Windows.
To find and replace text within a notebook, select Edit > Find and Replace. The current match is highlighted in orange and all other matches are highlighted in yellow.
To replace the current match, click Replace. To replace all matches in the notebook, click Replace All.
To move between matches, click the Prev and Next buttons. You can also press shift+enter and enter to go to the previous and next matches, respectively.
To close the find and replace tool, click or press esc.
You can run a single cell or a collection of cells. To select a single cell, click anywhere in the cell. To select multiple cells, hold down the Command
key on MacOS or the Ctrl
key on Windows, and click in the cell outside of the text area as shown in the screen shot.
To run the selected The behavior of this command depends on the cluster that the notebook is attached to.
Important
This feature is in Public Preview.
With Databricks Runtime 11.3 LTS and above, you can create and manage source code files in the Azure Databricks workspace, and then import these files into your notebooks as needed.
For more information on working with source code files, see Share code between Databricks notebooks and Work with Python and R modules.
You can highlight code or SQL statements in a notebook cell and run only that selection. This is useful when you want to quickly iterate on code and queries.
Highlight the lines you want to run.
Select Run > Run selected text or use the keyboard shortcut Ctrl
+Shift
+Enter
. If no text is highlighted, Run Selected Text executes the current line.
If you are using mixed languages in a cell, you must include the %<language>
line in the selection.
Run selected text also executes collapsed code, if there is any in the highlighted selection.
Special cell commands such as %run
, %pip
, and %sh
are supported.
You cannot use Run selected text on cells that have multiple output tabs (that is, cells where you have defined a data profile or visualization).
Azure Databricks provides tools that allow you to format Python and SQL code in notebook cells quickly and easily. These tools reduce the effort to keep your code formatted and help to enforce the same coding standards across your notebooks.
Important
This feature is in Public Preview.
Azure Databricks supports Python code formatting using black within the notebook. The notebook must be attached to a cluster with black
and tokenize-rt
Python packages installed.
On Databricks Runtime 11.3 LTS and above, Azure Databricks preinstalls black
and tokenize-rt
. You can use the formatter directly without needing to install these libraries.
On Databricks Runtime 10.4 LTS and below, you must install black==22.3.0
and tokenize-rt==4.2.1
from PyPI on your notebook or cluster to use the Python formatter. You can run the following command in your notebook:
%pip install black==22.3.0 tokenize-rt==4.2.1
or install the library on your cluster.
For more details about installing libraries, see Python environment management.
For files and notebooks in Databricks Git folders, you can configure the Python formatter based on the pyproject.toml
file. To use this feature, create a pyproject.toml
file in the Git folder root directory and configure it according to the Black configuration format. Edit the [tool.black] section in the file. The configuration is applied when you format any file and notebook in that Git folder.
You must have CAN EDIT permission on the notebook to format code.
Azure Databricks uses the Gethue/sql-formatter library to format SQL and the black code formatter for Python.
You can trigger the formatter in the following ways:
Format a single cell
%sql
language magic.%python
language magic.Format multiple cells
Select multiple cells and then select Edit > Format Cell(s). If you select cells of more than one language, only SQL and Python cells are formatted. This includes those that use %sql
and %python
.
Format all Python and SQL cells in the notebook
Select Edit > Format Notebook. If your notebook contains more than one language, only SQL and Python cells are formatted. This includes those that use %sql
and %python
.
Azure Databricks notebooks maintain a history of notebook versions, allowing you to view and restore previous snapshots of the notebook. You can perform the following actions on versions: add comments, restore and delete versions, and clear version history.
You can also sync your work in Databricks with a remote Git repository.
To access notebook versions, click in the right sidebar. The notebook version history appears. You can also select File > Version history.
To add a comment to the latest version:
Click the version.
Click Save now.
In the Save Notebook Version dialog, enter a comment.
Click Save. The notebook version is saved with the entered comment.
To restore a version:
Click the version.
Click Restore this version.
Click Confirm. The selected version becomes the latest version of the notebook.
To delete a version entry:
Click the version.
Click the trash icon .
Click Yes, erase. The selected version is deleted from the history.
The version history cannot be recovered after it has been cleared.
To clear the version history for a notebook:
The default language for the notebook appears next to the notebook name.
To change the default language, click the language button and select the new language from the dropdown menu. To ensure that existing commands continue to work, commands of the previous default language are automatically prefixed with a language magic command.
By default, cells use the default language of the notebook. You can override the default language in a cell by clicking the language button and selecting a language from the dropdown menu.
Alternately, you can use the language magic command %<language>
at the beginning of a cell. The supported magic commands are: %python
, %r
, %scala
, and %sql
.
Note
When you invoke a language magic command, the command is dispatched to the REPL in the execution context for the notebook. Variables defined in one language (and hence in the REPL for that language) are not available in the REPL of another language. REPLs can share state only through external resources such as files in DBFS or objects in object storage.
Notebooks also support a few auxiliary magic commands:
%sh
: Allows you to run shell code in your notebook. To fail the cell if the shell command has a non-zero exit status, add the -e
option. This command runs only on the Apache Spark driver, and not the workers. To run a shell command on all nodes, use an init script.%fs
: Allows you to use dbutils
filesystem commands. For example, to run the dbutils.fs.ls
command to list files, you can specify %fs ls
instead. For more information, see Work with files on Azure Databricks.%md
: Allows you to include various types of documentation, including text, images, and mathematical formulas and equations. See the next section.Syntax highlighting and SQL autocomplete are available when you use SQL inside a Python command, such as in a spark.sql
command.
In a Databricks notebook, results from a SQL language cell are automatically made available as an implicit DataFrame assigned to the variable _sqldf
. You can then use this variable in any Python and SQL cells you run afterward, regardless of their position in the notebook.
Note
This feature has the following limitations:
_sqldf
variable is not available in notebooks that use a SQL warehouse for compute._sqldf
in subsequent Python cells is supported in Databricks Runtime 13.3 and above._sqldf
in subsequent SQL cells is only supported on Databricks Runtime 14.3 and above.CACHE TABLE
or UNCACHE TABLE
, the _sqldf
variable is not available.The screenshot below shows how _sqldf
can be used in subsequent Python and SQL cells:
Important
The variable _sqldf
is reassigned each time a SQL cell is run. To avoid losing reference to a specific DataFrame result, assign it to a new variable name before you run the next SQL cell:
new_dataframe_name = _sqldf
ALTER VIEW _sqldf RENAME TO new_dataframe_name
While a command is running and your notebook is attached to an interactive cluster, you can run a SQL cell simultaneously with the current command. The SQL cell is executed in a new, parallel session.
To execute a cell in parallel:
Click Run now. The cell is immediately executed.
Because the cell is run in a new session, temporary views, UDFs, and the implicit Python DataFrame (_sqldf
) are not supported for cells that are executed in parallel. In addition, the default catalog and database names are used during parallel execution. If your code refers to a table in a different catalog or database, you must specify the table name using three-level namespace (catalog
.schema
.table
).
You can run SQL commands in a Databricks notebook on a SQL warehouse, a type of compute that is optimized for SQL analytics. See Use a notebook with a SQL warehouse.
Azure Databricks supports the display of images in Markdown cells. You can display images stored in the Workspace, Volumes, or FileStore.
You can use either absolute paths or relative paths to display images stored in the Workspace. To display an image stored in the Workspace, use the following syntax:
%md
![my_test_image](/Workspace/absolute/path/to/image.png)
![my_test_image](./relative/path/to/image.png)
You can use absolute paths to display images stored in Volumes. To display an image stored in Volumes, use the following syntax:
%md
![my_test_image](/Volumes/absolute/path/to/image.png)
To display images stored in the FileStore, use the following syntax:
%md
![my_test_image](files/image.png)
For example, suppose you have the Databricks logo image file in FileStore:
dbfs ls dbfs:/FileStore/
databricks-logo-mobile.png
When you include the following code in a Markdown cell:
the image is rendered in the cell:
You can drag and drop images from your local file system into Markdown cells. The image is uploaded to the current Workspace directory and displayed in the cell.
Notebooks support KaTeX for displaying mathematical formulas and equations. For example,
%md
\\(c = \\pm\\sqrt{a^2 + b^2} \\)
\\(A{_i}{_j}=B{_i}{_j}\\)
$$c = \\pm\\sqrt{a^2 + b^2}$$
\\[A{_i}{_j}=B{_i}{_j}\\]
renders as:
and
%md
\\( f(\beta)= -Y_t^T X_t \beta + \sum log( 1+{e}^{X_t\bullet\beta}) + \frac{1}{2}\delta^t S_t^{-1}\delta\\)
where \\(\delta=(\beta - \mu_{t-1})\\)
renders as:
You can include HTML in a notebook by using the function displayHTML
. See HTML, D3, and SVG in notebooks for an example of how to do this.
Note
The displayHTML
iframe is served from the domain databricksusercontent.com
and the iframe sandbox includes the allow-same-origin
attribute. databricksusercontent.com
must be accessible from your browser. If it is currently blocked by your corporate network, it must added to an allow list.
You can link to other notebooks or folders in Markdown cells using relative paths. Specify the href
attribute of an anchor tag as the relative path, starting with a $
and then follow the same
pattern as in Unix file systems:
%md
<a href="$./myNotebook">Link to notebook in same folder as current notebook</a>
<a href="$../myFolder">Link to folder in parent folder of current notebook</a>
<a href="$./myFolder2/myNotebook2">Link to nested notebook</a>
Training
Module
Get started with Jupyter notebooks for Python - Training
Get started using Jupyter notebooks for Python programming.