Training
Module
Work with files and directories in a .NET app - Training
Learn how to use .NET, C#, and System.IO to work with directories, paths, files, and the file system.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
You can interact with workspace files stored in Azure Databricks programmatically. This enables tasks such as:
You can programmatically create, edit, and delete workspace files in Databricks Runtime 11.3 LTS and above.
Note
To disable writing to workspace files, set the cluster environment variable WSFS_ENABLE_WRITE_SUPPORT=false
. For more information, see Environment variables.
Note
In Databricks Runtime 14.0 and above, the the default current working directory (CWD) for code executed locally is the directory containing the notebook or script being run. This is a change in behavior from Databricks Runtime 13.3 LTS and below. See What is the default current working directory?.
Use shell commands to read the locations of files, for example, in a repo or in the local filesystem.
To determine the location of files, enter the following:
%sh ls
/databricks/driver
./Workspace/Repos/name@domain.com/public_repo_2/repos_file_system
.You can programmatically read small data files such as .csv
or .json
files from code in your notebooks. The following example uses Pandas to query files stored in a /data
directory relative to the root of the project repo:
import pandas as pd
df = pd.read_csv("./data/winequality-red.csv")
df
You can use Spark to read data files. You must provide Spark with the fully qualified path.
file:/Workspace/Repos/<user-folder>/<repo-name>/path/to/file
.file:/Workspace/Users/<user-folder>/path/to/file
.You can copy the absolute or relative path to a file from the dropdown menu next to the file:
The example below shows the use of {os.getcwd()}
to get the full path.
import os
spark.read.format("csv").load(f"file:{os.getcwd()}/my_data.csv")
To learn more about files on Azure Databricks, see Work with files on Azure Databricks.
In Databricks Runtime 11.3 LTS and above, you can directly manipulate workspace files in Azure Databricks. The following examples use standard Python packages and functionality to create and manipulate files and directories.
# Create a new directory
os.mkdir('dir1')
# Create a new file and write to it
with open('dir1/new_file.txt', "w") as f:
f.write("new content")
# Append to a file
with open('dir1/new_file.txt', "a") as f:
f.write(" continued")
# Delete a file
os.remove('dir1/new_file.txt')
# Delete a directory
os.rmdir('dir1')
Training
Module
Work with files and directories in a .NET app - Training
Learn how to use .NET, C#, and System.IO to work with directories, paths, files, and the file system.