DBFS API 2.0
The DBFS API is a Databricks API that makes it simple to interact with various data sources without having to include your credentials every time you read a file. See What is the Databricks File System (DBFS)? for more information. For an easy to use command line client of the DBFS API, see Databricks CLI setup & documentation.
Note
To ensure high quality of service under heavy load, Azure Databricks is now enforcing API rate limits for DBFS API calls. Limits are set per workspace to ensure fair usage and high availability. Automatic retries are available using Databricks CLI version 0.12.0 and above. We advise all customers to switch to the latest Databricks CLI version.
Important
To access Databricks REST APIs, you must authenticate.
Limitations
Using the DBFS API with firewall enabled storage containers is not supported. Databricks recommends you use Databricks Connect or az storage.
Add block
Endpoint | HTTP Method |
---|---|
2.0/dbfs/add-block |
POST |
Append a block of data to the stream specified by the input handle. If the handle does not
exist, this call will throw an exception with RESOURCE_DOES_NOT_EXIST
. If the block
of data exceeds 1 MB, this call will throw an exception with MAX_BLOCK_SIZE_EXCEEDED
. A
typical workflow for file upload would be:
- Call create and get a handle.
- Make one or more
add-block
calls with the handle you have. - Call close with the handle you have.
Example
curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/add-block \
--data '{ "data": "SGVsbG8sIFdvcmxkIQ==", "handle": 1234567890123456 }'
{}
Request structure
Field Name | Type | Description |
---|---|---|
handle | INT64 |
The handle on an open stream. This field is required. |
data | BYTES |
The base64-encoded data to append to the stream. This has a limit of 1 MB. This field is required. |
Close
Endpoint | HTTP Method |
---|---|
2.0/dbfs/close |
POST |
Close the stream specified by the input handle. If the handle does not exist, this call throws an exception with RESOURCE_DOES_NOT_EXIST
. A
typical workflow for file upload would be:
- Call create and get a handle.
- Make one or more add-block calls with the handle you have.
- Call
close
with the handle you have.
Example
curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/close \
--data '{ "handle": 1234567890123456 }'
If the call succeeds, no output displays.
Request structure
Field Name | Type | Description |
---|---|---|
handle | INT64 |
The handle on an open stream. This field is required. |
Create
Endpoint | HTTP Method |
---|---|
2.0/dbfs/create |
POST |
Open a stream to write to a file and returns a handle to this stream. There is a 10 minute
idle timeout on this handle. If a file or directory already exists on the given path and
overwrite
is set to false, this call throws an exception with RESOURCE_ALREADY_EXISTS
. A
typical workflow for file upload would be:
- Call
create
and get a handle. - Make one or more add-block calls with the handle you have.
- Call close with the handle you have.
Example
curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/create \
--data '{ "path": "/tmp/HelloWorld.txt", "overwrite": true }'
{ "handle": 1234567890123456 }
Request structure
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the new file. The path should be the absolute DBFS path (for example/mnt/my-file.txt ). This field is required. |
overwrite | BOOL |
The flag that specifies whether to overwrite existing file or files. |
Response structure
Field Name | Type | Description |
---|---|---|
handle | INT64 |
Handle which should subsequently be passed into the add-block and close calls when writing to a file through a stream. |
Delete
Endpoint | HTTP Method |
---|---|
2.0/dbfs/delete |
POST |
Delete the file or directory (optionally recursively delete all files in the directory). This
call throws an exception with IO_ERROR
if the path is a non-empty directory
and recursive is set to false or on other similar errors.
When you delete a large number of files, the delete operation is done in increments. The call returns a response after approximately 45 seconds with an error message (503 Service Unavailable) asking you to re-invoke the delete operation until the directory structure is fully deleted. For example:
{
"error_code": "PARTIAL_DELETE",
"message": "The requested operation has deleted 324 files. There are more files remaining. You must make another request to delete more."
}
For operations that delete more than 10K files, we discourage using the DBFS REST API, but advise you to perform such operations in the context of a cluster, using the File system utility (dbutils.fs). dbutils.fs
covers the functional scope of the DBFS REST API, but from notebooks. Running such operations using notebooks provides better control and manageability, such as selective deletes, and the possibility to automate periodic delete jobs.
Example
curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/delete \
--data '{ "path": "/tmp/HelloWorld.txt" }'
{}
Request structure
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the file or directory to delete. The path should be the absolute DBFS path (e.g. /mnt/foo/ ). This field is required. |
recursive | BOOL |
Whether or not to recursively delete the directory’s contents. Deleting empty directories can be done without providing the recursive flag. |
Get status
Endpoint | HTTP Method |
---|---|
2.0/dbfs/get-status |
GET |
Get the file information of a file or directory. If the file or directory does not exist,
this call throws an exception with RESOURCE_DOES_NOT_EXIST
.
Example
curl --netrc -X GET \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/get-status \
--data '{ "path": "/tmp/HelloWorld.txt" }' \
| jq .
{
"path": "/tmp/HelloWorld.txt",
"is_dir": false,
"file_size": 13,
"modification_time": 1622054945000
}
Request structure
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the file or directory. The path should be the absolute DBFS path (for example, /mnt/my-folder/ ). This field is required. |
Response structure
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the file or directory. |
is_dir | BOOL |
Whether the path is a directory. |
file_size | INT64 |
The length of the file in bytes or zero if the path is a directory. |
modification_time | INT64 |
The last time, in epoch milliseconds, the file or directory was modified. |
List
Endpoint | HTTP Method |
---|---|
2.0/dbfs/list |
GET |
List the contents of a directory, or details of the file. If the file or directory does not
exist, this call throws an exception with RESOURCE_DOES_NOT_EXIST
.
When calling list
on a large directory, the list
operation will time out after approximately 60 seconds. We strongly recommend using list
only on directories containing less than 10K files and discourage using the DBFS REST API for operations that list more than 10K files. Instead, we recommend that you perform such operations in the context of a cluster, using the File system utility (dbutils.fs), which provides the same functionality without timing out.
Example
curl --netrc -X GET \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/list \
--data '{ "path": "/tmp" }' \
| jq .
{
"files": [
{
"path": "/tmp/HelloWorld.txt",
"is_dir": false,
"file_size": 13,
"modification_time": 1622054945000
},
{
"..."
}
]
}
Request structure
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the file or directory. The path should be the absolute DBFS path (e.g. /mnt/foo/ ). This field is required. |
Response structure
Field Name | Type | Description |
---|---|---|
files | An array of FileInfo | A list of FileInfo that describe contents of directory or file. |
Mkdirs
Endpoint | HTTP Method |
---|---|
2.0/dbfs/mkdirs |
POST |
Create the given directory and necessary parent directories if they do not exist. If there
exists a file (not a directory) at any prefix of the input path, this call throws an
exception with RESOURCE_ALREADY_EXISTS
. If this operation fails it may
have succeeded in creating some of the necessary parent directories.
Example
curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/mkdirs \
--data '{ "path": "/tmp/my-new-dir" }'
{}
Request structure
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the new directory. The path should be the absolute DBFS path (for example,/mnt/my-folder/ ). This field is required. |
Move
Endpoint | HTTP Method |
---|---|
2.0/dbfs/move |
POST |
Move a file from one location to another location within DBFS. If the source file does not
exist, this call throws an exception with RESOURCE_DOES_NOT_EXIST
. If there
already exists a file in the destination path, this call throws an exception with
RESOURCE_ALREADY_EXISTS
. If the given source path is a directory, this call
always recursively moves all files.
When moving a large number of files, the API call will time out after approximately 60 seconds, potentially resulting in partially moved data. Therefore, for operations that move more than 10K files, we strongly discourage using the DBFS REST API. Instead, we recommend that you perform such operations in the context of a cluster, using the File system utility (dbutils.fs) from a notebook, which provides the same functionality without timing out.
Example
curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/move \
--data '{ "source_path": "/tmp/HelloWorld.txt", "destination_path": "/tmp/my-new-dir/HelloWorld.txt" }'
{}
Request structure
Field Name | Type | Description |
---|---|---|
source_path | STRING |
The source path of the file or directory. The path should be the absolute DBFS path (for example, /mnt/my-source-folder/ ). This field is required. |
destination_path | STRING |
The destination path of the file or directory. The path should be the absolute DBFS path (for example, /mnt/my-destination-folder/ ). This field is required. |
Put
Endpoint | HTTP Method |
---|---|
2.0/dbfs/put |
POST |
Upload a file through the use of multipart form post. It is mainly used for streaming uploads, but can also be used as a convenient single call for data upload.
The amount of data that can be passed using the contents
parameter is limited to
1 MB if specified as a string (MAX_BLOCK_SIZE_EXCEEDED
is thrown if exceeded) and 2 GB as a file.
Example
To upload a local file named HelloWorld.txt
in the current directory:
curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/put \
--form contents=@HelloWorld.txt \
--form path="/tmp/HelloWorld.txt" \
--form overwrite=true
To upload content Hello, World!
as a base64 encoded string:
curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/put \
--data '{ "path": "/tmp/HelloWorld.txt", "contents": "SGVsbG8sIFdvcmxkIQ==", "overwrite": true }'
{}
Request structure
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the new file. The path should be the absolute DBFS path (e.g. /mnt/foo/ ). This field is required. |
contents | BYTES |
This parameter might be absent, and instead a posted file will be used. |
overwrite | BOOL |
The flag that specifies whether to overwrite existing files. |
Read
Endpoint | HTTP Method |
---|---|
2.0/dbfs/read |
GET |
Return the contents of a file. If the file does not exist, this call throws an exception
with RESOURCE_DOES_NOT_EXIST
. If the path is a directory, the read length is
negative, or if the offset is negative, this call throws an exception with
INVALID_PARAMETER_VALUE
. If the read length exceeds 1 MB, this call throws an
exception with MAX_READ_SIZE_EXCEEDED
. If offset + length
exceeds the number of bytes in a
file, reads contents until the end of file.
Example
Assume the content of the specified file is the string Hello, World!
. An offset
of 1
and a length
of 8
returns the base64 encoded string ZWxsbywgV28=
, which when decoded is ello, Wo
.
curl --netrc -X GET \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/dbfs/read \
--data '{ "path": "/tmp/HelloWorld.txt", "offset": 1, "length": 8 }' \
| jq .
{
"bytes_read": 8,
"data": "ZWxsbywgV28="
}
Request structure
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the file to read. The path should be the absolute DBFS path (e.g. /mnt/foo/ ). This field is required. |
offset | INT64 |
The offset to read from in bytes. |
length | INT64 |
The number of bytes to read starting from the offset. This has a limit of 1 MB, and a default value of 0.5 MB. |
Response structure
Field Name | Type | Description |
---|---|---|
bytes_read | INT64 |
The number of bytes read (could be less than length if we hit end of file). This refers to number of bytes read in unencoded version (response data is base64-encoded). |
data | BYTES |
The base64-encoded contents of the file read. |
Data structures
In this section:
FileInfo
The attributes of a file or directory.
Field Name | Type | Description |
---|---|---|
path | STRING |
The path of the file or directory. |
is_dir | BOOL |
Whether the path is a directory. |
file_size | INT64 |
The length of the file in bytes or zero if the path is a directory. |
modification_time | INT64 |
The last time, in epoch milliseconds, the file or directory was modified. |
Feedback
Submit and view feedback for