Manage files in Unity Catalog volumes with the Databricks ODBC Driver

This article describes how to upload, download, and delete files in Unity Catalog volumes using the Databricks ODBC Driver.

Requirements

  • Databricks ODBC Driver versions 2.8.2 or above.
  • By default, native query mode is enabled. Otherwise, add UseNativeQuery property to the ODBC connection string, setting its value to 1 or 2.

Upload a file

To upload a file to a volume, you must add the StagingAllowedLocalPaths property to the ODBC connection string, setting this property’s value to the path of the file to upload. To upload multiple files from separate locations, set this property to a comma-separated list of paths, for example /tmp/,/usr/tmp/.

To override the contents of any existing file in the specified upload location, add OVERWRITE.

The following Python code snippet shows how to upload a file to a volume.

conn_string = "".join([
    "DRIVER=", os.getenv("ODBC_DRIVER", "/Library/simba/spark/lib/libsparkodbc_sbu.dylib"),
    ";Host=", os.getenv("ODBC_HOST_NAME", "<<HOST_NAME>>"),
    ";PORT=443",
    ";HTTPPath=", os.getenv("ODBC_HTTP_PATH", "/sql/1.0/endpoints/1234567890"),
    ";AuthMech=11",
    ";SSL=1",
    ";ThriftTransport=2",
    ";SparkServerType=3",
    ";Auth_Flow=0",
    ";Auth_AccessToken=", os.getenv("API_TOKEN", "<<NO_ACCESS_TOKEN_IS_SET>>"),
    ";StagingAllowedLocalPaths=", "/tmp"),
    os.getenv("ODBC_OPTIONS", ""),
])

conn = pyodbc.connect(conn_string, autocommit=True)
cursor = conn.cursor()
cursor.execute("PUT '" +
    "/tmp/my-data.csv" +
    "' INTO '" +
    "/Volumes/main/default/my-volume/my-data.csv" +
    "' OVERWRITE")

Download a file

The following Python code snippet shows how to download a file from a volume.

conn = pyodbc.connect(conn_string, autocommit=True)
cursor = conn.cursor()
cursor.execute("GET '" +
    "/Volumes/main/default/my-volume/my-data.csv" +
    "' TO '" +
    "/tmp/my-downloaded-data.csv" +
    "'")

Delete a file

The following Python code snippet shows how to delete a file from a volume.

conn = pyodbc.connect(conn_string, autocommit=True)
cursor = conn.cursor()
cursor.execute("REMOVE '" +
    "/Volumes/main/default/my-volume/my-data.csv" +
    "'")