Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Note
Writes to external data sources using bundled Spark connectors on serverless compute are in Public Preview.
When writing to an external data source using a bundled connector on serverless compute, only a subset of connector options are supported. The following tables list the supported options per connector.
For setup instructions and examples, see Spark data sources.
PostgreSQL
The following options are supported when writing to PostgreSQL on serverless compute.
| Option | Description |
|---|---|
host |
Hostname of the PostgreSQL server. |
port |
Port number. Default: 5432. |
database |
Name of the database to connect to. |
connectTimeout |
Maximum time in seconds to wait for a connection. 0 disables the timeout. |
user |
Database username. |
password |
Database password. |
dbtable |
Target table name. Supports schema-qualified names (for example, myschema.mytable). |
batchsize |
Number of rows to insert per batch. Default: 1000. |
numPartitions |
Number of Spark partitions for parallel write operations. |
queryTimeout |
Maximum time in seconds to wait for a query to complete. 0 disables the timeout. |
isolationLevel |
Transaction isolation level: NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE. Default: READ_UNCOMMITTED. |
truncate |
If true, truncates the target table in overwrite mode instead of dropping and recreating it. Default: false. |
cascadeTruncate |
If true, cascades the truncation to tables with foreign key references to the target table. Default: false. |
SQL Server
The following options are supported when writing to SQL Server on serverless compute.
| Option | Description |
|---|---|
host |
Hostname of the SQL Server instance. |
port |
Port number. Default: 1433. |
database |
Name of the database to connect to. |
connectionTimeout |
Maximum time in seconds to wait for a connection. 0 disables the timeout. |
encrypt |
If true, encrypts all data sent between the client and server using TLS. Default: false. |
trustServerCertificate |
If true, trusts the server's TLS certificate without validation. For development environments only. Default: false. |
debug |
If true, enables verbose debug logging for the connector. Default: false. |
user |
Database username. |
password |
Database password. |
authentication |
Authentication type. Supported values: SqlPassword, ActiveDirectoryPassword, ActiveDirectoryMSI. |
dbtable |
Target table name. Supports schema-qualified names (for example, myschema.mytable). |
batchsize |
Number of rows to insert per batch. Default: 1000. |
numPartitions |
Number of Spark partitions for parallel write operations. |
queryTimeout |
Maximum time in seconds to wait for a query to complete. 0 disables the timeout. |
isolationLevel |
Transaction isolation level: NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE. Default: READ_UNCOMMITTED. |
truncate |
If true, truncates the target table in overwrite mode instead of dropping and recreating it. Default: false. |
MySQL
The following options are supported when writing to MySQL on serverless compute.
| Option | Description |
|---|---|
host |
Hostname of the MySQL server. |
port |
Port number. Default: 3306. |
database |
Name of the database to connect to. |
connectionTimeout |
Maximum time in seconds to wait for a connection. 0 disables the timeout. |
requireSSL |
If true, requires an SSL-encrypted connection to the server. Default: false. |
useSSL |
If true, enables SSL for the connection when supported by the server. Default: false. |
user |
Database username. |
password |
Database password. |
dbtable |
Target table name. Supports schema-qualified names (for example, myschema.mytable). |
batchsize |
Number of rows to insert per batch. Default: 1000. |
numPartitions |
Number of Spark partitions for parallel write operations. |
queryTimeout |
Maximum time in seconds to wait for a query to complete. 0 disables the timeout. |
isolationLevel |
Transaction isolation level: NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE. Default: READ_UNCOMMITTED. |
truncate |
If true, truncates the target table in overwrite mode instead of dropping and recreating it. Default: false. |
cascadeTruncate |
If true, cascades the truncation to tables with foreign key references to the target table. Default: false. |
Snowflake
The following sections list the supported options for the Snowflake connector, organized by function.
Connection
The following options configure the connection to Snowflake and control session behavior.
| Option | Description |
|---|---|
host |
Snowflake account hostname (for example, <account>.snowflakecomputing.com). |
port |
Port number. Default: 443. |
sfaccount |
Snowflake account identifier. |
sfauthenticator |
Authentication method: snowflake (password), oauth (token), or snowflake_jwt (key pair). Default: snowflake. |
networktimeout |
Timeout in seconds for network operations. |
sftimezone |
Timezone for timestamp operations (for example, America/New_York). |
client_session_keep_alive |
If true, sends keepalive signals to prevent session timeout during long-running operations. Default: false. |
ocspfailopen |
If true, allows connections to proceed when OCSP certificate validation is unavailable (fail-open mode). Default: true. |
Authentication
The following options supply credentials for the authentication method configured in sfauthenticator. Staging credentials (temporary_aws_*, awsaccesskey, temporary_azure_sas_token) are required when Snowflake stages write data through cloud storage.
| Option | Description |
|---|---|
sfuser |
Snowflake username. |
sfpassword |
Snowflake password. Used when sfauthenticator is snowflake. |
sfToken |
OAuth access token. Used when sfauthenticator is oauth. |
pem_private_key |
Private key in PEM format for key-pair authentication. Used when sfauthenticator is snowflake_jwt. |
temporary_aws_access_key_id |
Temporary AWS access key ID for S3 staging. Preferred over awsaccesskey when using short-lived credentials. |
temporary_aws_secret_access_key |
Temporary AWS secret access key for S3 staging. |
temporary_aws_session_token |
Temporary AWS session token for S3 staging. |
temporary_azure_sas_token |
Temporary Azure SAS token for Azure Blob Storage staging. |
awsaccesskey |
AWS access key for S3 staging. |
awssecretkey |
AWS secret key for S3 staging. |
Target
The following options specify the Snowflake database, schema, warehouse, and table to write to.
| Option | Description |
|---|---|
sfdatabase |
Snowflake database name. |
sfschema |
Snowflake schema name. |
sfwarehouse |
Snowflake virtual warehouse used for query execution. |
sfrole |
Snowflake role for the session. |
dbtable |
Target table name. |
Write behavior
The following options control how data is written to the target Snowflake table.
| Option | Description |
|---|---|
column_mapping |
How DataFrame columns are matched to Snowflake table columns: name (by column name) or position (by column order). Default: name. |
column_mismatch_behavior |
Behavior when DataFrame and table columns don't align: error or ignore. Default: error. |
truncate_table |
If true, truncates the target table before writing. Default: false. |
usestagingtable |
If true, stages data in a temporary table before swapping into the target, enabling atomic writes. Default: true. |
internal_execute_query_in_sync_mode |
If true, executes Snowflake queries synchronously. Default: false. |
autopushdown |
If true, pushes filter and aggregation operations down to Snowflake for execution. Default: true. |
Redshift
The following sections list the supported options for the Redshift connector, organized by function.
Connection
The following options configure the connection to the Redshift cluster.
| Option | Description |
|---|---|
host |
Redshift cluster endpoint hostname. |
port |
Port number. Default: 5439. |
database |
Redshift database name. |
connectionTimeout |
Maximum time in seconds to wait for a connection. |
Authentication
The following options configure credentials for Redshift and for the S3 staging location that Redshift uses during write operations.
| Option | Description |
|---|---|
user |
Redshift username. |
password |
Redshift password. |
aws_iam_role |
ARN of the IAM role that Redshift uses to access S3 for staging data. |
temporary_aws_access_key_id |
Temporary AWS access key ID for S3 staging. Preferred over long-lived credentials. |
temporary_aws_secret_access_key |
Temporary AWS secret access key for S3 staging. |
temporary_aws_session_token |
Temporary AWS session token for S3 staging. |
forward_spark_s3_credentials |
If true, forwards Spark's S3 credentials to Redshift for staging. Use only when Spark and Redshift share the same S3 credentials. Default: false. |
Write behavior
The following options control how data is written to the target Redshift table, including distribution, sort keys, and staging format.
| Option | Description |
|---|---|
dbtable |
Target table name. Supports schema-qualified names (for example, myschema.mytable). |
batchsize |
Number of rows per batch insert. Default: 1000. |
numPartitions |
Number of Spark partitions for parallel write operations. |
queryTimeout |
Maximum time in seconds to wait for a query to complete. |
isolationLevel |
Transaction isolation level: NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE. Default: READ_UNCOMMITTED. |
diststyle |
Redshift distribution style: EVEN, KEY, or ALL. |
distkey |
Column to use as the distribution key. Required when diststyle is KEY. |
sortkeyspec |
Sort key specification for the Redshift table (for example, SORTKEY(col1, col2)). |
csvnullstring |
String written in staging CSV files to represent NULL values. Default: empty string. |
tempformat |
Staging file format: CSV or AVRO. Default: CSV. |
truncate |
If true, truncates the target table in overwrite mode instead of dropping and recreating it. Default: false. |
Write to PostgreSQL on serverless compute
This example uses append mode and retrieves credentials from a Databricks secret scope.
df.write \
.format("postgresql") \
.option("host", dbutils.secrets.get(scope="<scope>", key="<host>")) \
.option("port", "<port>") \
.option("database", "<database-name>") \
.option("dbtable", "<table-name>") \
.option("user", dbutils.secrets.get(scope="<scope>", key="<user>")) \
.option("password", dbutils.secrets.get(scope="<scope>", key="<password>")) \
.mode("append") \
.save()
Next steps
- Spark data sources: Setup instructions, code examples, and a comparison of Spark integration strategies.
- JDBC connection: Use a Unity Catalog connection with a JDBC driver for options not supported by bundled connectors on serverless, or for data sources without a bundled connector.
- Spark API options reference: Reference for DataFrameReader, DataFrameWriter, and streaming options for file formats and streaming sources.