Databricks serverless SQLWarehouse fails to copy data from ADLSGen2 external stage

Question

Databricks serverless SQLWarehouse fails to copy data from ADLSGen2 external stage

Sujith Siddireddy 0

Environment Details :
Account : Azure Databricks Premium Tier Account
Workspace Region : WestUS (serverless SQL Warehouse is supported)
Cluster : 2X-Small Serverless SQL Warehouse
Unity Catalog : Enabled
JDBC Used : Databricks JDBC v2.6.29

Scenario :
The expectation is to copy a CSV file from ADLSGent2 container into a table in Databricks SQL Warehouse
Executed the following copy command

COPY INTO `default`.`sample_STAGEONE` 
FROM   
  (     
  SELECT       
  CAST(_c0 AS INT) AS `id`,
  CAST(_c1 AS INT) AS `value`
FROM 'abfss://******@suwethastriim.dfs.core.windows.net/default.sample/default.sample0.csv' WITH (CREDENTIAL (AZURE_SAS_TOKEN = [redacted]))
) FILEFORMAT = CSV FORMAT_OPTIONS('multiLine' = 'true', 'escape' = '"') COPY_OPTIONS('force' = 'true')

Query failed with an exception as mentioned below

**
Exception Stack Trace :

[Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: Failure to initialize configuration for storage account suwethastriim.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key     at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:56)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:681)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)     at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:559)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:410)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)     at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:156)     at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:51)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:64)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:388)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:373)     at java.security.AccessController.doPrivileged(Native Method)     at javax.security.auth.Subject.doAs(Subject.java:422)     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:422)     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)     at java.lang.Thread.run(Thread.java:750) Caused by: Failure to initialize configuration for storage account suwethastriim.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:682)     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2078)     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:267)     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:227)     at com.databricks.common.filesystem.LokiABFS.initialize(LokiABFS.scala:36)     at com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:119)     at com.databricks.common.filesystem.Cache.getOrCompute(Cache.scala:38)     at com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:116)     at com.databricks.common.filesystem.LokiFileSystem.$anonfun$initialize$1(LokiFileSystem.scala:162)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)     at com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:154)     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)     at com.databricks.sql.acl.fs.FixedCredentialsFileSystem.initialize(FixedCredentialsFileSystem.scala:90)     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)     at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.$anonfun$allFiles$1(PartitioningAwareFileIndex.scala:173)     at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.$anonfun$allFiles$1$adapted(PartitioningAwareFileIndex.scala:162)     at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)     at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)     at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)     at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)     at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)     at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)     at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.allFiles(PartitioningAwareFileIndex.scala:162)     at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.sizeInBytes(PartitioningAwareFileIndex.scala:154)     at org.apache.spark.sql.execution.datasources.HadoopFsRelation.sizeInBytes(HadoopFsRelation.scala:67)     at org.apache.spark.sql.execution.datasources.LogicalRelation.$anonfun$computeStats$3(LogicalRelation.scala:60)     at scala.Option.getOrElse(Option.scala:189)     at org.apache.spark.sql.execution.datasources.LogicalRelation.computeStats(LogicalRelation.scala:60)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.$anonfun$stats$1(QueryPlanStats.scala:39)     at scala.Option.getOrElse(Option.scala:189)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.stats(QueryPlanStats.scala:38)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.stats$(QueryPlanStats.scala:38)     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:33)     at org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.computeStats(DataSourceScanExec.scala:1979)     at org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.computeStats$(DataSourceScanExec.scala:1969)     at org.apache.spark.sql.execution.FileSourceScanExec.computeStats(DataSourceScanExec.scala:2095)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.$anonfun$stats$1(QueryPlanStats.scala:39)     at scala.Option.getOrElse(Option.scala:189)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.stats(QueryPlanStats.scala:38)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.stats$(QueryPlanStats.scala:38)     at org.apache.spark.sql.execution.SparkPlan.stats(SparkPlan.scala:132)     at com.databricks.sql.optimizer.EnsureRequirementsDP$$anonfun$totalScanSize$1.applyOrElse(EnsureRequirementsDP.scala:746)     at com.databricks.sql.optimizer.EnsureRequirementsDP$$anonfun$totalScanSize$1.applyOrElse(EnsureRequirementsDP.scala:742)     at scala.PartialFunction$Lifted.apply(PartialFunction.scala:228)     at scala.PartialFunction$Lifted.apply(PartialFunction.scala:224)     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1(TreeNode.scala:333)     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1$adapted(TreeNode.scala:333)     at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:292)     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:293)     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:293)     at scala.collection.Iterator.foreach(Iterator.scala:943)     at scala.collection.Iterator.foreach$(Iterator.scala:943)     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)     at scala.collection.IterableLike.foreach(IterableLike.scala:74)     at scala.collection.IterableLike.foreach$(IterableLike.scala:73)     at scala.collection.AbstractIterable.foreach(Iterable.scala:56)     at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:293)     at org.apache.spark.sql.catalyst.trees.TreeNode.collect(TreeNode.scala:333)     at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$collectWithSubqueries$1(QueryPlan.scala:605)     at scala.collection.immutable.List.flatMap(List.scala:366)     at org.apache.spark.sql.catalyst.plans.QueryPlan.collectWithSubqueries(QueryPlan.scala:605)     at com.databricks.sql.optimizer.EnsureRequirementsDP.totalScanSize(EnsureRequirementsDP.scala:742)     at com.databricks.sql.optimizer.EnsureRequirementsDP.apply(EnsureRequirementsDP.scala:764)     at com.databricks.sql.optimizer.EnsureRequirementsDP.apply(EnsureRequirementsDP.scala:575)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$4(RuleExecutor.scala:286)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:286)     at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)     at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)     at scala.collection.immutable.List.foldLeft(List.scala:91)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:283)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:266)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeEarlyStopBatch$1(RuleExecutor.scala:261)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:270)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:266)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$8(RuleExecutor.scala:353)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$8$adapted(RuleExecutor.scala:353)     at scala.collection.immutable.List.foreach(List.scala:431)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:353)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:233)     at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:858)     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$2(QueryExecution.scala:378)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:348)     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$4(QueryExecution.scala:421)     at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:926)     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:421)     at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:417)     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1038)     at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:417)     at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:374)     at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:368)     at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:475)     at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:540)     at org.apache.spark.sql.execution.QueryExecution.explainStringLocal(QueryExecution.scala:502)

**
Additional Details :

Using the same file, table schema, bucket and SAS key, copy command is running successfully

when a All Purpose Compute Cluster is used instead of Serverless SQL Warehouse

We would like to know the reason for this exception.

2 answers

Your answer

Answer 1

Hello Sujith Siddireddy,

Welcome to the Microsoft Q&A forum.

Per the error message "Failure to initialize configuration for storage account suwethastriim.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key"

it seems that there is an issue with the configuration of the storage account suwethastriim.dfs.core.windows.net and an invalid configuration value was detected for fs.azure.account.key.

Root cause:

Required configurations are not set correctly in cluster config. meaning the storage account key is incorrectly set in the Databricks configuration.

Solution:

You can check the configuration by going to the Databricks workspace, selecting the cluster you are using, and then selecting the "Advanced Options" tab and update the configurations.

User's image

If the storage account key is set correctly, there may be an issue with the account itself.

In this case, you can create a new storage account and try again.

Can you check if the storage account is correctly referenced in the data bricks workspace using this doc:https://docs.databricks.com/en/storage/azure-storage.html

config examples:

Gen2 Service Principal Auth:

fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net OAuth
fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net <application-id>
fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net <service-credential>
fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net https://login.microsoftonline.com/<directory-id>/oauth2/token

Azure Blob Storage:

# Using an account access key
spark.hadoop.fs.azure.account.key.<storage-account-name>.blob.core.windows.net <storage-account-access-key>

# Using a SAS token
spark.hadoop.fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net <complete-query-string-of-sas-for-the-container>

https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal

I hope this helps. Please let me know if you have any further questions.

If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

Answer 2

Sujith Siddireddy 0

Hi @Bhargava Gunnam ,

Thank you for the quick response.

The solution mentioned in the above response is pertaining to All Purpose Compute clusters.
But the issue we are facing is in Serverless SQL Warehouses

Screenshot 2023-08-16 at 23.56.40

We do not have provision to modify spark config in SQL Warehouse
Screenshot 2023-08-17 at 00.01.02

Instead of performing a static configuration of storage account keys in the spark config, we supply the account key with every copy command as mentioned in the below doc.
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-copy-into#syntax

It was working fine ever since we were using All Purpose Compute clusters.
Once we switched to Serverless SQL Warehouses, it started failing.

We are actually interested in known what is the root cause here.
The same account keys and copy command are working fine in All Purpose Compute clusters.

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-16T20:38:18.37+00:00

Hello Sujith Siddireddy,

Thank you for the details. I am looking into this further and get back to you with more details.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-08-18T00:26:39.0266667+00:00

Can you please try with a different JDBC version?
I wanted to check whether your current version is compatible with the Databricks SQL Warehouse.

Share via

Databricks serverless SQLWarehouse fails to copy data from ADLSGen2 external stage

2 answers

Your answer