Databricks serverless SQLWarehouse fails to copy data from ADLSGen2 external stage

Sujith Siddireddy 0 Reputation points
2023-08-16T10:55:38.09+00:00

Environment Details :
Account : Azure Databricks Premium Tier Account
Workspace Region : WestUS (serverless SQL Warehouse is supported)
Cluster : 2X-Small Serverless SQL Warehouse
Unity Catalog : Enabled
JDBC Used : Databricks JDBC v2.6.29

Scenario :
The expectation is to copy a CSV file from ADLSGent2 container into a table in Databricks SQL Warehouse
Executed the following copy command

COPY INTO `default`.`sample_STAGEONE` 
FROM   
  (     
  SELECT       
  CAST(_c0 AS INT) AS `id`,
  CAST(_c1 AS INT) AS `value`
FROM 'abfss://******@suwethastriim.dfs.core.windows.net/default.sample/default.sample0.csv' WITH (CREDENTIAL (AZURE_SAS_TOKEN = [redacted]))
) FILEFORMAT = CSV FORMAT_OPTIONS('multiLine' = 'true', 'escape' = '"') COPY_OPTIONS('force' = 'true')

Query failed with an exception as mentioned below

**
Exception Stack Trace :

[Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: Failure to initialize configuration for storage account suwethastriim.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key     at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:56)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:681)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)     at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:559)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:410)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)     at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:156)     at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:51)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:64)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:388)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:373)     at java.security.AccessController.doPrivileged(Native Method)     at javax.security.auth.Subject.doAs(Subject.java:422)     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)     at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:422)     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)     at java.lang.Thread.run(Thread.java:750) Caused by: Failure to initialize configuration for storage account suwethastriim.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:682)     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2078)     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:267)     at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:227)     at com.databricks.common.filesystem.LokiABFS.initialize(LokiABFS.scala:36)     at com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:119)     at com.databricks.common.filesystem.Cache.getOrCompute(Cache.scala:38)     at com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:116)     at com.databricks.common.filesystem.LokiFileSystem.$anonfun$initialize$1(LokiFileSystem.scala:162)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)     at com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:154)     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)     at com.databricks.sql.acl.fs.FixedCredentialsFileSystem.initialize(FixedCredentialsFileSystem.scala:90)     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)     at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.$anonfun$allFiles$1(PartitioningAwareFileIndex.scala:173)     at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.$anonfun$allFiles$1$adapted(PartitioningAwareFileIndex.scala:162)     at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)     at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)     at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)     at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)     at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)     at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)     at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.allFiles(PartitioningAwareFileIndex.scala:162)     at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.sizeInBytes(PartitioningAwareFileIndex.scala:154)     at org.apache.spark.sql.execution.datasources.HadoopFsRelation.sizeInBytes(HadoopFsRelation.scala:67)     at org.apache.spark.sql.execution.datasources.LogicalRelation.$anonfun$computeStats$3(LogicalRelation.scala:60)     at scala.Option.getOrElse(Option.scala:189)     at org.apache.spark.sql.execution.datasources.LogicalRelation.computeStats(LogicalRelation.scala:60)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.$anonfun$stats$1(QueryPlanStats.scala:39)     at scala.Option.getOrElse(Option.scala:189)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.stats(QueryPlanStats.scala:38)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.stats$(QueryPlanStats.scala:38)     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:33)     at org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.computeStats(DataSourceScanExec.scala:1979)     at org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.computeStats$(DataSourceScanExec.scala:1969)     at org.apache.spark.sql.execution.FileSourceScanExec.computeStats(DataSourceScanExec.scala:2095)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.$anonfun$stats$1(QueryPlanStats.scala:39)     at scala.Option.getOrElse(Option.scala:189)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.stats(QueryPlanStats.scala:38)     at org.apache.spark.sql.catalyst.plans.QueryPlanStats.stats$(QueryPlanStats.scala:38)     at org.apache.spark.sql.execution.SparkPlan.stats(SparkPlan.scala:132)     at com.databricks.sql.optimizer.EnsureRequirementsDP$$anonfun$totalScanSize$1.applyOrElse(EnsureRequirementsDP.scala:746)     at com.databricks.sql.optimizer.EnsureRequirementsDP$$anonfun$totalScanSize$1.applyOrElse(EnsureRequirementsDP.scala:742)     at scala.PartialFunction$Lifted.apply(PartialFunction.scala:228)     at scala.PartialFunction$Lifted.apply(PartialFunction.scala:224)     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1(TreeNode.scala:333)     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1$adapted(TreeNode.scala:333)     at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:292)     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:293)     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:293)     at scala.collection.Iterator.foreach(Iterator.scala:943)     at scala.collection.Iterator.foreach$(Iterator.scala:943)     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)     at scala.collection.IterableLike.foreach(IterableLike.scala:74)     at scala.collection.IterableLike.foreach$(IterableLike.scala:73)     at scala.collection.AbstractIterable.foreach(Iterable.scala:56)     at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:293)     at org.apache.spark.sql.catalyst.trees.TreeNode.collect(TreeNode.scala:333)     at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$collectWithSubqueries$1(QueryPlan.scala:605)     at scala.collection.immutable.List.flatMap(List.scala:366)     at org.apache.spark.sql.catalyst.plans.QueryPlan.collectWithSubqueries(QueryPlan.scala:605)     at com.databricks.sql.optimizer.EnsureRequirementsDP.totalScanSize(EnsureRequirementsDP.scala:742)     at com.databricks.sql.optimizer.EnsureRequirementsDP.apply(EnsureRequirementsDP.scala:764)     at com.databricks.sql.optimizer.EnsureRequirementsDP.apply(EnsureRequirementsDP.scala:575)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$4(RuleExecutor.scala:286)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:286)     at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)     at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)     at scala.collection.immutable.List.foldLeft(List.scala:91)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:283)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:266)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeEarlyStopBatch$1(RuleExecutor.scala:261)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:270)     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:266)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$8(RuleExecutor.scala:353)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$8$adapted(RuleExecutor.scala:353)     at scala.collection.immutable.List.foreach(List.scala:431)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:353)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:233)     at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:858)     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$2(QueryExecution.scala:378)     at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)     at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:348)     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$4(QueryExecution.scala:421)     at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:926)     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:421)     at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)     at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:417)     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1038)     at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:417)     at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:374)     at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:368)     at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:475)     at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:540)     at org.apache.spark.sql.execution.QueryExecution.explainStringLocal(QueryExecution.scala:502)

**
Additional Details :

Using the same file, table schema, bucket and SAS key, copy command is running successfully

when a All Purpose Compute Cluster is used instead of Serverless SQL Warehouse

We would like to know the reason for this exception.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator
    2023-08-16T17:12:17.94+00:00

    Hello Sujith Siddireddy,

    Welcome to the Microsoft Q&A forum.

    Per the error message "Failure to initialize configuration for storage account suwethastriim.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key"

    it seems that there is an issue with the configuration of the storage account suwethastriim.dfs.core.windows.net and an invalid configuration value was detected for fs.azure.account.key.

    Root cause:

    Required configurations are not set correctly in cluster config. meaning the storage account key is incorrectly set in the Databricks configuration.

    Solution:

    You can check the configuration by going to the Databricks workspace, selecting the cluster you are using, and then selecting the "Advanced Options" tab and update the configurations.

    User's image

    If the storage account key is set correctly, there may be an issue with the account itself.

    In this case, you can create a new storage account and try again.

    Can you check if the storage account is correctly referenced in the data bricks workspace using this doc:https://docs.databricks.com/en/storage/azure-storage.html

    config examples:

    Gen2 Service Principal Auth:

    fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net OAuth
    fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
    fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net <application-id>
    fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net <service-credential>
    fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net https://login.microsoftonline.com/<directory-id>/oauth2/token
    

    Azure Blob Storage:

    # Using an account access key
    spark.hadoop.fs.azure.account.key.<storage-account-name>.blob.core.windows.net <storage-account-access-key>
    
    # Using a SAS token
    spark.hadoop.fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net <complete-query-string-of-sas-for-the-container>
    

    https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal

    I hope this helps. Please let me know if you have any further questions.

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

    0 comments No comments

  2. Sujith Siddireddy 0 Reputation points
    2023-08-16T18:49:06.3466667+00:00

    Hi @Bhargava Gunnam ,

    Thank you for the quick response.

    The solution mentioned in the above response is pertaining to All Purpose Compute clusters.
    But the issue we are facing is in Serverless SQL Warehouses

    Screenshot 2023-08-16 at 23.56.40

    We do not have provision to modify spark config in SQL Warehouse
    Screenshot 2023-08-17 at 00.01.02

    Instead of performing a static configuration of storage account keys in the spark config, we supply the account key with every copy command as mentioned in the below doc.
    https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-copy-into#syntax

    It was working fine ever since we were using All Purpose Compute clusters.
    Once we switched to Serverless SQL Warehouses, it started failing.

    We are actually interested in known what is the root cause here.
    The same account keys and copy command are working fine in All Purpose Compute clusters.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.