Unable to write dataframe into hive table

2021-03-03T08:11:57.007+00:00

Team , We are using Hive Interactive cluster and Spark cluster . We have done the LLAP related configuration on Spark cluster . Now both the cluster are interacting each other without any issues. I tried to load dataset (adl gen2 filesystem) into hive table like below from spark-shell.

import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()
val df = spark.read.csv("abfs://<adl_container>@<adlgen2_instance>.dfs.core.windows.net/data/employee") df1.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector").mode("append").option("table","hwctesttbl").save()

and we are getting the error like , 21/03/03 07:26:20 WARN FileStreamSink [main]: Error while looking for metadata directory. org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "adfs" at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:547) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545) at org.apache.spark.sql.execution.datasources.DataSource.r

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
198 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. 2021-03-04T09:06:55.533+00:00

    This is a configuration issue and fixed .There is a typo on setting the staging property . We have updated with the proper value and it is working fine now .