FileNotFoundException when accessing abfss on databricks

Question

FileNotFoundException when accessing abfss on databricks

Millet, Aymeric (KACES) 0

Hello,

I have a strange behavior that when checking existence of a directory on an azure storage using the abfss connector.

I am using the sample code below:

import org.apache.hadoop.fs._
import org.apache.hadoop.conf.Configuration

val dir = "abfss://<my_container>@<my_storage_account>.dfs.core.windows.net/data/fmdp/stream-service/output/"

val path = new Path(dir)
val fs2 = path.getFileSystem(spark.sparkContext.hadoopConfiguration) 
fs2.listStatus(path)    <-- This is OK

val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
fs.listStatus(path)     <-- This is KO (throw FileNotFoundException)

Spark configuration to access the Azure storage is set correctly and "fs2.listStatus(path)" returns a correct value.

But "fs.listStatus(path)" is throwing a FileNotFoundException (error is "FileNotFoundException: File /1140974070106206/data/fmdp/stream-service/output does not exist." and path in the error message is not correct - abfss://<my_container>@<my_storage_account>.dfs.core.windows.net replaced by /1140974070106206) ?

Why ? what is the difference between "FileSystem.get" and "path.getFileSystem" ?

One is working and not the other !!!!

Unfortunatly, we are trying to port existing Spark java code to databricks and this code is using FileSystem.get and it is failing to check the existence of files/directories !!! I try to understand why.

Regards

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-02-16T23:57:35.3233333+00:00

@Millet, Aymeric (KACES) Just checking in to see if the below information was helpful. If it answers your query, please do click Accept Answer and Yes for "was this answer helpful", as it might be beneficial to other community members reading this thread. If you have any further query, do let us know.

Thank you

1 answer

Your answer

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-02-16T23:57:35.3233333+00:00

@Millet, Aymeric (KACES) Just checking in to see if the below information was helpful. If it answers your query, please do click Accept Answer and Yes for "was this answer helpful", as it might be beneficial to other community members reading this thread. If you have any further query, do let us know.

Thank you

Answer 1

Hi Millet, Aymeric (KACES),

Welcome to Microsoft Q&A forum and thanks for posting your query.

The difference between the two methods is that "FileSystem.get" uses the default configuration (static) to access the Azure storage, while "path.getFileSystem" uses the configuration (custom) specified in the path object.

In your case, it seems like the default configuration is not set correctly, which is why it is throwing a FileNotFoundException. To resolve the issue, please make sure that the default configuration is set correctly to access the Azure storage. If the default configuration is not set correctly, you may encounter errors such as FileNotFoundException. It is recommended to use "path.getFileSystem" instead of "FileSystem.get" when working with the abfss connector, as it uses the configuration specified in the "spark.sparkContext.hadoopConfiguration" object. This ensures that the correct configuration is used to access the Azure storage.

Given that the path.getFileSystem() method is working and the FileSystem.get() is not, it is likely that the issue is due to the configuration settings used. You can try to fix this issue by setting the same or similar configuration settings for both methods. (Please try to use the path settings as your default settings and see if that helps to resolve the issue)

For more helpful info, please refer to this SO thread: Hadoop: Path.getFileSystem vs FileSystem.get

Hope this info helps.

Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

Share via

FileNotFoundException when accessing abfss on databricks

1 answer

Your answer