Error when running notebook code with pyspark in a virtual machine

Michael C 0 Reputation points
2024-01-24T23:13:09.0766667+00:00

I am running code that uses pyspark to access blob storage files. The code works in Azure Notebook with a serverless spark instance. When I create a virtual machine and run the code in a python script, the code does not work. One error I receive is:

24/01/24 22:57:54 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: wasbs://*****@tdp.blob.core.windows.net/41/4827/.java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem$Secure not found

How can I run a spark session that accesses blob storage in an Azure virtual machine?

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,013 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Silvia Wibowo 6,041 Reputation points Microsoft Employee Volunteer Moderator
    2024-01-28T22:02:08.1066667+00:00

    Hi @Michael C, I think you need to copy azure-storage and hadoop-azure jars locally, like described in the answer of this question: https://stackoverflow.com/questions/38254771/spark-shell-error-no-filesystem-for-scheme-wasb

    Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.


  2. Prrudram-MSFT 28,201 Reputation points Moderator
    2024-02-09T00:49:02.0933333+00:00

    Hi MichaelCampbell-7305

    Error: WARN fs.FileSystem: Failed to initialize filesystem wasb:///: java.lang.IllegalArgumentException: Cannot initialize WASB file system, URI authority not recognized. -ls: Cannot initialize WASB file system, URI authority not recognized.

    This error occurs because the hadoop fs command is not able to recognize the wasb protocol. To fix this, you need to add the hadoop-azure JAR file to the classpath of your Hadoop installation. Adding high level steps:

    Download the hadoop-azure JAR file from the Apache Hadoop website. Make sure to download the version that matches the Hadoop version you are using.
    Copy the JAR file to the lib directory of your Hadoop installation. For example, if your Hadoop installation is located at /usr/local/hadoop, you can copy the JAR file to /usr/local/hadoop/lib.
    Set the HADOOP_CLASSPATH environment variable to include the path to the hadoop-azure JAR file.
    Reload your .bashrc file by running the following command. This will make the HADOOP_CLASSPATH environment variable available in your current terminal session.
    Try running the hadoop fs command again.

    If you still get an error, I recommend checking with support.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.