"Error importing delta-spark package in Azure Synapse Notebook: 'ModuleNotFoundError: No module named pyspark.errors'"

Stephen01 125 Reputation points
2024-12-11T08:02:20.1266667+00:00

I am trying to use the delta-spark package in an Azure Synapse notebook, but I encounter the following error:

ModuleNotFoundError: No module named 'pyspark.errors'

The delta-spark package is already installed, along with pyspark. Other imports from pyspark (e.g., pyspark.sql.functions) work fine, but importing DeltaTable fails with the above error. The same code and package versions work correctly in a Databricks notebook.

I also tried using findspark, but it didn’t help (likely because pyspark itself is importing correctly).

 

%pip install delta-spark

Output:

Requirement already satisfied: delta-spark in /path/to/lib (3.0.0)  

Requirement already satisfied: pyspark<3.6.0,>=3.5.0 in /path/to/lib (from delta-spark) (3.5.0)

 

import pyspark.sql.functions as F

from pyspark.sql import Window

from delta.tables import DeltaTable

Error:

ModuleNotFoundError: No module named 'pyspark.errors'

 

Environment Details:

•	Spark pool settings:

◦	SPARK_HOME -> /opt/spark

◦	PYTHONPATH -> /opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-0.10.7

Has anyone encountered this issue? Is there a workaround or specific configuration required for Azure Synapse to use the delta-spark package?

Any guidance would be appreciated.

Thank you !

 

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

Accepted answer
  1. Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator
    2024-12-11T16:54:05.5933333+00:00

    Hi @Stephen-5002
    Welcome to Microsoft Q&A platform and thanks for posting your query here.

    In Azure Synapse Notebooks, you can use the command from delta.tables import DeltaTable without installing the delta-spark package.

    Installing delta-spark appears to introduce conflicts with the native Delta package in Synapse, which likely causes the ModuleNotFoundError: No module named 'pyspark.errors'. This issue seems specific to how Synapse handles dependencies.

    To avoid this error, refrain from installing delta-spark via %pip install. Instead, you can directly use Delta Lake functionality that is natively supported in Synapse, as shown below:

    User's image

    This should work without additional installations in Synapse. Let me know if you encounter further issues or need additional clarification!

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.