可以查看一下pyspark和java的版本是否相匹配,或者环境变量是不是没有配置好
Java gateway process exited before sending its port number when setting spark config
Matthieu Marshall
6
Reputation points
Hello
I would appreciate it if someone would be able to point me in the right direction for an error I am seeing. I am trying to setup pyspark on a Azure DevOps build agent to use abfs to connect to an Azure Blob Storage container.
The error I am getting is below:
tests/unit/test_abfs_read_write.py:11: in <module>
from rdslm_common import spark
rdslm_common/__init__.py:3: in <module>
spark = get_spark()
rdslm_common/__main__.py:8: in get_spark
spark_session = SparkSession.builder.config(
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/pyspark/sql/session.py:269: in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/pyspark/context.py:483: in getOrCreate
SparkContext(conf=conf or SparkConf())
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/pyspark/context.py:195: in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/pyspark/context.py:417: in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/pyspark/java_gateway.py:106: in launch_gateway
raise RuntimeError("Java gateway process exited before sending its port number")
E RuntimeError: Java gateway process exited before sending its port number
The hidden line on line 8 of rdslm_common/main.py is:
SparkSession.builder.config("spark.jars.packages",f"org.apache.hadoop:hadoop-azure:3.3.1,com.databricks:spark-xml_2.12:0.15.0").getOrCreate()
Does anyone know what the cause could be?
When I run the same code locally on my machine, it works fine.