I am trying to read and process avro files from ADLS using a Spark pool notebook in Azure Synapse Analytics. The avro files are capture files produced by eventhub.
When I run df = spark.read.format("avro").load(<file path>) as I would in databricks, I get the following error:
"
AnalysisException : 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'
Traceback (most recent call last):
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 166, in load
return self._df(self._jreader.load(path))
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'
"
I have also tried creating a "dataset" with a linked service but no luck with that either.
I have tried adding spark-avro_2.12 as a package but I can't seem to install it, I can only install python packages to my spark pool.
Is there currently a way to read avro files within synapse analytics? If not, are there plans to have avro read capabilities built-in in the near future? What are other methods I can use to read avro for the time being?
Any and all help is much appreciated, thank you!