Parse XML with azure synapse spark pool

Ahmed ELHOULA 11 Reputation points
2022-10-20T16:04:16.853+00:00

Hello

I'm looking to parse an XML file with azure synapse spark pool
i ve installed the librairies for XML but i don' know how to call it into my python code

thanks for help

252581-image.png

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,346 questions
0 comments No comments
{count} vote

2 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,011 Reputation points
    2022-10-24T06:04:58.993+00:00

    Hello @Ahmed ELHOULA and welcome to Microsoft Q&A

    From the screenshot it looks like you tried to import a scala library while doing python. That's a bit mix and match. Try removing the "import scala.xml" line. You already have some other stuff going on.

    There are several routes you could take. Depends what you want to do. Is your data relatively flat, or is it super deep heirarchical? Do you want it in a pandas dataframe, or do you want to process it in a specific way?

    The "import xml" line is not enought if you want to use just python to read your xml. You will either want xml.parsers.expat or xml.etree.ElementTree. Neither of these will accept "abfss" paths, so you will need to first mount the blob so they can pretend it is normal file.

    Take a look at how-to-convert-an-xml-file-to-nice-pandas-dataframe , pandas.read_xml.html ,

    1 person found this answer helpful.
    0 comments No comments

  2. Ahmed ELHOULA 11 Reputation points
    2022-10-25T08:23:46.827+00:00

    Hello @MartinJaffer-MSFT

    Thanks for your reply
    I wan to flatten my xml file and put int into a dataframe

    1 person found this answer helpful.
    0 comments No comments