Excel or CSV file to XML - ADF

Santhi Dhanuskodi 325 Reputation points
2024-02-26T06:42:22.1933333+00:00

Hi, I want to convert excel/csv file into XML format. This xml is used as a body text for a rest api.

Basically i need to send excel data to a REST API using ADF. Rest API accepts xml format. excel file is present in storage container.

What are the possible ways to achieve this? Python script or data flow has inbuilt support excel to xml conversion? any other ways? If using python script, what are the dependancies for executing python script? what are the infra requirements for the same? From where this script will run and how?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
Microsoft 365 and Office SharePoint For business Windows
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 34,551 Reputation points Microsoft Employee Moderator
    2024-02-26T11:06:12.8033333+00:00

    @Santhi Dhanuskodi

    Thanks for using Microsoft Q&A platform and thanks for posting your query here.

    I understand that you want to convert excel/csv to XML using ADF.

    • You can use Azure Function to write python code convert the Excel/CSV file to XML format and run the azure function from ADF pipeline.

    Kindly refer to the below resources:

        import pandas as pd
        account_key_value="your_storage_acc_key"
        df = pd.read_excel('abfs://******@jadls2.dfs.core.windows.net/xl_files/sample.xlsx', storage_options = {'account_key' : account_key_value})
        s_df=spark.createDataFrame(df)
        display(s_df)    
    

    Instead of account key, you can also use these options:

    storage_options = {'sas_token' : 'sas_token_value'}
    storage_options = {'connection_string' : 'connection_string_value'}
    storage_options = {'tenant_id': 'tenant_id_value', 'client_id' : 'client_id_value', 'client_secret': 'client_secret_value'}
    
    • To write data intot xml file:
        df = spark.read.format('xml').options(rowTag='book').load('books.xml')
        (df.select("author", "_id").write
          .options(rowTag='book', rootTag='books')
          .xml('newbooks.xml')
        )
        
    

    Here are the resources you can refer:

    https://learn.microsoft.com/en-us/azure/databricks/query/formats/xml

    https://stackoverflow.com/questions/76812820/how-to-read-excel-file-in-synapse-notebook-which-is-in-azure-datalake-gen-2-stor

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou


1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2024-02-26T09:44:40.4266667+00:00

    XML format is supported for the following connectors: Amazon S3, Amazon S3 Compatible Storage, Azure Blob, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure Files, File System, FTP, Google Cloud Storage, HDFS, HTTP, Oracle Cloud Storage and SFTP. It is supported as source but not sink. Consider exploring the "Transform XML" feature in Azure Logic Apps to see if it aligns with your needs :

    1. A guide on using Azure Logic Apps for importing CSV files to SQL Server: http://blogs.recneps.net/post/Using-Azure-Logic-Apps-to-Import-CSV-to-SQL-Server
    2. A discussion on Stack Overflow about converting CSV to XML in Logic Apps: https://stackoverflow.com/questions/61417713/how-to-convert-a-csv-to-xml-in-logic-apps
    3. A forum post on MSDN about converting CSV/Excel to XML in Logic Apps: https://social.msdn.microsoft.com/Forums/en-US/dfce78db-ebd6-41d9-ac7b-423a1f83f186/convert-csvexcell-to-xml-in-logic-app Additionally, it's strongly encouraged to support existing feature requests on the ADF User Voice forum by upvoting or commenting. This can help prioritize the development of features such as:
    • XML support at the sink side
    • Support for XML files for ADLS as a sink If you have the possibility to use python :
    import pandas as pd
    import xml.etree.ElementTree as ET
    
    # Load the Excel/CSV file
    df = pd.read_csv('path_to_your_file.csv')
    
    # Convert the DataFrame to XML format
    root = ET.Element("Root")
    for _, row in df.iterrows():
        record = ET.SubElement(root, "Record")
        for col in df.columns:
            child = ET.SubElement(record, col)
            child.text = str(row[col])
    
    # Convert the XML tree to a string
    xml_str = ET.tostring(root, encoding='utf8', method='xml')
    
    # Save or use the XML string as needed
    
    
    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.