How do I use flatten for a XML file which has schema drift enabled.

Kalim, Yarak 25 Reputation points
2023-07-20T13:09:45.1+00:00

I've a complex XML which contains data in arrays. I've to load this data to Snowflake. There are some conditions that I've to keep in mind, mentioned below.

Conditions:

  1. XML file schema will change very often.
  2. Some of the columns in the XML may come someday and may not on the other days. Entire node will be absent if no data is to be sent for that column.
  3. Some columns name don't match with the Sink table column names
  4. There will be some column names which will be repeated in the XML file. Want to remove some column explicitly.
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,553 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,426 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sahil kumar 0 Reputation points
    2023-07-20T13:53:55.7766667+00:00

    you may change the name of column


  2. AnnuKumari-MSFT 34,451 Reputation points Microsoft Employee
    2023-08-07T09:10:39.81+00:00

    Hi Kalim, Yarak ,

    Apologies for delay in response. I tried to repro your scenario. The fact that you want to handle dynamic schema for xml after flattening it is making it complex to achieve.

    Here is a clear explanation on how to handle dynamic schema using Rule Based Mapping in dataflow . However, the very first thing we need to take care is not to import schema for one file as we have to handle multiple files. In your case, if we do not import schema/projection, it wouldn't be able to get flattened as flatten transformation. So, it's contrasting.

    Probably, you need to handle it programmatically using your own custom code.

    Hope it helps. Kindly revert back in case of any concerns. Thankyou


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.