Azure Datafactory , multi level complex csv structure

Sven Peeters 66 Reputation points
2022-02-07T07:35:18.337+00:00

Hi,

We have to deliver a rather complex csv structure and we would like to use Data factory for this.
The structure has multiple levels with a global header and trailer+ a subheader (per topic) and it's detail lines.
The first column defines which type of line it is. I've simplified the real format just to highlight the questions I have.

HEADER - common data like export date and number sequence
SUBHEADER - topic name 1
DETAIL - detail line of above topic
DETAIL - detail line of above topic
DETAIL - detail line of above topic
SUBHEADER - topic name 2
DETAIL - detail line of above topic
DETAIL - detail line of above topic
DETAIL - detail line of above topic
TRAILER - A closing line with total linecount

The source data would be the detail lines + the topic name.

There are 2 problems I'm unable to solve :

  1. How do I convert the source data into the complex SUBHEADER + DETAIL format. To be honest no clue on how to approach this.
  2. Is there a way to add the global header + trailer with total linecount via Datafactory? An alternative would be doing this with an azure function.

All suggestions are welcome ...

Regards,
Sven Peeters

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,723 questions
No comments
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 36,022 Reputation points Microsoft Employee
    2022-02-08T19:16:29.62+00:00

    Hello @Sven Peeters ,

    Thanks for the question and using MS Q&A platform.

    My understanding is that your CSV file structure is complex (the structure seems similar to EDI X12 format) and you would like to know if it is possible to read the CSV data and do transformation on data to achieve the desired data format.

    To better assist on your query, I would like to get more understanding of the data structure as below:

    1. If DETAIL records will be a fixed number or it varies by SUBHEADER
    2. Is there a common naming for SUBHEADER or it is unique for each SUBHEADER

    I'm not sure at this moment if it will be possible with ADF, but definitely this will be a complex implementation.

    Would request you to please share a sample source file (with dummy data) and the expected destination sink file which would help me to give a try on my end.

    But for sure writing your own code in Azure Function would be the better approach for this requirement.

    Will look forward to your response (sample files)

    Thank you

    3 people found this answer helpful.