Update/Transform Headerless CSV Source

Nestor Fregoso 0 Reputation points
2025-03-10T23:28:23.6066667+00:00

I am attempting to create a new dataflow to sink into a SQL table from a CSV source. My csv is missing it's header row though, so I'm stumbling on how to map the columns. It is delimited via pipe ('|').

Previous answers I have found reference a Headers 'Source Setting' that is not present in the current UI so those felt like dead ends.

Adding a DerivedColumn step after the source seems like the right call, but the examples I found use data with headers for their expression building and I wasn't able to figure out how to extrapolate that logic from there.

What are some guides or sources to achieve one of the following:

A) Insert a header row to the csv as part of the pipeline or data flow process?

B) Derive columns via the delimiter when the csv is missing the header row?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

3 answers

Sort by: Most helpful
  1. Q&A Assist
    2025-03-10T23:28:23.6333333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To handle a headerless CSV source in Azure Data Factory, you can consider the following approaches:

    A) Insert a header row: While there isn't a direct method to add a header row during the pipeline execution, you can create a new dataset that includes the desired headers and then use a combination of transformations to map the data from your headerless CSV to this new dataset. You may need to use a derived column transformation to assign names to the columns based on their positions.

    B) Derive columns via the delimiter: You can use the derived column transformation to create new columns based on the delimiter. Since your CSV is delimited by a pipe ('|'), you can use string functions to split the data into separate columns. For example, you can use the split function in the derived column expression to extract values based on the pipe delimiter.

    For more detailed guidance, you can refer to the Azure Data Factory documentation on data flows and derived columns.


    References:


  2. Anonymous
    2025-03-11T07:57:21.7066667+00:00

    @Nestor Fregoso

    To achieve a similar result in Azure Data Factory's expression builder, you can use the split function to handle your pipe-delimited data. Here's how you can do it:

    1. Import the CSV as a Single Column: Set the delimiter to none so that the entire row is treated as a single column.
    2. Use the Derived Column Transformation: Add a derived column transformation to split the single column into multiple columns. Column patterns

    Here's an example of how you can use the split function in the expression builder:

    split(columnName, '|')[0]  // For the first value
    split(columnName, '|')[1]  // For the second value
    split(columnName, '|')[2]  // For the third value
    

    In your derived column transformation, you would create new columns and use the split function to extract the values based on the pipe delimiter.

    For more detailed information on expressions and functions in Azure Data Factory, please refer to the official documentation

    0 comments No comments

  3. Anonymous
    2025-03-14T08:55:24.05+00:00

    Hello @Nestor Fregoso, Glad that you have figured out a resolution for your query. You can consider an alternate approach below which can be used dynamically by using header file.

    B) Derive columns via the delimiter when the csv is missing the header row?

    If you can rename the column names manually in the dataflow, you can directly change the Column delimiter to | in the source dataset.

    enter image description here

    Now, import the projection in the dataflow source and it will automatically assign the default column names Column_1,Column_2, Column_3,..etc. After the source, use derived column transformation to rename the above columns to your required names or use the sink map to map the columns to correct sink columns.

    A) Insert a header row to the csv as part of the pipeline or data flow process?

    You can try the below workaround using a header csv file.

    Take a csv file with required headers as below sample.

    
    Name|FullName|Age
    
    

    For sample, I took the input data as below.

    
    row1col1|row2col2|24
    
    row2col1|row2col2|26
    
    row3col1|row3col2|19
    
    

    Create another Delimited text dataset with same column delimiter and enable the First row as a header.

    enter image description here

    Take this dataset as another source and add a Union transformation to this. Include the Header less dataset as another stream to the Union transformation and select Union by position option in this.

    enter image description here

    Both Header row and the dataset rows will be merged, and you can go ahead with your sink from this.

    enter image description here

    NOTE: In both scenarios, you need to change the data types of the generated columns as per your sink data types.

    Hope this helps.

    If the answer is helpful, please click Accept Answer and kindly upvote it. If you have any further questions about this answer, please click Comment.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.