Azure Data Factory Search and Replace tokens in CSV file

Andrew Macinnes 6 Reputation points
2022-05-26T13:49:54.863+00:00

Hi,

I have a csv file that has over 40 columns. In these fields there can be but not always, one of 10+ tokens that I want to replace.

What is the best way to search through the whole file and replace those tokens with another value? Each token has a different value that I want to replace it with.

I know I can use the replace function on a derived column schema modifier to search and replace for a string on a column by column basis (See below). This will only allow me to search for one token at a time when I need to search for 10+ tokens and replace with 10+ values.

I also don't want to have to have to create over 40 entries in the derived column schema modifier and I imagine there is a better way to achieve what i am looking for?

205820-image.png

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,639 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Andrew Macinnes 6 Reputation points
    2022-06-01T13:33:40.507+00:00

    To give a bit of context, I am looking to find and replace tokens representing special characters such as carriage returns and line feeds. An example is
    [<000010>} which represents a line feed.

    What I have done and it may not be the most efficient method, is to process each column using multiple derived columns to do multiple passes of replacing strings. This allows me to replace multiple different tokens appearing in individual columns/cells.
    I use a column pattern to process all columns of type string and do a string replace.

    It wont let me upload any images but my solution it similar to the answer in the link below.

    https://stackoverflow.com/questions/72393036/azure-data-factory-search-and-replace-tokens-in-csv-file

    The difference being my case statement is something like :-

    case( like($$, '%[<000013>]%'),replace($$,'[<000013>]','\r'),
    like($$, '%[<000010>]%'),replace($$,'[<000010>]','\n'),
    like($$, '%[<000034>]%'),replace($$,'[<000034>]','BBBBBB'),
    .
    .
    like($$, '%[<000039>]%'),replace($$,'[<000039>]','CCCCCC'), $$)


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.