Azure Data Factory Copy Activity with Base64 encoded string

Yu Geng 26 Reputation points
2020-10-28T15:10:06.797+00:00

Good afternoon.

We are using Copy activity in ADF to extract data from the EventLogFile table in Salesforce into Azure Blog Storage as CSV.

There is an attribute called "LogFile" in EventLogFile table containing an encoded base64 string.

The content of it looks like "IkVWRU5UX1RZUEUiLCJUSU1FU1RBTVAiLCJSRVFVRVNU..."

However, after running the Copy Activity, the contents has become "0x224556454E545F545950..." in the CSV due to the Byte[] type mapping from Source to Sink.

If I am using Data Flow to read the CSV, how could I change the content "0x224556454E545F545950..." back to "IkVWRU5UX1RZUEUiLCJUSU1FU1RBTVAiLCJSRVFVRVN..." in ADF?

So that I can call FromBase64 function on "IkVWRU5UX1RZUEUiLCJUSU1FU1RBTVAiLCJSRVFVRVN..." to decode the real content out.

Kind regards

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,305 questions
No comments
1 vote

Accepted answer
  1. MartinJaffer-MSFT 24,091 Reputation points Microsoft Employee
    2020-10-28T20:39:30.407+00:00

    Hello @Yu Geng and welcome to Microsoft Q&A, and thank you for sharing the situation. I did much experimentation, but now I have an answer.

    To reverse the conversion (0x224556454E545F545950 -> "IkVWRU5UX1R...)

    unhex(ltrim(number,'0x'))  
    

    Also, I discovered you may not need to use the fromBase64 when I tried:

    toString(unhex(ltrim(number,'0x')))  
    

    I saw readable text. See the below screenshot. I think the text is truncated because the hex value you shared is much shorter than the base64 value you shared. Base64 is more information-dense than hex notation on a per-character bases.

    35816-image.png

    Also, it may be possible to alter the Copy Activity mapping or dataset schemas so the contents do not get converted to hex. I do not have a Salesforce environment, so I cannot tell you exactly which to change, but I think it is worth trying. The change would be to type string. Probably on sink, but could be both.

    No comments

1 additional answer

Sort by: Most helpful
  1. Yu Geng 26 Reputation points
    2020-10-28T22:43:16.903+00:00

    Thanks MartinJaffer-MSFT.

    Regarding changing the mapping of Copy Activity to use String instead of byte[], I did look into it, but it seems there is no option to change it in Copy Activity. Once the schema is imported, the byte[] can't be changed.

    I just tried using toString(unhex(ltrim(number,'0x'))) approach and it works really well. Thanks for your help.

    The final decoded content are below in the LogFile column of EventLogFile table.

    "\"EVENT_TYPE\",\"TIMESTAMP\",\"REQUEST_ID\"
    \"API\",\"20201027230545.986\",\"abc_vX2-p_1R-\"
    \"API\",\"20201027230546.144\",\"abcdAX2-paZs-\"
    \"API\",\"20201027230546.323\",\"abc2-qM4x-\"
    

    Would you mind suggest what is the best to set up the Sink to save the content as a CSV file in the Blob Storage?

    I tried using Azure Blob Dataset of DelimitedText, but the final CSV always have the header. Is it possible to omit the header (LogFile)?

    LogFile
    "\"EVENT_TYPE\",\"TIMESTAMP\",\"REQUEST_ID\"
    \"API\",\"20201027230545.986\",\"abc_vX2-p_1R-\"
    \"API\",\"20201027230546.144\",\"abcdAX2-paZs-\"
    \"API\",\"20201027230546.323\",\"abc2-qM4x-\"