Convert files(.xls, .jpeg, .pdf etc) to base64 with Azure Data Factory

GrigoropoulosMichail-7458 21 Reputation points
2022-09-28T23:30:54.68+00:00

Hi,

I have a blob storage that contains folders with files of different format. I would like to get the meta data from the files, convert the files to base64 format and then store the meta data and the base64 string into a table which I will later synch into Salesforce. Is this possible to be done with Azure Data Factory?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,663 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,156 Reputation points
    2022-09-29T17:37:54.683+00:00

    Hello @GrigoropoulosMichail-7458 ,
    Thanks for the question and using MS Q&A platform.

    I would like to break you ask up into several elements:

    1. Get the metadata for a variety of formats in blob storage
    2. Apply a transformation to binary format, specifically jpeg or pdf
    3. Apply transformation to file as a whole, as is, as opposed to the data inside.

    Data Factory absolutely can get metadata about your files, no matter the format. 1. is totally doable

    Data Factory does not include image manipulation tools, and does not do more than move or compress / uncompress that type of unstructured data. 2. cannot be done

    Data Flow has a toBase64 function, but this applies transformation to the data contained within a format to write to another format, rather than a whole-cloth transformation of files. 3. isn't that doable

    However, your asks may be possible in Azure Synapse Analytics. Synapse has spark notebooks which allow a much more free-form workload. For example you could find a library / module for the base64 conversion to employ in the notebook. You could tell the workbook to load the file, do the transformation, and write back to blob. By treating the file as a bytestream rather than a specific file type allow you to treat all the files the same way. This does require some level of comfort with writing code.
    Azure Synapse also contains functionality of Data Factory, letting you take care of the other job parts as well.

    There may be other Azure services also capable of better meeting your needs.

    Please do let me if you have any queries.

    Thanks
    Martin


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.