Base64 format of pdf file from Azure Data Factory

Avinash Pai 20 Reputation points
2024-06-24T08:57:43.5733333+00:00

I am using Azure data factory to read a pdf file from a storage blob container and then convert it to base64 format using a Web activity. This is required to pass the base64 format to an API. Although the base64 value returned by the Web activity consists of random garbage text and does not match with the value from https://base64.guru/converter/encode/pdf

Base64 From ADF Web activity

User's image

Base64 from https://base64.guru/converter/encode/pdf

User's image

I have tried following Headers, also "Accept" : "application/octet-stream" but same end result.

User's image

Is there any other way to read a pdf via Data Factory.

Appreciate any help on this.

User's image

Regards,

Avinash

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

Accepted answer
  1. phemanth 15,755 Reputation points Microsoft External Staff Moderator
    2024-06-24T10:22:09.31+00:00

    @Avinash Pai

    Thanks for the question and using MS Q&A platform.

    I understand that you’re trying to read a PDF file from Azure Blob Storage and convert it to base64 format using Azure Data Factory (ADF). However, you’re encountering an issue where the base64 value returned by the Web activity doesn’t match the expected value.

    One possible reason for this issue is that the PDF file is not being read correctly from the blob storage. PDF files are binary files, and they need to be read in binary mode to ensure that the contents are not corrupted.

    Unfortunately, as of now, ADF does not support the PDF format. ADF can get metadata about your files, no matter the format, but it does not include image manipulation tools, and does not do more than move or compress/uncompress that type of unstructured data.

    However, there might be a workaround. Azure Synapse Analytics, which contains the functionality of Data Factory, allows for a much more free-form workload. For example, you could find a library/module for the base64 conversion to employ in a Spark notebook. You could tell the workbook to load the file, do the transformation, and write back to blob. This does require some level of comfort with writing code.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.