Read varchar(max) field from sql server DB and export to datalake as actual document

rajanisqldev-42 221 Reputation points
2025-06-09T07:32:23.8233333+00:00

Hi,

I have a table in sql server with the below fields.

ActivityId, DocumentBody(images/documents), filename, MimeType(application/octet-stream)

I need to read this table and export to the blob container as an image/document depending on the filetype.

I am happy with ADF or Fabric Pipelines or any method.

Regards

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator
    2025-06-09T08:40:01.07+00:00

    Hi @rajanisqldev-42 You can handle this using either Azure Data Factory (ADF) or Microsoft Fabric Pipelines, by exporting the varchar(max) field (which should ideally contain Base64 or VARBINARY(MAX) data) to Azure Blob Storage or Data Lake as individual files.

    Here’s an approach using ADF:

    Linked Services: Set up connections to both your SQL Server and Blob Storage. If your SQL Server is on-premises, make sure to use a Self-hosted Integration Runtime.

    Datasets:

    • Source: A SQL dataset that includes DocumentBody, Filename, and MimeType.
    • Sink: A binary dataset pointing to your target container in Blob or Data Lake.

    Copy Activity:

    • Map the DocumentBody column as binary content.
    • Use the Filename column to dynamically name your files. For example:
        
        @concat(item().Filename, '.pdf')
      
      (You can also adjust the extension based on the MimeType if needed.)

    Note: If you have different file types (such as PDFs, images, etc.), you can use an If Condition or Switch activity to set the correct file extension or path based on the MimeType value.

    Also, if the content is stored as Base64 rather than binary, you’ll need to decode it before writing it out. This can be done using either an Azure Function to convert Base64 to binary, or a Mapping Data Flow with a Derived Column for inline decoding.


    Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

  2. Alex Burlachenko 9,780 Reputation points
    2025-06-09T09:15:03.59+00:00

    hi there rajanisqldev-42,

    thanks for posting this, its a cool challenge )) for microsoft stack, u can use azure data factory or fabric pipelines to handle this. in ADF, set up a copy activity that reads the sql table. make sure u map the 'documentbody' column as a binary source. then, sink it into a blob container with dynamic filename based on the 'filename' column. ADF handles binary data pretty well, just check the 'binary copy' option in the sink settings. https://docs.microsoft.com/en-us/azure/data-factory/format-binary fabric pipelines can do this too, but ADF might be simpler if u're already in that ecosystem.

    This might help in other tools too... if u're using python or something, u can read the varchar(max) as bytes and dump it directly to blob storage. Libraries like pyodbc for sql and azure_storage_blob for the upload part will work. just make sure u set the content type based on the 'mimetype' column when uploading ))

    worth looking into the 'mimetype' field to ensure files open correctly. Sometimes 'application/octet-stream' is too generic, so u might wanna adjust it based on the actual file type.

    aha, and dont forget to test with a small batch first, happy exporting :))))))))

    rgds,

    Alex

    and "yes" if you would follow me at Q&A - personaly thx.
    P.S. If my answer help to you, please Accept my answer
    PPS That is my Answer and not a Comment
    

    https://ctrlaltdel.blog/

    0 comments No comments

  3. rajanisqldev-42 221 Reputation points
    2025-06-11T09:08:18.54+00:00

    Hi Venkat Reddy / Alex,

    I am trying your suggestion in Fabric pipeline.

    SELECT [Body], [FileName] FROM table as a source for Copy data activity.

    How can I choose [Body] as binary and the properties for the sink?

    I have set the Destination, format as binary. I amgetting error that the source should be binary.

    Regards


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.