SFTP/FTP source location and Azure Fuctions

Samy Abdul 3,366 Reputation points
2021-09-11T09:26:59.577+00:00

Hi All, My source sits on FTP/SFTP which are quite a few Excel data sets in a Zip folder, what I want to do is use Azure Functions and

convert them in to CSV format and store in Blob Storage and ultimately copy in to sink ADLS. My question is can I use Azure functions to

do this task and the second question is let's say in case I want to copy as is that is Excel files by using Azure Functions in to Blob and I am

using ADF for both the scenarios to copy from Blob to ADLS . Please provide the assistance. Thanks a lot.

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,249 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,427 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,526 questions
0 comments No comments
{count} votes

Accepted answer
  1. Pramod Valavala 20,516 Reputation points Microsoft Employee
    2021-09-13T07:53:04.477+00:00

    @Samy Abdul Azure Functions should be a good fit here for your use case.

    1. You could use a Timer Trigger to periodically check for new files in SFTP by keeping track of ones already processed or even manually trigger via HTTP as per your requirement.
    2. Depending on the language, read the ZIP file. For C#, you can use the built-in System.IO.Compression.ZipFile APIs.
    3. Then you could use an SDK to work with the excel files like Open XML SDK for Office for C#.
    4. And finally leverage Blob Output Binding to store into Blob Storage. Also, with multi-protocol support, the bindings can also directly output to a hierarchical namespace as well.

    Depending on how big the files, you could consider splitting this into multiple functions as well. One for extracting from the ZIP file and output them temporarily into blob storage and have a second one that picks up the XLSX files for processing and outputting to blob storage. This approach will also parallelize the processing instead of going over each file one by one.

    You could even consider using Durable Functions to create a more organized workflow that can periodically check for new files and fan-out processing for each zip file and excel files within each.


0 additional answers

Sort by: Most helpful