I need to understand few capabilities of Azure Data Factory

Avishek Chowdhury 61 Reputation points
2022-08-23T05:20:27.837+00:00

Hi All,
We are in the comparison phase of ADF vs Spring Batch.
Below are the capabilities we are trying to explore in Azure data factory:

1.Can it work with File as a source and API as destination? Meaning read from files, call api after massaging the data.(I know it can read from BLOB file storage, but API as destination?)
2.Let's say the answer to the above question is yes, after it has made the api call(s) successfully and based on the api response, some calls might be success, some are failure(not tech failure, but business validation failure), can it create segregated Success and Failure Reports/Files?(i.e., the source file has 10 records, 5 of them failed,5 success ,create one success file with 5 rec and one failure file with 5 files)
3.Chunking Capabilities-
-Does ADF support Chunking?
-For example, let's say the source file contains 10k records, and I want to

  • Process 1000 records as Batch processing while reading and massaging,
    -Chunk it further into 100 records while sending to downstream API, basically chunking at every steps
    4.Does it also support low code basic transformations and validations, like if it is null/empty, replace with 0, something like that.
    5.Saving last processedrecord at source and starting from immediate next in the event of failure:
    -For example, there are 1000 records in a source file, ADF processed 200, it crashed for some reason, resolve it and when it started running again, it should start from record#201 rather than from 1 again.
    Also, is there any other solutions available in Azure which could potentially be a better fit candidate, if yes, why?

Thanks in advance.

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
301 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,539 questions
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2022-08-24T02:37:30.053+00:00

    Hello @Avishek Chowdhury ,
    Thanks for the question and using MS Q&A platform.

    Comments in line

    1.Can it work with File as a source and API as destination? Meaning read from files, call api after massaging the data.(I know it can read from BLOB file storage, but API as destination?)
    2.Let's say the answer to the above question is yes, after it has made the api call(s) successfully and based on the api response, some calls might be success, some are failure(not tech failure, but business validation failure), can it create segregated Success and Failure Reports/Files?(i.e., the source file has 10 records, 5 of them failed,5 success ,create one success file with 5 rec and one failure file with 5 files)

    Yes it can . You can use the mapping data flow , eg JSON file and then pass the json to a API . It looks like you should go through the vbideo from Mark . https://learn.microsoft.com/en-us/azure/data-factory/data-flow-external-call

    3.Chunking Capabilities-
    -Does ADF support Chunking?
    -For example, let's say the source file contains 10k records, and I want to

    • Process 1000 records as Batch processing while reading and massaging,
      -Chunk it further into 100 records while sending to downstream API, basically chunking at every steps

    I think we will have to make some pipelines to implament this .

    4.Does it also support low code basic transformations and validations, like if it is null/empty, replace with 0, something like that.

    Yes its does . https://learn.microsoft.com/en-us/azure/data-factory/data-flow-transformation-overview

    5.Saving last processedrecord at source and starting from immediate next in the event of failure:
    -For example, there are 1000 records in a source file, ADF processed 200, it crashed for some reason, resolve it and when it started running again, it should start from record#201 rather than from 1 again.

    The data processing is initiated by Triggers and yes if the triggers fails after some X records , you have the option to start from the last failure point

    Also, is there any other solutions available in Azure which could potentially be a better fit candidate, if yes, why?

    ADF / Mapping data flow does take care of the many ask . But may Azure function is also a worth a explore .

    Please do let me if you have any queries.
    Thanks
    Himanshu


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful