Azure Batch Service - splitting or partitioning resource files

Question

hi,

Is there a way to split or partition Azure Batch Service's Resource Files (input files) so that the burden of input file processing is load-balanced and distributed among all available compute nodes? For instance, if I have an input file with 100k rows, and my Azure Batch has 4 compute nodes, is it possible to split this into 25k row for each node?

If this partition option is available, how do I set this feature up? Should that be programmatically done? or should it be done through Azure Batch Service settings?

thank you

Answer

@etl2016-6749 Thank you for your question!!!

There are few ways partitioning of job can be done.

Partition of job is handled in the application side. So application divide the file into 4 jobs and keep track of progress of all 4 jobs and then can perform the reduce/aggregation operation on all 4 jobs.
This is easier to handle at application side but difficult to scale in case there are multiple such jobs running simultaneously.
You can perform the partitioning operation in the batch service also (as custom job). This job then can spawn 4 jobs and 1 reduce/aggregation job which is scheduled after completion of 4 jobs. (Dependent jobs are supported in batch service).

This is easier to scale but would require handling in application side as the initial job would be complete but sub-jobs are still running.

Hope this helps.

Please 'Accept as answer' if the provided information is helpful, so that it can help others in the community looking for help on similar topics.

Share via

Azure Batch Service - splitting or partitioning resource files

1 answer

Your answer