Persist job and task output

Straipsnis
06/13/2024

A task running in Azure Batch may produce output data when it runs. Task output data often needs to be stored for retrieval by other tasks in the job, the client application that executed the job, or both. Tasks write output data to the file system of a Batch compute node, but all data on the node is lost when it is reimaged or when the node leaves the pool. Tasks may also have a file retention period, after which files created by the task are deleted. For these reasons, it's important to persist task output that you'll need later to a data store such as Azure Storage.

For storage account options in Batch, see Batch accounts and Azure Storage accounts.

Some common examples of task output include:

Files created when the task processes input data.
Log files associated with task execution.

This article describes various options for persisting output data. You can persist output data from Batch tasks and jobs to Azure Storage, or other stores.

Options for persisting output

There are multiple ways to persist output data. Choose the best method for your scenario:

Use the Batch service API.
Use the Batch File Conventions library for .NET.
Use the Batch File Conventions library for C# and .NET applications.
Use the Batch File Conventions standard for languages other than .NET.
Use a custom file movement solution.

Batch service API

You can use the Batch service API to persist output data. Specify output files in Azure Storage for task data when you add a task to a job or add a collection of tasks to a job.

For more information, see Persist task data to Azure Storage with the Batch service API.

Batch File Conventions library

The Batch File Conventions standard is an optional set of conventions for naming task output files in Azure Storage. The standard provides naming conventions for a file's destination container and blob path, based on the names of the job and task.

It's optional to use the File Conventions standard for naming your output data files. You can choose to name the destination container and blob path instead. If you do use the File Conventions standard, then you can view your output files in the Azure portal.

If you're building a Batch solution with C# and .NET, you can use the Batch File Conventions library for .NET. The library moves output files to Azure Storage, and names destination containers and blobs according to the Batch File Conventions standard.

For more information, see Persist job and task data to Azure Storage with the Batch File Conventions library for .NET.

Batch File Conventions standard

If you're using a language other than .NET, you can implement the Batch File Conventions standard in your own application. Use this approach when:

You want to use a common naming scheme.
You want to view task output in the Azure portal.

Custom file movement solution

You can also implement your own complete file movement solution. Use this approach when:

You want to persist task data to a data store other than Azure Storage. For example, you want to upload files to a data store like Azure SQL or Azure DataLake. Create a custom script or executable to upload to that location. Then, call the custom script or executable on the command line after running your primary executable. For example, on a Windows node, call doMyWork.exe && uploadMyFilesToSql.exe.
You want to do checkpointing or early uploading of initial results.
You want to maintain granular control over error handling. For example, you want to use task dependency actions to take certain upload actions based on specific task exit codes.

Design considerations

When you design your Batch solution, consider the following factors.

Compute nodes are often transient, especially in Batch pools with autoscaling enabled. You can only see output from a task:

While the node where the task is running exists.
During the file retention period that you set for the task.

When you view a Batch task in the Azure portal, and select Files on node, you see all files for that task, not just the output files. To retrieve task output directly from the compute nodes in your pool, you need the file name and its output location on the node.

If you want to keep task output data longer, configure the task to upload its output files to a data store. It's recommended to use Azure storage as the data store. There's integration for writing task output data to Azure Storage in the Batch service API. You can use other durable storage options to keep your data. However, you need to write the application logic for other storage options yourself.

To view your output data in Azure Storage, use the Azure portal or an Azure Storage client application, such as Azure Storage Explorer. Note your output file's location, and go to that location directly.

Next step

PersistOutputs sample project

Bendrinti naudojant