Azure durable ORchestrator memory performance

Zava 25 Reputation points
2023-03-10T20:30:06.7766667+00:00

Hello!

I have created an Azure Durable Functions project using TypeScript, which includes an orchestrator that calls a sub-orchestrator to perform some data processing tasks.

The problem I'm facing is that every time the sub-orchestrator replays, the memory usage increases significantly. The project runs locally using Node 16.10.

The sub-orchestrator reads and transforms approximately 200MB of data from 10,000 JSON files, and stores some additional data (~30MB) in a durable entity. When the main orchestrator starts, the heap memory usage is around 50MB.

As the sub-orchestrator executes, the heap memory usage increases to around 100MB. After the data is read, the memory usage increases to around 300MB, and after the data is transformed, it increases to around 600MB. Each time the sub-orchestrator replays, the memory usage increases further, reaching up to 1GB by the end of its execution.

When the sub-orchestrator is done, the main orchestrator starts, with memory usage between 300MB and 800MB. Each time the sub-orchestrator starts during main orchestrator execution, the memory usage increases by an additional 200MB. After a few replays, the heap memory usage reaches its limit, even though the sub-orchestrator has finished its job and all memory should be released.

Flow:

When mainOrchestrator starts, the heapMemory is ~50Mb

  • after the etlOrchestrator is started the heapMemory increases to ~100MB
  • after the data is read, the memory increases to ~300MB
  • etlOrchestrator replays and the memory increases to ~400MB
  • after the data is transformed the memory increases to ~600MB
  • etlOrchestrator replays and the memory increases to ~800MB
  • after the data is stored in durable entity the memory increases to ~850MB
  • etlOrchestrator replays and the memory increases to ~1GB
  • after the data is written by the activity functions and etlOrchestrator replays, memory increases over ~1GB
  • etlOrchestrator is done so mainOrchestrator is started and the memory is between 300-800MB
  • etlOrchestrator replays and the memory increases to (300-800MB) + 200MB

In the end the error regarding heapMemory occurs when etlOrchestrator replays. FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory. I don't to increase the max-memory for my function app because it already has 3.5GB

Is this an expected behaviour when I run durable functions locally?

Is there a known issue regarding the node version?


export default df.orchestrator(function* mainOrchestrator(context: IOrchestrationFunctionContext) {
  const entityId = new df.EntityId('temp-storage', 'temp-storage-key);

  const etlOrchestrator = [10000, 10000, 10000].map((chunk) =>
    context.df.callSubOrchestrator('etlOrchestrator', chunk),
  );

  let index = 0;

  // Sequentially starts the etlOrchestrator
  while (index < etlOrchestrator.length) {
    yield etlOrchestrator[index];

    index += 1;
  }

  // insert additional info
  const additionalInfo = yield context.df.callEntity(entityId, 'get');
  yield context.df.callEntity('writer', additionalInfo);
  
  // context.df.destructOnExit();
  context.df.callEntity(entityId, 'delete');
});


export default df.orchestrator(function* etlOrchestrator(context: IOrchestrationFunctionContext) {
  const entityId = new df.EntityId('temp-storage', 'temp-storage-key);

  // creates multiple activity functions that read data in parallel
  yield context.df.Task.all(createReaderActivities(context, chcontext.df.getInput()));

  // transforms data
  const transformedData = yield context.df.callActivity('transformData', data);

  // stores in a temporary storage the additional data
  yield context.df.callEntity(entityId, 'add', transformedData.additionalData),

  // creates multiple activity functions that write data in parallel
  yield context.df.Task.all(createWriterActivities(context, transformedData));
});
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,559 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MughundhanRaveendran-MSFT 12,486 Reputation points
    2023-03-29T05:59:44.7333333+00:00

    @Zava

    The issue you are facing with the increased memory usage of your Durable Functions project is likely due to the orchestrator function replays. According to the Azure documentation, when extended sessions are enabled, orchestrator function instances are held in memory longer and new messages can be processed without a full history replay. This can result in an overall increase in function app memory usage because idle instances are not unloaded from memory as quickly.

    https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-perf-and-scale

    You can reduce the memory usage by reducing the value for controlQueueBufferThreshold in the host.json file. This can reduce the effect of idle-queue polling on storage transaction costs and reduce the memory usage.

    https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/azure-functions/durable/durable-functions-azure-storage-provider.md

    You can also try to reduce the memory usage by limiting the maximum number of orchestrator, entity, or activity functions that are loaded into memory on a single worker. You can do this by configuring the durableTask/maxConcurrentActivityFunctions for activity functions and durableTask/maxConcurrentOrchestratorFunctions for both orchestrator and entity functions in the host.json file.

    2 people found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.