Hello!
I have created an Azure Durable Functions project using TypeScript, which includes an orchestrator that calls a sub-orchestrator to perform some data processing tasks.
The problem I'm facing is that every time the sub-orchestrator replays, the memory usage increases significantly. The project runs locally using Node 16.10.
The sub-orchestrator reads and transforms approximately 200MB of data from 10,000 JSON files, and stores some additional data (~30MB) in a durable entity. When the main orchestrator starts, the heap memory usage is around 50MB.
As the sub-orchestrator executes, the heap memory usage increases to around 100MB. After the data is read, the memory usage increases to around 300MB, and after the data is transformed, it increases to around 600MB. Each time the sub-orchestrator replays, the memory usage increases further, reaching up to 1GB by the end of its execution.
When the sub-orchestrator is done, the main orchestrator starts, with memory usage between 300MB and 800MB. Each time the sub-orchestrator starts during main orchestrator execution, the memory usage increases by an additional 200MB. After a few replays, the heap memory usage reaches its limit, even though the sub-orchestrator has finished its job and all memory should be released.
Flow:
When mainOrchestrator starts, the heapMemory is ~50Mb
- after the etlOrchestrator is started the heapMemory increases to ~100MB
- after the data is read, the memory increases to ~300MB
- etlOrchestrator replays and the memory increases to ~400MB
- after the data is transformed the memory increases to ~600MB
- etlOrchestrator replays and the memory increases to ~800MB
- after the data is stored in durable entity the memory increases to ~850MB
- etlOrchestrator replays and the memory increases to ~1GB
- after the data is written by the activity functions and etlOrchestrator replays, memory increases over ~1GB
- etlOrchestrator is done so mainOrchestrator is started and the memory is between 300-800MB
- etlOrchestrator replays and the memory increases to (300-800MB) + 200MB
In the end the error regarding heapMemory occurs when etlOrchestrator replays.
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory. I don't to increase the max-memory for my function app because it already has 3.5GB
Is this an expected behaviour when I run durable functions locally?
Is there a known issue regarding the node version?
export default df.orchestrator(function* mainOrchestrator(context: IOrchestrationFunctionContext) {
const entityId = new df.EntityId('temp-storage', 'temp-storage-key);
const etlOrchestrator = [10000, 10000, 10000].map((chunk) =>
context.df.callSubOrchestrator('etlOrchestrator', chunk),
);
let index = 0;
// Sequentially starts the etlOrchestrator
while (index < etlOrchestrator.length) {
yield etlOrchestrator[index];
index += 1;
}
// insert additional info
const additionalInfo = yield context.df.callEntity(entityId, 'get');
yield context.df.callEntity('writer', additionalInfo);
// context.df.destructOnExit();
context.df.callEntity(entityId, 'delete');
});
export default df.orchestrator(function* etlOrchestrator(context: IOrchestrationFunctionContext) {
const entityId = new df.EntityId('temp-storage', 'temp-storage-key);
// creates multiple activity functions that read data in parallel
yield context.df.Task.all(createReaderActivities(context, chcontext.df.getInput()));
// transforms data
const transformedData = yield context.df.callActivity('transformData', data);
// stores in a temporary storage the additional data
yield context.df.callEntity(entityId, 'add', transformedData.additionalData),
// creates multiple activity functions that write data in parallel
yield context.df.Task.all(createWriterActivities(context, transformedData));
});