Enhancing Event Processing Efficiency
Hello,
I have documents that I read and use to create events of different types. The system_id serves as a unique identifier for each event and will be passed as a parameter for the vertex. I send these events to Event Grid with an endpoint at Azure Storage Queue. The eventgrid_trigger Azure function is used to load these events into Cosmos DB (Gremlin) as graph vertices. Before loading, I check if a vertex with the same system_id exists in the database. If it does, I update the properties for this vertex. If not, I load a new vertex.
I require loading events into the database sequentially to prevent the scenario where the same vertex is loaded multiple times. For this reason, I have configured the host.json
file as follows:
{
"version": "2.0",
"functionTimeout": "00:10:00",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensions": {
"queues": {
"maxPollingInterval": "00:00:02",
"visibilityTimeout" : "00:00:01",
"batchSize": 1,
"maxDequeueCount": 5,
"newBatchThreshold": 0,
"messageEncoding": "base64"
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[4.*, 5.0.0)"
}
}
Additionally, I have set "WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT": "1"
in local.settings.json to prevent multiple VMs from running simultaneously. Although the data are loading into the database without errors like before (such as TooManyRequests
or PreconditionFailedException
), the loading time is too long.
Could you please suggest how to improve the process? Perhaps switching to Event Hub and sending different types of events (there are 6 types of events) to different containers in the Storage account and then process them using different blob_trigger Azure functions could be beneficial?
I work with Azure Functions Consumption plan, there are around 500 000 vertices.
Many thanks