Logs are written as blobs when you want to set up diagnostic logging to Azure Blob Storage. The naming pattern of these blobs can give you an idea of how often they're generated. For example, a typical blob name might look like: resourceId=/SUBSCRIPTIONS/{subscription_id}/RESOURCEGROUPS/{resource_group_name}/PROVIDERS/MICROSOFT.COMPUTE/VIRTUALMACHINES/{resource_name}/y=2023/m=09/d=01/h=00/m=00/PT1H.json
.
From the naming, you can infer that logs are broken down by year, month, day, hour, and minute. The PT1H.json
suffix suggests that the log file typically covers a 1-hour period.
Your assumption seems correct. Log Analytics typically generates a new file for each hour that log entries are created.
If new log data for that hour comes in after the blob has been created, Log Analytics would typically append that data to the existing blob for that hour, rather than creating a new blob.
Using a trigger based on the creation of an item in Azure Blob Storage makes sense. However, since the logs might get appended within the hour, it may be a good idea to add some delay to your Data Factory pipeline's trigger to ensure that you capture all the logs for that hour. For instance, if a blob is created for the 1:00 PM - 2:00 PM window at 2:00 PM, you might want your Data Factory pipeline to start ingesting this blob only at 2:15 PM or 2:30 PM to account for potential late-arriving log data.
If you require real-time ingestion, it might be better to consider using a service like Azure Stream Analytics. But if a slight delay is acceptable, the above approach with Azure Data Factory should work for your needs.