I am running a functional powershell script that queries Log Analytics for a specified time range (usually within 2 monhts), retrieves the data from Log Analytics and send it to Event Hub.
Problem:
Query results are very large. The following are the counts:
24 hours of security logs query = count is 15,446,817 rows
1 hour of security logs query = count is 363,303 rows
1 minute of security logs query = count is 6,166 rows
If I run the Powershell code in Azure Automation, it is runs fine if query period is less than 3 minutes. If I attempt queries larger than 3 minutes, it times out (it exceeds 400MB memory limit). If I run the same code from my local machine, I can do queries for a period of 10 hours without timeouts.
Question:
- Should the best solution in this case break long query period and do increments of up to 3 minutes? For example, if the user requests a query for the entire month of March 2021, I should start a counter to increment time by 3 minutes until I reach the entire month?
2)Another suggestion I received is to implement a Hybrid Runbook Worker and use more computing resources from a VM running agent-job. But if I can avoid a VM additional cost by break long queries into small segments, that would be preferred.
Please let me know best approach in this large data set scenario.
Blockquote
function QueryLogAnalyticsSend {
Param(
$TableName,
$startperiod,
$endperiod,
$LogAnalyticsWorkspaceId
)
$makehub = $TableName.ToLower()
$EventHub = "am-"+$makehub
$elapsedvar1 = Get-Date
Write-Host "Executing query on Log Analytics, table $TableName..."
$query = "$($TableName)| where TimeGenerated >= datetime($startperiod) and TimeGenerated <= datetime($endperiod)"
Write-Host "Query is $query"
Try {
$output = (Invoke-AzOperationalInsightsQuery -WorkspaceId $LogAnalyticsWorkspaceId -Query $query).Results
}
Catch {
Write-Host "Error during Invoke-AzOperationalInsightsQuery. Verify 1) PowerShell Module Az is installed 2) LA Work ID is correct 3) DateTime syntax is correct. "
}
$arrayrows = @()
$total_size_sent = 0
$current_size = 0
$allowed_size = 1024 * 1024 - 3000
foreach ($row in $output)
{
$irow++
$converted_row = $row | ConvertTo-Json
$row_size = [System.Text.Encoding]::UTF8.GetByteCount($converted_row)
$current_size = $current_size+$row_size
IF ($current_size -gt $allowed_size)
{
Write-Host "*** Current size to be sent: $current_size Bytes Table is: $TableName *** "
$SAStoken = get-SAStoken -EventHubNamespace $EventHubNamespace -EventHub $EventHub -Access_Policy_Name $AccessPolicyName -Access_Policy_Key $Access_Policy_Key
SendDataToEventHub -token $SAStoken -row $arrayrows -Namespace $EventHubNamespace -EventHub $EventHub
$total_size_sent = $total_size_sent + $current_size
$current_size = 0
$arrayrows = @()
}
$arrayrows = $arrayrows + $converted_row
}
Write-Host "*** Current size to be sent: $current_size Bytes Table is: $TableName *** "
$SAStoken = get-SAStoken -EventHubNamespace $EventHubNamespace -EventHub $EventHub -Access_Policy_Name $AccessPolicyName -Access_Policy_Key $Access_Policy_Key
SendDataToEventHub -token $SAStoken -row $arrayrows -Namespace $EventHubNamespace -EventHub $EventHub
$elapsedvar2 = Get-Date
$elapsedtime = $elapsedvar2 - $elapsedvar1
$total_size_sent = $total_size_sent + $current_size
Write-Host "Elapsed time:$elapsedtime Total size sent:$total_size_sent Bytes Start:$elapsedvar1 End:$elapsedvar2 Rows:$irow"
Write-Host "Original query: $query"
}
Blockquote