Use Hybrid Runbook Worker or make shorter queries to Log Analytics?

MrSky 21 Reputation points
2021-05-31T16:55:44.733+00:00

I am running a functional powershell script that queries Log Analytics for a specified time range (usually within 2 monhts), retrieves the data from Log Analytics and send it to Event Hub.

Problem:
Query results are very large. The following are the counts:
24 hours of security logs query = count is 15,446,817 rows
1 hour of security logs query = count is 363,303 rows
1 minute of security logs query = count is 6,166 rows

If I run the Powershell code in Azure Automation, it is runs fine if query period is less than 3 minutes. If I attempt queries larger than 3 minutes, it times out (it exceeds 400MB memory limit). If I run the same code from my local machine, I can do queries for a period of 10 hours without timeouts.

Question:

  1. Should the best solution in this case break long query period and do increments of up to 3 minutes? For example, if the user requests a query for the entire month of March 2021, I should start a counter to increment time by 3 minutes until I reach the entire month?

2)Another suggestion I received is to implement a Hybrid Runbook Worker and use more computing resources from a VM running agent-job. But if I can avoid a VM additional cost by break long queries into small segments, that would be preferred.

Please let me know best approach in this large data set scenario.

Blockquote
function QueryLogAnalyticsSend {

    Param(
    $TableName, 
    $startperiod, 
    $endperiod, 
    $LogAnalyticsWorkspaceId
    )

    $makehub = $TableName.ToLower() 
    $EventHub = "am-"+$makehub


    $elapsedvar1 = Get-Date

    Write-Host "Executing query on Log Analytics, table $TableName..."

    $query = "$($TableName)| where TimeGenerated >= datetime($startperiod) and TimeGenerated <= datetime($endperiod)"
    Write-Host "Query is $query"

    Try {

        $output = (Invoke-AzOperationalInsightsQuery -WorkspaceId $LogAnalyticsWorkspaceId -Query $query).Results
    }

    Catch {
        Write-Host "Error during Invoke-AzOperationalInsightsQuery. Verify 1) PowerShell Module Az is installed 2) LA Work ID is correct 3) DateTime syntax is correct. "
    }


    $arrayrows = @()
    $total_size_sent = 0
    $current_size = 0
    $allowed_size = 1024 * 1024 - 3000

    foreach ($row in $output) 
    {

            $irow++
            $converted_row = $row | ConvertTo-Json
            $row_size = [System.Text.Encoding]::UTF8.GetByteCount($converted_row)
            $current_size = $current_size+$row_size


            IF ($current_size -gt $allowed_size) 
            {
                Write-Host "*** Current size to be sent: $current_size Bytes   Table is: $TableName *** "

                $SAStoken = get-SAStoken -EventHubNamespace $EventHubNamespace -EventHub $EventHub -Access_Policy_Name $AccessPolicyName -Access_Policy_Key $Access_Policy_Key

                SendDataToEventHub -token $SAStoken -row $arrayrows -Namespace $EventHubNamespace -EventHub $EventHub

                $total_size_sent = $total_size_sent + $current_size
                $current_size = 0
                $arrayrows = @()
            }

            $arrayrows = $arrayrows + $converted_row

    }

     Write-Host "*** Current size to be sent: $current_size Bytes   Table is: $TableName *** "

     $SAStoken = get-SAStoken -EventHubNamespace $EventHubNamespace -EventHub $EventHub -Access_Policy_Name $AccessPolicyName -Access_Policy_Key $Access_Policy_Key

     SendDataToEventHub -token $SAStoken -row $arrayrows -Namespace $EventHubNamespace -EventHub $EventHub

     $elapsedvar2 = Get-Date
     $elapsedtime = $elapsedvar2 - $elapsedvar1

     $total_size_sent = $total_size_sent + $current_size
     Write-Host "Elapsed time:$elapsedtime      Total size sent:$total_size_sent Bytes    Start:$elapsedvar1    End:$elapsedvar2    Rows:$irow"
     Write-Host "Original query: $query"

}

Blockquote

Azure Automation
Azure Automation
An Azure service that is used to automate, configure, and install updates across hybrid environments.
1,175 questions
{count} votes

1 answer

Sort by: Most helpful
  1. tbgangav-MSFT 10,416 Reputation points
    2021-06-01T07:49:47.967+00:00

    Hi @MrSky / @MrSky,

    Yes, as per this, maximum amount of memory given to a sandbox is 400MB hence the situation. Even though the intention of HRW (Hybrid Runbook Worker) is different as mentioned here but in this case I would recommend to go with option 2 i.e., HRW rather than option 1 i.e., incremental sandbox runbook execution.

    On the other hand, you may try to leverage this new feature that's currently in preview.

    0 comments No comments