question

simonbrami-1127 avatar image
0 Votes"
simonbrami-1127 asked SamaraSoucy-MSFT answered

Processing Azure Storage Logs to Azure SQL with Stream Analytics

Hi Azure Community!

I have a Storage Account and a CDN to make my images and videos faster to download/view.
I would like to count the views/downloads of my files periodically to an azure database, let's say every 2 hours.
I was thinking of Azure Stream Analytics to do so, but I have a concern as my files are cached with the CDN, does the storage account still logs every read/write operations?

Is there any best practice to achieve this?

azure-storage-accountsazure-cdnazure-stream-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SamaraSoucy-MSFT avatar image
0 Votes"
SamaraSoucy-MSFT answered simonbrami-1127 edited

The specifics of how to get the logs from your CDN will depend on which service you are using. For something like a file download, you will need to get the numbers from the CDN to get accurate counts- one of the reasons for having it is to make fewer requests to your storage account.

Stream Analytics is not a good choice retrieve data from an API on a regular basis. It essentially takes a stream of data being fed to it from something like Event Hub and do analysis on the data in real time for things like reporting and alerts.

I would instead suggest looking at Logic Apps, or possibly Azure Functions. Either option will allow you to call and API endpoint based on a timer and write the data retrieved to your database. Logic Apps can be easier to develop because there is no code involved, but Functions would give you more flexibility if the process for retrieving the data is difficult or impossible within the option available in Logic Apps.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for your answer! Very interesting.
According to you, should I set up in my Diagnostic settings the "AzureCdnAccessLog" to be saved in a Storage and then process the logs in a LogicApp/Func ?
Or is there an API that can be called directly from LogicApp without saving the logs?

0 Votes 0 ·
SamaraSoucy-MSFT avatar image
1 Vote"
SamaraSoucy-MSFT answered simonbrami-1127 edited

If you are using Azure CDN you can save to Storage or to a Log Analytics workspace, but you can query it directly from the built-in monitoring via the API.

  1. Create your Logic App and turn on "System Assigned Managed Identity" (Search for "Identity" in the left-hand options bar on the resource)

  2. Under the Azure CDN assign the Logic App to the role "Log Analytics Reader". This will let it read metrics.

  3. Set the Logic App trigger to timer and add an HTTP action to retrieve the metrics. The API endpoint is GET https://management.azure.com/{resourceUri}/providers/microsoft.insights/metrics?api-version=2018-01-01 This will return quite a bit of data by default, but there are additional parameters you can use to filter it down to just the data you want. Options are in the API reference docs: https://docs.microsoft.com/en-us/rest/api/monitor/metrics/list

  4. In the authentication box set authentication to Managed Identity and Audience to https://management.azure.com

  5. Use a 'Parse JSON' action to tell the Logic App what the data format is so you can use that in your SQL Query

  6. Use SQL Server 'Insert Row' to add the data into your SQL table


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

As I already save the logs to my storage I will make a logic app to extract those raw JSON logs to SQL.
I have one issue, the JSON files "PT1H.json" are malformed JSON without commas neither arrays, just "JSON lines" like this:

{ id: 1, "foo": "bar"}
{ id: 2, "foo": "bar"}
{ id: 3, "foo": "bar"}

So the JSON parser only parses the first element as he doesn't understand there are multiple elements ...

@SamaraSoucy-MSFT do you have any idea how to fix that?

Thank you very much for your help

0 Votes 0 ·
SamaraSoucy-MSFT avatar image
0 Votes"
SamaraSoucy-MSFT answered

Logic Apps unfortunately doesn't have a setting for splitting a line-separated array currently, so there is a bit of parsing that needs to happen to support converting them. It may be simpler to grab the data from the API, but either is fine.

The preview version of Logic Apps will let you do the split in JS, but there isn't yet a consumption billing option for that, so probably not going that route for something small like this. The current version allows you to run code within an integration account, which has the same issue.

Here is the option I put together- there are usually a few different ways that will work.

  1. Parse the contents into a string to convert from the raw base-64 encode to something that we can work with later.

  2. Set up an array variable to hold values as they are parsed.

  3. Run a loop splitting the string on the newline character. You must edit the expression in code view. The UI isn't able to handle the newline character properly. The correct expression is @split(variables('converttostring'), '\r\n')

  4. The last line is blank. so adding a check to make sure that the first character is '{' like we expect

  5. Parse the JSON based on whatever properties you need to pull. When generating from sample I had Logic Apps set a couple properties to integer instead of number, so just be mindful of that.

  6. Add the parsed data to the array

  7. Whatever else you want to do to the data and load it into SQL. Depending on what transformations you want you may want to put some of them before adding them to the array

83775-2021-04-01-15-12-41-logic-apps-designer-microsoft.png

Here's the JSON schema with my storage connection parameters removed

 {
     "definition": {
         "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
         "actions": {
             "Convert_to_string": {
                 "inputs": {
                     "variables": [
                         {
                             "name": "converttostring",
                             "type": "string",
                             "value": "@{body('Get_blob_content')}"
                         }
                     ]
                 },
                 "runAfter": {
                     "Get_blob_content": [
                         "Succeeded"
                     ]
                 },
                 "type": "InitializeVariable"
             },
             "For_each": {
                 "actions": {
                     "Condition": {
                         "actions": {},
                         "else": {
                             "actions": {
                                 "Append_to_array_variable_2": {
                                     "inputs": {
                                         "name": "parsedArray",
                                         "value": "@body('Parse_JSON_2')"
                                     },
                                     "runAfter": {
                                         "Parse_JSON_2": [
                                             "Succeeded"
                                         ]
                                     },
                                     "type": "AppendToArrayVariable"
                                 },
                                 "Parse_JSON_2": {
                                     "inputs": {
                                         "content": "@items('For_each')",
                                         "schema": {
                                             "properties": {
                                                 "average": {
                                                     "type": "number"
                                                 },
                                                 "count": {
                                                     "type": "number"
                                                 },
                                                 "metricName": {
                                                     "type": "string"
                                                 },
                                                 "resourceId": {
                                                     "type": "string"
                                                 },
                                                 "time": {
                                                     "type": "string"
                                                 },
                                                 "timeGrain": {
                                                     "type": "string"
                                                 },
                                                 "total": {
                                                     "type": "number"
                                                 }
                                             },
                                             "type": "object"
                                         }
                                     },
                                     "runAfter": {},
                                     "type": "ParseJson"
                                 }
                             }
                         },
                         "expression": {
                             "and": [
                                 {
                                     "not": {
                                         "startsWith": [
                                             "@items('For_each')",
                                             "{"
                                         ]
                                     }
                                 }
                             ]
                         },
                         "runAfter": {},
                         "type": "If"
                     }
                 },
                 "foreach": "@split(variables('converttostring'), '\r\n')",
                 "runAfter": {
                     "Initialize_variable": [
                         "Succeeded"
                     ]
                 },
                 "type": "Foreach"
             },
             "Get_blob_content": {
                 "inputs": {
                     "host": {
                         "connection": {
                             "name": "@parameters('$connections')['azureblob']['connectionId']"
                         }
                     },
                     "method": "get",
                     "path": "/datasets/default/files/@{encodeURIComponent(encodeURIComponent('XXXXXXXX'))}/content",
                     "queries": {
                         "inferContentType": true
                     }
                 },
                 "metadata": {
                     "JTJmdGVzdCUyZlBUMUguanNvbg==": "/test/PT1H.json"
                 },
                 "runAfter": {},
                 "type": "ApiConnection"
             },
             "Initialize_variable": {
                 "inputs": {
                     "variables": [
                         {
                             "name": "parsedArray",
                             "type": "array"
                         }
                     ]
                 },
                 "runAfter": {
                     "Convert_to_string": [
                         "Succeeded"
                     ]
                 },
                 "type": "InitializeVariable"
             },
             "Initialize_variable_2": {
                 "inputs": {
                     "variables": [
                         {
                             "name": "testArray",
                             "type": "array",
                             "value": "@variables('parsedArray')"
                         }
                     ]
                 },
                 "runAfter": {
                     "For_each": [
                         "Succeeded"
                     ]
                 },
                 "type": "InitializeVariable"
             }
         },
         "contentVersion": "1.0.0.0",
         "outputs": {},
         "parameters": {
             "$connections": {
                 "defaultValue": {},
                 "type": "Object"
             }
         },
         "triggers": {
             "Recurrence": {
                 "recurrence": {
                     "frequency": "Week",
                     "interval": 3
                 },
                 "type": "Recurrence"
             }
         }
     },
     "parameters": {
         "$connections": {
             "value": {
                 "azureblob": {
                     "connectionId": "XXXXXXXX",
                     "connectionName": "azureblob",
                     "id": "XXXXXXXX"
                 }
             }
         }
     }
 }



















5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.