Processing Azure Storage Logs to Azure SQL with Stream Analytics

simon brami 1 Reputation point
2021-03-25T10:56:45.873+00:00

Hi Azure Community!

I have a Storage Account and a CDN to make my images and videos faster to download/view.
I would like to count the views/downloads of my files periodically to an azure database, let's say every 2 hours.
I was thinking of Azure Stream Analytics to do so, but I have a concern as my files are cached with the CDN, does the storage account still logs every read/write operations?

Is there any best practice to achieve this?

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,687 questions
Azure Content Delivery Network
Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
330 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Samara Soucy - MSFT 5,051 Reputation points
    2021-03-29T16:12:50.437+00:00

    If you are using Azure CDN you can save to Storage or to a Log Analytics workspace, but you can query it directly from the built-in monitoring via the API.

    1. Create your Logic App and turn on "System Assigned Managed Identity" (Search for "Identity" in the left-hand options bar on the resource)
    2. Under the Azure CDN assign the Logic App to the role "Log Analytics Reader". This will let it read metrics.
    3. Set the Logic App trigger to timer and add an HTTP action to retrieve the metrics. The API endpoint is GET https://management.azure.com/{resourceUri}/providers/microsoft.insights/metrics?api-version=2018-01-01 This will return quite a bit of data by default, but there are additional parameters you can use to filter it down to just the data you want. Options are in the API reference docs: https://learn.microsoft.com/en-us/rest/api/monitor/metrics/list
    4. In the authentication box set authentication to Managed Identity and Audience to https://management.azure.com
    5. Use a 'Parse JSON' action to tell the Logic App what the data format is so you can use that in your SQL Query
    6. Use SQL Server 'Insert Row' to add the data into your SQL table
    1 person found this answer helpful.

  2. Samara Soucy - MSFT 5,051 Reputation points
    2021-03-25T18:13:42.56+00:00

    The specifics of how to get the logs from your CDN will depend on which service you are using. For something like a file download, you will need to get the numbers from the CDN to get accurate counts- one of the reasons for having it is to make fewer requests to your storage account.

    Stream Analytics is not a good choice retrieve data from an API on a regular basis. It essentially takes a stream of data being fed to it from something like Event Hub and do analysis on the data in real time for things like reporting and alerts.

    I would instead suggest looking at Logic Apps, or possibly Azure Functions. Either option will allow you to call and API endpoint based on a timer and write the data retrieved to your database. Logic Apps can be easier to develop because there is no code involved, but Functions would give you more flexibility if the process for retrieving the data is difficult or impossible within the option available in Logic Apps.


  3. Samara Soucy - MSFT 5,051 Reputation points
    2021-04-01T19:45:45.313+00:00

    Logic Apps unfortunately doesn't have a setting for splitting a line-separated array currently, so there is a bit of parsing that needs to happen to support converting them. It may be simpler to grab the data from the API, but either is fine.

    The preview version of Logic Apps will let you do the split in JS, but there isn't yet a consumption billing option for that, so probably not going that route for something small like this. The current version allows you to run code within an integration account, which has the same issue.

    Here is the option I put together- there are usually a few different ways that will work.

    1. Parse the contents into a string to convert from the raw base-64 encode to something that we can work with later.
    2. Set up an array variable to hold values as they are parsed.
    3. Run a loop splitting the string on the newline character. You must edit the expression in code view. The UI isn't able to handle the newline character properly. The correct expression is @split(variables('converttostring'), '\r\n')
    4. The last line is blank. so adding a check to make sure that the first character is '{' like we expect
    5. Parse the JSON based on whatever properties you need to pull. When generating from sample I had Logic Apps set a couple properties to integer instead of number, so just be mindful of that.
    6. Add the parsed data to the array
    7. Whatever else you want to do to the data and load it into SQL. Depending on what transformations you want you may want to put some of them before adding them to the array

    83775-2021-04-01-15-12-41-logic-apps-designer-microsoft.png

    Here's the JSON schema with my storage connection parameters removed

    {  
        "definition": {  
            "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",  
            "actions": {  
                "Convert_to_string": {  
                    "inputs": {  
                        "variables": [  
                            {  
                                "name": "converttostring",  
                                "type": "string",  
                                "value": "@{body('Get_blob_content')}"  
                            }  
                        ]  
                    },  
                    "runAfter": {  
                        "Get_blob_content": [  
                            "Succeeded"  
                        ]  
                    },  
                    "type": "InitializeVariable"  
                },  
                "For_each": {  
                    "actions": {  
                        "Condition": {  
                            "actions": {},  
                            "else": {  
                                "actions": {  
                                    "Append_to_array_variable_2": {  
                                        "inputs": {  
                                            "name": "parsedArray",  
                                            "value": "@body('Parse_JSON_2')"  
                                        },  
                                        "runAfter": {  
                                            "Parse_JSON_2": [  
                                                "Succeeded"  
                                            ]  
                                        },  
                                        "type": "AppendToArrayVariable"  
                                    },  
                                    "Parse_JSON_2": {  
                                        "inputs": {  
                                            "content": "@items('For_each')",  
                                            "schema": {  
                                                "properties": {  
                                                    "average": {  
                                                        "type": "number"  
                                                    },  
                                                    "count": {  
                                                        "type": "number"  
                                                    },  
                                                    "metricName": {  
                                                        "type": "string"  
                                                    },  
                                                    "resourceId": {  
                                                        "type": "string"  
                                                    },  
                                                    "time": {  
                                                        "type": "string"  
                                                    },  
                                                    "timeGrain": {  
                                                        "type": "string"  
                                                    },  
                                                    "total": {  
                                                        "type": "number"  
                                                    }  
                                                },  
                                                "type": "object"  
                                            }  
                                        },  
                                        "runAfter": {},  
                                        "type": "ParseJson"  
                                    }  
                                }  
                            },  
                            "expression": {  
                                "and": [  
                                    {  
                                        "not": {  
                                            "startsWith": [  
                                                "@items('For_each')",  
                                                "{"  
                                            ]  
                                        }  
                                    }  
                                ]  
                            },  
                            "runAfter": {},  
                            "type": "If"  
                        }  
                    },  
                    "foreach": "@split(variables('converttostring'), '\r\n')",  
                    "runAfter": {  
                        "Initialize_variable": [  
                            "Succeeded"  
                        ]  
                    },  
                    "type": "Foreach"  
                },  
                "Get_blob_content": {  
                    "inputs": {  
                        "host": {  
                            "connection": {  
                                "name": "@parameters('$connections')['azureblob']['connectionId']"  
                            }  
                        },  
                        "method": "get",  
                        "path": "/datasets/default/files/@{encodeURIComponent(encodeURIComponent('XXXXXXXX'))}/content",  
                        "queries": {  
                            "inferContentType": true  
                        }  
                    },  
                    "metadata": {  
                        "JTJmdGVzdCUyZlBUMUguanNvbg==": "/test/PT1H.json"  
                    },  
                    "runAfter": {},  
                    "type": "ApiConnection"  
                },  
                "Initialize_variable": {  
                    "inputs": {  
                        "variables": [  
                            {  
                                "name": "parsedArray",  
                                "type": "array"  
                            }  
                        ]  
                    },  
                    "runAfter": {  
                        "Convert_to_string": [  
                            "Succeeded"  
                        ]  
                    },  
                    "type": "InitializeVariable"  
                },  
                "Initialize_variable_2": {  
                    "inputs": {  
                        "variables": [  
                            {  
                                "name": "testArray",  
                                "type": "array",  
                                "value": "@variables('parsedArray')"  
                            }  
                        ]  
                    },  
                    "runAfter": {  
                        "For_each": [  
                            "Succeeded"  
                        ]  
                    },  
                    "type": "InitializeVariable"  
                }  
            },  
            "contentVersion": "1.0.0.0",  
            "outputs": {},  
            "parameters": {  
                "$connections": {  
                    "defaultValue": {},  
                    "type": "Object"  
                }  
            },  
            "triggers": {  
                "Recurrence": {  
                    "recurrence": {  
                        "frequency": "Week",  
                        "interval": 3  
                    },  
                    "type": "Recurrence"  
                }  
            }  
        },  
        "parameters": {  
            "$connections": {  
                "value": {  
                    "azureblob": {  
                        "connectionId": "XXXXXXXX",  
                        "connectionName": "azureblob",  
                        "id": "XXXXXXXX"  
                    }  
                }  
            }  
        }  
    }  
    
    0 comments No comments