Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Analyzing clickstream data is an effective way for businesses to optimize website traffic and gain insights into user behavior. This quickstart outlines how you can build a streaming application for analyzing website clickstream data.
The method outlined in this guide uses a PowerShell script to deploy Azure resources with autogenerated sample data streams. The autogenerated data enables exploring various stream analytic scenarios and helps you deploy Azure resources effortlessly.
Here are the typical scenarios for processing and analyzing clickstream:
In this example, you learn to extract GET
and POST
requests from a website clickstream and store the output results to an Azure Blob Storage. Here's the architecture for this example:
Sample of a website clickstream:
{
"EventTime": "2022-09-09 08:58:59 UTC",
"UserID": 465,
"IP": "145.140.61.170",
"Request": {
"Method": "GET",
"URI": "/index.html",
"Protocol": "HTTP/1.1"
},
"Response": {
"Code": 200,
"Bytes": 42682
},
"Browser": "Chrome"
}
We'll be using the scripts available in the GitHub repository for deploying our required resources:
Open PowerShell from the Start menu, clone this GitHub repository to your working directory.
git clone https://github.com/Azure/azure-stream-analytics.git
Go to BuildApplications folder.
cd .\azure-stream-analytics\BuildApplications\
Sign in to Azure and enter your Azure credentials in the pop-up browser.
Connect-AzAccount
Replace $subscriptionId
with your Azure subscription ID and run the following command to deploy Azure resources. This process may take a few minutes to complete.
.\CreateJob.ps1 -job ClickStream-Filter -eventsPerMinute 11 -subscriptionid $subscriptionId
eventsPerMinute
is the input rate for generated data. In this case, the input source generates 11 events per minute.Once the deployment is completed, it opens your browser automatically, and you can see a resource group named ClickStream-Filter-rg-* in the Azure portal. The resource group contains the following five resources:
Resource Type | Name | Description |
---|---|---|
Azure Function | clickstream* | Generate clickstream data |
Event Hubs | clickstream* | Ingest clickstream data for consuming |
Stream Analytics Job | ClickStream-Filter | Define a query to extract GET requests from the clickstream input |
Blob Storage | clickstream* | Output destination for the ASA job |
App Service Plan | clickstream* | A necessity for Azure Function |
Congratulation! You've deployed a streaming application to extract requests from a website clickstream.
The ASA job ClickStream-Filter uses the following query to extract HTTP requests from the clickstream. Select Test query in the query editor to preview the output results.
SELECT System.Timestamp Systime, UserId, Request.Method, Response.Code, Browser
INTO BlobOutput
FROM ClickStream TIMESTAMP BY Timestamp
WHERE Request.Method = 'GET' or Request.Method = 'POST'
There are sample codes in the query comments that you can use for other stream analytic scenarios with one stream input.
Count clicks for every hour
select System.Timestamp as Systime, count( * )
FROM clickstream
TIMESTAMP BY EventTime
GROUP BY TumblingWindow(hour, 1)
Select distinct user
SELECT *
FROM clickstream
TIMESTAMP BY Time
WHERE ISFIRST(hour, 1) OVER(PARTITION BY userId) = 1
All output results are stored as JSON
file in the Blog Storage. You can find it via: Blob Storage > Containers > job-output.
If you want to find out the username for the clickstream using a user file in storage, you can join the clickstream with a reference input as following architecture:
Assume you've completed the steps for previous example, run following commands to create a new resource group:
Replace $subscriptionId
with your Azure subscription ID and run the following command to deploy Azure resources. This process may take a few minutes to complete.
.\CreateJob.ps1 -job ClickStream-RefJoin -eventsPerMinute 11 -subscriptionid $subscriptionId
Once the deployment is completed, it opens your browser automatically, and you can see a resource group named ClickStream-RefJoin-rg-* in the Azure portal. The resource group contains five resources.
The ASA job ClickStream-RefJoin uses the following query to join the clickstream with reference sql input.
CREATE TABLE UserInfo(
UserId bigint,
UserName nvarchar(max),
Gender nvarchar(max)
);
SELECT System.Timestamp Systime, ClickStream.UserId, ClickStream.Response.Code, UserInfo.UserName, UserInfo.Gender
INTO BlobOutput
FROM ClickStream TIMESTAMP BY EventTime
LEFT JOIN UserInfo ON ClickStream.UserId = UserInfo.UserId
Congratulation! You've deployed a streaming application to join your user file with a website clickstream.
If you've tried out this project and no longer need the resource group, run this command on PowerShell to delete the resource group.
Remove-AzResourceGroup -Name $resourceGroup
If you're planning to use this project in the future, you can skip deleting it, and stop the job for now.
To learn about Azure Stream Analytics, continue to the following articles:
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining
Certification
Microsoft Certified: Azure Data Engineer Associate - Certifications
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.
Documentation
Introduction to Azure Stream Analytics - Azure Stream Analytics
Learn about Azure Stream Analytics, a managed service that helps you analyze streaming data from the Internet of Things (IoT) in real time.
Create a Stream Analytics job by using the Azure portal - Azure Stream Analytics
This quickstart shows you how to get started by creating a Stream Analytic job, configuring inputs, outputs, and defining a query.
Create an Azure Stream Analytics Cluster quickstart - Azure Stream Analytics
Learn how to create an Azure Stream Analytics cluster.