Kusto Query Language with Azure Notebooks and Azure Data Explorer

Getting Started with KQL

Microsoft Azure notebooks https://notebooks.azure.com now supports KQL, which enables notebooks to query big data stores such as Azure Log Analytics and Application Insights Analytics. You can get started using the KQL Sample notebooks in GitHub.

This becomes even more interesting as Azure Data Explorer and its documentation is an excellent place to educate yourself on the Kusto Query Language.  You can also see the KQL Samples on Azure Notebooks at https://aka.ms/kqlmagicsamples.

Why Azure Data Explorer?

Perform ad-hoc queries on terabytes of data with Azure Data Explorer—a lightning-fast indexing and querying service to help you build near real-time and complex analytics solutions. Azure Data Explorer allows you to quickly identify trends, patterns, or anomalies in all data types inclusive of structured, semi structured and unstructured data.

  • Build powerful, interactive analytics solutions
  • Get started and scale to terabytes of data, in minutes
  • Quickly discover insights from large volumes of event data
  • The Azure Data Explorer white paper also covers the basics of the query language.

Learning about KQL

There's also a 4-hour Pluralsight course which will really jump start you on KQL.

Here a quick overview of KQL

Queries generally begin by either referencing a table or a function. You start with that tabular data and then run it through a set of statements connected by pipes to shape your data into the final result.

KQL Basics

  • where - The "where" operator allows you to filter your data. So if you start with TableA and you want to only keep events that have a certain key,you would use:

     TableA | where MyKey='MyValue'
    
  • project/extend - These two operators help you pick the fields that you want to show up in your output. Extend adds a new field and project can either choose from the existing set of fields or add a new field. These two statements produce the same result:

     T | project A, B, C=D+E
    
    T | project A, B | extend C=D+E
    
  • count - The count operator returns a scalar value of the number of rows in the tabular set.

     T | count
    
  • summarize - This is a big topic, but we'll keep it light for now. The summarize operator can perform aggregations on your dataset. For example, the count operator mentioned above is short for:

     T | summarize count()
    

    You can specify a number of aggregations over a variety of fields:

     T | summarize count(), sum(A), avg(B) by C, D
    

    The bin() function is often used in conjunction with summarize statements. It lets you group times (or numbers) into buckets.

     T | summarize count() by bin(TimeStamp, 1d) // count the number of rows per day
    
  • join - Many types of joins are supported but the common ones are inner join (keep rows that match on both sides) and leftouter (keep all rows from the left side and include matching rows from the right). You technically don't have to specify a join kind but I recommend that you always do. It makes for easier readability and the default probably isn't what you expect.

     T | join kind=inner (U) on A
    

    Note that joins are only on equality and generally it's expected that the keys have the same name on both sides. If they aren't the same, you can use a project statement to make them the same or use an alternate key specification syntax:

     T | join kind=inner (U) on $left.A == $right.B
    
  • now(), ago() and datetime math - Azure Data Explorer excels at time series data analysis. There are some handy functions to get used to like "now()" which gives the current UTC time and "ago()". The ago function is especially handy when you're looking for recent data.

     T | where A > ago(5m) // where A is greater than 5 minutes ago
    
    T | where A > ago(1d) // where A is greater than 1 day ago
    

    You can also do easy datetime math.

     print ago(1d) + 10m
    
  • arg_max(), arg_min() - These are somewhat advanced topics but I include them when I introduce people to Kusto because they are so handy. Imagine that you have a bunch of entities and each one sends a row to your table periodically. You want to run a query over the latest message from each entity.

     T | summarize arg_max(Timestamp, *) by Id // for every Id, get the row with the maximum Timestamp
    

    Use these functions with care though. If they are used on a huge table and the cardinality of the grouping is high, it can destroy performance.

  • Rendering charts - Both  the Kusto Explorer desktop client and the web client have the ability to easily render charts. You can read the documentation to learn about the various types, but since I deal with a lot of time series data, the one I use the most is timechart. It's a line chart where the x-axis is a datetime and everything else goes on the y-axis. It automatically keeps the x-axis spaced nicely even if your data doesn't have every time specified.

     T | render timechart
    

So by using Azure Notebooks you can get quickly up to speed on Kusto Query Language and create some replicable notebooks and resources.