Query diagnostics

Članek
07/29/2024

With Query Diagnostics, you can achieve a better understanding of what Power Query is doing at authoring and at refresh time in Power BI Desktop. While we'll be expanding on this feature in the future, including adding the ability to use it during full refreshes, at this time you can use it to understand what kind of queries you're emitting, what slowdowns you might run into during authoring refresh, and what kind of background events are happening.

To use Query Diagnostics, go to the Tools tab in the Power Query editor ribbon.

Screenshot of the query diagnostics features under the Power Query Tools ribbon.

By default, Query Diagnostics might require administrative rights to run (depending on IT policy). If you find yourself unable to run Query Diagnostics, open the Power BI Desktop options page, and in the Diagnostics tab, select Enable in Query Editor (does not require running as admin). This selection constrains you from being able to trace diagnostics when doing a full refresh into Power BI rather than the Power Query editor. But does allow you to still use it when previewing, authoring, and so on.

Whenever you start diagnostics, Power Query begins tracing any evaluations that you cause. The evaluation that most users think of is when you press refresh, or when you retrieve data for the first time. But there are many actions that can cause evaluations, depending on the connector. For example, with the SQL connector, when you retrieve a list of values to filter, that would kick off an evaluation as well—but it doesn't associate with a user query, and that's represented in the diagnostics. Other system-generated queries might include the navigator or the get data experience.

When you press Diagnose Step, Power Query runs a special evaluation of just the step you're looking at. It then shows you the diagnostics for that step, without showing you the diagnostics for other steps in the query. This evaluation can make it much easier to get a narrow view into a problem.

It's important that if you're recording all traces from Start Diagnostics that you press Stop diagnostics. Stopping the diagnostics allows the engine to collect the recorded traces and parse them into the proper output. Without this step, traces are lost.

Types of diagnostics

We currently provide three types of diagnostics, one of which has two levels of detail.

The first of these diagnostics are the primary diagnostics, which have a detailed view and a summarized view. The summarized view is aimed to give you an immediate insight into where time is being spent in your query. The detailed view is much deeper, line by line, and is, in general, only needed for serious diagnosing by power users.

For this view, some capabilities, like the Data Source Query column, are currently available only on certain connectors. We'll be working to extend the breadth of this coverage in the future.

Data privacy partitions provide you with a better understanding of the logical partitions used for data privacy.

Note

Power Query might perform evaluations that you might not have directly triggered. Some of these evaluations are performed in order to retrieve metadata so we can best optimize our queries or to provide a better user experience (such as retrieving the list of distinct values within a column that are displayed in the Filter Rows experience). Others might be related to how a connector handles parallel evaluations. At the same time, if you see in your query diagnostics repeated queries that you don't believe make sense, feel free to reach out through normal support channels—your feedback is how we improve our product.

Summarized vs. detailed view

Query diagnostics provides two views: summarized and detailed. The summarized view "collapses" multiple related operations into a single operation. In this process, details collected by each operation are combined, and the exclusive durations are summed. No information is lost as part of this process.

The summarized view provides an overview of what occurred during an evaluation for easy high-level review. If further breakdown is wanted for a specific operation, you can look at the group ID and view the corresponding operations that were grouped in the detail view.

Explaining multiple evaluations

When a refresh occurs in the Power Query editor, there's a lot done behind the scenes to attempt to give you a fluent user experience. As an example, when you Refresh Preview, the evaluator executes the final step of each given Query. But then in the background it sequentially runs n-1 steps, n-2, steps, and so on. So if you step back through your steps, it's already available.

To provide higher performance, currently some caching happens so that it doesn't have to rerun every part of the final query plan as it goes back through the steps. While this caching is useful for normal authoring, it means that you don't always get correct step comparison information because of later evaluations pulling on cached data.

Diagnostics schema

Id

When analyzing the results of a recording, it's important to filter the recording session by Id, so that columns such as Exclusive Duration % make sense.

Id is a composite identifier. It's formed of two numbers—one before the dot, and one after. The first number is the same for all evaluations that resulted from a single user action. In other words, if you press refresh twice, there are two different numbers leading the dot, one for each user activity taken. This numbering is sequential for a given diagnostics recording.

The second number represents an evaluation by the engine. This number is sequential for the lifetime of the process where the evaluation is queued. If you run multiple diagnostics recording sessions, this number continues to grow across the different sessions.

To summarize, if you start recording, press evaluation once, and stop recording, some number of Ids appear in your diagnostics. But since you only took one action, they're all 1.1, 1.2, 1.3, and so on.

The combination of the activityId and the evaluationId, separated by the dot, provides a unique identifier for an evaluation of a single recording session.

Query

The name of the Query in the left-hand pane of the Power Query editor.

Step

The name of the Step in the right-hand pane of the Power Query editor. Things like filter dropdowns generally associate with the step you filter on, even if you aren't refreshing the step.

Data source kind

This data tells you what kind of data source you're accessing, such as SQL or Oracle.

Operation

The actual operation being performed. This operation can include evaluator work, opening connections, sending queries to the data source, and many more.

Start time

The time that the operation started.

End time

The time that the operation ended.

Exclusive duration (%)

The Exclusive Duration column of an event is the amount of time the event was active. This contrasts with the "duration" value that results from subtracting the values in an event's Start Time column and End Time column. This "duration" value represents the total time that elapsed between when an event began and when it ended, which might include times the event was in a suspended or inactive state and another event was consuming resources.

Exclusive duration % adds up to approximately 100% within a given evaluation, as represented by the Id column. For example, if you filter on rows with Id 1.x, the Exclusive Duration percentages would sum to approximately 100%. This isn't the case if you sum the Exclusive Duration % values of all rows in a given diagnostic table.

Exclusive duration

The absolute time, rather than %, of exclusive duration. The total duration (that is, exclusive duration + time when the event was inactive) of an evaluation can be calculated in one of two ways:

Find the operation called "Evaluation." The difference between End Time–Start Time results in the total duration of an event.
Subtract the minimum start time of all operations in an event from the maximum end time. In cases when the information collected for an event doesn't account for the total duration, an operation called "Trace Gaps" is generated to account for this time gap.

Resource

The resource you're accessing for data. The exact format of this resource depends on the data source.

Data source query

Power Query does something called folding, which is the act of running as many parts of the query against the back-end data source as possible. In DirectQuery mode (over Power Query), where enabled, only transforms that fold run. In import mode, transforms that can't fold are instead run locally.

The Data Source Query column allows you to see the query or HTTP request/response sent against the back-end data source. As you author your Query in the editor, many Data Source Queries are emitted. Some of these queries are the actual final Data Source Query to render the preview. But others might be for data profiling, filter dropdowns, information on joins, retrieving metadata for schemas, and any number of other small queries.

In general, you shouldn't be concerned by the number of Data Source Queries emitted unless there are specific reasons to be concerned. Instead, you should focus on making sure the proper content is being retrieved. This column might also help determine if the Power Query evaluation was fully folded.

Additional info

There's a lot of information retrieved by our connectors. Much of it's ragged and doesn't fit well into a standard column hierarchy. This information is put in a record in the additional info column. Information logged from custom connectors also appears here.

Row count

The number of rows returned by a Data Source Query. Not enabled on all connectors.

Content length

Content length returned by HTTP Requests, as commonly defined. This schema isn't enabled in all connectors, and it isn't accurate for connectors that retrieve requests in chunks.

Is user query

A Boolean value that indicates if it's a query authored by the user and present in the left-hand pane, or if it was generated by some other user action. Other user actions can include things such as filter selection or using the navigator in the get data experience.

Path

Path represents the relative route of the operation when viewed as part of an interval tree for all operations within a single evaluation. At the top (root) of the tree, there's a single operation called Evaluation with path "0." The start time of this evaluation corresponds to the start of this evaluation as a whole. The end time of this evaluation shows when the whole evaluation finished. This top-level operation has an exclusive duration of 0, as its only purpose is to serve as the root of the tree.

Further operations branch from the root. For example, an operation might have "0/1/5" as a path. This path would be understood as:

0: tree root
1: current operation's parent
5: index of current operation

Operation "0/1/5" might have a child node, in which case, the path has the form "0/1/5/8," with 8 representing the index of the child.

Group ID

Combining two (or more) operations don't occur if it leads to detail loss. The grouping is designed to approximate "commands" executed during the evaluation. In the detailed view, multiple operations share a Group Id, corresponding to the groups that are aggregated in the Summary view.

As with most columns, the group ID is only relevant within a specific evaluation, as filtered by the Id column.

Data privacy partitions schema

Id

Same as the ID for the other query diagnostics results. The integer part represents a single activity ID, while the fractional part represents a single evaluation.

Partition key

Corresponds to the Query/Step that's used as a firewall partition.

Firewall group

Categorization that explains why this partition has to be evaluated separately, including details on the privacy level of the partition.

Accessed resources

List of resource paths for all the resources accessed by this partition, and is in general uniquely identifying a data source.

Partition inputs

List of partition keys upon which the current partition depends (this list could be used to build a graph).

Expression

The expression that gets evaluated on top of the partition's query/step. In several cases, it coincides with the query/step.

Start time

Time when evaluation started for this partition.

End time

Time when evaluation ended for this partition.

Duration

A value derived from End Time minus Start Time.

Exclusive duration

If partitions are assumed to execute in a single thread, exclusive duration is the "real" duration that can be attributed to this partition.

Exclusive duration %

Exclusive duration as a percentage.

Diagnostics

This column only appears when the query diagnostics "Aggregated" or "Detailed" is also captured, allowing the user to correspond between the two diagnostics outputs.

Performance counters schema

When you run performance counters, every half second Power Query takes a snapshot of resource utilization. This snapshot isn't useful for very fast queries, but can be helpful for queries that use up a lot more resources.

% processor time

Percent of time spent by processors on the query. This percentage might reach above 100% because of multiple processors.

Total processor time

Total duration of processor time spent on the query.

IO data bytes per second

Throughput speed of data received from the data source, expressed in bytes per second.

Commit (bytes)

Amount of virtual memory reserved by the evaluation.

Working set (bytes)

Amount of memory reserved by the evaluation.

Deli z drugimi prek

Query diagnostics

Types of diagnostics

Summarized vs. detailed view

Explaining multiple evaluations

Diagnostics schema

Id

Query

Step

Category

Data source kind

Operation

Start time

End time

Exclusive duration (%)

Exclusive duration

Resource

Data source query

Additional info

Row count

Content length

Is user query

Path

Group ID

Data privacy partitions schema

Id

Partition key

Firewall group

Accessed resources

Partition inputs

Expression

Start time

End time

Duration

Exclusive duration

Exclusive duration %

Diagnostics

Performance counters schema

% processor time

Total processor time

IO data bytes per second

Commit (bytes)

Working set (bytes)

Related content

Povratne informacije

Dodatni viri