Optimize Field-Level Table Comparison in Power Query M

Rich Janoso 21 Reputation points
2021-04-05T15:34:00.037+00:00

Hi!

I'm a relatively new Power Query M user, so I suspect I'm attacking this problem in a very naïve way. I'm trying to compare two relatively small tables (<300K,<2000 rows, <=25 columns). These are actually two snapshots of the same data set a week apart, and I'm trying to highlight any fields/cells that have changed from one snapshot to the next.

I have some code that runs relatively well on smaller data sets (<25 rows), but runtime seems to grow exponentially as the rows increase, to the point where it's taking over an hour on larger sets. (And I haven't even let the full data set complete.) ...

The basic gist of the code is that the two tables are lined up using "ID" as the key, then each matching row is pulled into a list (with 2 records in each list element ... so a list of lists.) Then these list elements are transposed with List.Transform, and a custom column is added that compares each field and adds a tag character for changed fields. Then the list is un-transposed to result in a table with changed record fields tagged with the specified character.

Watching the mashup containers run, it looks like all of them are staying well below 256MB, and I've tried Table.Buffer and List.Buffer at various points in the code with little change in performance.

Can anyone suggest a different approach to this, or recommend some code profiling tips that might help me improve query performance here?

Here's the M code: (qry_PS_Last and qry_PS_Current are static Excel tables in the same workbook as the M query. No disk or network access required, I assume.) The code really starts bogging down when I get into the List Transforms...

let
tblOriginal = qry_PS_Last,
tblHasChanges = qry_PS_Current,

ColumnNames = Table.ColumnNames(qry_PS_Last),
NullRecord = List.Transform(List.Positions(ColumnNames), each null),
ChangeMarker = "^",
KeyColumn = "ID",

#"Removed Duplicates" = Table.Distinct(tblOriginal, {KeyColumn}),
#"Merged Queries" = Table.NestedJoin(#"Removed Duplicates",{KeyColumn},tblHasChanges,{KeyColumn},"Join",JoinKind.LeftOuter),
AddRowCount = Table.AddColumn(#"Merged Queries", "ValidateRowCount", each Table.RowCount([Join])),
FixEmptyTables = Table.AddColumn(AddRowCount, "Validate", each if [ValidateRowCount] = 0 then #table(ColumnNames,{NullRecord}) else [Join]),
normOriginal = Table.Sort(Table.SelectColumns(FixEmptyTables,ColumnNames),{{KeyColumn, Order.Ascending}}),
normChanges = Table.Sort(Table.ExpandTableColumn(Table.SelectColumns(FixEmptyTables,{"Validate"}), "Validate", ColumnNames, ColumnNames),{{KeyColumn, Order.Ascending}}),

lChanges = Table.ToRecords(Table.AddColumn(normChanges,"Source", each "lChanges")),
lOriginal = Table.ToRecords(Table.AddColumn(normOriginal,"Source", each "lOriginal")),

lRecordsToCompare      = List.Buffer(List.Transform(List.Positions(lOriginal),   each {lChanges{_},lOriginal{_}} )),
lMergeRecordsIntoTable = List.Transform(lRecordsToCompare,             each Table.FromRecords(_)),
lDemotedHeaders        = List.Transform(lMergeRecordsIntoTable,        each Table.DemoteHeaders(_)),
lTranspose             = List.Transform(lDemotedHeaders,               each Table.Transpose(_)),
lCompare = List.Transform(lTranspose,                    each Table.AddColumn(_, "Compare", each if [Column2] = [Column3] then Text.From([Column2]) else ChangeMarker & Text.From([Column2]))),
lRemoveColumns = List.Transform(lCompare,each Table.RemoveColumns(_,{"Column2", "Column3"})),
lUnTranspose = List.Transform(lRemoveColumns,each Table.Transpose(_)),
lPromoteHeaders = List.Transform(lUnTranspose,each Table.PromoteHeaders(_)),
RecombineRows = Table.Combine(lPromoteHeaders),

Output = RecombineRows

in
Output

Community Center Not monitored
0 comments No comments
{count} votes

Accepted answer
  1. Ehren (MSFT) 1,781 Reputation points Microsoft Employee
    2021-04-09T22:17:58.51+00:00

    So I have to admit I skipped reading your code. But after reading your problem description, I came up with the following. Let me know if this helps:

    // DeletedRows
    let
        Source = Table.NestedJoin(tblOriginal, {"ID"}, tblHasChanges, {"ID"}, "tblHasChanges", JoinKind.LeftAnti),
        #"Removed Columns" = Table.RemoveColumns(Source,{"tblHasChanges"})
    in
        #"Removed Columns"
    
    // AddedRows
    let
        Source = Table.NestedJoin(tblHasChanges, {"ID"}, tblOriginal, {"ID"}, "tblHasChanges", JoinKind.LeftAnti),
        #"Removed Columns" = Table.RemoveColumns(Source,{"tblHasChanges"})
    in
        #"Removed Columns"
    
    // ChangedRows
    let
        Source = Table.NestedJoin(tblOriginal, {"ID"}, tblHasChanges, {"ID"}, "tblHasChanges", JoinKind.Inner),
        #"Added Custom" = Table.AddColumn(Source, "Custom", each List.Select(Table.ColumnNames(Source), (c) => Record.HasFields([tblHasChanges]{0}, {c}) and Record.Field(_, c) <> Record.Field([tblHasChanges]{0}, c))),
        #"Renamed Columns" = Table.RenameColumns(#"Added Custom",{{"Custom", "ChangedColumns"}}),
        #"Extracted Values" = Table.TransformColumns(#"Renamed Columns", {"ChangedColumns", each Text.Combine(List.Transform(_, Text.From), ","), type text})
    in
        #"Extracted Values"
    

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.