RichJanoso-8566 avatar image
0 Votes"
RichJanoso-8566 asked RichJanoso-8566 commented

Optimize Field-Level Table Comparison in Power Query M


I'm a relatively new Power Query M user, so I suspect I'm attacking this problem in a very naïve way. I'm trying to compare two relatively small tables (<300K,<2000 rows, <=25 columns). These are actually two snapshots of the same data set a week apart, and I'm trying to highlight any fields/cells that have changed from one snapshot to the next.

I have some code that runs relatively well on smaller data sets (<25 rows), but runtime seems to grow exponentially as the rows increase, to the point where it's taking over an hour on larger sets. (And I haven't even let the full data set complete.) ...

The basic gist of the code is that the two tables are lined up using "ID" as the key, then each matching row is pulled into a list (with 2 records in each list element ... so a list of lists.) Then these list elements are transposed with List.Transform, and a custom column is added that compares each field and adds a tag character for changed fields. Then the list is un-transposed to result in a table with changed record fields tagged with the specified character.

Watching the mashup containers run, it looks like all of them are staying well below 256MB, and I've tried Table.Buffer and List.Buffer at various points in the code with little change in performance.

Can anyone suggest a different approach to this, or recommend some code profiling tips that might help me improve query performance here?

Here's the M code: (qry_PS_Last and qry_PS_Current are static Excel tables in the same workbook as the M query. No disk or network access required, I assume.) The code really starts bogging down when I get into the List Transforms...

tblOriginal = qry_PS_Last,
tblHasChanges = qry_PS_Current,

 ColumnNames = Table.ColumnNames(qry_PS_Last),
 NullRecord = List.Transform(List.Positions(ColumnNames), each null),
 ChangeMarker = "^",
 KeyColumn = "ID",

 #"Removed Duplicates" = Table.Distinct(tblOriginal, {KeyColumn}),
 #"Merged Queries" = Table.NestedJoin(#"Removed Duplicates",{KeyColumn},tblHasChanges,{KeyColumn},"Join",JoinKind.LeftOuter),
 AddRowCount = Table.AddColumn(#"Merged Queries", "ValidateRowCount", each Table.RowCount([Join])),
 FixEmptyTables = Table.AddColumn(AddRowCount, "Validate", each if [ValidateRowCount] = 0 then #table(ColumnNames,{NullRecord}) else [Join]),
 normOriginal = Table.Sort(Table.SelectColumns(FixEmptyTables,ColumnNames),{
                 {KeyColumn, Order.Ascending}}),
 normChanges = Table.Sort(Table.ExpandTableColumn(Table.SelectColumns(FixEmptyTables,{"Validate"}), "Validate", ColumnNames, ColumnNames),{
                 {KeyColumn, Order.Ascending}}),

 lChanges = Table.ToRecords(Table.AddColumn(normChanges,"Source", each "lChanges")),
 lOriginal = Table.ToRecords(Table.AddColumn(normOriginal,"Source", each "lOriginal")),

 lRecordsToCompare      = List.Buffer(List.Transform(List.Positions(lOriginal),   each {lChanges{_},lOriginal{_}} )),
 lMergeRecordsIntoTable = List.Transform(lRecordsToCompare,             each Table.FromRecords(_)),
 lDemotedHeaders        = List.Transform(lMergeRecordsIntoTable,        each Table.DemoteHeaders(_)),
 lTranspose             = List.Transform(lDemotedHeaders,               each Table.Transpose(_)),
 lCompare = List.Transform(lTranspose,                    each Table.AddColumn(_, "Compare", each if [Column2] = [Column3] then Text.From([Column2]) else ChangeMarker & Text.From([Column2]))),
 lRemoveColumns = List.Transform(lCompare,each Table.RemoveColumns(_,{"Column2", "Column3"})),
 lUnTranspose = List.Transform(lRemoveColumns,each Table.Transpose(_)),
 lPromoteHeaders = List.Transform(lUnTranspose,each Table.PromoteHeaders(_)),
 RecombineRows = Table.Combine(lPromoteHeaders),

 Output = RecombineRows


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

Ehren avatar image
0 Votes"
Ehren answered RichJanoso-8566 commented

So I have to admit I skipped reading your code. But after reading your problem description, I came up with the following. Let me know if this helps:

 // DeletedRows
     Source = Table.NestedJoin(tblOriginal, {"ID"}, tblHasChanges, {"ID"}, "tblHasChanges", JoinKind.LeftAnti),
     #"Removed Columns" = Table.RemoveColumns(Source,{"tblHasChanges"})
     #"Removed Columns"
 // AddedRows
     Source = Table.NestedJoin(tblHasChanges, {"ID"}, tblOriginal, {"ID"}, "tblHasChanges", JoinKind.LeftAnti),
     #"Removed Columns" = Table.RemoveColumns(Source,{"tblHasChanges"})
     #"Removed Columns"
 // ChangedRows
     Source = Table.NestedJoin(tblOriginal, {"ID"}, tblHasChanges, {"ID"}, "tblHasChanges", JoinKind.Inner),
     #"Added Custom" = Table.AddColumn(Source, "Custom", each List.Select(Table.ColumnNames(Source), (c) => Record.HasFields([tblHasChanges]{0}, {c}) and Record.Field(_, c) <> Record.Field([tblHasChanges]{0}, c))),
     #"Renamed Columns" = Table.RenameColumns(#"Added Custom",{
                 {"Custom", "ChangedColumns"}}),
     #"Extracted Values" = Table.TransformColumns(#"Renamed Columns", {"ChangedColumns", each Text.Combine(List.Transform(_, Text.From), ","), type text})
     #"Extracted Values"
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Nice! And much faster than mine. I will have to make the conditional highlighting formula in Excel more complex to flag the changed cells ... but using the INDIRECT trick and some structured references should be workable. Perhaps something along the lines of: =IFERROR(FIND(INDEX(INDIRECT("ExcelTableName[#Headers]"),COLUMN()),INDIRECT("ExcelTableName[@ChangedColumns]")),-1)>0 would work...

0 Votes 0 ·