Hi!
I'm a relatively new Power Query M user, so I suspect I'm attacking this problem in a very naïve way. I'm trying to compare two relatively small tables (<300K,<2000 rows, <=25 columns). These are actually two snapshots of the same data set a week apart, and I'm trying to highlight any fields/cells that have changed from one snapshot to the next.
I have some code that runs relatively well on smaller data sets (<25 rows), but runtime seems to grow exponentially as the rows increase, to the point where it's taking over an hour on larger sets. (And I haven't even let the full data set complete.) ...
The basic gist of the code is that the two tables are lined up using "ID" as the key, then each matching row is pulled into a list (with 2 records in each list element ... so a list of lists.) Then these list elements are transposed with List.Transform, and a custom column is added that compares each field and adds a tag character for changed fields. Then the list is un-transposed to result in a table with changed record fields tagged with the specified character.
Watching the mashup containers run, it looks like all of them are staying well below 256MB, and I've tried Table.Buffer and List.Buffer at various points in the code with little change in performance.
Can anyone suggest a different approach to this, or recommend some code profiling tips that might help me improve query performance here?
Here's the M code: (qry_PS_Last and qry_PS_Current are static Excel tables in the same workbook as the M query. No disk or network access required, I assume.) The code really starts bogging down when I get into the List Transforms...
let
tblOriginal = qry_PS_Last,
tblHasChanges = qry_PS_Current,
ColumnNames = Table.ColumnNames(qry_PS_Last),
NullRecord = List.Transform(List.Positions(ColumnNames), each null),
ChangeMarker = "^",
KeyColumn = "ID",
#"Removed Duplicates" = Table.Distinct(tblOriginal, {KeyColumn}),
#"Merged Queries" = Table.NestedJoin(#"Removed Duplicates",{KeyColumn},tblHasChanges,{KeyColumn},"Join",JoinKind.LeftOuter),
AddRowCount = Table.AddColumn(#"Merged Queries", "ValidateRowCount", each Table.RowCount([Join])),
FixEmptyTables = Table.AddColumn(AddRowCount, "Validate", each if [ValidateRowCount] = 0 then #table(ColumnNames,{NullRecord}) else [Join]),
normOriginal = Table.Sort(Table.SelectColumns(FixEmptyTables,ColumnNames),{{KeyColumn, Order.Ascending}}),
normChanges = Table.Sort(Table.ExpandTableColumn(Table.SelectColumns(FixEmptyTables,{"Validate"}), "Validate", ColumnNames, ColumnNames),{{KeyColumn, Order.Ascending}}),
lChanges = Table.ToRecords(Table.AddColumn(normChanges,"Source", each "lChanges")),
lOriginal = Table.ToRecords(Table.AddColumn(normOriginal,"Source", each "lOriginal")),
lRecordsToCompare = List.Buffer(List.Transform(List.Positions(lOriginal), each {lChanges{_},lOriginal{_}} )),
lMergeRecordsIntoTable = List.Transform(lRecordsToCompare, each Table.FromRecords(_)),
lDemotedHeaders = List.Transform(lMergeRecordsIntoTable, each Table.DemoteHeaders(_)),
lTranspose = List.Transform(lDemotedHeaders, each Table.Transpose(_)),
lCompare = List.Transform(lTranspose, each Table.AddColumn(_, "Compare", each if [Column2] = [Column3] then Text.From([Column2]) else ChangeMarker & Text.From([Column2]))),
lRemoveColumns = List.Transform(lCompare,each Table.RemoveColumns(_,{"Column2", "Column3"})),
lUnTranspose = List.Transform(lRemoveColumns,each Table.Transpose(_)),
lPromoteHeaders = List.Transform(lUnTranspose,each Table.PromoteHeaders(_)),
RecombineRows = Table.Combine(lPromoteHeaders),
Output = RecombineRows
in
Output