Hello, I am having trouble using vlookup to merge textual data on 2 different sheets where the names are slightly different. For example, it has trouble finding David v Dave, Chris v Christopher, or Adam John Smith vs Adam Smith. The primary list I want to check against tends to have the longer versions of names.
The overall project is to take two different reports from two different enterprise-level systems, one for funding and one for training. I want to see which funded personnel have done their mandatory training. Currently, I am using Power Query to pull the reports, clean the data, and insert them into one worksheet. I then use vlookup on a third tab to merge the lists and show yes/no whether they have taken the training. Since the data comes from two different systems, there is no employee ID or other unifying key aside from their names.
My current solution is to use Power Query to cut off names with 3+ parts from the funding list with longer names via the Extract Text Before Delimiter tool. This has helped remove a lot of false negatives, but obviously risks creating false positives. I then use the following formula to identify whether or not they have been trained:
=IFERROR(IF(VLOOKUP("*"&A2, 'CITI RCR training list'!$A$2:$B$5029,1,FALSE) = A2, "Yes"), "No")
I have tried using "*"&A2, "*"&A2&"*", and A2&"*". None of these options capture 100% of mismatched names. I have given up entirely on vlookup TRUE approximate match because it doesn't seem to work well with strings.
I am using Office Professional Plus 2019, Version 1808. Build 10402.20023 Click to Run. This version does not seem to have the Fuzzy Match merge option for Power Query, so that is off the table. I also cannot using the Fuzzy Lookup Add-in because this project needs to run on multiple computers via a network drive, and I cannot install the add-in everywhere.
The goal is to be able to hand this off to someone with little Excel knowledge and have them do little to no manual work to get the final report. My current set up allows them to simply paste in new report data into the source files and then hit refresh on the main sheet. I need to keep it at that level of simplicity once it is running. To be clear, the system works as is, but I need to improve the name matching and capture as many mismatched names as possible.
Is there a more clever way to improve name matching in Excel beyond adding the vlookup wildcard characters? Maybe additional data cleaning steps I didn't think of?
Thank you!
Here is an example of the final product so far for reference:
