Comparing table columns in Excel is a common task. You may need to identify items that are the same, different, or missing from these columns.
In Power Query, table columns are lists and you can compare these lists using table merges. But merging can only be done on two tables at a time. If you need to compare three or more table columns then using List Functions is the way to do it.
It does require some manual M coding and if the thought of that puts you off, it's not hard. And if you never get your hands dirty with some coding, you're going to miss out on the real power of Power Query.
The three examples I'll show you contain only two lines of code each and do what can take multiple lines using table merges or appends.
Download Sample Excel Workbook
Enter your email address below to download the sample workbook.
Excel Workbook. Note: This is a .xlsx file please ensure your browser doesn't change the file extension on download.
Source Data Tables
I've got four tables. The first contains the names of some imaginary staff members and an ID code assigned to them.
The other three tables show the names of staff that attended three different days of training that were arranged for them.
My tasks are to work out what staff
- Attended every day of training
- Attended at least one day
- Attended no days
I've already loaded three of these tables into PQ so let's load the last one.
Click into the table, Data -> From Table/Range.
Then under Close and Load click on the black arrow, Close & Load to and choose Only Create Connection.
Who Attended Every Day of Training
I have to find out who appears in every table for Training Days 1, 2 and 3.
If you were using table merges then you'd do an Inner Join on the Training_1 table and the Training_2 table to return only matching rows.
You then have to do another Inner Join on the result of the first join and the Training_3 table.
If you are eagle eyed you may have noticed that in the Training_2 table the name agueda jonson is lower case. So both of these joins have to be Fuzzy Joins and set to ignore case so that you can do case insensitive comparisons.
You can do all of these steps in 1 line using the List.Intersect function.
First I need to create a new blank query. Right click on the Queries area of the editor and then New Query -> Other Sources -> Blank Query
Name the query Attended All Days.
If you remember your set theory, Intersect returns whatever is common between sets, or in this case, our Name columns.
Remember that table columns are lists which is why you can use list functions to do this.
Click the Attended All Days query and under Applied Steps click on Source.
Now in the formula bar type this, and then press Enter.
= List.Intersect( { Training_1[Name], Training_2[Name], Training_3[Name] } , Comparer.OrdinalIgnoreCase )
The result is this list of names
What List.Intersect is Doing
I'm passing in two arguments to the function. The first are the columns I want to compare and these are enclosed in { }
The columns are of course the Name columns from the three tables of attendees at each days' training.
The second argument Comparer.OrdinalIgnoreCase tells the function to ignore case in text comparisons. So agueda jonson will look the same as Agueda Jonson.
Filtering the Staff Table
With the list of names of those who attended all days of training, I can use that to filter the Staff table and return all rows associated with those names.
Right-click on the Source step and select Insert Step After. With the new step selected press F2 and rename it to Tab, which is my abbreviation for Table. You can call it what you like really.
With the new step still selected, type this into the formula bar
= Table.SelectRows( Staff , each List.ContainsAny( {[Name]} , Source ) )
So what's happening here?
This line of code is doing the following
- Select rows from the table
- called Staff
- where each row
- in the Name column
- appears in the list Source
If you filter rows by using the menus, Power Query uses the same function. I'm just manually calling it here to do what I want.
The result is the Staff table filtered to just the rows showing those people who attended all days of training.
Who Attended At Least 1 Day of Training
This involves checking every day and seeing who was there at least once.
You could do this by combining the three tables (appending the queries) and then removing any duplicates.
Or you could just use the List.Union function
= List.Union( { Training_1[Name], Training_2[Name], Training_3[Name] } , Comparer.OrdinalIgnoreCase )
Using Set Theory again, a Union of the Name columns from the Training_1, Training_2 and Training_3 tables, using case insensitive text comparison because of Comparer.OrdinalIgnoreCase, gives a list of those who attended at least one day.
If you want to, you can then filter the Staff table exactly the same way as before using the Table.SelectRows function.
Who Attended No Training Days
To work this out you need to start with the full list of staff and then remove the names of those who attended on each of the 3 days.
Using table merges you can do this with Left Anti Joins, but you need to do 3 separate joins to get the final list.
Or you can do it with in one line with List.Difference
= List.Difference( Staff[Name], List.Union( { Training_1[Name], Training_2[Name], Training_3[Name] } , Comparer.OrdinalIgnoreCase ) )
Filtering the Staff table using these names gives
Tip - Quick Sanity Check
The number of people who attended at least 1 day (16) added to the number of people who attended no days (3) should equal the total number of staff (19).
Conclusion
I've shown you how to use List functions to compare lists and filter tables. In the sample file I've created I've also included queries that do the same job but use table merges and appends. You can have a look for yourself at the different methods.
If you're not sure about learning M functions like List.Intersect I'd encourage you to read through the Microsoft function definitions to become familiar with them and try them out.
Although I do have a programming background, I didn't wake up one day and know these M functions. I had to put time in to learn what functions existed and how they worked.
When you first started using Excel did you know how to use VLOOKUP or SUMIF? No, you had to learn them. So if you're at all hesitant about learning M functions, just dive in and get started.
Rob
In the output table “attended all courses” the Map Code can remove additional members eg if you make Agueda Jonson lower case in Staffing, it is visible in the Source but then removed in the Map Output. Not figured out how or why yet to resolve/understand, assume it needs “Comparer.OrdinalIgnoreCase” adding into Map?.
Thanks for this really informative and precise.
Philip Treacy
Hi Rob,
Yes I mention using Comparer.OrdinalIgnoreCase in this post.
Regards
Phil
Rob
Apologies Phil, I meant “Tab”, not map in the Attended All Events Query.
The Tab code shouldn’t but excludes Agueda Jonson when I make it lower case in the Staff table? “= Table.SelectRows(Staff, each List.ContainsAny ({[Name]}, Source ))”. I don’t understand why and haven’t been able to resolve?
Philip Treacy
Hi Rob,
The Source step creates this list by intersecting the 3 Training tables
= List.Intersect( { Training_1[Name], Training_2[Name], Training_3[Name] } , Comparer.OrdinalIgnoreCase )
This is a case insensitive intersection.
The Tab step then use Table.SelectRows to get the rows in the Staff table where the Name is in this list (created by List.Intersect above)
agueda jonson isn’t in the Staff table (it’s a case sensitive select) so that row isn’t selected.
You can select agueda jonson by using Text.Proper around [Name] like this
= Table.SelectRows( Staff , each List.ContainsAny( {Text.Proper([Name])} , Source ) )
but you will end up with lower case agueda jonson in your table. That can be fixed by right clicking on the column then Transform -> Capitalize Each Word.
Regards
Phil
Conrad
This is EXCELLENT!
Is there a way to compare two tables and find all differences between each in one table?
In my example I have a listing of employees for June 2021 and July 2021. Is there a way to extract all differences between the two months? So if an employee changed departments from June to July as well as finding employees who were hired in July?
Mynda Treacy
I expect you’d have to do this as two separate queries. One to compare changes in existing employee records and one to find new employees.
Matthias
Hi Phil,
This was really good. Thanks!
Could you do more of this?
I use quite some patterns with List.xyz. There are so many List functions and they are so powerful. You have selected really nice examples and it would be good to see more from you about other List functions you regard as especially useful.
Your third case is great for the introduction of the important List.Difference. I love M because instead you could also copy the second case and just filter by each not.
Thanks again,
Matthias
Philip Treacy
Thanks Matthias,
Yes I plan on doing more posts like this in the near future.
Regards
Phil
chahine atallah
amazing, i didnt know about these functions, i used to use merge