Fuzzy Matching in Power Query

Philip Treacy

June 10, 2021

Fuzzy matching is the ability to match non-identical text based on how similar one string is compared to the other.

You may have text that has been entered as the answer to a survey question, in which case you can't control what the respondent types.

This can result in misspelling of words, for example, take this table of companies (named Table1) and their products

sample table of data

It's clear to you and I that Msoft, MICROSOFT. and Micro Soft mean the same company based in Redmond, Washington.

But if we want to clean up that column and replace all the different versions of Microsoft with a single version Microsoft you can't do it with a regular match (query merge) in Power Query. Fuzzy matching to the rescue.

Fuzzy Matching Availability

Fuzzy matching is only available in Excel in Microsoft 365, and in Power BI.

Watch the Video

Subscribe YouTube

 

Download Sample Excel Workbook

Enter your email address below to download the sample workbook.

By submitting your email address you agree that we can email you our Excel newsletter.

 

If I have a second table (named Table2) consisting of the companies names as I want them displayed

table of company names

I can load both tables into Power Query as connection only queries.

From within the Power Query Advanced Editor, select either query and then from the Ribbon choose Merge Queries -> Merge Queries as New

merge queries as new

In the Merge dialog window select Table1 as the first table and Table2 as the second table. Leave Join Kind as Left Outer.

Click on the Company column in both tables to make the join on those columns.

merge query dialogYou can see with these settings that only 3 out of 14 rows match, because as it is, PQ is doing an exact match.

Check Use fuzzy matching to perform the merge and you can now see that 10 rows match. The fuzzy matching algorithm has been able to identify a further 7 matches.

check fuzzy matching box

Click on Fuzzy matching options to see what options are available to fine tune the matching. You'll need to use the scroll bar to see all the options. Why can't they make the window bigger?

fuzzy matching options

The first thing you'll see is the Similarity threshold which has a default value of 0.8.

A value of 1.0 means only look for exact matches. As you lower the value, the fuzzy matching algorithm gets less strict with how it matches text strings.

If I make the value 0.7, you can see that there are now 11 matches.

reduce fuzzy matching threshold

Reducing the value further, in this case, doesn't make any more matches. But depending on your own situation, you may find that reducing this threshold value gives you more matches.

Just be careful that you don't reduce it too low or you'll end up with incorrect matches because the algorithm has become very 'relaxed' with matching up strings.

For now, let's click OK to close the Merge dialog and see what our results are.

The merge gives me another column with a table in it

merged queries table

Clicking the icon in the new column header to expand these tables gives this

expand table column

Rows 1, 9 and 12 don't have a match. Lets open the Source step to adjust the Similarity Threshold and see if that helps.

open source step

Gradually lowering the threshold by 0.1 each time, even going all the way down to 0.1 doesn't give me any more matches. Looks like I'm going to have to use a Transformation Table.

lower fuzzy similarity threshold

A Transformation Table is a table with 2 columns (From and To) that explicitly states what the string in the From column will be changed To.

To create a Transformation Table, I go back into Excel and create this table that I named Transform.

transformation table

It lists the three companies in Table1 that don't have a match, and specifies in the To column what to use when the query merge is performed.

Load this table into Power Query (connection only).

In the Merge1 query, open the Source step again. In the Fuzzy Matching Options, scroll to the bottom until you see the Transformation table (optional) section.

Click on the drop down and select the Transform table.

select transformation table

The merge now tells me that it can match all 14 rows

all matches complete

With all rows matched in my merge I can get rid of the first column of incomplete or messed up company names, and end up with this nice table.

final fuzzy merged table

AUTHOR Philip Treacy Co-Founder / Owner at My Online Training Hub

Systems Engineer with 30+ years working for companies like Credit Suisse and E.D.S. in roles as varied as Network & Server Support, Team Leader and Consultant Project Manager.

These days Philip does a lot of programming in VBA for Excel, as well as PHP, JavaScript and HTML/CSS for web development.

He's particularly keen on Power Query where he writes a lot of M code.

When not writing blog posts or programming for My Online Training Hub, Philip can be found answering questions on the Microsoft Power BI Community forums where he is a Super User.

Leave a Comment

Current ye@r *