Let's say we have a column of text and we want to search it to see if each row contains a list of words.
The source data that I'll be searching through is a table in Excel named TextTable, and the list of words I'm looking for are stored in another table named WordList..
I'm loading this into Power Query with a query also called WordList, and then transforming the table into a list. By using the name WordList in other queries I can refer to the list of words that query produces.
The query to create a list of words
The list created by the query
The first thing I'm going to do is look for substrings. An example : searching for apple in the phrase four green apples will return True because apple is part of the word apples.
This query will not be looking for exact word matches, but another query later on will be doing this.
The query to look for substrings is:
After loading the source data from TextTable, the query is adding a custom column named Check.
The 3rd parameter for Table.AddColumn is a function, and I'll need to use this capability.
To search for substrings, I'm using Text.Contains but this is only able to compare one string with another. It doesn't work its way through a list checking each word in that list against the text in the current row of the TextCol column.
I need to write a function that will do this and this section of code is that function
You can write your own custom functions in Power Query but you can also write them inline in your M code. This is known as a lambda function, or an anonymous function. You don't have to name it or declare it somewhere else.
The word FindStrings is actually the parameter name for the value passed into the function. I called it FindStrings to make it clear what the function was doing, but the function would work just as well if you called the parameter x.
To explain what this function does let's work from the inside of the function out.
You have to look at the List.Transform function at the same time as the Text.Contains function to see what's happening, but first let's understand what each functions does.
List.Transform takes a list, in our case the output from the WordList query, and changes, or transforms, that list according to the result of the 2nd parameter. In this case the 2nd parameter is each Text.Contains((FindStrings[TextCol]), _ )
So for each item in WordList, call Text.Contains to check if that item is in the current row of the TextCol column.
The _ is short hand for the current item and in Text.Contains((FindStrings[TextCol]), _ )) refers to the current item in WordList.
As the function works through the WordList the result from Text.Contains is used by List.Transform to modify WordList into a new list.
Text.Contains returns TRUE or FALSE indicating if the substring was found or not. To help visualize the first test look at this
Text.Contains("One yellow mango", "apple" ))
Which yields FALSE and List.Transform changes WordList to
NOTE: WordList itself is not changed. List.Transform works with a copy of the WordList and returns a new list as its result.
The second test is
Text.Contains("One yellow mango", "yellow" ))
Which yields TRUE and List.Transform changes WordList to
The 3rd and 4th tests both give FALSE because neither PEAR nor RED are in the string one yellow mango.
WordList ends up as a list of TRUE or FALSE.
which is evaluated by List.AnyTrue as TRUE
This process is carried out for each row in the TextCol column with the end result being a column of TRUE or FALSE added to the original table.
In row 1 the result is TRUE because the word yellow is in row 1 of TextCol. Row 5 is also TRUE because apple is in the string in row 5 of TextCol.
The results for rows 2, 3 and 4 are FALSE. In rows 3 and 4 we get FALSE because the search is case sensitive. The word PEAR in WordList is not the same as pear.
Finds Substrings - Ignoring Case
To do a case insensitive search for substrings we can use the optional 3rd parameter for Text.Contains.
This 3rd parameter is a comparer which tells Text.Contains how do do the comparison.
By specifying Comparer.OrdinalIgnoreCase as the comparer, the text comparisons become case insensitive.
Rows 3 and 4 now give a True result too.
Exact Match String Searches
Let's say that we want to search only for exact words, so searching for apple in apples would return false.
To do this we need to use a different function, List.ContainsAny. This takes two lists as inputs and checks if any of the values in one list are in the other list.
Here, the list to search is the text in each row of TextCol, and the words we're looking for are in WordList.
The query looks like this
For each row in TextCol, List.ContainsAny compares the string of text in TextCol with WordList. Because List.ContainsAny takes lists as inputs, we have to use Text.Split to split the string in the TextCol column into a list of separate words.
The results show that only the first row gives a TRUE result because yellow is the only exact match. This search is case sensitive.
Exact Match String Searches - Ignoring Case
The exact match search can also be made case insensitive by utilising the 3rd optional parameter for List.ContainsAny.
In the official documentation this 3rd parameter is cryptically decribed as equationCriteria. In practice you can actually write your own function for this parameter and specify how you want the comparison performed.
In the following image I've laid out the lines of the query so it is easier to see what is going on.
We already know what the first two parameters are so I'll explain what the 3rd one is doing. It's another lambda function taking two parameters x and y. In this situation x is Text.Split( [TextCol] , " ") and y is WordList.
The function uses the Comparer.Equals function to tell List.ContainsAny that when it does the comparison of the values x and y, ignore case.
The results look like this because the words yellow, pear and red are exact matches when case is ignored.
I hope you find these queries useful.