Let's say we have a column of text and we want to search it to see if each row contains a list of words.
The source data that I'll be searching through is a table in Excel named TextTable, and the list of words I'm looking for are stored in another table named WordList.
.If you want to list the words your searches find, then read this post : Create a List of Matching Words When Searching Text in Power Query.
I'm loading this into Power Query with a query also called WordList, and then transforming the table into a list. By using the name WordList in other queries I can refer to the list of words that query produces.
The query to create a list of words
The list created by the query
Download the Excel Workbook With All the Sample Queries in This Post
The queries in this Excel file can be copied/pasted into the Power BI Desktop Advanced Editor and will work there too.
Enter your email address below to download the workbook with the data and code from this post.
Finding Substrings
The first thing I'm going to do is look for substrings. An example : searching for apple in the phrase four green apples will return True because apple is part of the word apples.
This query will not be looking for exact word matches, but another query later on will be doing this.
The query to look for substrings is:
After loading the source data from TextTable, the query is adding a custom column named Check.
The 3rd parameter for Table.AddColumn is a function, and I'll need to use this capability.
To search for substrings, I'm using Text.Contains but this is only able to compare one string with another. It doesn't work its way through a list checking each word in that list against the text in the current row of the TextCol column.
I need to write a function that will do this and this section of code is that function
You can write your own custom functions in Power Query but you can also write them inline in your M code. This is known as a lambda function, or an anonymous function. You don't have to name it or declare it somewhere else.
The word FindStrings is actually the parameter name for the value passed into the function. I called it FindStrings to make it clear what the function was doing, but the function would work just as well if you called the parameter x.
To explain what this function does let's work from the inside of the function out.
You have to look at the List.Transform function at the same time as the Text.Contains function to see what's happening, but first let's understand what each functions does.
List.Transform takes a list, in our case the output from the WordList query, and changes, or transforms, that list according to the result of the 2nd parameter. In this case the 2nd parameter is each Text.Contains((FindStrings[TextCol]), _ )
So for each item in WordList, call Text.Contains to check if that item is in the current row of the TextCol column.
The _ is short hand for the current item and in Text.Contains((FindStrings[TextCol]), _ )) refers to the current item in WordList.
As the function works through the WordList the result from Text.Contains is used by List.Transform to modify WordList into a new list.
Text.Contains returns TRUE or FALSE indicating if the substring was found or not. To help visualize the first test look at this
Text.Contains("One yellow mango", "apple" ))
Which yields FALSE and List.Transform changes WordList to
NOTE: WordList itself is not changed. List.Transform works with a copy of the WordList and returns a new list as its result.
The second test is
Text.Contains("One yellow mango", "yellow" ))
Which yields TRUE and List.Transform changes WordList to
The 3rd and 4th tests both give FALSE because neither PEAR nor RED are in the string one yellow mango.
WordList ends up as a list of TRUE or FALSE.
which is evaluated by List.AnyTrue as TRUE
This process is carried out for each row in the TextCol column with the end result being a column of TRUE or FALSE added to the original table.
In row 1 the result is TRUE because the word yellow is in row 1 of TextCol. Row 5 is also TRUE because apple is in the string in row 5 of TextCol.
The results for rows 2, 3 and 4 are FALSE. In rows 3 and 4 we get FALSE because the search is case sensitive. The word PEAR in WordList is not the same as pear.
Finds Substrings - Ignoring Case
To do a case insensitive search for substrings we can use the optional 3rd parameter for Text.Contains.
This 3rd parameter is a comparer which tells Text.Contains how do do the comparison.
By specifying Comparer.OrdinalIgnoreCase as the comparer, the text comparisons become case insensitive.
Rows 3 and 4 now give a True result too.
Exact Match String Searches
Let's say that we want to search only for exact words, so searching for apple in apples would return false.
To do this we need to use a different function, List.ContainsAny. This takes two lists as inputs and checks if any of the values in one list are in the other list.
Here, the list to search is the text in each row of TextCol, and the words we're looking for are in WordList.
The query looks like this
For each row in TextCol, List.ContainsAny compares the string of text in TextCol with WordList. Because List.ContainsAny takes lists as inputs, we have to use Text.Split to split the string in the TextCol column into a list of separate words.
The results show that only the first row gives a TRUE result because yellow is the only exact match. This search is case sensitive.
Exact Match String Searches - Ignoring Case
The exact match search can also be made case insensitive by utilising the 3rd optional parameter for List.ContainsAny.
In the official documentation this 3rd parameter is cryptically decribed as equationCriteria. In practice you can actually write your own function for this parameter and specify how you want the comparison performed.
In the following image I've laid out the lines of the query so it is easier to see what is going on.
We already know what the first two parameters are so I'll explain what the 3rd one is doing. It's another lambda function taking two parameters x and y. In this situation x is Text.Split( [TextCol] , " ") and y is WordList.
The function uses the Comparer.Equals function to tell List.ContainsAny that when it does the comparison of the values x and y, ignore case.
The results look like this because the words yellow, pear and red are exact matches when case is ignored.
I hope you find these queries useful.
Hector mdes. de los santos feliz
Hello friends.I saw that power query, is a poweful tool to work in excel, however if i have a reading file without any table to transform, how can power query transform it? lets suppose i have a little paragraph with somes information and somes numbers., in that case, how can i transform it? can i create a table with the paragraph with power query in excel 2013 or other? my name is Hector mdes. de los santos feliz from Dominican Republic. thank you and God bless you
Philip Treacy
Hi Hector,
Yes you can create a table from a paragraph of text/numbers. But to properly answer you I’d need to see the paragraph(s) and know exactly what transformations you want to do.
You can start a topic on our forum and attach your file(s) there with an explanation of what you are trying to achieve.
Regards
Phil
Alan Sidman
Thanks for the prompt reply. Works great. I was using it to find the state in a full address. I’m wondering if there is a means to remove it from the address field when it is placed into the new state field.
Alan
Philip Treacy
Hi Alan,
You can use Text.Replace for this.
https://docs.microsoft.com/en-us/powerquery-m/text-replace
If you have 2 columns, Address and State, where Address contains the State you just extracted into its own column, create a new step:
Regards
Phil
Alan Sidman
Custom1 = Text.Replace([Address],[States],””) generates an error message
Expression.Error: There is an unknown identifier. Did you use the [field] shorthand for a _[field] outside of an ‘each’ expression?
I looked at the link you provided MS and this puzzles me as the line of Mcode seems to follow exactly except that MS used actual strings instead of fields.
Your thoughts. I think I’ll post this to your help site with the file for analysis.
Philip Treacy
Hi Alan,
That piece of code works fine for me but it is dependent on how the data is structured. So yes, please post your file on our forum so I can have a look.
Regards
Phil
Alan Sidman
Posted and Val suggested a workable solution. Really surprised that your suggestion did not work for me as it was exactly the same as MS had suggested in the link you provided.
Alan
Philip Treacy
Hi Alan,
In my comment I did say add a new step so I’m, not sure how you did this, by using the Adv Editor or the GUI interface. Looks like the code you entered is missing some crucial bits but it seems that the guys on the forum have sorted it out for you.
Regards
Phil
Alan Sidman
Philip, how would you change the Mcode to actually show the word found instead of showing True or False?
Alan
Philip Treacy
Hi Alan,
I wrote about that here
https://www.myonlinetraininghub.com/create-a-list-of-matching-words-when-searching-text-in-power-query
Regards
Phil
Bettina Marchl
Hi Philip,
this is a great post and I am trying to adapt it for my task at hand (but I am still fairly new to power query). I have come across 2 challenges and I would like to get your feedback on these
1. I am trying to use it on a dataset of 550k rows – but so far I cannot get the data to load when I close power query. Would this size be too much for this solution?
2. Is there a way to run the query not only on one column but on multiple columns at the same time (if any textstring defined in Wordlist appears in Column A, B, C or D, then return true) ? My solution so far is to do it column by column and then add a conditional column to get a final result. But I am sure there is a more elegant way…..
Thanks
Bettina
Philip Treacy
Thanks Bettina.
550K rows should be fine, you may need to use buffer functions to speed things up. Hard to say without seeing your data and code. Can you post your data on our forum?
You can run a query against multiple columns, for example combining 2 calls to 2 different columns like this
Regards
Phil
David Nightingale
Hi – got a problem… instead of getting a column with true or false, I just get one that has a “Function” in it.
My Code:
let
Source = tbl_MaterialImports_Refined1,
#”Renamed Columns” = Table.RenameColumns(Source,{{“Description”, “TextCol”}}),
#”Added Custom” = Table.AddColumn(#”Renamed Columns”, “Check”, each (FindStrings) =>
List.AnyTrue(List.Transform(WordList, each Text.Contains((FindStrings[TextCol]),_))))
in
#”Added Custom”
Catalin Bombea
Hi David,
You are using: each (FindStrings) =>
Get rid of the “each” to make it work.
Tharindu Palipane
Hi Philip,
Thank you for the awesome explanation. needing some direction. So I’m trying to extract rather than the “TRUE” value, the words that actually made the value true in the concatinated manner. For e.g. “One yellow mango” will have “yellow” in the column in front. If the word was “One yellow apple” it would be “yellow;apple”.
Any idea how this can be done. appreciate a little direction to this.
Thanks
Tharindu
Philip Treacy
Hi Tharindu,
I wrote this post to do just that
Create list of matching words from text in Power Query
Regards
Phil
Cedric McKeever
This is probably a stupid question.
I do not understand what “FindStrings” does. You define it as “the parameter name for the value passed into the function. What is the value that is passed into the function by “FindStrings?” What does it do?
Also, just above “Finds Substrings – Ignoring Case,’ you say “In row 1 the result is TRUE because the word yellow is in row 1 of TexCol. Row 4 is also TRUE …” Should that be “Row 5 is also true because apple is in the string in row 5 of TextCol.”
Philip Treacy
Hi Cedric,
FindStrings is a lambda function, it’s declared inline in the code rather than separately elsewhere. The value passed into it is the current (row) value in the TextCol column.
Yes it should say Row 5, thanks for that, I’ve changed it.
regards
Phil
hue
Tracy,
my table with over 64,000 rows it takes 2 hours to return the result.
is there any way to make it faster?
thanks
hue
Philip Treacy
Hi Hue,
Possibly if you used List.Buffer. I’d need to see your code and data though to test it.
Please start a topic on the forum and attach your file.
Regards
Phil
Sarah
Please provide a video walk through of this!
Philip Treacy
Hi Sarah,
It’s being planned.
In the meantime check my latest post on listing words that are found in searches
https://www.myonlinetraininghub.com/create-a-list-of-matching-words-when-searching-text-in-power-query
Regards
Phil
Torstein J
Hi Phil!
I really enjoyed your lesson about searching for strings using Power query.
I have been using Power Query for a while, but I realize that there’s a lot I don’t know.
When using PQ I have been able to transfer the results easily from PQ to the worksheet.
With your quieries I was not able to figure out how to get the results back to the data sheet.
I would like to have the true/false list in column B to be able to for example filter table based on the results in that column.
I hope I have explained my problem understandable, and that you can tell me how to achieve this.
Philip Treacy
Thanks Torstein. All the queries in the workbook are set to Connection only. To load the results into a worksheet, right click on the query in the Queries and Connections list, then click on Load To … then choose Table and either New Worksheet or Existing Worksheet.
Regards
Phil
Torstein J
Thanks, Phil! Obviously, but I didn’t catch it. Now it works as i Wanted it to!
regards Torstein
Philip Treacy
No worries 🙂
Donald Parish
As a former Ruby and Clojure programmer, I’m impressed by your use of functional programming concepts applied to Power Query. I love how there is a function for what you need in m, so you can write such clean code.
Philip Treacy
Thanks Donald. Clean code is good 🙂
Jim Fitch
Hi Phil,
Thanks for the detailed, inside-out explanation of how the M code functions work inside the formula. That is very helpful instruction. Thanks, too, for defining “lambda function”. I look forward to impressing my children with it; two of them are software developers. We’ll be together for Thanksgiving (not too far off); it should be a good topic for official time-outs during the football games. Ha!
Philip Treacy
Thanks Jim. Our eldest son is just getting into programming and really enjoying it. It makes for fun conversations 🙂
Matthias
Hi Phil,
thanks, that was very well explained.
As you could have e.g. apple(s) and pear(s) in one sentence, you might want to know how many matches you have per sentence, instead of just match TRUE or FALSE:
List.Count(Splitter.SplitTextByAnyDelimiter(WordList)([TextCol]))-1
L.E.:
Exact Match – Ignore Case is quite complex. You explained it well, but what is the reason to use the function with Comparer.Equals instead of simply using Comparer.OrdinalIgnoreCase. They render the same results for your example data.
Regards,
Matthias
Catalin Bombea
Hi Matthias,
There are many ways to achieve the same result. You might also want to look at another way to do things, using regular expressions.
AlexJ Jankowski
Phi, Minda,
These are very useful indeed.
I am trying to build a function to provide the same functionality, but I am struggling with the syntax. Could you please possible post the correct syntax for a function that would be invoked in a new column function for one of the above cases?
Thanks in advance
Philip Treacy
Hi Alex,
In this file I’ve modified the Find Substrings query into a function and a separate query that calls this function. You’ll find these in the fx group when you open the Query Editor. Hope that gives you some pointers.
I’ve also written about Power Query Custom Functions.
Regards
Phil
Alex Jankowski
Thanks, Phil. I appreciate your reply.
In fact, my struggles with functions are not in concept or implementation of basic functions. (I had already read your post on functions). It is more in the syntax of more complex M statements which, embedded in functions, get even more complex. I’ll e workin on it!
Thanks again,
Alex
Philip Treacy
Good luck with your learning.
Glenn
Excellent post, Philip!
Philip Treacy
Thanks Glenn 🙂
Stephen Wright
Thank, that’s really interesting, but how could you return the item that was matched instead of just knowing that there was a match?
Catalin Bombea
Hi Stephen,
Power Query has more surprises..
You can use any JavaScript function, PQ has a function Web.Page, inside that page you can add your text to be analyzed and run a script to process data.
Here is an example of using a JavaScript function in a Web.Page created only to use the JavaScript Regular Expressions pattern matching.
Philip Treacy
Hi Stephen,
This code returns the matches
https://www.myonlinetraininghub.com/create-a-list-of-matching-words-when-searching-text-in-power-query
Regards
Phil