Hi all,
I have an interesting problem (at least for me) which I solved already, but I'm not really happy with my solution.
So, let me explain.
We have two simple tables:
- Articles - 1. column: Art.No. - 2. column: Description
- Categories - 1. column: Keyword - 2. column: category
The task sounded simple to me:
- For every Article from table 1 assign a Category from table 2 by comparing the keyword in table 2 with the description in table 1.
But when I received the Article data it became quickly obvious to me that this would not be an easy one.
20 years of nearly uncontrolled data entry into the Article table (an extract from SAP btw) resulted in a mess...
...same articles written in different ways (sorry for German language: Abdeckkappe...Abdeck Kappe...Abdeckkape and so on) and also not only single words, but also something like "Abdeckkappe Type TB 25 - Laenge 25mm" could easily be found in different variations.
Finally, I found a solution that works somehow:
- I splitted the Description column by delimiter (space) into rows ( so instead of my original 13000 records I had 59000)
- then I merged the two tables (LOJ - table 1 first, table 2 second) by comparing the (splitted) Description of Table 1 with the Keywords in Table 2.
- after some cleaning (removed nulls, duplicates etc.) I finally ended up with the requested result.
But for me it looks like not the best solution.
I would prefer something where I could take every Keyword from Table 2 in the format '*keyword*' run through my Table 1 by comparing with Description.
What I learned until now from a Youtube video by Oz du Soleil is , that although we have that fuzzy search thing while merging tables it has some shortcomings (which I only can confirm after trying out this possibility).
I would be very happy to learn from you experienced guys out there how to do it better (and more reliable) than I did.
Best regards Martin
P.S. Unfortunately I can not upload the data, because I'm not allowed to do so by company rules.
Hi Martin,
You don't have to upload company data, but I'm sure you can prepare a file with sample lookup tables, sample data table and an example of the desired result.
Without a file, I can only provide a theoretical answer, we can even use regular expressions.
Hi Catalin,
you are right: it's difficult to just THINK about what I was writing, but better and easier to understand, if you see something.
So, I prepared an exerpt from the original data and uploaded it.
Let's see, if this helps to understand my problem.
Please take a look at this similar forum topic: https://www.myonlinetraininghub.com/excel-forum/power-query/search-text-or-phrases-from-a-string-in-table-a-and-match-with-another-list-and-return-the-values
It's very easy to update to your situation, you can make it case sensitive or not, many alternatives available.
Hi Catalin,
yes indeed, this example is very similar.
But since I never worked with functions and Parameters I have to have another look at the details.
Right now I understand the principle (which I somehow 'simulated' with a lot of steps in my solution), but I don't understand how this done in your example.
So, I currently subscribed to the Power Query Course and maybe in some days I come to the functions part 😉
But nevertheless, many thanks already.
Hi Martin,
I tested 2 more alternatives for you, after splitting the description into words (split by space).
One using List.PositionOfAny to identify which word matches to keywords table, which seems to be very slow,
The second attempt using List.Intersect, comparing the list of words to the search criteria list, this will keep only matching items from both lists. Looks much faster than PositionOfAny.
See attached file.
Morning Catalin,
and thanks for your 2 examples.
Unfortunately I'm not able to view the steps inside Power Query in my Office 365 version on my company laptop.
When opening the file, a message is displayed telling me that this workbook is using some new functions that my Excel version might not be supporting.
This seems to be true, because when I try to open editor, it loads and loads and never finishes.
Therefore I have to wait until this evening and try it out with my private laptop and a 'newer' Excel version.
BR Martin
Good evening, Catalin.
Unfortunately I cannot open the Power Query in your workbook on my private laptop as well.
Most of the times Excel freezes completely (only the Taskmanager can help me out).
My Excel version reads -> s. attachment.
Looks like I have to wait for the next update.
Hi Martin,
There is nothing wrong with your version, you have the latest version and more, it's an office insider version.
Try the attached file.
Hi Catalin,
this file works, thank you.
I added now my full dataset (13.379 articles) to your worksheet and tried it out.
Yes, this also worked, BUT a refresh takes about 4 minutes to finish.
The same refresh with my solution (I'm far away from offending you, just mentioning the facts) takes 5 seconds.
Looks like, although my way is far more complicated/longer, the calcutation engine is optimized for that type complicated queries 😉
Indeed, this happens frequently when adding custom M code.
There are many things that can prevent query folding, you can read more on web about this.
There are a few things that can be done to speed up: use Table.Buffer, List.Buffer in previous steps when you use a time consuming function, change settings to disable background refresh, Fast Data Load, Background data download.
The attached version takes now under 1 minute to refresh, with over 25.000 rows of data. (in my computer, of course)
Yes, that Version is much faster...
Hopefully I will be able to create such M Code after finishing my Power Query Course 😉
Many thanks until now, Catalin.