I have a list of Merchant category code groups, merchant category codes and merchant names. The problem is there are almost 87,000 merchant names on it and for the analysis being done we want to group all the stores from the same store brand together. For example a store called Mark's Work Warehouse may be listed as #store number Mark's; or as Mark's #store number or some other variation of a the name of the chain with the store number but then there are stores with Mark's in the name that have nothing to do with Mark's Work Warehouse so need to be left out of the list for this particular store. Wal-Mart is even worse for varieties in how it is listed as the Merchant Name.
One of the group working on the analysis we are doing used filtering and searching to try to identify and get these stores name down to store brand rather then individual store but I'm trying to figure out the best way to use Power Pivot so when we repeat this with newer data, we don't have hours of work to clean up the merchant list.
What is a good way to do this? I was looking at the Power Query video on transformation tables but not sure with the number we have listed and the variety of patterns for store names and store number that it would work.
Thanks
Karen
There's fuzzy matching in PQ. Without an example of the data, I'm not sure we'll be able to give adequate advice.
I've removed the merchant list from the overall data mix and limited it to one merchant category.
One easier example (hopefully) is No Frills, it can be listed as
(name) No Frills (Store number)
No Frills (store number)
(name) No Frills Store
Nofrills (name) (Store number)
There may or may not be spacing so it can be Nofrills or No Frills and at times it may be 's at the end instead of just the s. They are all the same chain and it would be nice to know how much we spend overall at the various No Frills stores.
Thanks for any help.
Karen
In the column containing the Store Name, click on the Sort drop down. Select Text filer. Select Begins with No Frill and select the Or function in the window and select begins with NoFrill. You now have all the No Frills store available for analysis.
Alan
We have done that. I'm looking for a way that we can automate it. With close to 87,000 individual stores on the entire list and with a lot of time spent with the filters we got down to over 23,700. However, we don't want to have to repeat that process everytime we refresh the data (probably quarterly).
I was hoping to avoid macros, as I have very limited experience with programming and maintaining them. So was looking at some of the features of power query. We are already using power query to clean up and set up DIM tables for using in power pivot.
Thanks
Karen
Karen,
I am trying to get fuzzy grouping to work, I haven't gotten the syntax right and my version doesn't have it in the ribbon. I didn't have time to play with it this weekend, but I'm going to try and play tonight.
Hey Karen,
Fuzzy matching is your answer. I got it to work, but I think a fuzzy join of two tables would be better than fuzzy grouping. Here's a video showing exactly what you want. 🙂 Go, be the hero of the office.