
New Member
Power Query

May 27, 2016

Hello - I'm not super great with Power Query yet but I think my data is horrible honestly. I have about 5000 rows which could be worse but looking for some easy suggestions outside of manually changing all this data. I've provided a sample of fictitious data that replicates as best I can the types of data I have.
3 columns: Category, Type, City
Issue: Type & City columns - lots of a variations and typos. Ideally in the Type column I just want the type of meeting (i.e product launch, product launch with simulation, and remove the extraneous data that would give me product detail or location. In the City column I have many typos and some have the City, State entered and some just the city.
Can power query help me here or is this a manual job?


November 8, 2013

Hi Kristine,
For Type, you can simply extract text before delimiter:
= Table.TransformColumns(#"Changed Type", {{"Type", each Text.BeforeDelimiter(_, "-"), type text}})
City is more complicated, as there can be lots of variations.
It is possible though, if ou download the list of cities from somewhere, and merge the city table with your data table, with the fuzzy lookup option:
= Table.FuzzyNestedJoin(#"Extracted Text Before Delimiter", {"City"}, Cities, {"City"}, "Cities", JoinKind.LeftOuter, [IgnoreCase=true, IgnoreSpace=true, Threshold=0.6])
With the default matching threshold of 80%, I had only 10 matches out of 14 records, had to decrease the threshold to 60% to match all records.
See file attached
1 Guest(s)
