Notifications

Clear all

Removing duplicate values based on results from two columns

Power Query

Last Post by Tracey Hartley 2 years ago

5 Posts

3 Users

0 Reactions

729 Views

RSS

Tracey Hartley

(@confused-com)

Posts: 13

Eminent Member

Topic starter

Hi all. I receive monthly payroll data and one of the tables I use is dd every month. The data is staff structure, so there are always duplicate values. I need the query to remove duplicates from Personal Reference (which needs to be a Unique Identifier) and only keep the version where the Date is the most recent date. How do I do this?

Posted : 07/03/2023 9:08 am

Lionel Baijot

(@questvba)

Posts: 125

Estimable Member

Hi Tracey Hartley,

Can you attach a file with anonymised data that shows the basic data to be processed and then the result you want to achieve?

BR,

Lionel

Posted : 07/03/2023 9:35 am

Tracey Hartley

(@confused-com)

Posts: 13

Eminent Member

Topic starter

Hi. Please find attached the output I get from my Power Query.

I need it to remove duplicates on Personal Reference:People (column I) so that I can use this as the Unique Identifier. I need it to remove all those versions and just keep the latest version.

So, you can see that Personal Reference:People number 0000025 appears each mt, and I would only like to keep the latest version (so date 01/02/23):

01/01/2023	0000025	KB
01/02/2023	0000025	KB
01/04/2022	0000025	KB
01/05/2022	0000025	KB
01/06/2022	0000025	KB
01/07/2022	0000025	KB
01/08/2022	0000025	KB
01/09/2022	0000025	KB
01/10/2022	0000025	KB
01/11/2022	0000025	KB
01/12/2022	0000025	KB

In this example, Personal Reference:People number 5006718 last appeared on the 01/12/2022 file so this would be the record we would need to keep:

01/04/2022	5006718	EB
01/05/2022	5006718	EB
01/06/2022	5006718	EB
01/07/2022	5006718	EB
01/08/2022	5006718	EB
01/09/2022	5006718	EB
01/10/2022	5006718	EB
01/11/2022	5006718	EB
01/12/2022	5006718	EB	05/12/2022

Posted : 07/03/2023 10:13 am

Alan Sidman

(@alansidman)

Posts: 236

Member Moderator

If I understood you correctly, I first grouped your table by Personal Reference and Max Date. I then merged (joined) that table back onto the original table to give you only the data for the Max Date for the Personal Reference

See the attached. I was unable to open your query as it was linked to your PC.

Posted : 07/03/2023 1:50 pm

Tracey Hartley

(@confused-com)

Posts: 13

Eminent Member

Topic starter

That is amazing, Alan! Thank you so much for your help.

Sadly, there's one last bug. PAYROLL_Structure returns 169 unique rows, but that increases to 178 rows with Merge1. Looking at it, that's because 8 out of 169 Personal Reference:People are not unique as those 8 people have more than one role within the organisation (something I hadn't realised before doing this!)

Can you suggest any way around this, because otherwise I'm stuck without a Unique Identifier for example, is it possible to add a _1 or _2 to that final data set where there's a duplicate?

Many thanks!

Posted : 09/03/2023 9:57 am

26 Forums
7,319 Topics
32.1 K Posts
3 Online
35.7 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

Removing duplicate values based on results from two columns

Super Globals

Options and Features