Hi all. I receive monthly payroll data and one of the tables I use is dd every month. The data is staff structure, so there are always duplicate values. I need the query to remove duplicates from Personal Reference (which needs to be a Unique Identifier) and only keep the version where the Date is the most recent date. How do I do this?
Hi Tracey Hartley,
Can you attach a file with anonymised data that shows the basic data to be processed and then the result you want to achieve?
BR,
Lionel
Hi. Please find attached the output I get from my Power Query.
I need it to remove duplicates on Personal Reference:People (column I) so that I can use this as the Unique Identifier. I need it to remove all those versions and just keep the latest version.
So, you can see that Personal Reference:People number 0000025 appears each mt, and I would only like to keep the latest version (so date 01/02/23):
01/01/2023 | 0000025 | KB |
01/02/2023 | 0000025 | KB |
01/04/2022 | 0000025 | KB |
01/05/2022 | 0000025 | KB |
01/06/2022 | 0000025 | KB |
01/07/2022 | 0000025 | KB |
01/08/2022 | 0000025 | KB |
01/09/2022 | 0000025 | KB |
01/10/2022 | 0000025 | KB |
01/11/2022 | 0000025 | KB |
01/12/2022 | 0000025 | KB |
In this example, Personal Reference:People number 5006718 last appeared on the 01/12/2022 file so this would be the record we would need to keep:
01/04/2022 | 5006718 | EB | |
01/05/2022 | 5006718 | EB | |
01/06/2022 | 5006718 | EB | |
01/07/2022 | 5006718 | EB | |
01/08/2022 | 5006718 | EB | |
01/09/2022 | 5006718 | EB | |
01/10/2022 | 5006718 | EB | |
01/11/2022 | 5006718 | EB | |
01/12/2022 | 5006718 | EB | 05/12/2022 |
If I understood you correctly, I first grouped your table by Personal Reference and Max Date. I then merged (joined) that table back onto the original table to give you only the data for the Max Date for the Personal Reference
See the attached. I was unable to open your query as it was linked to your PC.
That is amazing, Alan! Thank you so much for your help.
Sadly, there's one last bug. PAYROLL_Structure returns 169 unique rows, but that increases to 178 rows with Merge1. Looking at it, that's because 8 out of 169 Personal Reference:People are not unique as those 8 people have more than one role within the organisation (something I hadn't realised before doing this!)
Can you suggest any way around this, because otherwise I'm stuck without a Unique Identifier for example, is it possible to add a _1 or _2 to that final data set where there's a duplicate?
Many thanks!