April 4, 2019
Hi -
I'm trying to solve what (to me at least) is a complex remove duplicates problem.
The data contains forecasted costs by month for unique ID numbers. The forecasts are updated monthly and the data includes a manually generated forecast name ("Name" in the sample data file), which is always the year and month they are updated. For example, a forecast updated in July 2022 would have the name "2022-07." The system I'm pulling from also automatically creates a version number ("Version" in the sample data file) for each forecast, i.e. 1, 2, 3, etc.
I recently discovered that forecasts deleted by users in the system of record are still being extracted in the data set, resulting in a duplicate combination of ID and forecast name. In other words, the data is erroneously giving me duplicate forecasts. Thankfully, the system-generated version number is not duplicated so there is a differentiator. I cannot find a way to stop the system of record from including the deleted forecasts so I'm trying to figure out a way to do that in the Excel file that I download from the system.
I only want to keep the records/rows where the combination of ID and forecast name are associated with the maximum version number of that combination. All other instances of that combination should be removed. So in the sample data I've uploaded, I want the rows where the combination of ID 1234567 and Name 2022-07 include anything less than Version "5" removed from the data set. The other sample data included needs to remain because it has a different combination of ID and Name.
I'm using Excel 2016 on a PC.
Moderators
January 31, 2022
Hi Eric,
I believe Power Query (PQ) is ideal for this kind of task. Connect to the "blue table" that comes from the system. Then PQ can group all rows by ID and keep only the rows with the highest version number for each individual ID. So, if one ID only had 4 updates so far, the 4th version will be taken for that particular ID.
If you are not familiar with PQ, it's well worth learning. The attached file now includes such a PQ solution (the "green table") and should work in your Excel version.
1 Guest(s)