Hi everyone, I wonder if anyone could help me...I have a set of data in which I have some duplicates which I need to remove. In one column I have stock number that duplicated and in another I have inv and credit note 33,333,33 & -33,333,33 which I also need to remove leaving only the recent invoice for 33,500. What would be the best way to approach this in power query? Many thanks for any suggestions.
Invoice/Credit Note No. | Date | Stock Number | Price Sold |
TM1425516 | 1240423 | 03321830 | 33,333.33 |
TM1426080 | 1240423 | 03321830 | -33,333.33 |
TM1426101 | 1240430 | 03321830 | 33,500.00 |
Thank you
Gosia
Hi Gosia,
You could add a column that converts the values in the Price Sold column to contain all positive values, let's call this column 'Value'. Then you could simply remove duplicates based on the Stock Number and Value columns. Using the example data above, you'd be left with the last item for 33,500.00.
However, there is a risk that you sell items for the same price and stock number that could get interpreted as a duplicate, when they're not. This isn't a limitation of Power Query, just a result of the nature of your data.
Mynda
Hi Mynda, in terms of removing duplicates. I did as suggested below, added column with value sold, converted it to absolute number. Then highlighted two columes ctr key+ stock number & column with value sold then removed duplicates but it doesn't quite work, I ended up with 2 lines, this is my data
stock number | new value |
03321830 | 33333.00 |
03321830 | 33333.00 |
03321830 | 33500.00 |
and after removing duplications : I still have line with 33333 which I actually wanted to get rid of, I'm wondering what I could be doing wrong?
stock number | new value |
03321830 | 33333.00 |
03321830 | 33500.00 |
Thank you
Gosia
Remove Duplicates in PQ actually filters for distinct values. So, it returns every existing value once. If you want to keep only unique values (i.e. values that exist exactly one time only), you need to group by the two columns, count, and keep only rows where the count equals 1.
You'll find an example of the code in the attached file.
That is helpful, thank you. Removing these with method you suggested would be ok for anything with count 2(where new invoice value is different to the previous 2 documents), however I have another challenge where stock number has 3 rows with same value(after changing to absolute number) but I only want to keep 1. I need to be able to keep one row. Does anyone have any tips? Thank you how
Stock Number | Price Sold | absolute number |
03265654 | 23766 | 23766 |
03265654 | -23766 | 23766 |
03265654 | 23766 | 23766 |
Based on stock number and price (or absolute value) alone, it will be difficult. I believe that is what Mynda pointed out in her response. What logic would you apply if you were to perform this task manually. I guess you would look at the customer, a transaction date, and a document reference to the customer order perhaps. How would you otherwise be able to know that an invoice, a credit note and a corrected invoice 'belong' together?
Perhaps you can share some more complete data that resembles your real data and indicate which rows would have to be removed.