July 20, 2019
HI,
I'm trying to setup an import process that will consolidate information downloaded from a bank in a .csv format. I'm putting the downloaded files into a folder and have used the 'from folder' option to collect the data into PowerQuery. The problem I have is that downloaded files may create duplicate entries simply by two, three or more files including data from the same date. My immediate thought is to simply exclude duplicates, but because the information isn't time stamped it is quite possible to have two or more valid records that are apparently duplicates of one another, for example if the same round of drinks is bought multiple times on the same day. This means that a record is only a duplicate if it occurs in more than one download. Perhaps this example will help:
In Download File 1, contains two valid records:
Date - 10 Jun 19, Retailer - Buggins Refreshment, Amount - £12
Date - 10 Jun 19, Retailer - Buggins Refreshment, Amount - £12
In Download File 2, and possibly 3 & 4 ....
Date - 10 Jun 19, Retailer - Buggins Refreshment, Amount - £12
Date - 10 Jun 19, Retailer - Buggins Refreshment, Amount - £12
the valid entry is that in Download File 1 and I want to exclude the records from the other downloads.
I have had it suggested to work with referenced copies of the data and some clever use of merges, but I haven't managed to make that work.
I'd be grateful for any suggestions or advice.
Best Regards
Peter
July 20, 2019
HI,
I believe I've found the answer and am posting in case it helps anyone else with the same problem:
1. Group by Download File, Date, Retailer and Amount - this collects any duplicate records on that day into a table.
2. Select Date, Payee and Amount and delete rows of duplicates. This leaves a single entry for each transaction including the table that contains the information about how may duplicates there are on one day.
3. Expand the table and delete any unnecessary columns, which is probably all the new column(s) you select.
Hopefully this is useful.
Regards
Peter
1 Guest(s)