Notifications

Clear all

Query is duplicating rows over 100 thousand times.

Power Query

Last Post by Mynda Treacy 2 years ago

4 Posts

2 Users

0 Reactions

138 Views

RSS

Rob Infinger

(@paparob)

Posts: 5

Active Member

Topic starter

I have a self-referencing query that is based on an append merge from two source sheets. One sheet is about 2k rows and the other about 500. Each sheet has an index column I created: one has even numbers and the has odd only. The merged query has over night gone to over 500k rows and is 52MB!

Source Sheet One:
- Facility One's Lab submission Records (What and how much)
Source Sheet Two:
- Facility Two's Lab submission records
Query:
- Self-referencing Append.
- Accounting marks an "X" in one column when the clients have been billed.
- Once marked, I filter it off (all "X"'s are filtered out)

1) I looked for duplicate index numbers in the two source sheets. I cannot find any.
2) Only one source sheet is duplicating the rows in the query.
3) The source sheets are used often throughout the day by multiple users: both Excel application and sharepoint web version.
4) The sheets are often sorted.

While the query is new (bout two weeks) it was working properly, for the most part.

The other day someone did insert a bunch of new rows. I deleted them as they were creating issues with the query.

Any ideas?

Posted : 15/07/2023 1:50 pm

Mynda Treacy

(@mynda)

Posts: 4770

Member Admin

Hi Rob,

That's a lot to try and visualise and as a result, I'm not sure I completely understand the scenario. For one, I don't know where the source data sheets are stored i.e. are they in the same file as the query, or an external file? How are you loading them i.e. by selecting the table names specifically, or using Excel.CurrentWorkbook and getting all tables in the file?

Are the Accounting users adding a column to the query output table in which to add an X? i.e. a column that is not part of the origina query. If so, this is not ideal. The X should be put in the source data tables, not the query output table. It's dangerous to do this because each time you load the query, the position of the data in the output table could become out of sync with where the Xs were inserted.

I suspect the query is duplicating itself because you are referencing the query output table as though it's a new data source in order to capture the X status.

Of course, you could always add a step to 'remove duplicates', but it will still probably be a very slow query to run, and become slower over time.

Mynda

Posted : 15/07/2023 8:05 pm

Rob Infinger

(@paparob)

Posts: 5

Active Member

Topic starter

Hello Mynda!

Sorry, that was a complicated description.

In the end, I was not able to find the duplicate values. However, since the error was acting like a duplicate value, I changed the Index Values of the repeating rows in the source sheets and gave them new unique ID numbers. This cleared up the problem. The sheet returned to normal; no duplicate rows.

Maybe I'll go back and try the "Remove Duplicates" in the source sheets, but I fear I will not see anything. I did add conditional formatting to highlight duplicates and nothing showed up.

To try and answer your questions:
- The source sheets are in the same workbook as the queries.
- I created the query by Merge / Append and selecting the two worksheets by name, using the unique Index column as the link.
- That was my first mistake i made. Indeed, adding the additional columns / table was dangerous. We lost all indexing (positioning). As you point out, the query and the new table are not "referenced" to each other. I used a "Self Referencing" query to solve this problem. We are now able to allow accounting to mark the billed entries. Here, the self-referencing sheet did the trick.

Thanks!

Posted : 28/07/2023 3:19 pm

Mynda Treacy

(@mynda)

Posts: 4770

Member Admin

Glad you got it working, Rob.

Posted : 28/07/2023 8:18 pm

26 Forums
7,320 Topics
32.2 K Posts
13 Online
35.7 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

Query is duplicating rows over 100 thousand times.

Super Globals

Options and Features