Notifications

Clear all

Table.Buffer w/ Thousands of Records

Page 1 / 2 Next

Power Query

Last Post by Adam Smith 4 years ago

17 Posts

2 Users

0 Reactions

1,394 Views

RSS

Adam Smith

(@patri0t82)

Posts: 43

Trusted Member

Topic starter

Hello, in conjunction with a previous post of mine:

https://www.myonlinetraininghub.com/excel-forum/power-query/calculating-time-between-two-rows-of-datetime-stamps#p19048

I've been able to modify my table of 30k+ records and return the information I'm looking for (I believe), however, even when wrapped in Table.Buffer, it refreshes at roughly 10 rows per second and will take quite a while to get through the 30k.

Is this likely due to the calculation involved in power query just being too large, or something I'm not doing with Table.buffer?

Posted : 11/03/2021 1:10 pm

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

Hi Adam,

The FilterTable function is executed at each row, filtering the source data to get the exit time for each record is not efficient.

It can be improved, instead of filtering the entire initial table, we can filter a smaller table containing Exits only, already sorted to avoid sorting at each record in the FilterTable function. Filtering a smaller table already sorted should be faster.

let
Source = Excel.CurrentWorkbook(){[Name="AdamS"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"Field Date Time", type datetime}}),
FilteredRows = Table.SelectRows(ChangedType, each ([Direction] = "Entry")),
Exits=Table.SelectRows(ChangedType, each ([Direction] = "Exit")),
ExitsSorted = Table.Buffer(Table.Sort(Exits,{{"Field Date Time", Order.Ascending}})),

FilterTable = (tbl as table, Person as text, Company as text, EntryDate as datetime)=>
let
Filtered = Table.SelectRows(tbl, each ([Name] = Person and [Company] = Company and [Field Date Time] > EntryDate))
in Table.FirstN(Filtered,1),

AddedCustom = Table.AddColumn(FilteredRows, "Exit", each FilterTable(ExitsSorted,_[Name],_[Company],_[Field Date Time])),
#"Expanded Exit" = Table.ExpandTableColumn(AddedCustom, "Exit", {"Field Date Time", "Direction"}, {"Exit.Field Date Time", "Exit.Direction"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Exit",{{"Exit.Field Date Time", type datetime}})

in
#"Changed Type"

If you can prepare sample data for 1-5 thousand records or more, I can test it on my side to see what else can be improved

Posted : 12/03/2021 2:11 am

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

Because you mentioned power BI, you could write a measure to calculate the Exit dates...

Split the initial data into 2 tables (one for Entry, one for Exits)

In Entry table, add a measure to calculate the Exits from Exits table.

Should be much faster than PQ, which is mostly for data transformations, not complex calculations.

Posted : 12/03/2021 2:20 am

Adam Smith

(@patri0t82)

Posts: 43

Trusted Member

Topic starter

Thank you for your responses, so very much.

I have been struggling mightily with this and I worry a great deal my current solution won't be sustainable. With regards to Power BI, where our company's solution is currently implemented, a fantastic user in the official PowerBI forum provided this calculated column for me which returns the total number of minutes on site, which I then convert to hh.mm:

Time on Site (Minutes) (Unique Identifier) =

IF (

'Gate Scans'[Transit Direction] = "Exit",

VAR aux_ =

CALCULATE (

MAX ( 'Gate Scans'[FieldDateTime] ),

ALLEXCEPT ( 'Gate Scans', 'Gate Scans'[Company], 'Gate Scans'[SBIID1], 'Gate Scans'[Trade] ),

'Gate Scans'[FieldDateTime] < EARLIER ( 'Gate Scans'[FieldDateTime] )

)

RETURN

IF ( NOT ISBLANK ( aux_ ), 24 * 60 * ( 'Gate Scans'[FieldDateTime] - aux_ ) )

)

The problem is, after 25K records I'm getting memory allocation errors in the Power BI service, which is why I returned to Excel to see if I couldn't perform the calculations outside of PBI.

I managed to implement the code you just provided, though curiously on one step you filter to Entry then the very next step you filter back to Exit. I'm not sure I understand that behaviour, though it's beside the point I think. The code works, and it's quicker until I get to around 10k and it begins to slow.

I've also wondered if I couldn't break down the PBI solution above into a couple different stages to ease the burden.

I've added a file with 20k dummy records. Currently YTD we're up to ~110K rows, so I figure we'll be into ~450-500k by the year's end. I would love to not be constrained by a rolling 30 days.

Also, your code only worked for me when I changed the source to a table and not a queried SharePoint folder, but that may be a problem for another time.

Your help, as always is greatly appreciated.

Posted : 12/03/2021 7:58 am

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

Hi Adam,
The last query provided took 2 min and 20 seconds on my PC to process those 20k records, generated a little over 10k rows.

If we extrapolate, 100k records will take around 12 min. Will that work for you?

Posted : 12/03/2021 10:06 am

Adam Smith

(@patri0t82)

Posts: 43

Trusted Member

Topic starter

Hi Catalin,

I suppose it would get the job done. Though for our purposes, where ~35k = 1 month ~110k = 3 months and and ~450k = 1yr, I'm not sure anyone has the appetite for 3 months. Maybe it's better than 1.

I guess my hope in all this was a stronger solution that wouldn't burden the system in the same way the calculated DAX column does, yet also performs faster and gives us a whole year view.

My suspicion is that using your query for a rolling 30 days is the most viable, stable and efficient method of accomplishing the goal.

If you wouldn't mind answering a couple questions. 1) Can you think of a better way of utilizing the DAX solution that accomplishes the same goal with less strain 2) Assuming I attempt to implement your PQ code with our SharePoint folder source, I was getting errors. It only worked when I ran the query against a flat table. Would you have any thoughts on that?

Posted : 12/03/2021 12:16 pm

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

PP is much faster, that's why I mentioned it.

See attached PP version, takes a few seconds for 20 k test data.

I split the data into Entry and Exits, the calculated column formula is very simple, no relationship needed.

Posted : 12/03/2021 12:45 pm

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

Getting data from sharepoint folder is not the same thing as getting data from local file or folder, you have to adjust the query.

Microsoft 365 has the option to get data from SP folder: Get Data> From File> From Sharepoint Folder

Posted : 12/03/2021 1:00 pm

Adam Smith

(@patri0t82)

Posts: 43

Trusted Member

Topic starter

Thanks for sending along the sample file, though unfortunately it's confused me a little. I see Query1 is the AdamS table, Entry2 has the corresponding Exit timestamps, Entry fails to load - it provides "Expression.Error: 5 arguments were passed to a function which expects between 2 and 4." and Exits is loaded to the data model, which I'm assuming populates the pivot table.

When I refresh Query1 it took about 2:15, as well as Entry2. For only ~10k records this seems like a high amount of time, especially if we hope to go over 1yr of data.

Also, just to be clear, the DAX measure I'm using in Power BI is from a SharePoint folder, (which I've also tried using in Excel Power Query). The SharePoint Folder model appears to not work with this Excel query you're providing.

Posted : 12/03/2021 3:42 pm

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

Query1 and Entry2 are 2 different PQ solutions, the processing time is almost the same, but I doubt that it can be made faster than it is.

Entry and Exits are loaded to data model. If you get an error in Entry, you might have an older version of PQ.
Try = Table.AddIndexColumn(FilteredRows, "Index", 0, 1) instead of = Table.AddIndexColumn(FilteredRows, "Index", 0, 1, Int64.Type)

Again, just to be clear:

Getting data from sharepoint folder is not the same thing as getting data from local file or folder, you have to adjust the query.
Microsoft 365 has the option to get data from SP folder: Get Data> From File> From Sharepoint Folder

If the PP version is acceptable, you can replicate the structure in your PBI model, you have everything you need in the file provided.

The SharePoint Folder model appears to not work with this Excel query you're providing.

there can be many reasons for this, my guess is that you did not updated properly the query to the PBI version.

Of course, without seeing what you did and without proper error description, I have to take your word for it 😉

Posted : 12/03/2021 4:15 pm

Adam Smith

(@patri0t82)

Posts: 43

Trusted Member

Topic starter

I really appreciate you taking the time to support this. I am building two reference tables in PBI from the Gate Scans source, (Entry and Exit). I will attempt to replicate what you've done in PP. I see you created a key for Name and Company, would the Unique SBIID1 column not be sufficient?

Posted : 13/03/2021 12:20 pm

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

I used the fields you have indicated originally:

Is there any easy way to subtract one line from the previous line given the name and company match[..]

Use as key whatever you know it can be used as an identifier. I have no idea what that SBIID represents, you know best.

Posted : 13/03/2021 12:30 pm

Adam Smith

(@patri0t82)

Posts: 43

Trusted Member

Topic starter

I was optimistic about the PP solution initially. I had no idea how to make it work in PBI so I went back to your Excel file and removed the two tables I didn't need, and then swapped the AdamS dummy table with the correct information (100k rows). Once I did that, it removed the two columns for Entry and Exit time stamps from the pivot table. I've attempted to re-add the Entry column first, and it's been taking at least 20 minutes to put back on there (still seems to be working away as I type this). Something tells me the PP solution may not be viable after all.

GateScans query is now 114k, Entry 57,834 and exit 56,332.

I feel like I've reached the end of the road though.

Posted : 13/03/2021 4:11 pm

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

Columns are removed from a pivot table ONLY if you overwrite the headers as well when you paste the real data into source table..

Why are you replacing headers? Paste data.

Posted : 14/03/2021 12:39 am

Adam Smith

(@patri0t82)

Posts: 43

Trusted Member

Topic starter

Please forgive the delay in responding. I had to change the data source to connect with our actual source data, at which point it lost those two columns, which I was unable to re-add.

I ran the report last night using the original filtering solution (in the link at the very top), and it worked, though it took about 7 hours to get through the 59k rows for the three months. I meant to only have one month selected. That may be a solution, though, to the point of this topic, I've got the code in Table.Buffer and it didn't seem to increase the speed at all which tells me I may not quite understand what it does.

Finally, in the file you sent me above, where you had a query for Entry scans and another for Exit scans, can you explain how you intended to rejoin the data? I saw only one index column and I couldn't figure it out (which is why I re-tried the original solution).

Posted : 17/03/2021 6:50 am

Page 1 / 2 Next

26 Forums
7,300 Topics
32.1 K Posts
10 Online
35.5 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

Table.Buffer w/ Thousands of Records

Super Globals

Options and Features