I've had a chance to start to build out a costing model with a lot of data manipulations (e.g. category rollups and splits). To start small, I'm using some test data. The raw, test data has about 4 Excel tables, with anywhere from 3 to 30 rows of data in it. I also added 2 other Excel tables, each with 3,000 rows of data in them (about 3 columns wide).
In my model so far, I've got about 30 queries. It's not that I've built the queries any more granular than necessary, but it's the beginning of a very complex financial model, query-wise, although not math-wise. The model is solely in Power Query. Somewhere along the way in my model building, I've observed very slow Excel performance on the refresh. In particular, this occurs when I refresh any 1 of the 5 pivot tables I've built from the model, and also when I relaunch Power Query and click on any of the queries to view its data. The refreshes sometimes take about 30 secs to 1 minute, although not all the time.
My hypothesis is that it's not the data sources that are slowing things down, but the 30 or so queries I've built. I don't have any proof though. Since I'm building the model, I'm tinkering with it, and constantly refreshing it. So, although 1 minute is not a long time to wait, it feels like it especially when I'm making a small change to the model and waiting to update a pivot table to ensure I've made that change correctly.
I've looked online and one site said that PQ updates ALL the queries when you do a refresh, so the more queries, the longer it takes. Another site said to enable the Fast Data Load under Query Options. I've done that, but it doesn't seem to help much.
Is there any workaround to this issue? My source data is in the spreadsheet where I have my PQ model. I could move my source (Excel) data back out to external XLS files, but I would think that this would only slow things down further. I do have 8 GB RAM. Any thoughts would be appreciated.
Thanks,
Cory
Hi Cory,
Power Query can push processing back to external databases like SQL etc., which is called query folding. However, as your data is in Excel it won't fold the queries anywhere. In other words, Excel is doing all the work as query folding to an external database isn't available.
You might want to try moving your source data to a CSV file to see if this improves performance, as described here.
This post explains how you can use a process monitor to find out how much data Power Query reads from a file.
This post has some information about how Power Query caches data, which is particularly important if you have queries referencing other queries.
Hope that helps.
Mynda
Hi Mynda,
To speed up performance of my small example, I tried converting all the Excel tables to external .CSV sources and then updating my queries to read them. the good news is that I improved performance when opening the file and query editor from scratch. The bad news, when I opened the last query in the stack, is that performance improved only from 4 min, 8 secs to 3 min, 50 secs. So, unfortunately, that didn't help that much.
I did appreciate the article in your 2nd link in learning more about how PQ caches data. In particular, that article led me to others which talk about the Table.Buffer command. That put me into the deep end of the pool though in that I'm trying to figure out where that should be use (on both source and subsequent queries) and exactly what the syntax is. So, that would be a great topic for a future blog entry on exactly how to use that.
Another site recommended doing as much manipulation in DAX (PP) as possible since the compute engine is faster in DAX than it is in M. The caveat to that advice was to keep the manipulations in PP in case you need it for subsequent queries, which I do.
I'll continue to investigate how to use Table.Buffer in my queries to see if and how this can improve performance though.
Regards,
Cory
Hi Cory,
Another idea is if you don't need to preview all of the queries you can turn off Background Data; go to Query Options > Current Workbook: Data Load > deselect 'Allow data preview to download in the background'.
The syntax for the buffer functions is to simply wrap it around the Source line of code e.g.
Binary.Buffer (preferred):
Source = Excel.Workbook(Binary.Buffer(File.Contents("C:myndamy_file.xlsx")), null, true),
Table.Buffer:
Source = Table.Buffer(Excel.Workbook(File.Contents("C:myndamy_file.xlsx"), null, true)),
Mynda
Hi Mynda,
Thanks for your suggestion about turning off Background Data in Power Query. That was a game changer! That's the only change I made and it allowed my queries to be refreshed in a matter of seconds. I haven't tried using the Excel buffer command, but I may need to try it when I load my model with larger amounts of data. But at least, I now have the correct syntax for using the buffer command, if needed.
Thanks again.
Cory
Great! So glad it was a simple change.
Mynda Treacy said
Great! So glad it was a simple change.
I registered on here just to say thank you as thats saved so much time! Thanks!
🙂 glad I could help, Joe!
I have an Excel file where I was stacking queries pulling from Salesforce. I simply tried your tip Mynda and it changed my refresh time from 15-20 minutes to 3! This is awesome, Thanks!
Thanks Mynda! Turning off the data preview literally saved my life! Thanks! 😀
Mynda Treacy said
Hi Cory,Another idea is if you don't need to preview all of the queries you can turn off Background Data; go to Query Options > Data Load > deselect 'Allow data preview to download in the background'.
The syntax for Table.Buffer is to simply wrap it around the Source line of code e.g.
Table.Buffer(Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content]),
I would apply it to the first query, assuming that is the one that your other queries are referring to and see if that helps.
Mynda
Hi Mynda,
Let me add my kudos and praises for this elegant solution. My workbook takes very long to refresh 23 queries. I've been wrestling with it all day! This solution radically speeds things up!
Great to know it was helpful, Gary!
Hey Mynda
My problem is also regarding slow refresh but my database is google sheets.
I have few google sheets with around 150-200 tabs in each sheet, though the data in those sheets is not more than 20 columns and 100 rows. And the file size is just 3 MB.
Still the refresh time is more than 20 minutes. Also any change I do in like changing data type or deleting any column takes 2-3 minutes. And if I have to save the file its almost like a nightmare. I can take a nap, have lunch and do some shopping till the time it get saved. I hope now you can imagine how slow it is.
Any suggestions what should I do for a faster refresh?
Thanks in advance 🙂
Hi Mayank,
You can take a look at this topic: https://www.myonlinetraininghub.com/excel-forum/power-query/power-query-very-slow
As you can see there, it's a matter of what your query does. As I said before, trying to imitate excel functions in PQ might lead to highly inefficient queries.
You will have to provide more details to see what can be improved.
Hi Catalin
Thank you for the response.
Here I am adding a sample file (though its not the complete file, but I hope it would be helpful in clearing the situation).
This sheet contains just 3 tabs where as the original one contains over 100 tabs.