May 16, 2020
Hi Mynda,
Thank you for the tips to speedup refresh times, however, despite disabling the background data preview, I am unable to see much of an improvement in the refresh times. Moreover, the load to data model time is even horrible.
For your reference, I usually work with data that averages 5 million rows and my hardware configuration is as follows:
Processor - i5
RAM - 8GB
Do you think the processor and RAM could be the issue?
Any help would be appreciated
May 16, 2020
Hi Mynda,
Shared below is the M code for one of the many transformations I do in a single excel file.
Data source is a shared location on a network. The rows count is close to 5 million and the time it takes to load to data model is at least 35 mins.
let
Source = Folder.Files("\\ant.amazon.com\dept-as\BLR12\TRMS\BLR BRI\Salman\QBR\Raw Data\FCB\FCB$"),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
#"Filtered Hidden Files1" = Table.SelectRows(#"Removed Other Columns", each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from FCB$", each #"Transform File from FCB$"([Content])),
#"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Transform File from FCB$"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from FCB$", Table.ColumnNames(#"Transform File from FCB$"(#"Sample File (2)"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"FCB", type number}, {"Max Week", type text}, {"Rpt Week", Int64.Type}, {"Rpt Year", Int64.Type}, {"Goal * Vol", type number}, {"Goal", type number}, {"Achieved", type number}, {"Rpt Month", type date}, {"Temporal Dimension", type text}, {"SUM([Numerator])/ 10000", type number}, {"Category1", type text}, {"Function Name1", type text}, {"Marketplace Id1", type text}, {"Number of Records", Int64.Type}, {"Period Name Mth1", type date}, {"Proposed Goals", type number}, {"Sub Category1", type text}, {"Sub Metric Name1", type text}, {"Actual Denom", type number}, {"Attribute Id", type text}, {"Calendar Date", type date}, {"Category", type text}, {"Denominator", type number}, {"Function Name", type text}, {"Location Name", type text}, {"Login", type text}, {"Manager", type text}, {"Marketplace Id", type text}, {"Metric Name", type text}, {"Numerator", Int64.Type}, {"Outsource Flag", type text}, {"Period Name Mth", type date}, {"Rpt Period Name Week", type text}, {"Sub Category", type text}, {"Sub Metric Name", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Location Name", "Site"}}),
#"Inserted Month" = Table.AddColumn(#"Renamed Columns", "Month", each Date.Month([Calendar Date]), Int64.Type),
#"Added Conditional Column" = Table.AddColumn(#"Inserted Month", "RoN", each if [Site] = "BLR" then "BLR" else if [Site] = "VIR" then "BLR" else "Rest of Network"),
#"Merged Queries" = Table.NestedJoin(#"Added Conditional Column",{"Category"},CatAlias,{"Category"},"CatAlias",JoinKind.LeftOuter),
#"Expanded CatAlias" = Table.ExpandTableColumn(#"Merged Queries", "CatAlias", {"Alias"}, {"Alias"}),
#"Renamed Columns1" = Table.RenameColumns(#"Expanded CatAlias",{{"Alias", "Cat Final"}}),
#"Replaced Value" = Table.ReplaceValue(#"Renamed Columns1",12,1,Replacer.ReplaceValue,{"Month"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",5,4,Replacer.ReplaceValue,{"Month"})
in
#"Replaced Value1"
Power Query
Power Pivot
Xtreme Pivot Tables
Excel for Decision Making
Excel for Finance
Excel Analysis Toolpak
Power BI
Excel
Word
Outlook
Excel Expert
Excel Customer Service
PowerPoint
November 8, 2013
Hi Salman,
Background data preview might help only when you are in power query editor, that's the place where the preview is used. In a normal refresh there is no preview.
With large data, you have to optimize the process as much as possible.
Merging queries might not be a good idea, because it makes no sense at this point, I assume you will not load data to sheet.
Load data to power pivot, then create a relationship between those 2 tables, instead of merging them, the result will be the same but more efficient.
If not possible, try adding Table.Buffer before merge.
You are using default combiner, which is not usually flexible, you should build your own functions to combine files and sheets, see an example here: https://www.myonlinetraininghu.....npivotting
Also, if you have data that does not change (historical data), might be a good idea to convert it once and store the converted data, next time load the converted data instead of processing same data each time, will be faster to load.
May 16, 2020
Hi Catalin,
Thank you so much for the reply.
- Now, I have an even more clear picture about the background data preview.
- With respect to merging queries, what you said actually makes sense, will try building relationship between the two tables.
- Could you please show me the exact place and syntax to add the Table.Buffer
- I am quite new to power query, so at this time, building my own functions to combine files would a challenge.
- Right now, there's no historical data. Will keep your suggestion in mind.
May 16, 2020
Hi Mynda,
Thanks for the reply.
This almost clears the confusion that I have about the Table.Buffer except for one thing - In the post "Where to place Table.Buffer", you had asked to wrap it around the Source (which is ideally the first step in power query) whereas Catalin (reply# 4) had suggested to add Table.Buffer before merge. Let's say I'm doing a merge operation at step #10, does it mean I need to add Table.Buffer at step 9?
Besides, I really appreciate Catalin's suggestion on building relationship between the two tables instead of merging them. I'm able to save quite some time following his advice. However, I have a follow up question to what Catalin suggested - if I need to use the merged column for some operation in the source data, then just building a relationship won't help is what I think, you actually need to do the merge so that you have the column in the source data. Please correct me if I'm wrong.
Thanks
Power Query
Power Pivot
Xtreme Pivot Tables
Excel for Decision Making
Excel for Finance
Excel Analysis Toolpak
Power BI
Excel
Word
Outlook
Excel Expert
Excel Customer Service
PowerPoint
November 8, 2013
Hi Salman,
Try both places (Source, step 9), see which one is faster. Buffering is a trial-error process, not a fail-proof recipe, sometimes works, sometimes don't.
About merging or using relationships:
it really depends on what you need to do. Best is to reconsider the operations needed, if some of them can be done in power pivot without merging tables in power query, will help you speed up the process.
Most operations can have more than one solution, if you can reconsider that merge operation you might be able to find a solution without merging.
1 Guest(s)