July 7, 2016

My Excel forecasting program has slowed down. When I enter a number it takes half a minute to recalculate, when once it was almost instant. Calculation is set to automatic and there are no external connections. There are no Pivot Tables or Power Queries. There are no "volatile" functions and any VLOOKUPS have been replaced by INDEX/MATCH formulas. There are no functions that reference entire columns or rows ("C:C") or ("1:1"). The file stands at 12MB in binary xlxb format (probably 20MB in normal xlxs format). However, the file is loaded with array and SUMPRODUCT formulas.

I ran an experiment to see where the problem might be. I copied and pasted values over two of the sheets in the workbook. These two sheets have formulas like this:

=IF(INDEX(Data_Load_Complete_DRG_DB,MATCH(BI$1,Dates_DRG_DB,0))=TRUE,IFERROR(SUMPRODUCT((Maps_To_Class="Medicaid")*(MDC=$G33)*(Dates_DRG_DB=BI$1)*(DRG_Weight)*(DRG_DB))/SUMPRODUCT((Maps_To_Class="Medicaid")*(MDC=$G33)*(Dates_DRG_DB=BI$1)*(DRG_DB)),1),BH33)

What these formulas do is extract data from a database and arrange it for use in a summary format as named ranges elsewhere in the model. It first evaluates whether we are dealing with an actual results (historical) month or whether it hasn't been reported yet (it's a future reporting period for which we will prepare a forecast). And if there's an error, for example, if there is no volume which would produce a #DIV! result, then default to the prior month. It actually works really well. These two sheets have 120 months along the top columns and 180 categories down the rows. There are two similar sheets like this with a combined 52,000 cells.

I ran a profiler program, and what it showed was that 3/4 of the total calculation time is taken up by the above. There is one other sheet in the workbook also costing a lot of time, and it also performs calculations against the same database, this time to figure out the weighting for different classes of patients which further determine expense levels and revenues.

There are some cells that reference these sheets in the Assumptions tab, and these areas are where the slowness issues occur. They are not themselves formulas, but simple inputs at the head of large dependency trees. Some of these assumption inputs affect small dependency trees, and these react in 2-3 seconds. But those inputs at the head of large dependencies can take 30 seconds.

Do you have anything to suggest? I'm running an 8GB RAM Mac using Excel for Windows. That may be part of the problem. Perhaps a 64 bit, 16GB RAM Windows-only machine might do the trick. Since this file will be used by others, I'd rather not stipulate that they must have a very fast machine or be prepared to wait, but if that's the only solution, then so be it.

Anything you can suggest would be greatly appreciated.

Paul

Dashboards

Power Query

Power Pivot

January 13, 2016

Dashboards

Power Query

Power Pivot

January 13, 2016

July 16, 2010

Hi Paul,

The solution is to find another way to achieve what those bottleneck formulas are doing. Avoiding array formulas if possible.

Without fully understanding your formula in the context of your file it's difficult to be specific, but some suggestions are:

1. break the formulas down into helper columns. Less nesting of functions = less processing power required (generally).

2. reformat your data so you can use PivotTables instead of those formulas

3. use Power Query to clean the data and then extract what you need with either Power Query or PivotTables.

Hope that helps.

Mynda

July 16, 2010

Hi Paul,

There's no set of rules that I'm aware of. Array formulas can perform multiple calculations on one or more of the items in an array. Sometimes the same results can be achieved using multiple steps/formulas, which can be quicker to calculate than the single array formula.

Sometimes an array formula is the only way to achieve the desired result, but having looked at your example above I think you should be able to replace (at least some of ) those formulas.

Mynda

July 7, 2016

By the way, a friend ran this file on his machine. It's about the same as my computer, except that it has 16GB RAM whereas mine has 8. We both have about 2.3-2.6 gigahertz processors. He could recalculate a cell in about 15 seconds, whereas my recalculation time for the identical cell was 35 seconds. For a good user experience, that's still too long. I'm wondering if 32GB or even 64GB of RAM, and possibly a 4 gigahertz processor would fix everything. But I can't test this assumption until I actually buy a new computer, something I don't want to do just to find out.

November 8, 2013

Hi Paul,

Attached is a visual basic module with a code that will measure the calculation time for different options:

Range recalculation time (select a specific range, then run the code RangeTimer). This will give you the time used to calculate the selected range. It will help you identify the most "expensive" formulas, in terms of calculation power.

Sheet recalculation time

Active workbook recalculation time

Full recalculation on all open workbooks.

Note that if you make structural changes in a sheet (inserting cells, rows, columns), Excel will rebuild the entire calculation tree, taking the most time.

As Mynda said, the only option is to use power query and /or pivot tables to aggregate data and perform preliminary calculations, that will allow you to use simplified formulas.

Even if you don't use CSE key combination to enter SUMPRODUCT formulas, this function is a native array formula, and works exactly the same as an array formula. 52000 rows with such formulas will certainly slow down a file.

If you upload a simplified model with personal data removed or replaced, we can try to identify alternative solutions.

July 7, 2016

Hi Catalin,

I did run a profiler program before. It took 3-1/2 days. What it showed was that 3/4 of the total calculation time is taken up by formulas that extract data from a database and perform calculations against it.

I've attached an extract file with the culprits. The first tab is the inpatient database. There are no formulas in there except for a total at the bottom. The other tab, Avg Inpt Wts, extracts the data from the database, performs calculations to determine the average weight for each patient group, and assembles it into a format useful for other parts of the model. The weights from this tab are used for determining workload and expenses. Other parts of the model use Avg Inpt Wts as a database to find the patient weights for the different patient groups.

The formulas in Avg Inpt Wts are all the same, like so:

=IF(INDEX(Data_Load_Complete_DRG_DB,MATCH(BE$1,Dates_DRG_DB,0))=TRUE,IFERROR(SUMPRODUCT((Maps_To_Class="Medicare")*(MDC=$G12)*(Dates_DRG_DB=BE$1)*(DRG_Weight)*(DRG_DB))/SUMPRODUCT((Maps_To_Class="Medicare")*(MDC=$G12)*(Dates_DRG_DB=BE$1)*(DRG_DB)),1),BD12)

The formula checks if the data load is complete for that month, and if so, finds Medicare patients by diagnostic category for that month, multiplies by the patient weights that match that criteria, and then divides by the count of those patients to get the average weight. If the data load isn't complete, then it uses the prior month's weight. Basically what I have here is a SUMPRODUCT formula multiplying the weights from column E in the database by the patient sum meeting the criteria. There are 120 date columns and 212 rows, for about 25,000 cells containing the above formula.

There is an identical database and weights sheet for the outpatient side, so that makes 50,000 cells with these formulas in the workbook.

The formulas don't seem to me too complicated, so I don't know what's going on, but it sure takes a lot of time to recalculate. If you change one of the patient counts in the DRG Dbase tab, you'll see what I mean.

Paul

November 8, 2013

Hi Paul,

From the first look, the data structure is not right. Any database should expand in rows, not in columns. Data strored in excel should respect databases structure, then you will be able to use excel built in tools for reporting.

Attached is an example of a simple pivot table report made from your data. I reorganized the data properly with power query and added a calculated coilumn in Power Query for DRG_Wt * Value, I used this column in report aggregated as an average.

I definitely do not recomend using such a large number of formulas, a simple pivot table will work much faster and with larger data sets.

You mentioned that you are not referring to full columns in your formulas, but found a much worse case in your file: there is a formula referring to a defined name DRG_DB, multiplying it with DRG_Wt. That named range has 120 columns! 50000 rows multiplied with 120 columns, that's a huge load on the calculation tree.

Regards,

Catalin

Answers Post

July 7, 2016

Thank you. I had no idea that database structure mattered. I had organized it with dates across the top report-style instead of down the rows, because that's the way that my report worksheets are organized throughout the model. I will need to experiment with this approach.

Paul

Most Users Ever Online: 57

Currently Online: Mark Lund, Rudi, Max Neumann

16 Guest(s)

Currently Browsing this Page:

1 Guest(s)

Top Posters:

SunnyKow: 651

Frans Visser: 210

David_Ng: 96

mey tithveasna: 71

A.Maurizio: 60

rathanak: 58

yhooithin05: 54

Anders Sehlstedt: 47

julian: 46

PaulFogel: 37

Newest Members:

Ajay Mukhi

Stephen Motte

Rodney Hoyle

michael pitre

Samantha Burleson

Krishnaa Kurugod

David McSwain

Shameem Nassiree

Alanna Henderson

Janel Oz

Forum Stats:

Groups: 2

Forums: 18

Topics: 935

Posts: 4405

Member Stats:

Guest Posters: 1

Members: 42350

Moderators: 1

Admins: 3

Administrators: Mynda Treacy, Philip Treacy, Catalin Bombea

Moderators: Genevieve Tupas