A running total (or cumulative sum) is when you add the previous value to the next, basically you're summing values every step of the way.
Creating a running total in Excel is easy. Mynda has written about a couple of ways you can do it
Running Totals in Excel Tables
But creating a running total in Power Query is not so straight forward, at least if you want a fast, usable query.
Grouped Running Totals
Follow this link if you are looking to calculate grouped running totals in Power Query.
Watch the Video
Download the Workbook with Data and Queries from this Blog
Enter your email address below to download the workbook with the data and code from this post.
Running Totals in Power Query
You could use List.Sum and List.FirstN, but this approach is extremely slow for a lot of rows.
For a handful of rows you won't notice it much but I did some tests with 100,000 rows and after 45 minutes it hadn't even gone through half the records.
The formula to calculate a running total this way is
where our table looks like this
List.FirstN takes two parameters and returns a list as its output. The first parameter is a list, which in this case is the values in the Value column.
The second parameter is a number or a condition that tells List.FirstN how many elements in the list to return. I'm supplying it with a number specified by Index.
So if Index is 3, List.FirstN returns a list comprised of the first 3 items in the Values column {36, 63, 89}.
List.Sum adds these values together to give the result 188.
By doing this for every row we can generate the running total.
What makes this approach slow is that for every row in the table, the Value and Index lists are re-evaluated.
The official word on how evaluation works (at least what I can find) is a bit sketchy. But my understanding is that for each row the query must check the entire Value and Index lists. It's like it forgets the lists exist and then for every row goes, oh, better go and see what these lists are, how many items in them, check that they contain valid data, etc.
Doing this 100,000 times for two lists takes a lot of time.
You can speed up the query by using List.Buffer. If you click that link you'll see the description of the function is a bit cryptic. But it works something like this.
List.Buffer loads a list into memory where it stays whilst the query is running. Most importantly, the query doesn't 'forget' about the list. So every time you refer to the list, Power Query doesn't have to go and evaluate the list all over again.
These combination of things makes using buffered lists significantly faster than non-buffered. There are some trade-offs though, buffering breaks query folding so queries to databases can actually be made slower by using buffering. But in this scenario query folding isn't being used so there's no downside.
Creating and Using a Buffered List
To buffer the list just call List.Buffer and then alter the reference to the Value list in the List.FirstN function to use that buffered list.
This query is a lot faster than our original non-buffered query but there's still an issue.
As the query is using an index column to indicate how many rows (values) List.FirstN should return and hence be summed by List.Sum, the query is still having to evaluate the Index list on each pass.
Trying to buffer the Index list actually doesn't help. Even if you did create a buffered Index list, you'd need to explicitly keep track of what row the query is currently operating on and then use that variable to access the buffered Index. There's no mechanism to do this in a query like this.
List.Generate
Being able to work with lists that are all buffered, and explicitly track what row you're working on will give you the best performance in this scenario, using List.Generate allows you to do this.
Based on a set of initial values, and rules that determine how those values change, List.Generate creates a list of values.
Iteration/looping is controlled by a loop counter - you aren't evaluating a list on each iteration. Which is why you don't need an index column for this approach.
The function syntax is
written in a way that's easier to understand
Thinking of it as a While loop is useful:
- Start with : x = 0, y =0
- while x < 20
- y = y + 1, x = x + y
- Return the list of x values
The above psuedo-code generates this list
Creating a Running Total with List.Generate
One of the great things about using List.Generate versus the List.Sum/List.FirstN approach is that we only need the list of values. An index column isn't needed because List.Generate can keep track of what row it's working on using a counter variable.
I'm creating a custom function called fxRunningTotal to generate the running total, and passing in the list of values as the only parameter. The function returns the running total as a list. Here's the code.
Each of the four lines in the List.Generate code can be explained as:
- Start with : RT = values{0} (the first item in the list), counter = 0
- while counter < the number of items in the values list
- RT = RT + the next item in the list, counter = counter + 1
- Return the list of RT values
The query to call this is:
As before, we buffer the Values, then pass them into the fxRunningTotal function.
The query then creates a table from the Source[Value] column and the list returned from the function that contains the running total.
Giving us this:
This query creates a running total for 100,000 rows in a couple of seconds on my PC. Your experience may be a bit different but it is an order of magnitude faster than using List.FirstN.
Grouped Running Totals
If you need to create a running total that is grouped based on the values in another column, for example, the image below shows running totals grouped by country, then you need this Grouped Running Totals in Power Query.
David Evans
Hi Philip,
In PQ, I’m trying to create a rolling sum of the last 3 months of values rather than a cumulative of all the values. I know how to do this in DAX and your examples show that, but i’m really struggling to do it in PQ when following the example on this page where a buffer is used.
Thanks in advance.
Philip Treacy
Hi David,
Could you please start a topic on our forum and attach your data/file so we can give you an answer that is specific to your situation.
Thanks
Phil
Sven
Thanks for the explanation! The code works but I have substantial runtime issues. Not a big dataset, 40 columns and ~15k lines.
I have a sneaking suspicion it´s the last step when you add the base columns and “running total” as lists back into one table. Any thing I´ve missed here to adjust? Do I need to buffer the whole base table, not just one column from it? If so, how would I do this?
Thanks!
Philip Treacy
Hi Sven,
I don’t think you need to buffer the entire table as it’s the columns (lists) that the operations are being carried out on.
But if you can start a topic on our forum and attach a file with your data/query, I can take a look.
Phil
John Karagoulis
For the list.generate method, is there an easy way to create a running total up to the prior period? I have 50 years of data, and this method worked for creating a running total up to the current year for every row, but I want to also create a prior year running total that would sum all the values in the column where the year is less than the current year. Is this simple to accomplish with the fxRunningTotal function? Or is it more difficult?
Philip Treacy
Hi John,
Sounds like it can be done but I’d need to see your data to write any code. Please start a topic on our forum and attach a sample file of your data.
Regards
Phil
Andrea Hagel
Thank you very much for the post and the explanation.
The running total works well and my list is really not long. I only have 66 lines are I always generate the data for a specific time period.
What I need to know, if I can do the running total always for 12 months, so in January 2022 I want to have the total of the period from 02.2021 to 01.2022 in February 2022 the period from 03.2021 to 02.2022 and so on.
In excel that is easy but I am really struggling with Power Query.
Thank you for your help!
Philip Treacy
Hi Andrea,
Yes this is possible, but without seeing your data I’m not 100% sure how to implement it. Assuming that each row holds the data for 1 month, you can implement the running total like this
Download this example file to see this.
If that doesn't work for you, please start a topic on our forum and attach a sample of your data.
Regards
Phil
Venkatesh Avvaru
Thanks A Lot, its working
it is very helpful to Me…
Philip Treacy
No worries.
Paul Staggers
Hi, Thank you for the excellent article/video. What would the query look like if I had several distinct running totals to calculate, all under the same query? I’d like to call fxRunningTotal several times in the same query, each time for a different item. Possible?
Application is a Pareto Chart in Power BI. I am trying to calculate running totals and % of total as a function of time. I can do it using DAX measures, but wondered if calculated columns using your function would be a better way than creating tons of measures. Data generically looks like this:
Date | Fruit Total | Apples | Apples (%) | Oranges | Oranges (%) | Bananas | Bananas(%)
Catalin Bombea
Power Query is much slower than DAX, using tons of PQ functions will slow down your app.
Carlos
Hi all,
I wanted to ask you if it would be possible to use this function inside AddColumn. I was trying to add the result as a column in the source table but I dont find the way.
Greetings and thanks
Catalin Bombea
Hi Carlos,
A classic way to add a running total column looks like this:
let
Source = Excel.CurrentWorkbook(){[Name=”APIA”]}[Content],
#”Changed Type” = Table.TransformColumnTypes(Source,{ {“Amount”, type number}}),
#”Added Index” = Table.AddIndexColumn(#”Changed Type”, “Index”, 1, 1, Int64.Type),
BufferedValues = List.Buffer(#”Added Index”[Amount]),
#”Added Custom” = Table.AddColumn(#”Added Index”, “Custom”, each List.Sum(List.FirstN(BufferedValues ,[Index])))
in
#”Added Custom”
Basically, just add an index column, buffer the list of values separately from the original table, then Sum then values in the new column up to the current row index.
Alexis
This is great, but should we include the caveat to start the index at 1 instead of at 0. Starting at 0 seems to make it so the most recent entry isn’t added on, whereas starting at 1 does. I think this is because starting at 0 numerically decrements the count by one, stopping the running total from going to the most recent entry.
Alternatively, include the caveat that you can time-shift the running total by adjusting the index either side of “1”
I love this super-comprehensive post with all the main ways you would do this, and the comments are amazing too!
Philip Treacy
Hi Alexis,
Are you talking about using an Index column with the List.FirstN function? Yes you have to start the Index from 1 because to get the first row in the value column requires List.FirstN(Bufferedvalues, 1).
List.FirstN(Bufferedvalues, 0) returns nothing.
Regards
Phil
Artur
Hi,
Great function. I faced some issues with lists containing null values. I solved it by replacing null values with zeros while buffering the list using List.ReplaceMatchingItems function. I was trying to update the function but no luck. Is updating the list more efficient then changing the function? How the function can be changed to not be prone to null vales in list?
Regards
Artur
Philip Treacy
Hi Artur,
You could either change the list to replace nulls with 0 as you’ve done, or change the function to allow for nulls, like this
RT = List.Generate
(
()=> [RT = values{0}, counter = 0],
each [counter] < List.Count(values), each [RT = if values{[counter] + 1} is null then [RT] else [RT] + values{[counter] + 1}, counter = [counter] + 1], each [RT] )
Regards
Phil
Berkhan Eminsoy
The one thing is, you can’t sort running totals later on. The original list is hardcoded into the steps and if you wish to reorder or sort the columns you’re stuck with the running total of the original unordered, unsorted list. Otherwise great function, great insight!
Guillaume
Hi Phil, I would like to create a running total like this.
Everytime the value is 1, the running total starts and adds up with every other 1 values.
The running total stops (goes to 0) everytime there’s a value bigger than 1.
Is it possible?
Here’s an exemple. Thank you
Value RTotal
1 1
2 0
1 1
1 2
1 3
2 0
1 1
3 0
1 1
1 2
Philip Treacy
Hi Guillaume,
Yes this is possible, just need to change the Running Total function like so
(values as list) as list =>
let
RT = List.Generate
(
()=> [RT = if values{0} = 1 then 1 else 0, counter = 0],
each [counter] < List.Count(values), each [RT = if values{counter} = 1 then [RT] + 1 else 0, counter = [counter] + 1], each [RT] ) in RT
You can download a workbook with this code and your example data
Running Totals - Guillaume
Regards
Phil
Matthias
Hi Phil,
nice post with a good description of List.Generate. Thanks!
Off topic: You are succesfully redefining the data type for Index, whereas I get an Error:
= Table.AddIndexColumn(#”Kept First Rows”, “Index”, 1, 1, Int64.Type)
Expression.Error: 5 arguments were passed to a function which expects between 2 and 4.
Details:
Pattern=
Arguments=[List]
The Syntax description below cleary includes the optional type element. What could be the reason that it doesn’t work at my side (SAC 2002)?
Table.AddIndexColumn(table as table, newColumnName as text, optional initialValue as nullable number, optional increment as nullable number, optional columnType as nullable type) as table
Philip Treacy
Thanks Matthias,
I think you have an older version of Power Query where Table.AddIndexColumn takes a max of 4 arguments. I don’t know what version of PQ you have – what does SAC 2002 mean? You can try updating PQ but if that doesn’t work remove the last argument and set the data type in a separate step.
Regards
Phil
Matthias
Thanks Phil, I am on the Semi Anual Channel and I was just upgraded to the 2020 Feb version.
That would mean that this 5th element was introduced relatively lately. Actually my Power BI is from June 2020 and there too it is not yet available.
I tried this one step approach in the past without success and was surprised to see that you did apply it in your code.
Thanks for making me aware of the change!
Philip Treacy
Hi Matthias,
No worries. There are so many channels and updates it’s hard to keep track of who’s got what.
Regards
Phil
Jim Fitch
Terrific post, Phil. Thanks for the detailed explanations of List.Buffer & the List.Generate technique to create running totals. Understanding how the code works beneath the covers is very helpful. Looking forward to the Grouped Running Totals post.
Philip Treacy
Thanks Jim.
Jerry Helm
Philip, this is great information! I have had instances before where I needed to accomplish this but didn’t know how to do it.
In your website explanation (I got the email version), the image after “written in a way that’s easier to understand” is not being displayed.
Thanks for the great inside tip!!
Philip Treacy
Thanks Jerry, and thanks for spotting the image issue. I wasn’t seeing that in my browser (Chrome), I think it was handling the typo in the page code that was causing the image not to appear.