There are several ways to rank things, dense ranking is when items that compare equally receive the same ranking number, with subsequent items receiving the next ranking number, with no gaps between numbers, like this:
I have some data in an Excel table for students who are studying Spanish or English. They've just taken exams and I'm going to use Power Query to create a dense ranking for the scores they received in those exams.
I'm going to use two queries to load the table of data. The first query called Scores just gives us the same 20 row table, sorted first by Course (ascending) and then by Score (descending).
The 2nd query is called Ranks and this is where most of the work is done. After loading the same source table, the first thing to do is remove all columns except Course and Score.
Then remove duplicates in the Score column
Next, Group By the Course column
To give us a table with two rows, one for each course. Each row in the Count column contains a table that contains all the scores for that course.
Now the best part, by adding a Custom Column and using this to add an Index Column to each item in the Count column, because each item is a table of scores for that course,
you end up with another table in each row of the new custom column (called Rank) that has assigned a ranking (index) to every score for each course.
Now by expanding the tables in the Rank column you end up with this, a table with each score in each course ranked.
There are only 13 rows in this table but we have 20 rows in our source data so we need to merge (join) the Scores and Rank tables together into a new query
The result is a table in every row of our new query that includes the rank for that combination of Course and Score.
Expanding the column of tables gives this
Can you see the problem? All the ranks are wrong.
So what is going on? I'm not 100% sure. I've read several blog posts and articles where similar issues are described, and I've seen this same kind of problem occur with sorting and removing duplicates.
My understanding is that Power Query presents one view of how data is stored, as in the end result of the Ranks query above, but it actually stores it in another way/order.
This does seem odd but the explanation I've seen given is that PQ uses lazy evaluation - it only really evaluates something when it is actually needed. So as you are going through building a query with various steps, the data you see in the preview isn't necessarily the data you're going to get when you run the query for real.
I'm not convinced that this is desirable behaviour, but the solution appears to be to use Table.Buffer. Table.Buffer takes a table and stores it in memory after evaluating it. This seems to be the key point.
As you add steps to your query, and as that query is run, each step is evaluated and the data in the step may be evaluated many times. What does it mean to evaluate? It means PQ checks the data to see what it is. But there appears to be no guarantee that the data is stored in an expected, ordered state.
You could sort a list but in a subsequent step that sorting is lost. Or as we have here, we've created a ranking that isn't applied correctly, even though when you examine the table in the Ranks column, it shows you the correct rank.
What is really puzzling is that as the query is doing a merge, it is matching up two columns, the Course and the Score, so shouldn't it follow that the Dense Rank value in that row in the Ranks table should be correct?
The fact that it isn't would imply that the join isn't working properly. If the join can attach the correct Course and Score from the Ranks table to the Scores table, why is the Dense rank value wrong?
Anyway, the fix is to wrap the Ranks table in Table.Buffer inside the join step.
Buffering the table like this means the table is held in computer memory in a known state. The query evaluates Ranks once and then does not evaluate it again. The order of the elements in the table won't change.
With Table.Buffer in place, the result of the join is now correct.
We might have to ask Microsoft what is actually going on here.