Today I want to share with you an Excel secret. It’s so powerful that when you know this your Excel life will become much easier.
Most of the time they’re struggling for one reason…
Their data is in the wrong format.
5 , oops, make that 6 main data format categories based on what I see most often.
Let’s first look at the perfect data format so you know what to aim for.
Excel Tabular Data - The Perfect Format
Tabular data is typically at a granular level of detail. Often transactional i.e. every row represents one transaction.
As is the case with the example above, where every row represents one order and has the related data for that order in a single row i.e. country, salesperson, units and amount.
When you start at this bottom level of detail it’s easy to summarise or slice and dice your data into any number of combinations with PivotTables, SUMIFS, COUNTIFS etc.
But when you skip this step and start with data that has already been summarised it is much more difficult to change the summary format, and it often means you can’t use a lot of the built in Excel tools and functions in the way they were intended.
You might think you're saving time by skipping this step but it will eventually come back to haunt you.
The rules for tabular data:
- Every record is housed on one row.
- Each Column contains a type of data e.g. date, order number, quantity, amount, salesperson, region etc.
- There are no blank rows or columns. Note: blank rows aren’t the end of the world but they will make your life more difficult than it needs to be.
- Column labels are in one cell per column and located on the first row. Note: they don’t have to be on the first row, but they need to be in one row only. i.e. not split over multiple rows.
- There are no subtotals interspersed in the data.
Now that you know the rules for perfect tabular data let’s look at some examples that break these rules so you know what not to do.
A semi-report is data that has had some degree of summarising already applied.
Like in the example below we can see that the data has been summarised into regions (UK and USA) as these have their own column labels, as opposed to the ideal tabular data format above which has a column label for ‘Country’ and the actual country/region is recorded on each row.
Signs of a semi-report (some or all of these will be present):
- Column labels are on two rows.
- Column(s) for Totals.
- Columns for time periods (weeks, months, years) or other grouping of data.
- Blank rows or columns.
- Rows containing subtotals and or a Total.
What’s wrong with a Semi-report:
- You can’t use this data in a PivotTable because a PivotTable can’t have blank columns or duplicate column labels, and column labels must be on one row only.In the semi-report above row 3 has ‘units’ and ‘order amount’ repeated. Note: this can be fixed with a Flat Data table – more on that below.
- You can’t easily change the way the data is summarised. For example what if your boss wanted you to change the view so that it was summarised by year like the example below?
You can do this in a PivotTable in seconds when your source data is in a tabular format but it’s not as easy when it’s in a semi-report format…I’m not saying it’s impossible, it’s just more complicated than it needs to be.
Flat Data Table
A flat data table is similar to a semi-report in that it has some degree of summarisation already applied, however it doesn’t have blank columns or column labels on two rows, which means you can use the data in a PivotTable report.
Signs of a Flat Data Table
- Column labels are on one row.
- Columns for time periods (weeks, months, years) or other summary levels like regions, departments etc.
- No blank rows or columns.
- No rows containing subtotals and or a Total.
Whilst the flat data table can be used in a PivotTable it still has limitations because there is already a degree of summarisation in the data.
That is; there is a limit to how much you can manipulate the data in your PivotTable. For example you can’t create the PivotTable report below using the Flat Data table above, but you can with the first Tabular Data example:
Note: you could use some complicated formulas to summarise the flat data table into the above report format, but why make your life difficult when you can do it more easily with tabular data.
Data Entry Format
The data entry format gets its name due to the intuitive layout which makes it easy for the person keying in the data.
However it doesn’t give any consideration to further analysis of the data.
This is perhaps the most frequent mistake I see people make and the one which I get the most questions from because people try to use functions they think should work, like SUMIF(S) and COUNTIF(S) etc.
However when your data is in the wrong format you can't use the built in functions the way they were intended.
Just imagine this data spans more than 100 columns (data for years 2011 to 2013) and your boss has asked you to tell him the total ‘Order Amt’ for UK and total for USA.
Sure you can do it with a complicated array formula, but if you had recorded your data in a Tabular data format you could create the above report and all the totals in a matter of seconds using a PivotTable.
And you could quickly calculate the total sales for UK and USA with a couple of SUMIF formulas referencing the Tabular data like this:
Note: my formulas above use Structured References because my tabular data is formatted as an Excel Table.
Excel Tables have many advantages from quickly writing formulas to automatic set up of dynamic ranges. If you aren't familiar with them I highly recommend you learn how to leverage their benefits.
The worst type! Often 3rd party systems attempt to ‘help’ by providing you with a series of default reports, all nicely formatted of course!
Unfortunately these reports often fall short of what you actually need and manipulating them into a format so that you can further analyse the data can be time consuming.
If you have a system that spits out data in a report format and you end up spending hours manipulating it to get what you actually need, find out if there is a way to get the transactional data i.e. tabular data, out of the system and import it into Excel.
Rest assured it’ll be there (after all it’s creating reports for you from this very data), you just have to find a way to get it out.
Then you can use the tools God gave you, I mean Microsoft gave you and wield your Excel magic to get what you really want.
Multi-Sheet or Multi-Workbook
I can't believe I initially forgot this format.
This is where you have a separate worksheet (tab) or workbook for each month, year, region, department, salesperson etc.
You might also have some of the other data formats present on each of these sheets/workbooks, but the mere fact that you have split your data into separate sheets/workbooks makes things very difficult to summarise or compare.
Since Excel 2007 you have more than 1 million rows to play with. It can handle a fair bit of data. You are only making things difficult if you split it over multiple sheets or workbooks.
Instead add a new column to your source/raw data sheet for the date/region/salesperson, whatever the category is that you have separate sheets for, and put the information from the tab name in the cells of your new column.
Think Tabular format - each row represents a record.
If you split it into multiple workbooks you cause even more problems for yourself and often end up with bloated files full of links to external places.
Tip: If your source data isn't going to change then you don't need a live link to it. Just bring the data into your analysis file as a value and analyse it from there.
Excel was designed to work with data in a tabular format. Tools like PivotTables and many of the functions work best with tabular data.
This is the format I recommend to my dashboard course members when building dashboards, because when your data is in a tabular format you can easily build dynamic reports that are quick to update.
I hope that from this you will have understood the difference between the different data formats and know that the best by far is Tabular data.
That’s not to say you can’t work with data in any of the other formats, it’s just that you’re making extra work for yourself if you do.
So on your head be it!
Converting Data to Tabular Format
If you suffer from any of the data problems above I have a few options for you:
- Use Power Query to unpivot and clean the data.
- Convert data with this Reverse PivotTable trick.
- Reverse engineer the data with Jon Acampora's Formula method.
Download the file and play around with analysing the different formats.
See how easily you can analyse the Tabular data compared to the other formats.
If you liked this or know someone who could use it please click the buttons below to share it with your friends on LinkedIn, Google+, Facebook and Twitter.