I guess this is a general question about how to handle data where you are interested in different levels of aggregation (in this case, different levels of geography) ...
To reinforce my understanding of the PowerBI coursework that I am going through, I'm trying to emulate an official output - https://commonslibrary.parliament.uk/social-policy/housing/home-ownership/constituency-data-house-prices/ - that is powered by PowerBI. The aim is to showcase house prices by parliamentary constituency, alongside corresponding regional and national figures.
I can mostly replicate the official version, by using a modified version of the dataset attached and a dates table to achieve desired ordering, but just wondering if the most efficient approach is to have one mega data file containing constituency, regional and national figures, or whether I should be splitting this into separate tables for constituency, region and country. When I tried the latter approach, I struggled to establish relationships between the different tables and certainly did not get anything that looked like a star (or snowflake).
Mega or separate ...? Any thoughts?
Bob
Hi Bob,
Your file didn't get attached. Can you try again please?
Mynda
Hi Mynda
I think original file exceeded max file size. Truncated version (first 1000 rows of data only) hopefully attached.
Bob
Thanks, Bob.
I would split this data into two tables:
1. A Dimension table that contains the first 6 columns from ConstID to CountryName. You can use Power Query to extract the columns and then remove duplicates.
2. A Fact table that contains the remaining columns plus the ConstID.
You would create a relationship between the ConstID columns.
This will result in a better compressed file because you'll be removing duplicates from the ConstituencyName column through to the CountryName column from your fact table.
It's possible you could also replace the columns that calculate the percentage change, median and ratios with DAX measures, but I haven't looked at them closely to be sure all of the required data is there.
Mynda