May 2, 2014
cvxcvx
I have a large flat file, which I got from Kaggle which I want to normalise into a number of tables: gender, neighbourhood, NoShow and the actual appointments table to load into a PowerBI model. What I have in mind is to start with the flat file and then create a number of queries to create the gender, neighbourhood, NoShow tables using the Reference option e.g. start with the flat file and then use the Reference option to create another query in which I would extract just the neighbourhood details and remove duplicates - thereby giving me a table for neighbourhood. I also want to create a calendar table as well..Am I on track with this? Am attaching just a data sample.
July 16, 2010
Hi Anne,
I can't see any benefit to creating a neighbourhood table since there is only one column containing this data.
If there were multiple columns e.g. neighbourhood, suburb, postcode that you could move to a dim table and then remove from the fact table all but the column you need for the relationship, then it would be more efficient to create a dim table for this data. However, as it is, there'd be no benefit. In fact it would just add data to your model.
Mynda
Answers Post
May 2, 2014
So if I wanted to use this flat file as the basis for a PowerBI demo - should I normalise it with reference queries or just go with it as is? I want to use it to demo different outcomes of NoShow by using slicers by gender, neighbourhood and possibly day of the week. (Still flailing around a bit with it - so just want to see what is the most effective way to set up the data..
1 Guest(s)