Active Member
June 15, 2021
While I'm knowledgeable with excel I am new to Power Query (been using it for about a week) and I'm struggling with some raw data from a .csv file. I'm struggling to find a work around within power query to automate a task. In excel I would use the INDEX/MATCH function to solve but I can feel that there is a better way in PQ.
My task is as follows:
- Download a .csv file from a third party containing raw data
- Use power query to tidy it up so as to have one row of data per set of information
- Upload the clean data onto my website
My problem:
- The raw .csv has each set of information duplicated over severals rows and I need one row per set of information
- The information is identical until it gets to the heading "Role" (see attachment) and then it forks and each "Role" has its own information (contact person, phone number etc.)
- NB: in the attached image I filtered the data to show one Information Set but the raw data has hundreds of different Information Sets
My goal:
- Have a column/header for each and every Role type (which was easily done with Conditional Column)
- Have the follow on information (phone number of Role etc) fill up/down to make each row identical
- Remove duplicates to be left with one row per data set
Help:
- My knowledge of Power Query is failing me here. Fill Up/Down doesn't seem to be a solution because there is blank cells and fill down spills into the next information set
- I suppose I can Close and Load to excel and then manually Index/Match before removing duplicates, but then its the manual input I wanted to get away from doing.
- Can someone see a betetr way of doing this?
Many thanks in advance,
VIP
Trusted Members
December 7, 2016
July 16, 2010
Hi Stephen,
I think what you want is the wrong approach. The current layout of your data is correct. i.e. it's in a tabular format. Your goal of "Have a column/header for each and every Role type" would make this data very difficult to summarise/analyse later. Perhaps Power Query is the wrong tool for your goal. A PivotTable would be better, but it's difficult to say without seeing a sample Excel file containing your data (desensitised if required) and a mock up of your desired result.
Mynda
Active Member
June 15, 2021
Thanks for taking the time to reply folks.
I should have clarified that the information isn't going to be displayed in excel; excel is only the vehicle to tidy up the data before exporting as .csv onto my website.
I'm restricted by the how the website wants the information input. Long story short, the wesbsite will display each excel row as a unique URL page, so I need to condense the information into one row.
See attachment with dummy data with the raw data on tab 1 and the desired format in tab 2.
Power Query
Power Pivot
Xtreme Pivot Tables
Excel for Decision Making
Excel for Finance
Excel Analysis Toolpak
Power BI
Excel
Word
Outlook
Excel Expert
Excel Customer Service
PowerPoint
November 8, 2013
Hi Stephen,
Looks like a simple Pivot column operation:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Pivoted Column" = Table.Pivot(Source, List.Distinct(Source[Role]), "Role", "Company Name")
in
#"Pivoted Column"
However, you seem to have duplicate roles for the same project, is this possible in the real data? Not sure if it's just a sample data error.
For example, Project 4 has Roofing subcontractor role 3 times.
For this case, the solution is more complicated, duplicated should be merged under that role.
Active Member
June 15, 2021
Thank you Catalin,rn rnIt looks like you have done exactly what I was aiming to do (and yes, the raw data may contain a number of different companies fulfilling the same "roles" so that solution of merging would have been my next question!)
As I am new to PQ, I didn't quite follow each of the steps. I pasted each of your step into a word document so I could have it open while I followed it in PQ, but:
- I don't know where to post the Source code above. I thought maybe in View > Advanced Editor but it gave me and error: Expression.SyntaxError: Invalid identifier.
- I got as far as "Add custom column" and pasted in the code from your first step, albeit it showed up as a 'Function' whereas yours showed up as 'Table'.?
- I then looked at how to go about with the "Expanded custom" step but I see no option for this
How does one carry out the step "Expanded custom" - google was not very helpful. I even tried watching Pivot Column videos on YouTube.
I'm sorry, I really am quite puzzled and you kind of jumped to the final answer without showing your workings haha. Without taking up too much of your time, could you slow it down a bit and provide commentary on each step?
Sincerest thanks in advance and thansk for sticking with this one!
Power Query
Power Pivot
Xtreme Pivot Tables
Excel for Decision Making
Excel for Finance
Excel Analysis Toolpak
Power BI
Excel
Word
Outlook
Excel Expert
Excel Customer Service
PowerPoint
November 8, 2013
Sorry, that merging question should be asked from the very beginning, because that changes the entire solution. It's not something you can patch on the go, it should be part of the design stage.
This time I took a guess based on the sample data, but it will be really helpful to have a clear image from the beginning, otherwise it will waste time.
The query is not easy to understand if you have no PQ experience unfortunately.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", (x)=> Table.SelectRows(Table.Group(Source, {"Project Title"}, {{"Count", each Table.SelectColumns(_,{"Role","Company Name"}), type table}}), each _[Project Title] =x[Project Title])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Count"}, {"Count"}),
#"Removed Duplicates" = Table.Distinct(#"Expanded Custom", {"Project Title"}),
#"Added Custom1" = Table.AddColumn(#"Removed Duplicates", "Custom", each ConvertTable([Count])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Count"}),
#"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns", "Custom", Table.ColumnNames(Table.Combine(#"Removed Columns"[Custom])), Table.ColumnNames(Table.Combine(#"Removed Columns"[Custom])))
in
#"Expanded Custom1"
Added Custom: in this column, I grouped Roles and Companies for the current Project title.
After collecting the roles and companies, we can remove duplicate project rows, we have what we need in the added column;
in #"Added Custom1", we use the custom function ConvertTable, that performs the company merge by role and will also transpose the roles into columns, so it's an important step.
#"Expanded Custom1" will simply expand the table converted by ConvertTable function.
To help you understand how ConvertTable works, I added a sample table and a query that does the same thing as the function mentioned ("SampleRoleMerging" query) so you can see each step.
1 Guest(s)