Hi,
I want to create a query that combines all files and all sheets within those files. However I have come across an issue with the file format that the folder contains.
All of the files have .xls extensions. When I create the query I can see all of the files in the editor. I am then adding a custom column using Excel.Workbook([Content]). I then expand the column but I am not seeing the columns I expect. I don't see the "Item", "Kind", or "Hidden" ones. I want to filter on the "Kind" one to say include "Sheet".
If I save the files as .xlsx files then I do see the additional columns. I can't convert all of the files to this format as they are a scheduled download from a third party and there are many of them.
Any ideas how to get around this? Is there another way to combine the files and pick up all sheets?
I have attached three example .xls files. These are the ones that I want to combine. There are two worksheets in each file but there could be more or less in the actual files I need to combine.
The Summary.xlsx file is the desired output I need.
Thanks
Bax
Hi Bax,
You can try adding a query for a single workbook, process it as you need, then transform this query into a function by simply adding this line before the first let statement:
(FilePath)=>
Then, in the Source step, replace the path with the parameter above:
Source = Excel.Workbook(File.Contents(FilePath), null, true),
Now, close the query and you have to change the query name to something more relevant: ProcessFile
Instead of adding a column with Excel.Workbook function in the folder query, add a column with the new function: ProcessFile(FolderPath&Name)
This will process all the files from that folder in the exact way you processed the first file transformed into a function.
You can also expand the Content column, in the newest versions of Power Query this will also combine the files automatically, then you have to apply the steps needed to cleanup data.
Hi Catalin,
Thanks for the response. What I can't work out is how to include all of the sheets from all of the files.
Bax
Not sure what you mean.
Even if you add a new column and use Excel.Workbook([Content]), 2 columns will show up after you expand the new column: Name and Data. The Name column will contain all sheet names from the folder files, all you have to do is to expand the Data column and cleanup data, remove duplicate headers, and so on, all sheets data will be there.
Is each sheet from a workbook the same? If not, you will have to process each sheet type separately. If you can upload a sample file, will help us understand your situation.
Hi Catalin,
There are three example files attached to my original post. There is also a file called Summary which indicates what I am hoping to get when combining the files using PQ.
Let me know if you can't see the files and I will try and upload them again.
Thanks
Bax
Hi Bax,
Atached is a query made from a folder with your 3 sample files.
In CombinedFiles query, in "Expanded Table Column1" step, there are 3 columns: Source.Name (this is the file name), Name (Sheet Name) and Data (this column contains data from all those sheets)
All I did is to expand the Content binary column, this automatically extracts and combines the files, as mentioned before there is no need to use the Excel.Workbook([Content]) function.
I also made a query using Excel.Workbook([Content]) function, in CombinedFiles (2) query.
Same results, when you expand the column, you will get the sheet names and data.
Still not seeing what the problem is, you can filter out some sheets only if you don't need them, otherwise the queries will extract data from all sheets by default.
Make sure you change the path to the folder in the Source step, if you want to test the queries on your side.
Hi Catalin,
Thanks for your input. I have worked out where I was going wrong. When I was combining the files it asks you to select the object to be extracted from each file. I was clicking on a specific worksheet name rather than the top level option "Sample file parameter1" (see image attached). This meant that the subsequent queries were using the specific worksheet name I had selected. When it couldn't find it it caused an error.
Thanks again.
Bax