In this post we’re going to look at how you can import multiple files containing multiple sheets with Power Query, even if the data isn’t formatted in an Excel Table. In other words, the worst data layout ever!
The data in these sheets also doesn’t include the date that the data relates to, but thankfully the file name does.
As you can see below, I’ve got three of these terrible files in a folder:
Watch the Video
Download the Files
Enter your email address below to download the sample workbook.
Import Multiple Files Containing Multiple Sheets with Power Query
Step 1: Get the data
Data tab > Get Data > From File > From Folder
Step 2: Select the folder
Click ok and at the folder screen click Transform Data:
Step 3: Remove Other Columns
Hold SHIFT to select the columns containing the meta data that you want to retain, plus the Content column > right-click > Remove other columns. In this example I want to keep the file name (partially hidden behind the right-click menu in the image below) because it contains the date information for the data in each file:
Step 4: Add Custom Column
On the Add Column tab > Custom Column. In the formula field enter the formula shown below:
Step 5: Expand the Tables
Click on the double down arrow on the Custom column > click OK:
Step 6: Apply Filters
We can now see a list of the objects in the files (sheets, tables, named ranges etc.). If your files contain tables, sheets or named ranges that you don’t want, use the ‘Kind’ column to filter them out. As you can see in the image below, my files don’t contain any unwanted data:
If you click in the white space beside the ‘Table’ in the Custom.Data column, you’ll get a preview of the data in the preview pane at the bottom of the window:
Step 7: Remove Unwanted Columns
Before expanding the individual tables in the Custom.Data column, you should remove columns that you don’t want in your final dataset. Note: you must retain the Custom.Data column at the very least!
I want to keep the file name as this contains the date information and the Custom.Name column as this contains the product category information from the worksheet tab names.
Step 8: Expand Tables
Click on the double arrow on the Custom.Data column and deselect the ‘Use original column name as prefix’ check box before clicking OK:
Step 9: Promote Headers
You should now see the data from each file and worksheet in one table.
It’s time to tidy up the headers. The first row contains header labels. To promote them, click the drop down in the top left of the table > Use First Row as Headers:
Step 10: Rename Columns
Rename the first two columns: File Name and Category. Double click the column header to edit.
Step 11: Remove Extra Header Rows
The headers from the other sheets are still occupying rows in the data set. Click the Product Column filter button and remove ‘Product’ from the list:
Extracting Dates from File Names
At this point you may be finished. However, I want to extract the date from the file name, so I have a few more steps to go.
Step 12: Extract the Date from the File Name
The Date column contains the file names, so let’s extract the date from the file name and convert it to a proper date.
Select the File Name column > Transform tab > Extract > First Characters:
I want to discard the .xlsx and keep the first 7 characters:
Step 13: Split Month and Year
Now I need to split the month and year into separate columns. Select the File Name column > Home tab > Split Column > By Delimiter:
In the Split Column by Delimiter dialog box choose the following:
You now have the month in File.Name1 and the year in File.Name.2. Power Query should automatically apply a ‘Changed Type’ step to convert these columns to data type; date and you should see it in the Applied Steps pane. If not, select File.Name.1 and File.Name.2 columns > Home tab > Data Type > Whole Number:
Step 14: Create Date Column
Now we can join the columns back together with a custom column (Add Column > Custom Column), as a date using the #date function. The syntax is:
#date( year, month, day)
Notice in the image below that I’ve added 1 for the day value, so all data is as at the first of each month:
Step 15: Remove Columns
Now we can tidy up the columns, deleting File Name.1 and File.Name2 as we don’t need these anymore.
Step 16: Set Data Types
The Sales & Date columns need data types set. Use the drop-down icons to the left of each column header to set the data types:
Step 17: Rename Query
Lastly, rename the query in the Query Settings Properties with something useful as this will be the Table name in Excel/Power Pivot.
Now you’re ready to Close & Load:
So, there you have a relatively easy way to import Multiple Files Containing Multiple Sheets with Power Query. There are a lot of steps, but everything is point and click simplicity.
Similar Power Query Scenarios
|Get data from one file containing multiple sheets
Splitting data over multiple sheets is perhaps one of the worst Excel crimes I see. It’s a crime because it breaks the rule that source data should be in a tabular format. Tabular data is what we need for PivotTables and many functions like SUMIFS, COUNTIFS, INDEX, VLOOKUP etc. However, it’s easily fixed with Power Query.
If you receive your data in separate daily, weekly or monthly files then before you can even begin analyzing the data, you need to consolidate it into one table.