Forum

Combine files with ...
 
Notifications
Clear all

Combine files with different header names

12 Posts
3 Users
0 Reactions
261 Views
(@lanser)
Posts: 61
Estimable Member
Topic starter
 

Hi all,

I am sure I have done this before but can't remember how

I have a folder with csv files that I need to combine, they contain sales forcasts for the next 14 days so everyday a new csv file is generated with columns for the next 14 days..

We keep the old data to refer back to and also the number of products can vary month to month and the file date is above the headers. I know how to extract that its dealing with the different header names that I am stuck on.

Ive attached some example files so you can see what I need.

 

John

 
Posted : 22/12/2020 10:09 am
(@catalinb)
Posts: 1937
Member Admin
(@duncanwil)
Posts: 11
Active Member
 

Firstly, John, remove that first row in every testx.csv files because it is redundant as it seems the entry in row 2, column B relates to the date give in row 1rnThat has the benefit of allowing Query to use the new first row as the header row for each file.

I then agree with Mynda that you need to create a separate Query for each .csv file ... no big deal in this example.

Then once you have Appended your files, you can unpivot the data, Close & Load it and create a Pivot Table very easily now and analyse your data. 

 
Posted : 24/12/2020 10:03 am
(@catalinb)
Posts: 1937
Member Admin
 

1.jpg

Using the File attached, any csv files can be combined, no matter how different their headers are.

Of course, like Duncan said, first row is irrelevant and must be removed. The image attached shows how the result looks like.

However, this demonstrates that you can combine from folder, without having to create a query for each source file.

 
Posted : 24/12/2020 10:32 am
(@duncanwil)
Posts: 11
Active Member
 

Thanks Catalin

I did take a look at your Get CSV function earlier but when I tried to get it to import all three csv files at once, it asked me to give it a file path rather than a folder path. That led to three files, three queries ... did I use the function wrongly?

 
Posted : 24/12/2020 4:16 pm
(@catalinb)
Posts: 1937
Member Admin
 

@Duncan:

If you tried to open the function, you actually invoke the function and that indeed asks for a file path.

The file is supposed to be used as is, the only change you need to do is to change the FOLDER path from settings sheet. Then you refresh the query, data will be extracted from ALL csv files from that folder with a single query.

If you want to look at the GetCSV function, avoid opening it directly, this action invokes the function and creates a new query. Instead, open another query, from the queries list on the left select the function and open the Advanced Editor, it's the only way to see the code without invoking the function.

12.jpg

 
Posted : 24/12/2020 11:55 pm
(@duncanwil)
Posts: 11
Active Member
 

Thanks again, Catalin,

I read the file more carefully this time and pasted my own folder path into the first sheet and it worked as you designed it.

I know the Advanced Editor, thanks, even though I will not say I am an M expert at all. I have installed a couple of other functions before. 

I tried an example of my own in your file but I think I have found that your function does not pick up the column headers from those csv files ... I will look at your code after Christmas Day and see what I might suggest ... either because of my file or because of the code.

Merry Christmas if you celebrate it!

Duncan

 
Posted : 25/12/2020 9:36 am
(@catalinb)
Posts: 1937
Member Admin
 

Hi Duncan,
The function is adjusted for this topic, John's test files have the headers AFTER the first row. You have to remove the line in GetCSV function that removes the first row.
The code for this scenario is:

(FilePath)=>
let

#"Imported CSV" = Csv.Document(File.Contents(FilePath),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
Skip = Table.Skip(#"Imported CSV",1),
PromoteHeaders = Table.PromoteHeaders(Skip, [PromoteAllScalars=true])

in
PromoteHeaders

 

For a normal csv, the code should not have the Skip first row step:

(FilePath)=>
let

#"Imported CSV" = Csv.Document(File.Contents(FilePath),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
PromoteHeaders = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true])

in
PromoteHeaders

 
Posted : 25/12/2020 10:01 am
(@duncanwil)
Posts: 11
Active Member
 

There you are! Solved.

Thanks, Catalin, I will be using that function from now on!

 

Duncan

 
Posted : 26/12/2020 7:12 pm
(@lanser)
Posts: 61
Estimable Member
Topic starter
 

Hi Guys,

Thanks for all the replies so far but one key thing is missing the date in the first row isn't irrelevant.

If you look at the result csv I need to filter by the product and that date, it basically tracks forecasts for example

on the 1st it was forecast that on the 3rd abc would require 1 unit

but on the 2nd the forecast for the 3rd had changed to 2 units

date product 01/01/21 02/01/21 03/01/21 04/01/21 05/01/21 06/01/21 07/01/21
01/01/21 abc 1 1 1 1 1    
02/01/21 abc   2 2 2 2 2  

so I need to keep that date in the first row linked to each product in that csv and of course durin a year that will end up at 365 csv's which is why I was trying to avoid a query for each csv file.

at the moment I am looking at 2 querys one to pull the first line date linked to a id based on csv name and a second to do as in the GetCsv function then merge based on the csv name

regards

John

 
Posted : 30/12/2020 10:40 am
(@catalinb)
Posts: 1937
Member Admin
 

If you need to keep the first value, you can add it as a new column. This is how the function should look like:

(I collected the first cell value in a parameter before removing the first row, then added the value in a new column)

(FilePath)=>
let
Source = Csv.Document(File.Contents(FilePath),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
FirstCellValue = Table.ToColumns(Source){0}{0},
Skip = Table.Skip(Source,1),
PromoteHeaders = Table.PromoteHeaders(Skip, [PromoteAllScalars=true]),
AddColumn = Table.AddColumn(PromoteHeaders, "Date", each FirstCellValue)
in
AddColumn

A normal query , will look like:

let
Source = Csv.Document(File.Contents("C:CsvFile.csv"),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
FirstCellValue = Table.ToColumns(Source){0}{0},
Skip = Table.Skip(Source,1),
PromoteHeaders = Table.PromoteHeaders(Skip, [PromoteAllScalars=true]),
AddColumn = Table.AddColumn(PromoteHeaders, "Date", each FirstCellValue)
in
AddColumn

The only difference between the function and this query is that the first line disappears. That's what converts any normal query to a function.
Because the file path will no longer be provided, hard type the sample file path:
File.Contents("C:CsvFile.csv")

This will allow you to view the steps, add more processing if needed.

After you are done with converting one of the csv files, convert the query back to a function.

This is the best way to build a function. Start with a simple query from one of the files, then convert the query into a function by adding the first row with one or more parameters as needed, building such a custom function is very easy.

 
Posted : 31/12/2020 4:46 am
(@duncanwil)
Posts: 11
Active Member
 

I understand the points you make John but I still wonder about that first line. As far as I can tell, the date in cell B2 is exactly the same date that is in cell B3 of all of your files. That is why I said that first line is redundant and it would make your life a little easier if you were to work with cell B3 and remove that first line. Otherwise, Catalin has your problem in hand, I think!

 
Posted : 31/12/2020 6:22 pm
Share: