September 9, 2020
Hi Forum,
I have the following directory with three PDFs (for the moment).
They all have the same structure. The only thing that changes is the year in question (A1) and the reporting data.
The actual steps:
- NOK : I want to combine all the PDFs in this folder
- OK : delete the rows before and after (not relevant)
- OK : delete the Total columns (and Total rows)
- OK : change the first row to a title
- OK : make a pivot
- OK : reflect the year on my column headings. From the year mentioned in cell A1, 'Year: 2020', take 2020 and add it to the column with my months: jan → jan-2020.
The current code looks like this but does not work for combining all the files in a single folder.
let Source = Pdf.Tables(File.Contents("C:\Users\00060106\Downloads\Cadastre\Data\Decompte_2020.pdf"), [Implementation="1.3"]), Page1 = Source{[Id="Page001"]}[Data], #"En-têtes promus" = Table.PromoteHeaders(Page1, [PromoteAllScalars=true]), #"Type modifié" = Table.TransformColumnTypes(#"En-têtes promus",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Nombre de primes", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}, {"Column9", type text}, {"Column10", type text}, {"Column11", type text}, {"Column12", type text}, {"Column13", type text}, {"Column14", type text}, {"Column15", type text}, {"Column16", type text}, {"Column17", type text}, {"Column18", type text}, {"Column19", type text}, {"Column20", type time}}), #"Premières lignes supprimées" = Table.Skip(#"Type modifié",1), #"Dernières lignes supprimées" = Table.RemoveLastN(#"Premières lignes supprimées",5), #"Lignes filtrées" = Table.SelectRows(#"Dernières lignes supprimées", each ([Column2] <> "Total")), #"Colonnes supprimées" = Table.RemoveColumns(#"Lignes filtrées",{"Column6", "Column10", "Column14", "Column18", "Column19", "Column20"}), #"En-têtes promus1" = Table.PromoteHeaders(#"Colonnes supprimées", [PromoteAllScalars=true]), #"Type modifié1" = Table.TransformColumnTypes(#"En-têtes promus1",{{"Column1", type text}, {"Année : 2020", type text}, {"Jan", Int64.Type}, {"Fév", Int64.Type}, {"Mar", Int64.Type}, {"Avr", Int64.Type}, {"Mai", Int64.Type}, {"Jun", Int64.Type}, {"Jul", Int64.Type}, {"Aoû", Int64.Type}, {"Sep", Int64.Type}, {"Oct", Int64.Type}, {"Nov", Int64.Type}, {"Déc", Int64.Type}}), #"Personnalisée ajoutée" = Table.RenameColumns(#"Type modifié1", List.Transform(List.LastN(Table.ColumnNames(#"Type modifié1"), 12), each {_, _ & "-" & Text.End(Table.ColumnNames(#"Type modifié1"){1}, 2)})), #"Rempli vers le bas" = Table.FillDown(#"Personnalisée ajoutée",{"Column1"}), #"Tableau croisé dynamique des colonnes supprimé" = Table.UnpivotOtherColumns(#"Rempli vers le bas", {"Column1", "Année : 2020"}, "Attribut", "Valeur") in #"Tableau croisé dynamique des colonnes supprimé"
I thank you for all the tips that would lead me to the solution.
BR,
Lionel
Power Query
Power Pivot
Xtreme Pivot Tables
Excel for Decision Making
Excel for Finance
Excel Analysis Toolpak
Power BI
Excel
Word
Outlook
Excel Expert
Excel Customer Service
PowerPoint
November 8, 2013
Hi Lionel,
Transform your file query into a function like this:
let //Source = Pdf.Tables(File.Contents("C:\Users\00060106\Downloads\Cadastre\Data\Decompte_2020.pdf"), [Implementation="1.3"]), Page1 = tbl, // (commented out the original text) Source{[Id="Page001"]}[Data], #"En-têtes promus" = Table.PromoteHeaders(Page1, [PromoteAllScalars=true]), #"Type modifié" = Table.TransformColumnTypes(#"En-têtes promus",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Nombre de primes", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}, {"Column9", type text}, {"Column10", type text}, {"Column11", type text}, {"Column12", type text}, {"Column13", type text}, {"Column14", type text}, {"Column15", type text}, {"Column16", type text}, {"Column17", type text}, {"Column18", type text}, {"Column19", type text}, {"Column20", type time}}), #"Premières lignes supprimées" = Table.Skip(#"Type modifié",1), #"Dernières lignes supprimées" = Table.RemoveLastN(#"Premières lignes supprimées",5), #"Lignes filtrées" = Table.SelectRows(#"Dernières lignes supprimées", each ([Column2] <> "Total")), #"Colonnes supprimées" = Table.RemoveColumns(#"Lignes filtrées",{"Column6", "Column10", "Column14", "Column18", "Column19", "Column20"}), #"En-têtes promus1" = Table.PromoteHeaders(#"Colonnes supprimées", [PromoteAllScalars=true]), #"Type modifié1" = Table.TransformColumnTypes(#"En-têtes promus1",{{"Column1", type text}, {"Année : 2020", type text}, {"Jan", Int64.Type}, {"Fév", Int64.Type}, {"Mar", Int64.Type}, {"Avr", Int64.Type}, {"Mai", Int64.Type}, {"Jun", Int64.Type}, {"Jul", Int64.Type}, {"Aoû", Int64.Type}, {"Sep", Int64.Type}, {"Oct", Int64.Type}, {"Nov", Int64.Type}, {"Déc", Int64.Type}}), #"Personnalisée ajoutée" = Table.RenameColumns(#"Type modifié1", List.Transform(List.LastN(Table.ColumnNames(#"Type modifié1"), 12), each {_, _ & "-" & Text.End(Table.ColumnNames(#"Type modifié1"){1}, 2)})), #"Rempli vers le bas" = Table.FillDown(#"Personnalisée ajoutée",{"Column1"}), #"Tableau croisé dynamique des colonnes supprimé" = Table.UnpivotOtherColumns(#"Rempli vers le bas", {"Column1", "Année : 2020"}, "Attribut", "Valeur") in #"Tableau croisé dynamique des colonnes supprimé"
Now, change the file query name to GetData
Start a new query FROM FOLDER, browse to this path where you have all the pdf files: C:\Users\00060106\Downloads\Cadastre\Data
Add a new column with the formula:
=Pdf.Tables(File.Contents([Path] & [Name]), [Implementation="1.3"]),
Expand this new column, you should see the pdf tables. Filter this table until you leave only the tables you are interested into, then add a new column with the new function, passing the Data column tables:
=GetData([Data])
If the files are the same, then the same processing steps set in the GetData function should work on each file (unfortunately, PDF files are not consistent, so you might get errors, even if the structure is apparently the same).
September 9, 2020
Hi Catalin Bombea,
Thanks for the reply which led me to a solution that seems to work.
I have created a function to retrieve all the PDFs in my folder (with some pre-processing):
let FonctionFolder=(FolderPath,SheetName)=>
let
Source = Pdf.Tables(File.Contents(FolderPath), [Implementation="1.3"]),
Page1 = Source{[Id=SheetName]}[Data],
#"En-têtes promus" = Table.PromoteHeaders(Page1, [PromoteAllScalars=true]),
#"Type modifié" = Table.TransformColumnTypes(#"En-têtes promus",{{"Column1", type text}, {"[image]", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}, {"Column9", type text}, {"Column10", type text}, {"Nombre de primes de naissance par CAF, rang et mois", type text}, {"Column12", type text}, {"Column13", type text}, {"Column14", type text}, {"Column15", type text}, {"Column16", type text}, {"Column17", type text}, {"Column18", type text}, {"Column19", type text}, {"Column20", type time}}),
#"Premières lignes supprimées" = Table.Skip(#"Type modifié",1),
#"Dernières lignes supprimées" = Table.RemoveLastN(#"Premières lignes supprimées",5),
#"Colonnes supprimées" = Table.RemoveColumns(#"Dernières lignes supprimées",{"Column6", "Column10", "Column14", "Column18", "Column19", "Column20"}),
#"Personnalisée ajoutée" = Table.AddColumn(#"Colonnes supprimées", "Personnalisé", each if Text.Start([#"[image]"],5) = "Année" then Text.End([#"[image]"],4) else null),
#"Rempli vers le bas" = Table.FillDown(#"Personnalisée ajoutée",{"Personnalisé"})
in
#"Rempli vers le bas"
in FonctionFolder
Then I do the final processing with a pivot at the end:
let
Source = Folder.Files("C:\Users\00060106\Downloads\Cadastre\Data"),
#"Autres colonnes supprimées" = Table.SelectColumns(Source,{"Name", "Folder Path"}),
#"Personnalisée ajoutée" = Table.AddColumn(#"Autres colonnes supprimées", "Personnalisé", each FonctionFolder([Folder Path]&[Name],"Page001")),
#"Personnalisé développé" = Table.ExpandTableColumn(#"Personnalisée ajoutée", "Personnalisé", {"Column1", "[image]", "Column3", "Column4", "Column5", "Column7", "Column8", "Column9", "Nombre de primes de naissance par CAF, rang et mois", "Column12", "Column13", "Column15", "Column16", "Column17", "Personnalisé"}, {"Column1", "[image]", "Column3", "Column4", "Column5", "Column7", "Column8", "Column9", "Nombre de primes de naissance par CAF, rang et mois", "Column12", "Column13", "Column15", "Column16", "Column17", "Personnalisé.1"}),
#"Colonnes supprimées" = Table.RemoveColumns(#"Personnalisé développé",{"Name", "Folder Path"}),
#"Colonnes renommées" = Table.RenameColumns(#"Colonnes supprimées",{{"[image]", "RANK"}, {"Column3", "01"}, {"Column4", "02"}, {"Column5", "03"}, {"Column7", "04"}, {"Column8", "05"}, {"Column9", "06"}, {"Nombre de primes de naissance par CAF, rang et mois", "07"}, {"Column12", "08"}, {"Column13", "09"}, {"Column15", "10"}, {"Column16", "11"}, {"Column17", "12"}, {"Personnalisé.1", "YEAR"}}),
#"Lignes filtrées" = Table.SelectRows(#"Colonnes renommées", each ([RANK] <> "Total") and ([01] <> "Jan")),
#"Colonnes permutées" = Table.ReorderColumns(#"Lignes filtrées",{"Column1", "RANK", "YEAR", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"}),
#"Rempli vers le bas" = Table.FillDown(#"Colonnes permutées",{"Column1"}),
#"Tableau croisé dynamique des colonnes supprimé" = Table.UnpivotOtherColumns(#"Rempli vers le bas", {"Column1", "RANK", "YEAR"}, "Attribut", "Valeur"),
#"Colonnes renommées1" = Table.RenameColumns(#"Tableau croisé dynamique des colonnes supprimé",{{"Attribut", "MONTH"}, {"Valeur", "DATA"}}),
#"Type modifié" = Table.TransformColumnTypes(#"Colonnes renommées1",{{"Column1", type text}, {"RANK", type text}, {"YEAR", type text}, {"DATA", Int64.Type}}),
#"Personnalisée ajoutée1" = Table.AddColumn(#"Type modifié", "Personnalisé", each [YEAR] & [MONTH]),
#"Type modifié1" = Table.TransformColumnTypes(#"Personnalisée ajoutée1",{{"Personnalisé", type text}}),
#"Colonnes renommées2" = Table.RenameColumns(#"Type modifié1",{{"Personnalisé", "MONTHYEAR"}}),
#"Colonnes permutées1" = Table.ReorderColumns(#"Colonnes renommées2",{"Column1", "RANK", "MONTHYEAR", "YEAR", "MONTH", "DATA"})
in
#"Colonnes permutées1"
BR,
Lionel
1 Guest(s)