Hi,
I have 100's of pdf's like the one attached. I would like to be able to import them all and reconfigure into a table like the one I have also attached. I have tried it with PowerQuery but I don't know enough to make it work.
I am running Office 365 on Windows 11.
Your assistance would be greatly appreciated.
Wayne
Hi Wayne,
You should use Get Data> From File> From Folder, then you can filter for a specific file extension-.pdf in your case.
Example attached, change the folder path from Path sheet to your pdf folder path.
sample queries:
let
FolderPath = Excel.CurrentWorkbook(){[Name="FolderPath"]}[Content]{0}[Folder Path],
Source = Folder.Files(FolderPath),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".pdf")),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each GetPDFData([Content])),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
ColNames = Table.ColumnNames(Table.Combine(#"Removed Other Columns"[Custom])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", ColNames, ColNames)
in
#"Expanded Custom"
Code for the function that reads data from a single pdf file GetPDFData:
(bin as binary)=>
let
#"Imported PDF" = Pdf.Tables(bin),
#"Filtered Rows1" = Table.SelectRows(#"Imported PDF", each ([Kind] = "Table")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Data"}),
ColNames = Table.ColumnNames(Table.Combine(#"Removed Other Columns"[Data])),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", ColNames, ColNames)
in
#"Expanded Data"
Thank you, Catalin. I will let you know if it works as soon as I get a chance to try it.