Active Member
August 5, 2022
Hi All,
I am a bit stuck with combining (..a lot..) of pdfs containing multiple pages into one query. The data on the pdfs is in the pages (Page001, Page001, etc) and the tables (Table001, Table002, etc.) do not contain data that I am after.
There is no possibility to select multiple pages in the "import from folder" data query as far as I am aware.
A single pdf file with multiple pages opens as follows:
let
Source = Pdf.Tables(File.Contents("C:\Temp\File_001.pdf"), [StartPage=1, EndPage=5])
in
Source
How can I loop the above for a number of files, the names of which I specify in a dynamic table on an Excel Worksheet?
I could e.g. generate a function like below and invoke it, but this still leaves me with manually generating one query per file.
(FileID) =>
let
Source = Pdf.Tables(File.Contents("C:\Temp\"& FileID &".pdf"), [StartPage=1, EndPage=5])
in
Source
Help greatly appreciated.
Power Query
Power Pivot
Xtreme Pivot Tables
Excel for Decision Making
Excel for Finance
Excel Analysis Toolpak
Power BI
Excel
Word
Outlook
Excel Expert
Excel Customer Service
PowerPoint
November 8, 2013
Hi Sjaak,
When you import from folder, you should have a table with all the files in that folder. In this table, add a new column calling the function you mentioned that processes one file, the formula for the new column should look like:
=ProcessFile(FileID)
This way, you will have a column with extracted data, remove other columns and expand this new column.
Active Member
August 5, 2022
Hi Catalin,
Many thanks for your helpful reply.
It worked like a charm!
I've named my function "ProcessFile" and called it from the query by adding a column to the data file table you mentioned. Next, I expanded the data in the newly created column.
My code now reads:
let
Source = Folder.Files("C:\Temp"),
#"Added Custom" = Table.AddColumn(Source, "Test", each ProcessFile(FileID)),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Data"}, {"Data"}),
in
#"Expanded Custom"
1 Guest(s)